PatternQuery Reference

A PatternQuery is a simple definition of a single transformation to perform on one or more input sets of items to cause changes to or more output sets of items (a set of Items is called an ‘ItemSpace‘). The transformation can involve re-organizing data, doing math, summarizing sets of Items, and so on. When a PatternQuery executes, it performs its single defined transformation as quickly as possible and then stops. A query can be graphical like the trivial one shown here, or text in the ‘i-code’ format:

Query

Each PatternQuery operates on a given InfinityDB database as the primary input and output, but also possibly on additional inputs and outputs as well. For example, an additional input can be data received in a client REST request, and an additional output can be the data sent back in the corresponding REST response to the client: this provides a simple powerful way to define a REST ‘API’ or interface that mediates access to the database. In that case, a PatternQuery is like a remote procedure call.

PatternQueries are declarative rather than procedural, using sets of Items as examples of input and output called ‘pattern’ and ‘result’. A query definition is a set of Items, just like any other data in the database. So, query definitions can be operated on and stored by any tools or systems that can work on databases or any other ItemSpace. See PatternQuery Examples for some sample code, and see PatternQuery Implementation for a discussion of the internals.

This document intersperses Java language concepts and code with the PatternQuery level concepts and code. However, no Java language understanding is necessary to make full use of PatternQueries. In the InfinityDB version 6 server, for example, PatternQueries may be worked with through the web-based database browser UI without any Java code. Knowing basic Java is helpful, because PatternQuery uses concepts that are as similar to Java as possible. Also, the basic InfinityDB ItemSpace concepts are important, and although expressed with Java, the ideas can stand alone.

Essential Concepts

Patterns and ResultsAre simple sets of Items containing Literals, Symbols, and Expressions. Logically, the database is traversed like a tree i.e. recursively corresponding to the pattern Items. A Symbol matches an Item or a variable part of an Item from an input ItemSpace like a number, a string, or a tuple. If individual symbols occur in multiple pattern Items, joins are formed. The result is also a set of Items that define how the symbol values and expressions using symbol values are to be used to insert or delete newly created Items . The traversal is not physically recursive but is optimized for speed and for complex operations.
Symbol attributesEach symbol can be defined to have a minimum, a maximum, a set of restricted data types, strong type checking, and runtime strict detection of undefined values and much more. Symbols can also be ‘named expressions’. Some symbol attributes can be defined as expressions, hence are determined at runtime. Symbols can have selectable ‘actions’. Symbol attributes are specified like Where 'symbol name' attribute value.
ExpressionsThese follow and extend the standard expression syntax of Java, JavaScript, C, and C++. The data types are the 12 InfinityDB component types, plus an ‘item’ data type, being a series of zero or more components. A sub-type of item is the ‘tuple’, which is a sequence of zero or more of the 10 primitive types. Items and tuples resemble the Python tuple datatype but do not nest. Optional strong typing checks for many execution errors in advance. Expressions may occur in the pattern, in the result, and in certain symbol attributes.
Stateful SymbolsSome kinds of symbols have ‘statefulness’ and are affected by the logically recursive navigation of the input items. These include counters, timers, random number generators, hashes, statistics, and reducers. A reducer allows any expression to be re-applied to its previous value, for extremely powerful iterated or recursive operations in a functional style.
Input and Output SymbolsInput and Output symbols allow multiple sources for the pattern input and multiple destinations for the result output. Items can be matched as inputs from data embedded in the query itself, from a stream of input Items, or from remote requests coming in from client programs via the REST protocol. In a REST call, the request content, REST request header, REST request URL items can be input, and there can be output to the REST response content, or the REST response header. Input and Output symbols occur at the root of the pattern or result, and are identified by their kind symbol attribute.

Joins

In patterns, a symbol is allowed to occur in multiple places, forming a ‘join’. Using joins it is possible to create queries that combine input from different areas in the database, using the most efficient sources. The result can be an enormous performance improvement, similar to using indexes in an RDBMS. However, a Pattern Query effectively includes the decision about the use of such ‘indexes’ or ‘inversions’ to be encoded directly, rather than being hidden inside a query optimizer. This characteristic means that PatternQueries can have extreme speed, given good Item structures. Fast Item structures can be created simply by replicating certain kinds of data, such as by transposing or ‘permuting’ variable parts of Items and storing multiple equivalent Items. These replications replace indexes and are part of the database, navigable and mutable like any other data. The PatternQuery compiler rewrites the pattern items internally to handle joins.

Sample Query

A simple query can be specified using the ‘i code’ syntax. This is a text format that simply translates directly into Items that can then be executed. It is easier to read, and can be embedded into Java code easily or sent in email. (i code is extremely simple and should not considered a real language.) The InfinityDB web-based database browser allows direct editing, saving, and executing queries expressed in i code and other forms, such as a graphical format. Here we have a query that examines Items beginning with an ‘Aircraft’ class, finding the id’s and models for each aircraft. The =id symbol represents a particular aircraft identifier, and the =model symbol represents the model of that aircraft. The input of the query is defined by the components after the ‘pattern’ attribute on the pattern line, and the output of the query is the part after the result attribute on the result line, in which the class becomes ‘AircraftModel’, and the =model and =id symbols are reversed. This is a simple ‘inversion’ that produces a re-sorting of all of the aircraft into an additional set of Items in the database. Such an inversion can later be used to find ids given a model with one database access, either by application Java code or other queries.

query {
    pattern Aircraft =id model =model;
    result AircraftModel =model id =id;
    comment 'do a simple inversion of Aircraft';
}

The contents of the database before the query might be:

Aircraft "747" model 100
Aircraft "747" model 200
Aircraft "757" model 300

And after the query executes, additional Items appear:

Aircraft "747" model 100
Aircraft "747" model 200
Aircraft "757" model 300
AircraftModel 100 id "747"
AircraftModel 200 id "747"
AircraftModel 300 id "747"

(For older applications: If your existing code does not use components of the EntityClass or Attribute data types like Aircraft or model but instead you use constants of some sort, you can just substitute those. The EntityClass and Attribute data types are two of the 12 standard types, and formatted as text they look like Aircraft or model.) The ‘query’, ‘pattern’ and ‘result’ components are Attributes. There is often a ‘Where’ class along with the pattern and result to describe the symbols further. The syntax of the above simply describes two Items for the query definition, which in the standard Item tokenized form look like:

query comment "do a simple inversion of Aircraft"
query pattern Aircraft "=id" model "=model"
query result AircraftModel "=model" id "=id"

Application code or users can rewrite or construct a query in any way desired because the query is actually a set of Items. The text above is translated by iParser into these Items, and it can be re-created and made tidy by iFormatter. The ‘i code’ syntax is trivial. I code can be compiled and executed online, in the database browser. Here is how to execute some i code from Java:

String queryICode = "the query i code";
// Create a VolatileItemSpace with the Items. Or, an existing ItemSpace can be used.
ItemSpace querySpace = new IParser(queryICode).parse();
PatternQuery patternQuery = new PatternQuery(querySpace);
patternQuery.compile();
// InputSpace and outputSpace are often your database but can be any ItemSpace
patternQuery.execute(inputSpace, outputSpace);

Above we used ItemSpaces, but if you work up at the level of the InfinityDBMap API, just substitute those.

Here is a slightly longer query, where we put the result into ‘SelectedAircraft’, and we limit the range of the model to 200 to 300. The execution logically requires scanning the entire set of Aircraft and filtering by range on model. We will show a faster version below. This would rely on a similar execution plan compared to an RDBMS as well without indexes.

query {
    Where 'model' {
        from 200;
        to 300;
    }
    pattern Aircraft =id model =model;
    result SelectedAircraft =id;
}

This query takes four Items , here in tokenized form, and sorted by Item as usual:

query Where "model" from 200
query Where "model" to 300
query pattern Aircraft "=id" model "=model"
query result SelectedAircraft "=id"

Any i code can also be written in a ‘Python style’, even mixed with the ‘Java style’, and both are interchangeably recognized by the compiler. The language is extremely simple:

query:
    Where 'model':
        from 200
        to 300
    pattern Aircraft =id model =model
    result SelectedAircraft =id

The pattern and results can be multi-line like Where, and there can be nesting. To the left of each opening brace or colon is a series of tokens for components to prefix to all of the nested suffixes. The i code format is really just for:

  • Factoring out common prefixes
  • Changing single quotes to double (double quoted strings are allowed as well)
  • Rewriting =symbol to “=symbol” to create standard strings starting with an equal sign
  • Rewriting (expression) to “=expression
  • Changing lists with [..,.. ] to the InfinityDB Index data types, which are similar to longs that serve as list indices.

Here is how this query looks in the InfinityDB Server web-based database browser in table display mode. This tool can be used to create and use PatternQueries via a browser directly on the server for maximum performance. The table display is based only on the set of Items designating the query, according to certain universal formatting rules. The browser is unaware that these Items represent anything other than regular data, until they are submitted for execution. It is simple to edit and execute these Items online. As a consequence of the formatting rules the suffixes of the Where class come out arranged in a nice table at the bottom, one row per symbol, and one column per symbol attribute. The pattern Items fill the stretchy box below the green ‘pattern’ attribute, and the result Items are below the ‘result’ attribute. There are no limits to the practical complexity of a PatternQuery in terms of executability or representation. The display of the PatternQuery in the browser expands and packs down automatically.

Query

Now for almost instant performance, we rewrite the query to use the inversion we created above. This requires only changing the pattern line. The physical scan is more efficient, using the from and to attributes to hop directly to the beginning and end of the needed Items, potentially providing a huge performance improvement. If instead we had specified the model itself rather than a range, there would be only one database access irrespective of database size.

query {
    Where 'model' {
        from 200;
        to 300;
    }
    pattern AircraftModel =model id =id;
    result SelectedAircraft =id;
}

Joins Again

The pattern part can be arbitrarily complex, and can do the equivalent of the relational ‘select’, ‘project’ and ‘Join’, or ‘SPJ’, which is the universally fundamental set of database operations. A join happens when a particular symbol occurs in multiple patterns. There can be multiple joins at the same time on different symbols forming a ‘multi-way’ join, and there can be more than two repetitions of a symbol to form ‘higher order’ joins although these are rarer. A first order join is also known as an ‘equi-join’ or ‘full join’ or just ‘join’. Furthermore, a symbol can occur multiple times within the same pattern forming a ‘lengthwise’ join. ‘Crosswise’ joins connect the suffixes of different patterns, although the distinction between lengthwise and crosswise is only terminology. Patterns that participate in joins are closely related, and their pattern matching requires that all of the related patterns match at once. When such a ‘join set’ of patterns match, all of the symbols in any of the patterns are given values together.

Here is an example of a join:

query {
    pattern {
        Aircraft =id model =m maker =mk;
        AircraftMaker =mk country =c;
    }
    result Results maker =mk id =id model =m country =c;
}

This query joins on the =mk symbol to combine information from two places in the database. The Aircraft Class uses the =id symbol to identify particular kinds of aircraft, but the AircraftMaker Class factors out the common facts about the makers. The results are being inserted under the ‘Results’ Class. The PatternQuery compiler rewrites the pattern Items internally to handle joins in the pattern and to handle a situation called a ‘cross product’ specified by the results.

Joins are a vital and powerful part of any DBMS. Not only do they determine the semantics of the query, but they also provide ways to increase performance tremendously, when applied correctly.

Expressions

There can be extended Java-style expressions within the pattern, result, and some parts of the Where. The syntax is also like C, C++, or JavaScript. These expressions allow limiting the pattern matching, or computing on symbol values for inclusion in the results, or determining certain attributes of the symbols. An expression in i code format is specified inside parentheses. Symbols in expressions correspond to the symbols defined in the pattern Items, but do not have the equals character at the start. Symbol names use standard syntax for identifiers, which are mixtures of letters, digits, and underscore, not starting with a digit.

An expression in the pattern is very efficient, causing a restriction of the matching to the value of the expression, even though the expression value may be determined by other symbols in the pattern. The access performance is similar to that of a literal, and it is very efficient in that there is no iterative scanning done in order to throw away input that does not match.

(A note for standalone Java code: expressions are compiled directly into Java Objects where they are interpreted at runtime, without requiring the actual Java compiler. The compilation takes only tens of microseconds, and a compiled query can be re-used without re-compiling, although there is no way to persist the executable form. The executable form is not thread-safe but each thread can simply compile a new re-usable one for itself. The execution speed is much faster than the database accesses for most operations, and the Java Just-In-Time compiler can optimize the expression evaluation code easily.)

The expression syntax uses only unary, binary and n-ary operators. There are many extensions beyond Java. Here is an example result where we format an item using the n-ary format() operator:

    result AircraftMakerSummary ('Aircraft make %s in %s makes %s model %s'.format(mk, c, id, m));

What Expressions Cannot Do

The expression facility does not provide a general purpose programming language, but instead has ‘functional’ semantics. It is nevertheless very expressive, much more so than SQL for example. This means there are no:

  • variables (but symbols take on various values during execution)
  • while, for, if, else or other block statements (but patterns loop, filter and match)
  • references or pointers
  • classes or interfaces (but queries have ‘interface’ names)
  • arrays (but items and tuples are short sequences of data values called components)
  • Exceptions (except the transient undefined value from e.g. divide by 0)
  • source code files (other than optionally expressing the query’s Items as i code)

What Expressions Can Do

So what is left? A ‘program’ is purely declarative. Instead of having explicit loop statements, all of the expressions logically execute each time a pattern or set of joined patterns matches, assigning values to all of the symbols from the data in the matched input Items that are necessary to fire related result Items. So for each match, the symbols have fixed i.e. immutable values, and all expressions perform fixed computations on those values. Effectively, looping happens in a nested way during the nested pattern matching. For conditionals, patterns can selectively match zero times or can be disabled.

In expressions, instead of if and else, the standard ‘ternary operator’ is used like x ? y : z. This has the value y if x is true, otherwise z. These can be nested or chained arbitrarily. Expressions can be given symbol names and referred to in other expressions: these are called ‘expression symbols’, as inWhere 'my_symbol' equals (other_symbol * 2); All expressions are by default checked using compile-time strong typing to avoid common mistakes. Runtime errors like divide by zero create temporary undefined values that can be caught or can stop the query with an error message, but which never reach any ItemSpace, since they cannot be stored there. There is also runtime type checking, explicit and implicit data type conversion and automatic numeric promotion when necessary.

External Java Code

(As a side note: Currently, arbitrary external Java code cannot be invoked from within an expression. Until the security implications of such integration can be worked out, this will remain the case. Also, PatternQueries, like SQL, are intended to be portable and not sensitive to the local execution environment. Furthermore, environments such as the InfinityDB version 6 server will not usually provide a way to add access to arbitrary class files, at least not without a restart. Possibly the IoT-as-server case will be an exception, where the server is under local control in a dedicated device. Some of the safe java.lang., java.util, and java.math. classes and methods are integrated. )

Item Values

An additional datatype for the value of an expression, the ‘item’ is added to the basic 12 for the purposes of computing the expression. An item represents an immutable sequence of zero or more components, each being of the basic 12 types. (An item is manipulated in the expression interpreter as a simple array of Objects.) Literal items are parenthetical comma-separated lists. like (x,y,z) so an empty item is just () and a singleton item, i.e. an item of length one is x alone or (x). A singleton item value is interchangeable with a single component value.

Item values are not nestable, and when necessary, they are ‘flattened’, so they can be temporarily ‘unflattened’ while being computed. For example (1,(2,3),4,5) is legal and evaluates to (1,2,3,4,5), and the elements can be any expressions. Also they cannot contain undefined, or they will either throw an Exception to stop the query execution in strict mode or will turn into undefined themselves. In fact, the value undefined cannot be stored in the database at all, and an item value or a regular encoded Item can not contain it. The undefined value is transitory, existing only during expression evaluation.

There is a practical limit on item length, in that the encoded form must fit in the chars of a regular encoded Item when stored in the database, which is always up to 1665 chars long. While being evaluated in an expression, however, items are not limited in length.

The compile-time strong type checking takes into account whether any expression value may be an item. The compile-time type of an expression is a set of 12 booleans, one per basic type, plus a boolean for ‘itemness’. Each of the booleans represents a type that can be the result of the evaluation. These can be controlled in the query’s ‘type’ symbol attribute.

Tuples

A ‘tuple’ is an item value or regular encoded Item that happens to contain only primitive elements (primitives exclude the class and attribute i.e. meta types). This is in keeping with the regular mathematical definition of a tuple. Tuple values are not flagged as such while being computed, but the strong compile-time type checking can distinguish them and find many errors early. It is possible to set the data type of a plain symbol to ‘tuple’, and only tuples will be matched by the symbol from the database. (A future feature will allow strong type checking and filtering of items during retrievals to enforce that they contain only a given predetermined subset of the 12 basic types. For example, items of only numbers could be enforced. Currently, a matched item symbol can contain any components and a matched tuple symbol can contain any primitives. )

Quoting Tokens with At Sign

Any value of the 12 basic data types – which are called components – can be written literally in a standard text form called a ‘token’. These conflict sometimes with the Java expression syntax, but they can be embedded in expressions by surrounding them with at-signs like @MyEntityClass Bytes(25_A6) myattribute@ and so on. It is usually possible to hoist such tokens outside the expressions into the surrounding i code syntax where the at-signs go away. So, this at-sign quoting is actually seldom used.

Operators

There are some extensions for operators beyond those in Java. For example, some operators produce ‘undefined‘ values for error conditions, which may be caught or ignored or which may be designated to cause an Exception. Expressions or subexpressions can be cast to (unstrict) or (strict) to control whether undefined should cause an Exception. An Exception stops the execution, optionally printing a message determined in the query. Values that are items of length one are called ‘singletons’ and are treated identically to simple components.

Any value can be tested in any boolean context without error, and the default values are considered false, as is undefined. The default values are false, 0, 0.0f, 0.0, Index(0) or zero-length values including “” or (). A tuple of length > 1 is true. Boolean contexts are operands of logical operators such as ||, &&, or ?:. Catching undefined can be done with (x instanceof undefined ? 0 : x + y) or ((isdefined)x ? x + y : 0), or for some uses, (x ? x + y : 0) and so on.

Plain symbols can be declared selective for a set of chosen types and will skip data of non-selected types found in the database. Constructing items or tuples merely requires placing the desired elements in parentheses with commas, so there is no concatenation operator needed. Such item expressions can include elements which are also items, which are flattened at compile time or at runtime. Compile-time i.e. ‘strong’ type checking can be disabled by setting the strict symbol attribute false on any symbol, so type checking then relies on runtime tests or type adaptation such as numeric type promotion.

Operator Arities

Operators work differently depending on the number of operands, i.e. their ‘arity’. The standard Java operators are all present with the same precedence as Java, C, C++ and JavaScript but are also defined to work over items as well as the basic types as in Java.

Operator ArityDescription
UnaryHave one parameter. These distribute over items, so -(4, 5) is (-4, -5) and (‘hello’, ‘world’).length() is (5, 5). Operators can look like parameterless method invocations, like expr.length(), or can be casts as in (length)expr interchangeably. There are no static methods as there are in Java, so Math.abs(x) becomes (abs)x or x.abs().
BinaryHave two parameters. These distribute over items on both sides. So 3 * (4, 5) is (12, 15). Also (4, 5) * 3 is (12, 15). Furthermore, (3, 4) * (5, 6) is (15, 24). There are no static methods as there are in Java, so Math.max(x,y) becomes x.max(y).
TernaryDistribute on left only, so (‘hello’, ‘world’).substring(1,3) is (‘el’,’or’). There are two right parameters, which are singletons, i.e. items of length 1, often longs.
Item-Ary Operators like get(index), get(start, end), head(end), tail(start) or size() operate on an entire item without distribution for the left parameter. A component as usual is considered to be a singleton item. The right parameters are singletons, i.e. items of length 1, often longs.
ConstructorThe keyword ‘new’ is followed by a type name and a parameter list. Except for new Date(years,…,”timezone“) the parameters are an item of all longs.

The operators are:

OperatorArityDistributiveOperation
parameterItem.format(formatString)Item-AryfalseFormat according to the Java string formatter, with the usual %s and other escapes.
itemOfStrings.join(string)Item-AryfalseSame as Java String.join(). The right string is placed between pairs of strings in the left operand and returned.
strings.simpleDateFormat(formatString)
dates.simpleDateFormat(formatString)
BinarytrueFormat or parse the date with the Java SimpleDateFormat according to a format string like e.g. ‘yyyy-MM-dd’T’HH:mm:ss.SSSZ’ .
string.match(regexPattern)
string.split(regexPattern)
string.split(regexPattern, limit)
string.replaceFirst(regexPattern, replacementString)
string.replaceAll(regexPattern, replacementString)
Binary
Ternary
trueMatch the string or item of strings against the regular expression according to Java regular expressions. The regex is not surrounded by // as in JavaScript but instead is a normal string in quotes, as in Java. The regex is a constant to avoid denial of service attacks. As usual, be careful to write safe regexes (which is hard).
item.formatCSVLine()
item.formatCSVLine(lineEnding)
Item-AryfalseComma-separated line methods. The default line ending is System.lineEnding.
string.parseCSVLine()Unarytrue
(totokens)exprItem-AryfalseCreate a string formatted according to the standard token format. Expr may be a component or item. String components are quoted within the result. The empty Item i.e. () creates an empty string.
(fromtokens)stringUnaryfalseParse a string to create an item according to the standard token formats of the basic types. The parsed string components are expected to be quoted within the given string. The string may be empty.
expr instanceof typeSpecialfalseReturn a boolean determined by expr being of a given type. The type names have no quotes. The types are the 12 basic types class, attribute, string, boolean, double, float, long, date, chars, bytes, bytestring and index. Also the general types number, component, primitive, and meta are recognized. The type ‘item’ is orthogonal to the basic types, returning true only for non-singletons, i.e true unless the length is 1.
(typeof)exprUnaryfalseReturn the single basic type of expr as a string. If it is a non-singleton, return ‘item’.
(isdefined)exprUnaryfalseReturn a true iff expr is not undefined.
(strict)expr or (unstrict)exprSpecialfalseSwitch on or off the runtime check for undefined. A strict expression or subexpression will throw an Exception with an explanatory message if it evaluates to undefined Otherwise by default all runtime errors cause undefined expression values and do nothing else. Note that symbols can have a strict false attribute which deactivates the strong type checking, but symbols may never be undefined.
(double)expr
(float)expr
(long)expr
UnarytrueNumber, boolean, string, or date can be cast to any numeric type. Casts from date return milliseconds. Casts from boolean return 0 or 1, 0.0 or 1.0, or 0.0f or 1.0f. long may be cast to and from bytes of length 8, where they are big-endian.
(date)exprUnarytrueDouble, float, long, string, or date can be cast to date. Numbers are interpreted as milliseconds. Strings use standard ISO like 2023-2-20T08:04:01-0700.
new Date()
new Date(year,month,day)
new Date(year,month,day,
hour,minute
)
new Date(year,month,day,
hour, minute,second
)
new Date(year,month,day,
hour,minute,second
, ‘timezone‘)
newfalseConstruct the current Date if no parameters. Otherwise, create a Date independent of the current time. If present, parameter 7 is the string name of a timezone, falling back to GMT if unparseable, defaulting to the local timezone. The year is 0-based (unlike the original Date(year,…) constructor, which was 1900 based).
new String(itemOfLongs)
new Bytes(itemOfLongs)
new ByteString(itemOfLongs)
new Chars(itemOfLongs)
newfalseConstruct various arrays, each element converted from a long to a byte or char
new RandomLong(count)
new RandomLong(count, seed)
new RandomBytes(byteCount)
new RandomBytes(byteCount, seed)
…more…
Randoms for arrays of long, double, boolean, float, gaussian of count elements, or byte[] of byteCount elements. If a seed is given, it takes effect if currently unset or changed.
(string)exprUnarytrueString casting works on all types. All types may be cast from strings, but produce the string ‘(undefined)’ for meaningless text. Each possible value has a unique text representation that also encodes the type. Note that (type)(string)expr == expr if expr is of type type . However, byte arrays, char arrays, and ByteStrings are converted directly as UTF-8.
(boolean)exprUnarytrueAny type may be cast to boolean except that (boolean)’false’ is false and (boolean)’true’ is true and other strings are undefined. This is required so that (boolean)(string)expr == expr where expr is boolean. For the logical operators && || ?: and ! only undefined, and default values such as false, 0, 0.0f, 0.0 Index(0) or “” are treated as false. Items of size > 1 are true.
(index)exprUnarytrueThe Index data type can be created from any positive numeric type. It is for creating lists.
(bytes)expr
(chars)expr
(bytestring)expr
(string)expr
(bytes)long
(long)bytes
Unarytruebytes, chars, bytestring, and strings may be cast to each other. Strings work with UTF-8 encoding. Longs may be cast to bytes and back, where high bytes are first, which is ‘big-endian’. Chars map to and from bytes big-endian.
(item)exprItem-AryfalseConvert expr to an internal Java Object[]. This is only for external Java code (future). It is effectively a no-op.
Undefined.UNDEFINED
Math.PI
Math.E
System.lineSeparator
File.separator
File.pathSeparator
Long.MAX_VALUE
Long.MIN_VALUE
Double.POSITIVE_INFINITY
Double.NEGATIVE_INFINITY
…the rest of the double and float constants…
Calendar.JANUARY…
Calendar.MONDAY…
A ‘manifest’ constant is identified as in Java, with a package name and all-uppercase identifier. Others are not manifest but actually variable like System.lineSeparator, which is a system property. Look in these packages for the values of these constants: System, File, Random, Math, Calendar, plus the basic types like Long.
(head)expr
expr.head(endIndex)
Item-AryfalseReturn the components of an item before endIndex. On out-of-bounds, return undefined. If endIndex is negative, it designates a position relative to the end of the item. The unary version means head(1)
(tail)expr
expr.tail(startIndex)
Item-AryfalseReturn the part of an item at and after startIndex. On out-of-bounds return undefined. If startIndex is negative, it designates a position relative to the end of the item. The unary version means tail(1)
(isEmpty)exprUnarytrueSame as (length)expr == 0. Not for detecting the empty tuple, which uses expr == () or (size)expr == 0 for example.
(length)exprUnarytrueThe length of a string, char array, byte array, or bytestring. Other data types return 0. Not like (size)
(size)ItemItem-AryfalseThe number of components in an item. If the left parameter is a component, it has length 1. The content of the item is ignored. Not like (length) or .length()
item.get(index)Item-AryfalseGet the component of an item at the given index. If the left parameter is a component, treat it as a singleton item. A negative index is relative to the end of the item.
item.get(startIndex, endIndex)Item-AryfalseGet, from an item, a sub-item from startIndex to before endIndex. If the left parameter is a component, treat it as a singleton item. The result may be an item of length one, which is a singleton and therefore equivalent to a component, or may be empty or any other length. A negative startIndex or endIndex is relative to the end of the item.
stringLikeExpr.substring(startIndex),
stringLikeExpr.substring(startIndex, endIndex)
string.toUpperCase(),
string.toLowerCase(),
string.trim(),
string.replace(searchString, replacementString)
string.indexOf(string)
string.lastIndexOf(string)
string.indexOf(string, startIndex)
string.lastIndexOf(string, startIndex)
string.contains(string)
string.startsWith(string)
string.startsWith(string, startIndex)
string.endsWith(string)
Unary, Binary, TernarytrueString operators based on java.lang.String. For example, “abc”.indexOf(“b”) is 1. Unlike Java, negative indexes are relative to the end. substring() also works on bytes, chars, or bytestrings. Also note regex methods above.
number.abs() or (abs)number
number.max(number)
double.sin() or (sin)double.
… more from java.lang.Math…
Unary, BinarytrueMath based on java.lang.Math class methods. However, instead of Math.max(n,m) use n.max(m). Instead of Math.abs(n) use either (abs)n or n.abs(). Methods with Integer parameters (4 bytes) are not used but the long versions (8 bytes) are used instead.
item.repeat(n, limit)Item-aryfalseMake a new item consisting of min(n,limit) concatenated copies of it. The new item length is limited to 100M (currently) which if exceeded is undefined or causes an error when strict.
Java unary operators minus (-) and binary not (~)UnarytrueJava syntax. Distribute over the operand, if it is an item.
Arithmetic and bit-wise Java binary operators * / % >> >>> << | &BinarytrueJava syntax. Distribute on left and right over items. Left and right operands may have any length, but one operand must be of length 1 or else the lengths must be equal. The size combinations are thus 1:n, n:1 or n:n where n can be >= 0. Precedence is like Java, C, C++, and JavaScript.
Java comparison i.e. equality or relational operators == != > < >= <=Item-AryfalseJava syntax. For example ‘==’ compares all elements of two items. The items may be of different lengths, but to be equal they must be the same length. A component is considered to be identical to a singleton, i.e. an item of length one. For ‘>’ and so on, first the initial components are compared, and if equal, then longer items are larger. The compared values may have different data types in any combination. Precedence is like Java, C, C++, and JavaScript.
Java logical operators && || ?: !Item-AryfalseThese are just like Java. They do not distribute. Only undefined, and default values such as false, 0, 0.0f, 0.0 Index(0) or “” or zero-length values are treated as false. Items of size > 1 are true. No errors are possible except from the expressions after ? or :. Precedence is like Java, C, C++, and JavaScript.

Symbol Attributes

There is a small set of possible symbol attributes that have similar meanings for different kinds of symbols. Symbol attributes are specified under the Where class under the query attribute. For example, a depends_on symbol attribute could be defined with i code Where 'mysymbol' depends_on 'some_other_symbol';

Symbol attribute in WhereValue TypeDescription
kindstringDetermines the purpose of the symbol. If absent, it is a plain symbol and just retrieves data from the input i.e. the database or root symbol. Otherwise, the symbol can only occur once in the pattern because only plain symbols can join. Plain or other kinds can occur multiple times in expressions and the results.
depends_onconstant string The name of a symbol whose value changes are monitored to determine when a ‘reset’ happens . The ‘reset’ for some kinds causes the value of the ‘from’ attribute’s expression to be transferred into the symbol’s state, or else a new random to be generated or the time to be taken. The depended-on symbol must be outer to the depending symbol, i.e. in the pattern, closer to the pattern root.
from, from_exclusiveone or more expressionsThe reset value or starting point or minimum. The type of the expression depends on the kind and type attribute. This can be a series of expressions in one item, effectively designating a ‘tuple’. So Where 'mysymbol' from x 's' 5 (x+2);is equivalent to Where 'mysymbol' from ((x, 's', 5, x+2)); Specifying a from value on a plain symbol is very efficient, and does not cause a ‘filtering’ which means it does not retrieve all possible matches and then throw some away, so it is good for large data.
to, to_exclusiveone or more expressionsThe ending point or maximum. The type of the expression depends on the kind and type attribute, behaving like the from symbol attribute.
typeset of stringsOptional set of the 12 data types, or ‘tuple’ to match a series of zero or more components of the 10 primitive types or ‘item’ to match a series of zero or more components to the end of the Item. Also ‘number’ matches ‘double’ ‘float’ or ‘long’ and ‘meta’ matches ‘class’ (i.e. ‘entityclass’) or ‘attribute’. Furthermore, ‘component’ matches any single component i.e. a singleton tuple, and ‘primitive’ is like ‘component’ but excludes ‘class’ or ‘attribute’ and is the default. For plain symbols, these restrict the set of values matched from the input data. Other kinds do not accept multiple types. The set does not require any special syntax, so it can be like Where 'mysymbol' type { 'double'; 'float'; }
equalsone or more expressionsFor a plain symbol this constrains the matches to the given expressions, so at most one match occurs. Also, a symbol can be an ‘expression symbol’ by having no kind, and only an equals symbol attribute. This can be a series of literals, symbols, and expressions at the end of the item, effectively designating a ‘tuple’ like the from symbol attribute.
firstconstant booleanWhether to use only the lowest value iterated for plain symbols. This is very efficient and takes only one database access, instead of ‘filtering’. The default is false.
lastconstant boolean For plain symbols, whether to use only the highest value. This is very efficient and takes only one database access, instead of ‘filtering’. For a stateful kind of symbol, only the final iteration is ‘passed on’ to the inner pattern. The default is false.
incrementnumeric long expressionFor series and counter kinds, the amount to add per iteration, default 1. For plain symbols, the number of input values to skip from the initial or from value, with positive being forwards, and negative backwards. This does extra database accesses while doing the skipping, so don’t use very large increments.
actionstringWhat do do when a pattern matches and result fires. This applies to results having this symbol as innermost, such as the terminal symbol. The action affects all items that are extensions of the prefix of this symbol in the result.
lengthnumeric long expressionThe size in bits of a random, size in bytes of a secure random.
limitnumeric long expressionFor a plain symbol, the maximum number of database values to iterate over before ‘exiting’ the loop, thereby skipping iterations at the end.
strictconstant booleanWhether to do strict compile-time type checking on the symbol, default true. In very rare cases, disabling strict may be needed. This is not the same as the (strict) or (unstrict) casts in expressions, which determine whether to cause an error on an undefined value and stop the query.
commentstringAn uninterpreted single line of text for humans to read.
enabledboolean expressionif false, make the symbol be apparently absent, causing no iterations. This transitively disables the pattern inwards as well. Note that for joins, each occurrence of the join symbol can be disabled separately if the corresponding outer pattern component is transitively disabled. If all but one such join symbol occurrence is disabled, then there is effectively no join at all. The default is true.
values, min_values, max_valuesnumeric long expressionThe number of values allowed, i.e. the number of iterations allowed. These are the iterations generated by this symbol ‘inwards’ for each iteration ‘received’ from outwards by this symbol. If there are too few or too many, and on_error is defined, then exit with the message in the on_error symbol attribute. If on_error is not provided, a generic message results.
on_errorstring expressionif defined, this is the error message that results when values, min_values, or max_values is violated, causing query termination..
defaultone or more expressionsThe value for the plain symbol to have if there are no matches, i.e. there would be zero iterations. This can be a series of expressions in one item, effectively designating a tuple, as in the from symbol attribute. Counters and Reducers and other stateful symbols fall back to this value if no iterations reach them.
yieldsmulti-value expressionThis expression may be self-referencing, and it determines the value of the symbol in references to it in expressions and results. It is applied to the ‘internal’ value of the symbol as it would be without the yields. This is a shortcut and the effect can be had in other ways except that it is allowed to be multi-valued, which can be very useful.
disable_by_defaultconstant booleanIf true, then if the symbol matches nothing, then consider it to have enabled false. With enabled false, the part of the pattern inner to the symbol effectively is removed from the query. Default false.
logic‘or’, ‘and’ or ‘not’The default is ‘or’ logic. With ‘and’ or ‘not’ logic, the nearest inner symbol must be a plain symbol (having no kind attribute). With ‘and’ logic we have for each v1: if for all v2: (v1, v2) exists then visit(v1, v2). With ‘not’ logic we have for each v1: if for all v2: (v1, v2) does not exist then visit (v1, v2). The ‘not’ logic is apparently self-defeating, but in case v2 is part of a join it is very useful. It is useful to do joins that combine ‘or’, ‘and’ and ‘not’ together, possibly with disable_by_default. The default ‘or’ logic is for all v1: for all v2: visit (v1,v2).

The ‘and’ and ‘not’ require some temporary space in memory, so they are not unlimited, but the space is small, being only the size of v1. The algorithm is an extremely efficient ‘zig-zag’ and is computed combined with a possible join on v2. Joins always use ‘zig-zag’ in any case.
idconstant tuple of strings(Future feature) For a query invoking another query, this identifies the invoked query. Currently only queries occurring under the class Query can invoke each other (later inter-server invocations are planned). In this case, the id is a pair of strings denoting an ‘interface name’ ‘method name’ combination, which together are a ‘query id’ or ‘query name’. The invocation is done via a symbol of kind ‘local query’ with this id symbol attribute. A future security feature will constrain access according to interface name.
fanoutconstant numberThe user can advise the query compiler how many iterations a given symbol is expected to make for each iteration reaching it, which can increase speed by helping in optimization in some cases. For example, for zipcodes per state, the fanout might be 100, or for pets per street address, it might be 0.3. The compiler’s plan printout shows the expected total number of iterations based on fanout, so if the fanouts are set to match the real data, execution time can be guessed. If there are 300M persons, set the fanout of the =person symbol to 3e8. This can help detect unwanted nested loops or ‘cross products’. Series symbols’ fanouts are computed from the from, to, and increment if possible. Sometimes fanouts can be determined to be 0.0 to 1.0. A compiled query’s text ‘plan’ printout includes this, visible in the web-based database browser. Defaullt fanouts are: Filter: 0.3, Literal or Expression: 0.5, Stateful (i.e. counters etc) 1.0, ItemScanner: 2.5, Series: 3.0 and PlainSymbols: 4.0.
summaryconstant booleanFor stateful symbols, control whether only the ‘bottom line’ is output, or else all of the preceding ‘running’ outputs are also output. For example, with summary true, a counter will output just its final value, or if false, all of the individual values it takes on. This works by suppressing output, but does not affect the state or state transitions of the stateful symbol. Default true.
resetconstant booleanFor stateful symbols, control when the ‘reset’ event occurs. By default, stateful symbols are reset each time the nearest outer loop reaches its final iteration, but with reset false, a loop allows its outer loop to do the resetting. A ‘loop’ is a plain symbol or series kind symbol. The depends_on symbol attribute can be used instead if that is simpler, for tighter control. Default true.
all_must_matchconstant booleanIf true, then each of the inner patterns must match at least once, or else the symbol does not match. There is an additional preceding test phase of iterations of the inner patterns to see if some combination of symbol values can be found that cause a match. Without this, the compiler feels free to omit checking for matches of symbols where it can as long as the output is not affected. There is thus an assumption that symbols do match something even though the value of the symbol may not be needed. Without this assumption, execution is too expensive. The test phase sometimes is quick, but can be as long as the regular matching. Default false.
requiredconstant booleanIf true, the symbol must match, or else no output will occur for any result. This is similar to all_must_match but much more efficient, and global. No extra test phase occurs as for all_must_match. Default false.

Actions

Each symbol has an ‘action’ that determines what happens when patterns ‘match’ and results ‘fire’. An action is a symbol attribute, and can be omitted, to get default ‘copy’ behavior. When a pattern or set of joined patterns match, the symbols they contain have values assigned to them for the purpose of creating result Items. A result ‘fires’ when all of the symbols referred to in the result are available with assigned values. If a symbol is at the end of a pattern, then the ‘move’ types of actions defined below are available. If a symbol is the one nearest the end of a result, which is called ‘innermost’, then any action is available for that symbol. The actions of symbols in earlier positions in the pattern or result must be default, and a symbol may be both at the end of a pattern and innermost in a result at the same time.

The actions are below. For example, you specify Where 'mysymbol' action 'move unioning'; and make sure that the mysymbol occurs at the end of some pattern and innermost in some result.

Action nameWhat happens
‘none’Nothing – normally for testing or disabling a result or pattern
defaultSame as ‘copy’
‘move’,
‘move safely’,
‘move unioning’,
‘move differencing’
Like the similar other variants, but delete the matched pattern Items afterwards. For safety, this deletion happens in a final ‘move’ pass during execution.
‘copy’Clear the result, then copy pattern data to create the result Items. In other words, this is like ‘clear’ and then ‘union.
‘copy safely’Make sure the copying would not overwrite existing data, and throw an Exception right away if so. In other words, this is like ‘ensure clear’ then ‘union’
‘union’Insert the new result Items with the previously existing Items. Note that this is a set operation, not a list, so no duplicate output Items can arise. This only requires the main execution pass, and is fastest.
‘difference’Delete the new result Items from the previously existing Items. This only requires the main execution pass, and is fastest.
‘clear’Remove the Items that are extensions of the prefix of the innermost result symbol. A special ‘clear’ execution pass happens before the main pass, and after a successful ‘safety pass’.
‘ensure clear’Check whether there is already result data and throw an Exception right away if so. In other words, make sure there are no Items that are extensions of the prefix of the innermost result symbol. A special initial execution pass does the checking for safety.
‘union or difference’Special for PatternQueryItemSpaces. Each execute() invocation provides a parameter that tells whether to union or difference.

Symbol Kinds

Each symbol has a ‘kind’, optionally specified in the kind symbol attribute. Plain symbols, which have no specified kind, do not need a Where table entry unless non-default symbol attributes are needed. Some attributes can have values that are expressions, and in those cases, symbols like =mysymbol or literals like ‘mystring’ or 73 are also allowed.

Below are the kinds:

KindKind namesSymbol AttributesDescription
Plain(none)strict
type
equals
yields
from
from_exclusive
to
to_exclusive
first
last
increment
action
comment
on_error
values
min_values
max_values
enabled
disable_by_default
default
limit
logic
summary
reset
required
all_must_match
fanout
Plain symbols have the default kind, or are not present in the Where table. They simply retrieve data from the database (or from a root – see below). Only the length, depends_on, and id attribute are disallowed.
Reducerreducerdepends_on
type
strict
from
recursion
first
last
action
enabled
disable_by_default
default
comment
yields
summary
reset
fanout
Compute a sequence of states based on a self-reference to the current state and possibly other symbols. ‘from’ is for initialization and ‘recursion’ is for the expression determining the next value, which is usally self-referencing. See the Sum kinds relative to reset and summary. This is a ‘stateful’ symbol.
Timedate ,
time,
time seconds,
time milliseconds, time nanoseconds
strict
type
yields
depends_on
action
comment
enabled
disable_by_default
summary
reset
fanout
Snapshot the current date as a Date or time as a long or double in seconds, milliseconds, or nanoseconds. Setting the type to ‘string’ generates standard date text. See the Sum kinds relative to reset and summary. Resetting takes a snapshot of the time. This is a ‘stateful’ symbol.
Randomrandom,
secure random
strict
type
yields
from
to_exclusive
depends_on
action
comment
enabled
disable_by_default
summary
reset
fanout
Generate a random number as a long with a given bit length or range, or else a secure random byte array of the given length, defaulting to 32 bytes. See the Sum kinds relative to reset and summary. Resetting regenerates a new random. This is a ‘stateful’ symbol.
Countercounter
counter from zero
counter from one
list index counter
strict
type
yields
from
to
first
last
increment
depends_on
action
comment
enabled
disable_by_default
default
summary
reset
fanout
Starting from a given ‘from’ attribute defaulting to 0, add a given increment defaulting to 1 and provide the count as the symbol value. Type defaults to long but may be set to double or float or else it is the type of the from or increment. Each iteration increments the count. ‘counter from zero’ is identical to ‘counter’ , but ‘counter from one’ uses default 0, from 1. ‘List index counter’ is like ‘counter from zero’ but the values are of Index types. See the Sum kinds relative to reset and summary. This is a ‘stateful’ symbol.
Summax, min, count,
sum,
mean,
variance,
sigma,
sample variance,
sample sigma
strict
type
yields
first
last
depends_on
action
comment
enabled
disable_by_default
summary
reset
fanout
A simple reducer that works on the immediately outer component in the pattern. Changes to a depended-on symbol or an outer loop with the reset symbol attribute true cause a reset. Any outer loop with reset false defers the resetting to its own outer loop. Using summary false gets output of the running values instead of just one value such as a total. This is a ‘stateful’ symbol.
Seriesseriesstrict
type
yields
from
to
to_exclusive
increment
action
comment
enabled
disable_by_default
summary
reset
fanout
Generate a series of numbers N (of type double, float, or long). The yields is an optional self-referencing expression giving a function of N and possibly other symbols that determines the symbol value. Either the ‘to’ or ‘to_exclusive’ is required. A series is not dependent on database data . The complete series is generated on each iteration of the ‘outer’ part of the pattern to the immediate left. Only plain and series symbol kinds plus item scanner kinds ‘generate loop iterations’. (More are being added.)
Hashsha256 hash,
sha1 hash,
md5
strict
yields
first
last
depends_on
action
enabled
disable_by_default
comment
summary
reset
fanout
Hash the values of the immediately outer component in the pattern using a standard secure algorithm. This is a running hash dependent on all preceding iterations. The sha256 is 32 bytes, sha1 is 20 bytes, and md5 is 16 bytes. Each iteration includes as input the hash value from the preceding iteration and an encoding of the outer component. This is not the same as hashing a BLOB, which represents a long series of bytes: such a hash is forthcoming, and possibly CRC32. This is a ‘stateful’ symbol.
Input or Output query definition,
with data,
stream input,
request header,
request content,
response header,
response content,
request parameters, params url parameter, local query,
sort
commentPatterns can match data in alternative input sources, and results can output to alternative destinations . These ‘root’ symbols are outermost in patterns or results in order to specify where to input or output. A given query may input and output to multiple roots. (For Java programmers, the stream input is for PatternQueryItemSpace). Most of these are for REST Client Queries. ‘With data’ is anything under the WithData class in the query definition under the query attribute. (future: the ‘local query’ kind is an invocation of another query identified by the id symbol attribute within the Query class in the same database. ‘Sort’ is future.)
Filterfilterexpression
last
comment
enabled
disable_by_default
summary
reset
fanout
The inner pattern is conditionally iterated according to the boolean expression in the expression symbol attribute. When the expression evaluates to true, the received iteration from outer is passed on inwards.. Filter is very general but less efficient than other techniques. For example, the from, from_exclusive, to, to_exclusive, first, last, enabled, disable_on_default, and limit symbol attributes cause no wasted iterations when used on other symbol kinds.
Item Scanneritem scannerstrict
expression
yields
first
last
action
comment
default
enabled
summary
reset
all_must_match
fanout
The expression attribute is an expression that evaluates to a tuple that is scanned from start to end. (A tuple is zero or more primitives). Each step in the scan causes the inner patterns to be iterated once, and the singleton value of the scanner symbol is taken from within the tuple. Hence the effect is to ‘rotate’ the tuple from a ‘horizontal’ to a ‘vertical’ orientation. Then for example stateful symbols can be used to do computations ‘along the length’ of the tuple,, such as by a reducer. Don’t forget to set the types of the symbols in the expression to ‘tuple’, or else construct fresh tuples in the expression. Also consider providing a default value in case of an empty tuple. This constitutes a third kind of ‘loop’ aside from plain symbols and series symbols. Currently only tuples can be scanned, not items in spite of the name.

Reducers

A powerful technique in functional programming which is used in ‘big data’ databases is the ‘reducer’. In PatternQueries, there is a general reducer facility, plus simplified pre-defined reducers like count, sum, mean, variance, sample variance, sigma and sample sigma. A reducer or other ‘stateful’ symbol has a state which is advanced as the database is logically iterated through during query execution. There is an initial value expression and a recursion expression including a self-reference to get the previous value and to determine the next value. Here is an example totaller (normally a simple sum symbol kind would be used instead):

query {
    pattern =symbol1 =symbol2 =sum;
    result =symbol1 =symbol2 =sum;
    Where 'sum' {
        kind 'reducer';
        type 'number';
        from =symbol2;
        recursion (symbol2 + sum);
        default 0;
        comment 'note from starts at symbol2 not 0';
    }
}

In practice the =symbol1 and =symbol2 would be different, as appropriate to the task and might be part of a more complex pattern. The ‘kind’ attribute declares the ‘sum’ symbol a reducer. The ‘from’ attribute of the ‘sum’ symbol initializes the reducer before the first iteration to be the value of symbol2, then the ‘recursion’ attribute contains an expression that advances to the next iteration by adding the previous value given by the ‘sum’ symbol (a self-reference) to the current value of symbol2. Setting the reset attribute on the reducer or a preceding symbol false can control resetting by letting an outer symbol cause the reset, and the summary symbol attribute can be set false so that all of the iterations are visible – otherwise the reducer’s output is restricted to its last iteration. The symbols at the left in the pattern are in ‘outer’ loops and change more slowly than those to the right. The optional default attribute determines the reducer’s value in case it is referenced before it gets a chance to execute its first iteration. In practice, reducers are used for the most difficult tasks, and pre-defined ones are often used. See PatternQuery Examples for some simple and some elaborate reducers, such as a Fibonacci series printer, and a running weighted average of a series of values.

Reducer Performance

A note on reducer implementation is necessary, but this can be skipped. Reducers are not as slow or limited as experienced programmers might expect. While a reducer is a form of recursive function definition, reducers execute iteratively in constant time per iteration and with no limit on the number of iterations or database size. On each iteration, the reducer’s self-reference does not cause a recursive invocation of the interpreter and a push of the system stack. Instead, the self-reference is detected at compile time, and when executed, the previous reducer value is returned as the value of the self-referencing symbol. If the system stack approach were used in the traditional implementation of recursion, the performance would be unacceptable, there would be a practical limit on iteration count, and there would be the possibility of stack overflows to contend with.

Input and Output ‘Root’ Symbols

It is sometimes necessary to deal with Items in places other than the database. The pattern Items can begin with special ‘input symbols’ also known as ‘roots’ that designate various other sources, and the result Items can begin with ‘output symbols’ which are also called roots. for designating various other destinations. For example, static input data can be included with the query itself, as part of the definition of the query, called the ‘with data’:

{
     query { 
         pattern =wd =message;
         result =message;
         Where 'wd' kind 'with data';
         WithData 'hello world';
     }
}

This just outputs ‘hello world’. The =wd symbol goes at the start of the pattern Items that are to be affected. It is defined in the Where 'wd' kind 'with data' and can have any name (with Java identifier syntax as usual, i.e. a mixture of letters, digits, underscore and dollar sign, not starting with a digit). Often the symbol names ‘in’ and ‘out’ are used.

For Java programmers: If the query is running in standalone mode, i.e. invoked by application Java code, then the ‘with data’ can be set explicitly, and the inputs and outputs are called ‘rootMaps’:

        RootMaps rootMaps = patternQuery.getRootMaps();
        InfinityDBMap<Object, Object> withDataMap = 
                new InfinityDBMap<>(someItemSpace);
        rootMaps.setRootMapByType(RootMaps.Type.WITH_DATA, withDataMap);
        patternQuery.setRootMaps(rootMaps);

Note that the Java code can put the with data anywhere it wants. In fact, standalone Java code can completely control all of the rootMaps, and is able to discover the defined maps after compilation, see their symbols and names, and provide an actual InfinityDBMap for each definition. A possible use of this capability would be automated testing.

The Root Types

There are other kinds of inputs and outputs for different contexts:

Root Map Symbol kindUsage
‘query definition’The input is the definition of the query itself.
‘database’(Future feature) in the server, this identifies a database like ‘demo/read_only’. Standalone, this can be bound to any map by name, like a named parameter. The name in the id symbol attribute is a string constant.
‘request header’
‘request content’
‘response header’
‘response content’
These are used in a Pattern Query executing on behalf of a REST access, in which case it is a ‘Client Query’. The request and/or response content are the ‘input’ and ‘output’ of the Query, if either is used. The request and response headers are formatted as items like Header headerName value value. (These are all implemented as in-memory VolatileItemSpaces, so they should not have huge amounts of data – say 10MB or less.)
‘stream input’(Advanced – not explained here) in a PatternQueryItemSpace providing the intercepted retrieval and mutation Items passed down to the wrapped ItemSpace or map. There are limitations on the overall transformation because the stream input provides one Item at a time for matching. This is used with symbol action ‘union or difference’.
‘with data’Input data embedded in the query definition itself under the WithData class. This can be overridden by standalone Java code that invokes the query.
‘params url parameter’In REST access, the client provides a URL with an optional query string parameter called ‘params’ that is a JSON string. This JSON can be placed in the URL easily by the Python and other REST client code. The JSON is converted to and from items automatically.
‘request parameters’All of the parameters in the REST URL’s query string. These are formatted as Items like RequestParameter paramName value value This might rarely be needed, as the ‘params url parameter’ is more convenient, and can hold any JSON. The paramName and value are strings.
‘local query’(Future feature) The id symbol attribute is used to designate a query to be invoked. Such ids look like ‘interface name’ ‘method name’ and together comprise a ‘query id’. These ids are always kept under the Query class in the database, so all queries that can invoke each other are co-located. (Eventually the id will additionally encode a URL for remote invocation).

The input parameters to the invoked query are in the result of the local query root and are received in the invoked query in the ‘request content’, while the ‘response content’ of the invoked query is returned in the pattern of the local query root of the invoking query. Thus both the input and output parameters can be arbitrarily complex, although they must fit in memory.

The depends_on symbol attribute determines when the query is invoked: each time the depended-on outer symbol changes, another invocation occurs. The query invocations can nest, and they can be transitive. Without a depends_on symbol attribute, only one invocation occurs. The compiler detects and prevents self-reference loops so recursion is not
possible (is this too restrictive?). However, recursion would produce queries that cannot be guaranteed always to finish in reasonable time.
‘sort’(Future feature) a temporary workspace inside the query for ‘stepwise’ execution. The input to the sort is its results and the output is in its pattern, which is intuitively reversed but necessary. Multiple sort ‘spaces’ can be defined. A sort can for example restructure data such as by inversion. A temporary space is allocated (one time, via a ‘VolatileItemSpace’) so unlimited iterations are not possible, but a depends_on attribute can force the space to be reset, minimizing peak size.

Database Inputs and Outputs

(Future feature) In the Server, explicitly named databases can be used as inputs and outputs by means of the ‘database’ kind of symbol. For this, one simply provides the database name in the symbol’s id attribute as a literal. (Note that user access permissions are checked). The database name is not a general expression, but a constant, so it can be determined by the compiler before execution.

query { 
    pattern =my_db AnyInput =any_symbol;
    result AnyOutput =any_symbol;
    Where 'my_db' {
        kind 'database';
        id 'demo/read_only';
        comment 'in the server, this identifies a database, but when standalone,
             it is a "named root map" like a parameter';
     }
}

For a Java program, one adds a bit of code to the above to bind any desired InfinityDBMap to the query:

    // If you started out with an ItemSpace, wrap it using the map API
    InfinityDBMap<Object, Object> myMap = 
                new InfinityDBMap<>(myItemSpace);
    rootMaps.setRootMapByDataBaseName("my root map name", myMap);

Then, the query can input or output to this map. Effectively, the database kind symbol becomes a named parameter for the query. The standalone Java code can discover the set of named databases given the compiled query, or it can just assume that a given name is there, but if it is absent, false is returned by the setter method, and the query cannot be executed.

Logic ‘or’, ‘and’ and ‘not’

The ‘logic’ symbol attribute has default ‘or’, but in the ‘and’ and ‘not’ cases, the inner patterns are visited according to an intersection or negation. The logic uses two adjacent plain symbols, with the outer having the logic symbol attribute set. Let v1 be the value of the outer plain symbol, and v2 the value of the inner plain symbol, then the definitions are:

or for all v1: for all v2: visit the inner plain symbol.
andfor each v1: if for all v2: (v1, v2) exists then visit the inner plain symbol
notfor each v1: if for all v2: (v1, v2) does not exist then visit the inner plain symbol

The ‘not’ logic is apparently self-defeating and should never match, but in case the inner plain symbol is part of a join it is very useful. It is useful to do joins that combine ‘or’, ‘and’ and ‘not’ together, possibly with enable or disable_by_default. Effectively, each value of the outer plain symbol identifies a particular set of values of the inner plain symbol that are unioned, intersected, or excluded to determine the iteration of the inner plain symbol. One use of this is to provide externally-controllable queries that can select by rich data-dependent criteria, such as a user-defined query . For example the join here is on =image_id:

query {
    Where {
        'excluded_images' logic 'not';
        'required_images' logic 'and';
    }
    pattern {
        AllowedImages =allowed_images =image_id;
        ExcludedImages =excluded_images =image_id;
        RequiredImages =required_images =image_id;
    }
    result Image =image_id;
}

When executed against the following database, the AllowedImages, ExcludedImages, and RequiredImages constitute a data-determined definition of the desired set of images from the Images class.

{
    AllowedImages {
        'allowed_image_set_1' 'image2';
        'allowed_image_set_2' 'image1';
    }
    ExcludedImages 'excluded_image_set_1' 'image3';
    Images {
        'image1';
        'image2';
        'image3';
    }
    RequiredImages 'required_image_set_1' {
        'image1';
        'image2';
    }
}

The output is:

Image {
    'image1';
    'image2';
}

Crosses

A general characteristic of PatternQuery optimization is that in some cases, loops must become nested, while in the original pattern they were independent. Only three kinds of symbols represent loops, and can cause multiple iterations: ‘plain symbols’ i.e. symbols of the default kind which iterate over input, symbols of kind ‘series’, and symbols of kind ‘item scanner’. Any case of such increased nesting is called a ‘cross product’ or just a ‘cross’. When crosses occur, it is necessary to pay attention to the possibility of a large increase in execution time and possibly in space as well if the newly nested loops each execute a large number of times. Usually it is obvious to the author of a query whether crosses are significant, based on prior knowledge of the data characteristics or other factors. Unwanted crosses may signal the need to re-evaluate the purpose of the query. However, normally crosses are actually intended, and they are known to be harmless, such as when one of the loops is known to execute at most once or a small number of times. There are two kinds of crosses:

  • Join Crosses
  • Result Crosses

Join Crosses

Join crosses affect only the ‘crossed parts’ of the patterns, where the crossed parts are the unshared parts outer to the join symbol in each pattern. Again, a join symbol is one that occurs in the patterns more than once, and there may be multiple join symbols. The crossing does not relate to parts of the patterns inner to the join symbol occurrences, or to the common prefixes of the joined patterns. This also does not relate to ‘lengthwise’ joins, which are rarer, since they duplicate the symbol within a single pattern item. If a join is not ‘lengthwise’, then it is ‘crosswise’. Often the crossed parts contain no looping symbols – perhaps only literals or expressions – so there is no increase of nesting.

The crossed loops are forced to become nested so that all combinations of the input items can be found that match the patterns. (The fact that these loops are implied rather than explicit, as they are in other programming systems, means that when it is later discovered that a particular loop should actually iterate multiple times because the data is richer than anticipated, the query just does the right thing already. )

Below are two joined pattern Items with some crossed parts. The common prefix is not important, nor are the inner parts, but the crossed parts represent two plain symbols, hence each may loop, and they effectively nest so that all combinations of the values matched by either symbol can be found.

    pattern =common_prefix {
        =crossed_part_1 =join_symbol =inner_part_1;
        =crossed_part_2 =join_symbol =inner_part_2;
    }

Result Crosses

A result cross occurs when the result refers to different patterns in a crosswise way, meaning that the result symbols are not all contained within any individual pattern. In this case, the different patterns are matched with nested loops in order to collect all combinations of input items that can possibly match and generate the result. It is possible for there to be both join crosses and result crosses together. This case is interpreted per-result, meaning that it relates to each result independently. In fact, each result is independent from every other result in any case, and a query is always equivalent to a set of queries each having the same patterns but only one of the various results. The query compiler considers the results independent logically and it optimizes the results independently, but then it tries to execute them in a combined way as much as possible, sharing the loops to avoid re-iterating the input.

Limiting Loops

There are various ways to ensure speed if necessary when crosses occur causing deeper loop nesting or when loops are nested for other reasons. Note that only plain symbols, series symbol kinds and item scanner symbol kinds need to be considered when they are in the crossed part, since only they can loop. Often, one of the loops is known already to execute only a small number of times, typically zero or one time, possibly dependent on the data. Or, the data may actually not be structured as intended and may be possible to clean up. Here are some ways to control loops:

  • Setting the first or last symbol attribute true forces only lowest or highest value to be used. The matching is very efficient, doing a single direct access, so there is no wasted time scanning over unnecessary input.
  • The limit symbol attribute can be set to a numeric long expression, to limit the iterations. When that number of iterations have been done, further iterations are simply skipped. If it is necessary for external code to discover that the limit has taken effect, set the limit one higher, and consider it to have taken effect if it reaches that value.
  • The values, max_values, or min_values symbol attributes can be given a numeric long expression to force the iteration count to be within a range, or else to cause the query to fail with the message given in the on_error string symbol attribute. Using max_values 0 or values 0 can be used to ensure the lack of a match, i.e. that a particular input does not exist.
  • The from, from_exclusive, to, or to_exclusive symbol attributes can restrict the range of values to given expressions. This is very efficient.
  • An equals symbol attribute restricts matches to only the value of the equals expression. This is very efficient.
  • Expressions and literals can match only zero or one times per outer iteration dependent on the input. This is very efficient.
  • A filter kind of symbol matches zero or one times, dependent on the boolean value of its expression symbol attribute , which when false prevents the iteration of inner parts of the pattern. Filter is less efficient than other techniques because when the expression evaluates to false, the work of reaching the filter is not used.
  • The enable symbol attribute takes a boolean expression which controls whether the symbol effectively exists. If false, the matching behaves as though the symbol and all inner parts of the pattern were not present. The disable_by_default symbol attribute takes a boolean expression which when true takes effect like enable false if there are zero matches. Enable is as efficient as filter, but behaves differently for joins.

Of the above, some do ‘direct access’ for very high efficiency and speed. These are the from, from_exclusive, to, to_exclusive, equals, first, last, limit, expressions, and literals. (The database is implemented as a ‘B-Tree’ which is a kind of Index that does direct accesses in O(log(n)) time). They are efficient even when the ‘fanout’ of the symbol is high (see the fanout symbol attribute). On the other hand, the filter symbol kind and enable symbol attribute just disable inwards iteration, so when they ‘fail’, the work of reaching that symbol is not used and backtracking happens immediately.

Joins are an Essential Performance Tool

It may seem that the crosses caused by joins that happen to cause increased loop nesting are a problem, but in fact joins are an essential tool to increase performance. Queries can often be restructured to use additional joins to bring in other data for tremendous performance increases. Often the request content will contain parameters for the query that will have only a limited number of values, often zero or one, so joins on these will always be fast.

Permuted and Simplified Items

Here is a relatively philosophical discussion about the meaning of the data in query results.

The output of a PatternQuery can if desired reduce the amount of information in the individual input Items reaching the database by including in some result Items less than all of the symbols of corresponding pattern Items, but this affects deletion. Without such simplification of Items, the variable parts of Items are simply permuted while literals and expressions may be changed, producing Items with the same meaning, and having a one-to-one correspondence. The database is then maintained in a consistent state over any future mutations.

Collisions

On insertion, the simplified result Items will sometimes collide, but this may be fine and may in fact allow very high performance improvements when the simplified Items are used as later input. For example, the simplified Items can be joined to in other patterns. The simplified Items may take up relatively little space.

However, later deletions will generate the same simplified Items and ‘too much’ will be deleted. For this reason, symbols can be given actions that prevent loss. If the simplified results have an action of ‘union’, then deletions will not occur. Without deletions, the simplified Items will only accumulate, but in many cases this is acceptable, or even desirable. If necessary, the set of simplified Items can be manually deleted and an appropriate PatternQuery can reconstruct them from the source data, so they are refreshed and minimal. In many cases, the simplified Items will actually not collide based on the data characteristics, so there is no issue.

These considerations reflect no inherent limitation of PatternQuery or PatternQueryItemSpace but are logical consequences of storing simplified Items as replications that omit some data rather than permuting it.

Implementing REST with Client Queries

Also see REST APIs with PatternQueries.

A ‘Client Query’ is a PatternQuery that provides a part of a REST API. The client such as a Python program does a REST access, but instead of accessing database data directly, a PatternQuery is transparently executed to input from the request, do the database access, and then output the results. The query can be compiled so quickly that it is not necessary to keep it cached. Root symbols in the query direct its access to the REST request header, request content, response header, response content, and the usual database data. The request content or response content may be a blob with an associated content type for high speed transfer, or else JSON for flexibility.

A query is named with a two-part string tuple, and all queries are stored under the Query class in each database. To do a REST access, one hits a URL like https://infinitydb.com:37411/infinitydb/data/my/database/myinterface/mymethod?action=execute-query The my/database is the two-part database id. The myinterface and mymethod are strings (each double quoted and URL-escaped), and together they are considered the query name. The admin user can determine permissions of access to particular user groups and databases based on query interface names.

Here is an example client query that just retrieves the first entry in a class called ‘Label’.

Query "com.infinitydb.ai" "get_first_label" query {
    Where {
        'first_label' first true;
        'out' kind 'response content';
    }
    pattern Label =first_label;
    result =out first_label =first_label;
}

This query is stored under Query "com.infinitydb.ai" "get_first_label" so the interface name is com.infinitydb.ai and the method name is ‘get_first_label’. It has a symbol ‘out’ that has kind ‘response content’, which is a ‘root symbol’ that directs its output to the response for the REST request. The REST client sees some JSON like { "_first_label" : "my_label" }. (Note that the key “_first_label” includes an initial underscore to flag it as an attribute rather than a general string in JSON . If you really want a string starting with an underscore, add yet another initial underscore and so on).

The PatternQueries that together implement an API are kept in the database itself, so a database becomes a self-describing unit including not only data but also the set of operations that can safely, securely and efficiently be applied to it, according to its particular data structures. This produces a more flexible, future-proof and secure system overall. The internal data structures are hidden behind the API, and can therefore be extended or re-arranged with ‘upgrade’ queries transparently.

One example use case is for JavaScript code on a web page to do an AJAX invocation of the Client Query and put the results on the page. Also the JavaScript can determine the AJAX input parameters based on user input. Any other way to hit a REST URL will work, such as curl, Python code, or Java code, or even a web browser.

Permissions

Access to databases and queries can be restricted to particular users in various ways. In the admin control panel, the admin user can define users, databases, roles, and permissions. Each user can be in any number of roles, and each role can have any number of permissions granted to any number of databases. The permissions can be ‘read’, ‘write’ or ‘query’, and they are independent.

Interface Names and Permission Patterns

Queries have more structure called a ‘pattern’, and future additional kinds will have their own patterns.. A query permission can be like ‘query:interface=com.mydomain’ or ‘query:prefix=com.mybasedomain’. Query interface names are limited to be one or more components separated by dots, each component being one or more lower-case letters, digits, or dashes like ‘com.infinitydb.ai-public’. They must start with a lower case letter. These then are like internet domain names, except that upper case letters are not allowed – in the internet case is ignored anyway. However, the expectation is that the components are specified in the reverse to the way they are in the internet. The names of actual queries in the database are two strings, the first being such an interface, and the second being a ‘method name’, which is unlimited. If the name does not follow this pattern, it cannot be executed. All executable queries have Items like Query "interface" "method" query definition.... The colon may be replaced by a question mark, indicating that if there is a problem with the permission then it is to be ignored and the permission is not granted. This allows future permissions to be backwards compatible with older server software. A role can be given any number of query permissions with different patterns, but read or write only once. It is possible then to name queries such that they usually do not conflict with those written by other authors, so long as the names correspond to internet domains actually under the control of those authors. It is expected that authors will try to avoid changing interface names frequently, so that REST clients can avoid needing to be changed, where the interface names may sometimes be hard-coded. In the future, clients may be able to discover remotely which methods are available on a given interface.

Setters and Getters

In addition, queries can be given Options by the query author, which are like ‘Option setter true’ or ‘Option getter true’. The Option EntityClass is placed under the query’s query attribute. A query permission set by the admin user can look like ‘query:setter,interface=my.domain’, in which case the query can only be executed if it has the setter option true and its interface is exactly my.domain. The same rule applies to the getter option. A permission like query:setter,getter allows executing queries that have either option true. The query author may set either or both options on any query, but the purpose is to allow getter queries to be ‘immutable’, i.e. they are safe to execute without modifying relevant data. Setters generally are expected to possibly modify some relevant data. A getter query may actually modify the database, such as for logging, but from the outside the changes are expected to be invisible and execution may be repeated an arbitrary number of times. If the options are absent, the query is like a ‘private’ one, with respect to permissions that have either option true. Getters can be useful for limiting access to more ‘public’ roles.

The Python infinitydb.access Package

The Python infinitydb.access package includes utilities for working with JSON-formatted InfinityDB data in dicts, including classes for Attribute and EntityClass. Also direct low-level access Item-by-Item or with blobs is possible, bypassing the client query approach if necessary. Using Python you can hit a URL easily to fire a Client Query:

# InfinityDB Client Query example

import infinitydb.access as idb
LABEL_ATTRIBUTE = idb.Attribute('label')
server = idb.InfinityDBAccessor('https://example.com:37411/infinitydb/data',
                             db="ai/label", user='me', password='xxx')

# optional check for connected
try:
    head_result, head_reason = server.head()
    if head_result != 200:
        print('Cannot connect to server: ' + str(head_result) + ' ' + head_reason)
        exit()
except InfinityDBError as e:
    print(e)
    exit()

def read_first_label():
    success, content, response_content_type = server.execute_query(
        ["com.infinitydb.ai", "get_first_label"])
    return content[LABEL_ATTRIBUTE] if success else None

There are also the ‘execute_get_blob_query()’ and ‘execute_put_blob_query()’, which transfer contiguous bytes rather than JSON, and are very fast. They should be used up to tens or perhaps hundreds of MB of blob data. For execute_put_blob_query(), the request content cannot be used, since that is used for the blob itself, so the query’s ‘parameters’ if any are passed in the ‘params url parameter’ root, which uses a single URL parameter called ‘params’ that is escaped JSON. The database can also be accessed directly, bypassing queries entirely if the read or write permissions are granted, by means of get_blob() or put_blob(), but these are not generally as flexible and secure. The database browser uses get_blob for images and other blobs to displayed. In fact if a database URL is accessed directly, without any ‘action’ URL parameter, then the blob is returned from that prefix, or JSON for the suffixes of that prefix if there is no blob there. There is a blob on a given prefix if there is a com.infinitydb.blob Attribute having that prefix. See the infinitydb.accessor Python package itself for more documentation on this and other features. There is also example code elsewhere.