Using PatternQueries

PatternQueries are declarative rather than procedural, using sets of Items as examples of input and output called ‘pattern’ and ‘result’. A query definition is a set of Items, just like any other data in the database. So, query definitions can be operated on and stored by any tools or systems that can work on databases or any other ItemSpace, such as a VolatileItemSpace, which only exists in memory. See PatternQuery Implementation for a discussion of the internals.

Essential Concepts

Patterns and ResultsAre simple sets of Items containing Literals, Symbols, and Expressions. Logically, the database is traversed like a tree i.e. recursively corresponding to the pattern Items. A Symbol matches an Item or a variable part of an Item like a number, a string, or a tuple. If individual symbols occur in multiple pattern Items, joins are formed. The result is also a set of Items that define how the symbol values and expressions using symbol values are to be used to insert or delete newly created Items . If symbols in a result Item occur in multiple pattern Items, cross-products are formed.
Symbol attributesEach symbol can be defined to have a minimum, a maximum, a set of restricted data types, strong type checking, and runtime strict detection of undefined values and more. Symbols can also be ‘named expressions’. Some symbol attributes can be defined as expressions, hence are determined at runtime.
ExpressionsThese follow the standard expression syntax of Java, JavaScript, C, and C++. The data types are the 12 InfinityDB component types, plus an ‘item’ data type, being a series of zero or more components. A sub-type of item is the ‘tuple’, which is a sequence of zero or more of the 10 primitive types. Items and tuples resemble the Python tuple datatype but do not nest. Optional strong typing checks for many execution errors in advance. unary, binary, ternary, n-ary and item-ary operators are supported. There are many extension operators such as regex and text formatting, with more on the way. Expressions may occur in the pattern, in the result, and in symbol attributes.
Stateful SymbolsSome kinds of symbols have ‘statefulness’ and are affected by the logically recursive navigation. These include counters, timers, randoms, and reducers. A reducer allows any expression to be re-applied to its previous value, for extremely powerful iterated or recursive operations in a functional style.
Root SymbolsRoot symbols allow multiple sources and destinations for the pattern input and result output. Items can be matched from different databases, from a REST query input or output, from a session store, and more.
Filter SymbolsThe logically recursive navigation can be controlled by boolean expressions in the pattern to prevent visiting selected subtrees.


In patterns, a symbol is allowed to occur in multiple places, forming a ‘join’. Using joins it is possible to create queries that combine input from different areas in the database, using the most efficient sources. The result can be an enormous performance improvement, similar to using indexes in an RDBMS. However, a Pattern Query effectively includes the decision about the use of such ‘indexes’ or ‘inversions’ to be encoded directly, rather than being hidden inside a query optimizer. This characteristic means that PatternQueries can have extreme speed, given good Item structures. Fast Item structures can be created simply by replicating certain kinds of data, such as by transposing or ‘permuting’ variable parts of Items and storing multiple equivalent Items. These replications replace indexes and are part of the database, navigable and mutable like any other data. The PatternQuery compiler rewrites the pattern items internally to handle joins.

Sample Query

A simple query can be specified using the ‘i text’ syntax. This is a text format that simply translates directly into Items that can then be executed. It is easier to read, and can be embedded into Java code easily. Here we have a query that examines Items beginning with an ‘Aircraft’ class, finding the id’s and models for each aircraft. The =id symbol represents a particular aircraft identifier, and the =model symbol represents the model of that aircraft. The ‘model’ component is of the ‘attribute’ data type. The input of the query is defined by the components after the ‘pattern’ attribute on the pattern line, and the output of the query is the part after the result attribute on the result line, in which the class becomes ‘AircraftModel’, and the =model and =id symbols are reversed. This is a simple ‘inversion’ that produces a re-sorting of all of the aircraft into an additional set of Items in the database. Such an inversion can later be used to find ids given a model with one database access, either by application Java code or other queries.

query {
    pattern Aircraft =id model =model;
    result AircraftModel =model id =id;
    comment 'do a simple inversion of Aircraft';

The contents of the database before the query might be:

Aircraft "747" model 100
Aircraft "747" model 200
Aircraft "757" model 300

And after the query executes, additional Items appear:

Aircraft "747" model 100
Aircraft "747" model 200
Aircraft "757" model 300
AircraftModel 100 id "747"
AircraftModel 200 id "747"
AircraftModel 300 id "747"

If your existing code does not use components of the EntityClass or Attribute data types like Aircraft or model but instead you use constants of some sort, you can just substitute those. (The EntityClass and Attribute data types are two of the 12 standard types, and formatted as text they look like Aircraft or model.) The ‘query’, ‘pattern’ and ‘result’ components are Attributes. If absent, the result defaults to being identical to the pattern. There is often a ‘Where’ EntityClass along with the pattern and result to describe the symbols further. The syntax of the above simply describes two Items for the query definition, which in the standard Item tokenized form look like:

query pattern Aircraft "=id" model "=model"
query result AircraftModel "=model" id "=id"

Application code can rewrite or construct a query in any way desired because the query is actually a set of Items. The text above is translated by iParser into these Items, and it can be re-created and made tidy by iFormatter. The ‘i text’ syntax is trivial. (It is not a real language, and things that are not real are imaginary, so it’s i). Here is how to execute some i text:

String queryIText = "the query i text";
// Create a VolatileItemSpace with the Items. Or, an existing ItemSpace can be used.
ItemSpace querySpace = new IParser(queryItext).parse();
PatternQuery patternQuery = new PatternQuery(querySpace);
// InputSpace and outputSpace are often your database but can be any ItemSpace
patternQuery.execute(inputSpace, outputSpace);

Here is a slightly longer query, where we put the result into ‘SelectedAircraft’, and we limit the range of the model to 200 to 300. The execution requires scanning the entire set of Aircraft and filtering by range on model. We will show a faster version below. This would rely on a similar execution plan compared to an RDBMS as well without indexes.

query {
    Where 'model' {
        from 200;
        to 300;
    pattern Aircraft =id model =model;
    result SelectedAircraft =id;

This query takes four Items , here in tokenized form, and sorted by Item as usual:

query Where "model" from 200
query Where "model" to 300
query pattern Aircraft "=id" model "=model"
query result SelectedAircraft "=id"

Any iText can also be written in a ‘Python style’, even mixed with the ‘Java style’, and both are interchangeably recognized by the compiler. The language is extremely simple:

    Where 'model':
        from 200
        to 300
    pattern Aircraft =id model =model
    result SelectedAircraft =id

The pattern and results can be multi-line like Where, and there can be nesting. To the left of each opening brace or colon is a series of tokens for components to prefix to all of the nested suffixes. The i text format is really just for:

  • Factoring out common prefixes
  • Changing single quotes to double (double quoted strings are allowed as well)
  • Rewriting =symbol to “=symbol” to create standard strings starting with ‘=’
  • Rewriting (expression) to “=expression

Note that a string component starting with ‘=’ is interpreted not as a literal to be matched in the data in the database but instead as a symbol or expression. This means that if a query is constructed or customized by program code at runtime, then those initial ‘=’ must be watched for and just disallowed as user input text for example. If necessary, the string “=’user input‘” can be substituted for user input, but the user input must be backslash quoted as well for the quotes. This only affects the parts of the PatternQuery where symbols and expressions are allowed, and only PatternQueries that are dynamically constructed or modified by Java code.

Here is how this query looks in the InfinityDB Server backend database browser. This tool can be used to create and use PatternQueries via a browser directly on the server for maximum performance. The display is based only on the set of Items designating the query, according to certain universal formatting rules. The browser is unaware that these Items represent anything other than regular data, until they are submitted for execution. It is simple to edit and execute these Items online. As a consequence of the formatting rules the suffixes of the Where class come out arranged in a nice table across the bottom, one row per symbol, and one column per symbol attribute. The pattern Items fill the stretchy box below the green ‘pattern’ attribute, and the result Items are below the ‘result’ attribute. There are no limits to the practical complexity of a PatternQuery in terms of executability or representation. The display of the PatternQuery in the browser expands and packs down automatically to handle any structure.


Now for almost instant performance, we rewrite the query to use the inversion we created above. This requires only changing the pattern line:

query {
    Where 'model' {
        from 200;
        to 300;
    pattern AircraftModel =model id =id;
    result SelectedAircraft =id;


The pattern part can be arbitrarily complex, and can do the equivalent of the relational ‘select’, ‘project’ and ‘Join’, or ‘SPJ’, which is the fundamental set of database operations. Here is an example of a join:

query {
    pattern {
        Aircraft =id model =m maker =mk;
        AircraftMaker =mk country =c;
    result Results maker =mk id =id model =m country =c;

This query joins on the =mk symbol to combine information from two places in the database. The Aircraft Class uses the =id symbol to identify particular kinds of aircraft, but the AircraftMaker Class factors out the common facts about the makers. The results are being inserted under the ‘Results’ Class. (The Results class may need to be cleared first manually, but see stepped queries below which can also do it). PatternQueries can also delete the result Items. The PatternQuery compiler rewrites the pattern Items internally to handle joins in the pattern and to handle a situation called a ‘cross product’ specified by the results.


There can be Java-style expressions within the pattern, result, and some parts of the Where. These expressions allow limiting the pattern matching, or computing on symbol values for inclusion in the results, or determining certain attributes of the symbols. An expression in i-text format is specified inside parentheses. An expression in the pattern is very efficient, causing a restriction of the matching to the value of the expression, even though the expression value may be determined by other symbols in the pattern. The access performance is similar to that of a literal, so there is no redundant iteration and throwing away of candidate matches.

Symbols in expressions use Java syntax, and correspond to the symbols defined in the pattern Items. Expressions are compiled directly into Java Objects where they are interpreted at runtime, without requiring the actual Java compiler. The compilation takes only milliseconds, and a compiled query can be re-used, although there is no way to persist the executable form. The execution speed is much faster than the database accesses for most operations. The expression syntax uses only unary, binary, ternary and n-ary operators. There are many extensions beyond Java. Here is an example result where we format an item using the n-ary format() operator:

    result Results 'Aircraft make %s in %s makes %s model %s'.format(mk, c, id, m);

What Expressions Cannot Do

The expression facility does not provide a general purpose programming language, but instead has ‘functional’ semantics. It is nevertheless very expressive, much more so than SQL for example. This means there are no:

  • variables (but symbols take on values)
  • while, for, if, else or other block statements
  • references or pointers
  • classes or interfaces
  • arrays (except items and tuples)
  • Exceptions (except the ‘undefined’ value from e.g. divide by 0)
  • source code files (other than optional i texts)

What Expressions Can Do

So what is left? A ‘program’ is purely declarative. Instead of having explicit loop statements, so all of the expressions logically execute each time a pattern matches, assigning values to all of the symbols from the data in the matched input Items that are necessary to fire related result Items. So for each match, the symbols have fixed i.e. immutable values, and all expressions perform fixed computations on those values. Effectively, looping happens during the nested pattern matching. For conditionals, instead of if and else, the ‘ternary operator’ is used like (x ? y : z). This has the value y if x is true, otherwise z. These can be nested or chained arbitrarily. Expressions can be given symbol names and referred to in other expressions. All expressions are by default checked using compile-time strong typing to avoid common mistakes. Runtime errors like divide by zero create temporary ‘undefined’ values that can be caught or can cause an Exception. Without strong type checking, runtime type checking and numeric promotion occurs.

Item Values

An additional datatype for the value of an expression, the ‘item’ is added to the basic 12 for the purposes of computing the expression. An item represents an immutable sequence of zero or more components, each being of the basic 12 types. (An item is manipulated in the expression interpreter as a simple array of Objects.) Literal items are parenthetical comma-separated lists. like (x,y,z) so an empty item is just () and a singleton item, i.e. an item of length one is (x). A singleton item value is interchangeable with a single component value for all operators.

Item values are not nestable, and when necessary, they are ‘flattened’, so they can be temporarily ‘unflattened’ while being computed. For example (1,(2,3),4,5) is legal and evaluates to (1,2,3,4,5), and the elements can be any expressions. Also they cannot contain ‘undefined’, or they will either throw an Exception in strict mode or will turn into ‘undefined’ themselves. In fact, the value ‘undefined’ cannot be stored in the database at all, and an item value or a regular encoded Item can not contain it. The ‘undefined’ value is transitory, existing only during expression evaluation.

There is a practical limit on item length, in that the encoded form must fit in the chars of a regular encoded Item when stored in the database, which is always up to 1665 chars long. While being evaluated in an expression, however, items are not limited in length.

The optional compile-type strong type checking takes into account whether any expression value may be an item. (The compile-time type of an expression is a set of 12 booleans, one per basic type, plus a boolean for ‘itemness’. Each of the booleans represents a type that can be the result of the evaluation.)


A ‘tuple’ is an item value or regular encoded Item that happens to contain only primitive elements (primitives exclude the class and attribute types). This is in keeping with the regular mathematical definition of a tuple. Tuple values are not flagged as such while being computed, but the strong compile-time type checking can distinguish them and find many errors early. It is possible to set the data type of a plain symbol to ‘tuple’, and only tuples will be matched by the symbol from the database. (A future feature will allow strong type checking and filtering of items during retrievals to enforce that they contain only a given predetermined subset of the 12 basic types. For example, items of only numbers could be enforced. Currently, a matched item symbol can contain any components and a matched tuple symbol can contain any primitives. )

Quoting Tokens with At Sign

Any value of the 12 basic data types – which are called components – can be written literally in a standard text form called a ‘token’. These conflict sometimes with the Java expression syntax, but they can be embedded in expressions by surrounding them with at signs like @MyEntityClass Bytes(25_A6) myattribute@ and so on. It is usually possible to hoist such tokens outside the expressions into the surrounding syntax where the at signs go away because everything is a token already.


There are some extensions for operators beyond those in Java. These produce ‘undefined’ values for error conditions, which may be caught or ignored or which may be designated to cause an Exception by casting any expression or subexpression to (unstrict) or (strict). Values that are items of length one are called ‘singletons’ and are treated identically to simple components. Any value can be tested in any boolean context without error, and the default values are considered false, as is ‘undefined’. The default values are false, 0, 0.0f, 0.0, Index(0) or zero-length values including “” or (). Boolean contexts are operands of logical operators such as ||, &&, or ?:. Catching ‘undefined’ can be done with (x instanceof undefined ? 0 : x + y) or ((isdefined)x ? x : x + y), or for some uses, (x ? x + y : 0) and so on. Note that plain symbols can be declared selective for a set of chosen types and will skip data of non-selected types found in the database. Constructing items merely requires placing the desired elements in parentheses with commas, so there is no concatenation operator needed, which would tend to be ambiguous. Such item expressions can include elements which are also items, which are flattened at compile time or at runtime. Compile-time i.e. ‘strong’ type checking can be disabled by setting strict false on any symbol, so type checking then relies on runtime tests or type adaptation such as numeric type promotion.

Operator Arities

Operators work differently depending on the number of operands, i.e. their ‘arity’. Some, like get() can have two or three operands. The standard Java operators are all present with the same precedence but are also defined to work over items as well as the basic types as in Java.

Operator ArityDescription
UnaryHave one parameter. These distribute over items, so -(4, 5) is (-4, -5) and (‘hello’, ‘world’).length() is (5, 5). Operators can look like parameterless method invocations, like expr.length(), or can be casts as in (length)expr. Some operators can be expressed optionally with either syntax.
BinaryHave two parameters. These distribute over items on both sides. So 3 * (4, 5) is (12, 15). Also (4, 5) * 3 is (12, 15). Furthermore, (3, 4) * (5, 6) is (15, 24).
TernaryDistribute on left only, so (‘hello’, ‘world’).substring(1,3) is (‘el’,’or’). There are two right parameters, which are non-items, i.e. components i.e. singletons.
N-AryDo not distribute on the left, with a right parameter that evaluates to a single component or item or a variadic parameter list. For example, ‘%s %d’.format(x, y). Also ‘%d %d’.format(x) where x is (1,2) will work. This is a bit like the Python formatter.
Item-Ary Operators like get(index), get(start, end), head(end), tail(start) or size() operate on items for the left parameter. A component as usual is considered to be a singleton item. The right parameters are non-items.

The operators are:

string.format(parameter_ list)N-AryfalseFormat according to the string, with the usual %s and other escapes. This is n-ary, so the parameters may optionally evaluate to a single item, which is used as the parameter list (like Python).
string.simple_date_format(date)BinaryfalseFormat the date according to the SimpleDateFormat string.
string ~= regexSpecialfalseMatch the string against the regular expression, yielding a boolean result. The regex is not surrounded by // as in JavaScript but instead is a normal string in quotes. The regex is a literal, not an expression, which is important to avoid denial of service attacks. As usual, be careful to write safe regexes (which is hard).
(totokens)exprItem-AryfalseCreate a string formatted according to component tokens. Expr may be a component or item. String components are quoted within the result.
(fromtokens)stringUnaryfalseParse a string to create an item based on the standard token formats of the basic types. The parsed string components are expected to be quoted within the given string. The string may be empty.
expr instanceof typeSpecialfalseReturn a boolean determined by expr being of a given type. The type names have no quotes. The types are the 12 basic types class, attribute, string, boolean, double, float, long, date, chars, bytes, bytestring and index. Also the general types number, component, primitive, and meta are recognized. The type ‘item’ is orthogonal to the basic types, returning true only for non-singletons, i.e true unless the length is 1.
(typeof)exprUnaryfalseReturn the single basic type of the expr. If it is a non-singleton, return ‘item’.
(isdefined)exprUnaryfalseReturn a true iff expr is not ‘undefined’.
(strict)expr or (unstrict)exprSpecialfalseSwitch on or off the runtime check for ‘undefined’. A strict expression or subexpression will throw an Exception with an explanatory message if it evaluates to ‘undefined’. All runtime errors cause ‘undefined’ expression values and do nothing else.
(type)exprUnarytrueAny expression may be cast to any type when meaningful.
(string)exprUnarytrueString casting works on all types. All types may be cast from strings, but produce the string ‘(undefined)’ on errors for meaningless text. Each data type and each possible value has a unique text representation.
(boolean)exprUnarytrueAny type may be cast to boolean or used in a boolean context without errors. ‘Undefined’, and default values such as false, 0, 0.0f, 0.0 Index(0) or zero-length values such as “” or () are treated as false. Items of length > 1 are true.
(head)expr or expr.head(endIndex)Item-AryfalseReturn the components of an item before endIndex. On out-of-bounds, return undefined. If endIndex is negative, it designates a position relative to the end of the item. The unary version means head(1)
(tail)expr or expr.tail(startIndex)Item-AryfalseReturn the part of an item at and after startIndex. On out-of-bounds return undefined. If startIndex is negative, it designates a position relative to the end of the item. The unary version means tail(1)
(length)exprUnarytrueThe length of a string, char array, byte array, or bytestring. Other data types return 0. Not like (size)
(size)ItemItem-AryfalseThe number of components in an item. If the left parameter is a component, it has length 1. The content of the item is ignored.
item.get(index)Item-AryfalseGet the component of an item at the given index. If the left parameter is a component, treat it as a singleton item. A negative index is relative to the end of the item.
get(startIndex, endIndex)Item-AryfalseGet, from an item, a sub-item from start to before end. If the left parameter is a component, treat it as a singleton item. The result may be an item of length one, which is a singleton and therefore equivalent to a component, or may be empty or any other length. A negative startIndex or endIndex is relative to the end of the item.
Java unary operators minus (-) and binary not (~)UnarytrueJava syntax. Distribute over the operand, if it is an item.
Arithmetic and bit-wise Java binary operators * / % >> >>> << | &BinarytrueJava syntax. Distribute on left and right over items. Left and right operands may have any length, but one operand must be of length 1 or else the lengths must be equal. The size combinations are thus 1:n, n:1 or n:n where n can be >= 0.
Java comparison i.e. equality or relational operators == != > < >= <=Item-AryfalseJava syntax. For example ‘==’ compares all elements of two items. The items may be of different lengths, but to be equal they must be the same length. A component is considered to be identical to a singleton, i.e. an item of length one. For ‘>’ and so on, first the initial components are compared, and if equal, then longer items are larger.
Java logical operators && || ?: !Item-AryfalseThese are just like Java, but do not distribute, and work on booleans as if the operands are cast to boolean however, a zero-length item is false, and an item longer than 1 is true.


A powerful technique in functional programming which is used in ‘big data’ databases is the ‘reducer’. In PatternQueries, there is a general reducer facility, plus simplified pre-defined reducers like ‘sum’. A reducer has a state which is advanced as the database is logically iterated through during query execution. There is an initial value expression and a recursion expression including a self-reference to get the previous value and to determine the next value. Here is an example totaller (normally a simple sum symbol kind would be used instead):

query {
    pattern =symbol1 =symbol2 =sum;
    result =symbol1 =symbol2 =sum;
    Where 'sum' {
        kind 'reducer';
        from =symbol2;
        equals (symbol2 + sum);
        depends_on 'symbol1';
        comment 'note from starts at symbol2 not 0';

In practice the =symbol1 and =symbol2 would be different, as appropriate to the task and might be part of a more complex pattern. The ‘kind’ attribute declares the ‘sum’ symbol a reducer. The ‘from’ attribute of the ‘sum’ symbol initializes the reducer before the first iteration to be the value of symbol2, then the ‘equals’ attribute contains an expression that advances to the next iteration by adding the previous value given by the ‘sum’ symbol (a self-reference) to the current value of symbol2. The optional ‘depends_on’ attribute causes the summer to be re-initialized each time symbol1 changes value during the database iteration. The symbols at the left in the pattern are in ‘outer’ loops and change more slowly than those to the right, so the depends_on references a symbol to the left. In practice, reducers are used for the most difficult tasks, and pre-defined ones are often used.

Reducer Performance

A note on reducer implementation is necessary, but this can be skipped. Reducers are not as slow or limited as experienced programmers might expect. While a reducer is a form of recursive function definition, reducers execute iteratively in constant time per iteration and with no limit on the number of iterations or database size. On each iteration, the reducer’s self-reference does not cause a recursive invocation of the interpreter and a push of the system stack. Instead, the self-reference is detected at runtime, and the previous reducer value is returned as the value of the self-referencing symbol. This does not use an Exception, but simply uses a boolean flag on the self-referencing symbol object. If the system stack approach were used in the traditional implementation of recursion, the performance would be unacceptable, there would be a practical limit on iteration count, and there would be stack overflows.

Fibonacci Reducer

Here is a reducer that creates the first Fibonacci numbers. We use a ‘series’ kind of symbol at the left or ‘outside loop’ of the pattern to create integers, but we don’t use those numbers in the reducer – they are just for providing a series of iterations to drive the reducer. In other cases, often the source of iterations for outer loops will be database data.

query {
    pattern =n =fib;
    result  =n ('fib(%s)=%s'.format(n,(head)fib));
    Where  {
        'n' {
            kind 'series';
            to 10;
        'fib' {
            from ( (0,1,1) );
            kind 'reducer';
            equals ( ((tail)fib, fib.get(1) + fib.get(2)) );

The state of the reducer is a three-element item, which is a tuple containing fib(n) fib(n+1) and fib(n+2). The ‘from’ attribute initializes this to the item (0,1,1). This produces:

0 "fib(0)=0"
1 "fib(1)=1"
2 "fib(2)=1"
3 "fib(3)=2"
4 "fib(4)=3"
5 "fib(5)=5"
6 "fib(6)=8"
7 "fib(7)=13"
8 "fib(8)=21"
9 "fib(9)=34"
10 "fib(10)=55"

Running Weighted Average Reducer

Here is a more complex reducer that prints the weighted average of the last n values of a series of generated random numbers. It shows some of the real power of reducers. The last n values are kept in an item, which ‘rolls forward’ by means of an item constructor ( ((tail)last_n, random) ). The weighting is done by multiplying two items, last_n and weights_normalized, which is element-by-element. Then the predefined (sum) cast totals that. The (weights_normalized * 0) just creates the initial Item of zeroes because a component times an item distributes the operation over the item elements. The ‘weights’ symbol constructs an item from a comma-separated list or ‘item constructor’, which can contain any expressions and is automatically flattened. The weights,, input and output could be elsewhere, such as in the database.

query {
    pattern =n =random =last_n;
    result  =n ('n=%d random=%2d avg=%.3f'.format(n, random, avg));
    Where  {
        'n' {
            kind 'series';
            to 10;
            comment 'generate series of numbers';
        'random' {
            kind 'random';
            to_exclusive 100;
            depends_on 'n';
            comment 'generate arbitrary input. random is long';
        'last_n' {
            from ( weights_normalized * 0 );
            kind 'reducer';
            equals ( ((tail)last_n, random) );
            comment 'the last n values as an item';
        'weights' equals ((1.0, 2.0, 3.0, 5.0, 6.0));
        'weights_normalized' equals (weights/(sum)weights);
        'avg' equals ((sum)(last_n * weights_normalized));
    description 'running weighted sum of last n values';

The output of one run was:

0 "n=0 random=47 avg=0.000"
1 "n=1 random= 8 avg=2.824"
2 "n=2 random=64 avg=24.941"
3 "n=3 random=17 avg=26.235"
4 "n=4 random=75 avg=43.706"
5 "n=5 random=51 avg=51.059"
6 "n=6 random=52 avg=52.353"
7 "n=7 random=44 avg=49.647"
8 "n=8 random=58 avg=53.000"
9 "n=9 random=63 avg=56.176"
10 "n=10 random= 1 avg=37.353"

Symbol Kinds

Reducers are only one of several kinds of symbols that can be designated with the ‘kind’ attribute in the Where table. Some are ‘stateful’, meaning that as the iteration of the database progresses, their values are updated in some way. Plain symbols, which have no specified kind, are not stateful, but only retrieve data from the database and do not depend on the order of iteration (they may in principle be computed in parallel therefore). They do not need a Where table entry unless non-default symbol attributes are needed.

There is a small set of possible symbol attributes that have similar meanings for the different symbol kinds:

Symbol attribute in WhereValue TypeDescription
kindstringDetermines the purpose of the symbol. If absent, it is a plain symbol and just retrieves data. Otherwise, the symbol can only occur once in the pattern because only plain symbols can join. Plain or other kinds can occur multiple times in expressions and the results.
depends_onstring The name of a symbol whose value changes are monitored to determine when a ‘reset’ happens . The ‘reset’ for some kinds causes the value of the ‘from’ attribute’s expression to be transferred into the symbol’s state, or else a new random or time to be taken.
from, from_exclusiveexpressionThe reset value or starting point or minimum. May also be a symbol or literal. Any data type is allowable depending on the kind and type attribute.
to, to_exclusiveexpressionThe ending point or maximum. May also be a symbol or literal. Any data type is allowable depending on the kind and type attribute.
typeset of stringsOptional set of the 12 data types, or ‘tuple’ to match a series of components of the 10 primitive types or ‘suffix’ to match any series of components to the end of the Item. Also ‘number’ matches ‘double’ ‘float’ or ‘long’ and ‘meta’ matches ‘class’ (i.e. ‘entityclass’) or ‘attribute’. Furthermore, ‘component’ matches any single component i.e. a singleton tuple, and ‘primitive’ is like ‘component’ but excludes ‘class’ or ‘attribute’ and is the default. For plain symbols, these restrict the set of values matched from the input data. Other kinds do not accept multiple types.
equalsexpressionA function to be applied to the current state or for other purposes. When appropriate, this can be a series of expressions, effectively designating a ‘tuple’. Wherever an expression is allowed, a symbol or literal is allowable too.
firstbooleanWhether to use only the initial value iterated. For plain symbols, only the lowest value is retrieved and it takes only one access.
lastboolean Whether to suppress output other than for the final value iterated. For plain symbols, only the highest value is retrieved and it takes only one access. For example, a sum kind outputs only the total.
incrementnumberFor series and counter, the amount to add per iteration, default 1. For plain symbols, the maximum number of inner iterations to ‘generate’ per outer iteration ‘received’, default unlimited.

Below are the kinds. The ‘Plain’ and ‘Root’ kinds are not stateful.

KindKind namesSymbol AttributesDescription
Plain symbols have the default kind, or are not present in the Where table. They simply retrieve data from the database (or from a root – see below). The types may be a set, allowing selective ‘hiding’ of other types. Or, the type can be just ‘item’ in order to match 0 or more of any basic type, or ‘tuple’ in order to match 0 or more primitives.
Compute a sequence of states based on current state and possibly other symbols. ‘from’ is for initialization, and ‘equals’ is for the recursion function, which can be self-referencing.
Timedate ,
time seconds,
time milliseconds, time nanoseconds
Snapshot the current date as a Date or time as a long or double in seconds, milliseconds, or nanoseconds. Setting the type to ‘string’ generates standard date text. A depends_on determines when to snapshot: otherwise the time is snapshotted only once.
secure random
Generate a random number as a long with a given bit length or range, or else a secure random byte array, defaulting to 32 bytes. A depends_on determines when to re-generate: otherwise only one random is generated.
Starting from a given ‘from’ attribute defaulting to 0, add a given increment defaulting to 1 and provide the count as the symbol value. Type defaults to long but may be set to double or float or else it is the type of the from or increment. Each iteration increments the count.
A simple reducer that adds up either the immediately outer component in the pattern, or the equals expression if present. A depended-on symbol causes a reset to the outer component or the from expression.
Generate a series of numbers N (of type double, float, or long). The equals is an optional self-referencing expression giving a function of N and possibly other symbols that determines the symbol value. Either the ‘to’ or ‘to_exclusive’ is required. A series is not dependent on database data except to be reset by a depends_on symbol. The complete series is generated on each iteration of the ‘outer’ part of the pattern to the immediate left.
Root query definition,
stream input,
request header,
request content,
response header,
response content,
Patterns can match data in alternative input sources, and results can output to alternative destinations . The root symbols are outermost in patterns or results in order to specify where to input or output. A given query may input and output to multiple roots. Stream input is for PatternQueryItemSpace, and others are for REST Client Queries.


It is possible to have a query do multiple passes or ‘steps’ over the data. This is just like running multiple queries one after the other. So, the output of earlier steps can be inputs to later steps and temporary work space can be used between them. The Where table is shared between the steps. For example:

query {
    Steps {
        1 {
            pattern Temporary HelloWorld;
            pattern_mode 'delete';
        2 {
            pattern Input HelloWorld =x;
            result Temporary HelloWorld =x;
        3 {
            pattern Temporary HelloWorld =y;
            result Output HelloWorld =y;
    Where 'x' type 'string';

For the step numbers any type is acceptable, and only the ordering is important. In step 1, the result defaults to be a copy of the pattern.

The ‘pattern_mode’ in step 1 causes the Items that match the pattern to be deleted along with their suffixes. This clears out the temporary area. Then step 2 uses that area as a work area. One could delete the work area at the end instead, or do both. By default, the patterns use mode ‘none’ and the results use mode ‘insert’. The pattern_mode can be default, ‘none’ or ‘delete’. The result_mode can be default, ‘none’, ‘insert’, ‘delete’, or ‘delete suffixes’. The PatternQuery execute() method provides parameters that determine the modes for the default cases. Many queries just default the modes, and the execution environment chooses to set the modes, which are usually ‘none’ for the pattern and ‘insert’, ‘delete’ or ‘delete suffixes’ for the results. The modes are available on non-stepped queries as well.

The temporary area can either be a dedicated area in the database or a root map (defined below) of kind ‘temporary’. (Later, a unique work area will be possible to be chosen by putting in a symbol for the time in nanoseconds, or a random number, but currently the time or random is not transferred between steps.

Using temporary work space allows very advanced execution plans that can be very efficient. There are many other uses for the modes as well, such as clearing out the destination of the results so that the results do not simply merge with the pre-existing results. The merging or ‘unioning’ behavior is desirable in many situations, though, and it is the inverse of the deleting mode, so it is symmetrical and a query’s effects may be reversed by executing in delete mode after insert mode. The symmetry is not perfect in case there are collisions in the unioning, and the colliding Items will be deleted while in the delete mode. The symmetry also works by following a delete with an insert in case each deleted Item already exists before the query is executed.

Root Map Symbols

It is sometimes necessary to deal with Items in places other than the database. The pattern Items can begin with special ‘root map symbols’ that designate various other sources, as can the result Items for designating various other destinations. For example, static input data can be included with the query itself, as part of the definition of the query:

     query { 
         pattern =qd DataAlongsideTheQuery =query_data;
         result =query_data;
         Where 'qd' kind 'query definition';
     DataAlongsideTheQuery 'hello world';

This just outputs ‘hello world’. The =qd symbol goes at the start of the pattern Items that are to be redirected. It is defined in the Where ‘qd’ kind ‘query definition’ and can have any name (with Java identifier syntax as usual). If the query is running in standalone mode, i.e. embedded in Java code, then there is some boilerplate code to set this up:

        RootMaps rootMaps = patternQuery.getRootMaps();
        InfinityDBMap<Object, Object> queryMap = 
                new InfinityDBMap<>(querySpace);
        rootMaps.setRootMapByType(RootMaps.Type.QUERY_DEFINITION, queryMap);

Note that the Java code can put this query definition input anywhere it wants.

There are other kinds of root maps for different contexts:

Root Map Symbol kindUsage
‘query definition’associated with the query definition as explained above. In standalone mode, this can be bound explicitly to a map but is usually the same map as the query definition. This data should not be modified dynamically. May contain data as well as the query.
‘database’in the server, this identifies a database like ‘demo/read_only’. Standalone, this can be bound to any map by name, like a named parameter. The name is in the equals symbol attribute as a literal.
‘request header’
‘request content’
‘response header’
‘response content’
‘session data’
in a Client Query executing on behalf of a REST access . These are all implemented as in-memory VolatileItemSpaces, so they should not have huge amounts of data.
‘temporary’is a VolatileItemSpace for temporary work, if referred to in the PatternQuery. This is in-memory. This must be provided explicitly by a standalone query, and there is none for a PatternQueryItemSpace.
‘stream input’in a PatternQueryItemSpace (explained below) providing the intercepted retrieval and mutation Items passed down to the wrapped ItemSpace or map. There are limitations on the overall transformation because the stream input provides one Item at a time for matching.

Database Root Map

In the Server, explicitly named databases can be used as root maps by means of the ‘database’ kind of symbol. For this, one simply provides the database name in the symbol’s equals attribute as a literal. (Note that user access permissions are checked):

query { 
    pattern =my_db AnyData =any_symbol;
    result =any_symbol;
    Where 'my_db' {
        kind 'database';
        equals 'demo/read_only';
        comment 'in the server, this identifies a database, but when standalone,
             it is a "named root map" like a parameter';

In standalone mode, one adds a bit of code to the above to bind any desired InfinityDBMap to the query:

    // If you started out with an ItemSpace, wrap it using the map API
    InfinityDBMap<Object, Object> myMap = 
                new InfinityDBMap<>(myItemSpace);
    rootMaps.setRootMapByDataBaseName("my root map name", myMap);

Then, the query can input or output to this map. Effectively, the database kind symbol becomes a named parameter for the query. The standalone code can discover the set of named databases given the compiled query, or it can just assume that a given name is there. There can be any number of root symbols.


One way to control a query’s execution is by modifying it before compiling. This is a trick that can work for small amounts of input data instead of using root maps. (Many years ago, some assembly language code would do self-modification for speed. This technique was often referred to as the definition of bad programming practice. Maybe you have a way to keep it clean though. Restricting it to testing is a candidate.)

static final Attribute query = new Attribute("query");
static final EntityClass Where = new EntityClass("Where");
static final Attribute from = new Attribute("from");

// Embed the query as iText as a String described above or
// create it as desired. This parsing creates a VolatileItemSpace in memory.
ItemSpace querySpace = new IParser(queryItext).parse();

// Note: if fromValue is a user input String, make sure it does not start with '='
long fromValue = ...;

querySpace.insert(query, Where, "model", from, fromValue);
PatternQuery patternQuery = new PatternQuery(querySpace);
// now compile and execute as described above

The above provides a single value to be added into the query, because the from attribute of a symbol is single-valued. The value may be a multi-component Item if desired, however, not just one component. The from symbol attribute allows expressions, so if it is a String and the String is user input, make sure it does not start with ‘=’ so that it is not interpreted as an expression. Any number or kind of modifications to the query are possible. The symbol names can be considered parameter names, and keeping the modifications limited to working with symbol attributes would be good practice. Be sure that the query is either re-parsed or else the previous from value is deleted before modifying and re-compiling to avoid a multi-valued from. Another good practice would be to wrap the query in a method that does the query modification locally based on the method’s strongly-typed parameters. It not good to modify the iText of the query as one does with dynamic SQL due to security and other considerations. The strong type checking of the compiler will often notice when a modification has produced a from of the wrong type.


In order to upgrade the application so that it continuously maintains the inversion we defined above for AircraftModel we can:

  • change the application’s Java Code, or
  • add or change the application’s PatternQueries by adding a result Item for the AircraftModel Items where the result already includes Aircraft Items, or
  • set up a PatternQueryItemSpace that intercepts the mutations to the database and transparently maintains the inverted forms of the Aircraft Items under AircraftModel. Any set of per-Item transformations such as permutations of Item data, even with joins is possible. The events can be either mutations of the Aircraft Items, the AircraftModel Items, or both.

An elegant way to handle the maintenance of the Item replications is to keep all the PatternQueries in the database itself. Then the application invokes them where they live, directly in the database, and they can be edited or copied from elsewhere in order to keep them in synch with the data structures. The database becomes a self-describing unit, with its structure and the operations it can perform against itself all in one place. The Java code can still contain some hard-coded parts, such as legacy code, as desired.

Alternatively, a PatternQueryItemSpace can transparently extend the hard-coded part or the other PatternQueries, by using a single per-database PatternQuery, whose definition is stored anywhere. The application constructs one PatternQueryItemSpace in order to wrap the database, and then does operations on it instead of on the database directly. In InfinityDB Client/Server, a special per-database PatternQuery is called the system query and is stored under ‘System Query’ (that’s the System EntityClass immediately followed by the Query EntityClass and then the PatternQuery definition). The PatternQuery sees an input stream of individual Items being inserted into or deleted from the wrapped database, and it rewrites them and can optionally join them with database Items to produce a set of output Items that actually reach the wrapped database. If the inputs are deletions, then the outputs are deletions as well.

Performance will not reach the high levels of direct InfinityDB access, of course, but it can be concurrent. There is a SimplePatternQueryItemSpace and an alternative ThreadedPatternQueryItemSpace. Both of them are implemented by wrapping a TriggerItemSpace that in turn wraps the wrapped ItemSpace. The PatternQuery definition affects the setup of the hidden TriggerItemSpace described below so that mostly only the necessary Items are captured and used, and the rest of the action is relatively fast.

For example, a system query could be:

System Query query {
    pattern =si {
        Aircraft =id1 model =model1;
        AircraftModel =model2 id =id2;
    result {
        Aircraft =id2 model =model2;
        AircraftModel =model1 id =id1;
    Where 'si' kind 'stream input';

In the above, when an Item arrives from the stream input matching the Aircraft pattern Item, the AircraftModel result Item fires because it has enough information to do so: the =id1 and =model1 symbols have been bound. Likewise, when the AircraftModel pattern Item matches a stream input Item then the =model2 and =id2 symbols have been bound, so the result Aircraft =id2 model =model2 Item fires. The stream input is the series of individual Items arriving at the PatternQueryItemSpace via the first(), next(), last(), previous(), insert(), delete(), update(), and deleteSubspace() methods. The PatternQueryItemSpace passes these on down to the wrapped ItemSpace and also inserts or deletes the newly generated Items from the fired results to the wrapped ItemSpace.

Simplified Items

The output of the PatternQueryItemSpace or any other PatternQuery can if desired reduce the amount of information in the individual input Items reaching the database by including in some result Items less than all of the symbols of corresponding pattern Items, but this affects deletion. Without such ‘simplification’ of Items, the variable parts of Items are simply ‘permuted’, producing Items with the same meaning.

As an example, consider taking Items that include a ‘country’ value and storing that country value in a separate place in the database without the entire original Items. Then, the set of ‘seen’ countries would accumulate but there would be no way to deduce the original Items from which they came.


On insertion, the ‘simplified’ result Items will sometimes harmlessly collide, but this may be fine, and in fact may allow very high performance improvements when the ‘simplified’ Items are used as later input. For example, the simplified Items can be joined to in other patterns. The simplified Items may take up relatively little space.

However, later deletions will generate the same simplified Items and ‘too much’ will be deleted. For this reason, Stepped queries may be used to separate insertions from deletions. Without deletions, the simplified Items will only accumulate, but in many cases this is acceptable, or even desirable. If necessary, the set of simplified Items can be manually deleted and an appropriate PatternQuery can reconstruct them from the source data, so they are refreshed and minimal. In many cases, the simplified Items will actually not collide based on the data characteristics, so there is no issue.

These considerations reflect no inherent limitation of PatternQuery but are logical consequences of storing different Items representing the same information but that omit some data rather than ‘permuting’ it.


A TriggerItemSpace is a a part of a PatternQueryItemSpace when used in the Server as the System Query, but it can also be used alone by wrapping any ItemSpace, to make the ItemSpace ‘active’ rather than ‘passive’. Any retrieval or mutation operation can be captured and custom Java code can be invoked. Pre-retrieval, post-retrieval, insert, and delete are independently handled, and if one of them is not active, it does not cause delays. A small set of fixed prefixes of various lengths can be detected to trigger different associated actions. The prefixes are independent, so some can be prefixes of others. Matching can be exact or by prefix.

Trigger Implementation

The TriggerItemSpace does a quick hash of a single fixed prefix length of the input Items to determine whether they are candidates to be intercepted, and if so then it compares the necessary associated variable-length prefixes. The length of the hashed prefix part is the minimum of the full prefixes, so if the minimum prefix is long, most operations only incur the hashing overhead, and short Items are not even hashed. The custom code can pass the triggering Item through to the underlying ItemSpace or it can block it and then it can do anything desired. Access is fully concurrent, with no locks or synchronizations. (Lock-free concurrency techniques are used). The trigger code can be replaced, and triggers can be added or removed at any time even with multiple threads active. About a few hundred to a few thousand triggers would be practical at one time. The triggered action code executes without locks, i.e. completely concurrently. Performance of the trigger detection should often be higher than the underlying InfinityDB ItemSpace, so little is lost.

Client Queries

An advanced feature under development is the ‘Client Query’ based on the REST API of the InfinityDB Client/Server version 6.0. The client does a REST access, but instead of accessing database data directly, a PatternQuery is transparently executed to input from the request, do the database access, and then output the results. Root symbols direct the access to the REST request header, request content, response header, response content, session data, temporary and the usual database data, either default or identified by a database kind of symbol.

If desired, the PatternQueries can be kept in the database itself, so a database becomes a self-describing unit including not only data but also the set of operations that can be safely and efficiently applied to it, according to its particular data structures. This should produce a more future-proof and secure system overall.

A typical use would be for JavaScript code on a web page to do an AJAX invocation of the Client Query and put the results on the page. Also the JavaScript can determine the AJAX input parameters based on user input. Any other way to hit a REST URL will work, such as curl, Python code, or Java code, or even a web browser.