Pattern Queries

InfinityDB has a new query feature called Pattern Queries that can be used by applications in place of Java code for rapid development, schema extension, testing, data transformation, database administration, consistency maintenance, ad-hoc user data retrieval and processing and much more. PatternQueries access and transform InfinityDB databases in rich and powerful ways based on simple query definitions. Queries are declarative rather than procedural, using sets of Items as examples of input and output called ‘pattern’ and ‘result’. These contain literals, symbols, and expressions and use a strictly functional execution model. It is only necessary to provide a few such Items instead of many more lines of Java code. This efficiency is vital as queries become more complex. Queries compile almost instantly.

For more on the PatternQuery feature, please contact support@boilerbay.com. Become a beta tester!

Pattern Queries are the basis for further upcoming features. The current release is for Rapid Application Development, but the following features are in progress:

Rapid Application DevelopmentGather, distribute, transform, edit, or delete small or large Item structures with minimal application code. This is like a third intra-JVM database access API, along with ItemSpace and the InfinityDBMap. This is next to be released.
Consistency MaintenanceApplication-transparent capturing and transformation of mutation operations keeps redundant data structures in synch. Applications are simplified and isolated from database structure alterations. A PatternQueryItemSpace wraps any other ItemSpace, including an entire InfinityDB database to provide consistent alternate views of data for the ultimate in performance and data model convenience and flexibility. (Future release)
Client QueriesREST access to predefined named queries stored in a database allows applications to effectively define ‘API’s for client code. A database acts like an object accessed by clients through PatternQueries that act like methods. The methods interact with the database in a custom way, hiding database structure and data, future-proofing the upgrade path, and increasing reliability and security. (InfinityDB Client/Server future release)
Ad-Hoc queryUsers can interactively create, edit, execute, save, retrieve, and share simple but powerful interactive queries to find and transform data. For example, developers can test and perfect queries for embedding in applications. (InfinityDB Client/Server future release). Also, client programs who have write access to a database can store a PatternQuery in it and execute it in order to perform complex operations local to the server at high speed. Permission to execute queries is separate from the read and write permissions.

Query Definitions

A query definition is a set of Items, just like any other data in the database. So, query definitions can be operated on and stored by any tools or systems that can work on databases or any other ItemSpace. An alternative text form called i text can be used by applications for convenience in embedding in Java code. Here are the essential features:

Patterns and ResultsAre simple sets of Items containing Literals, Symbols, and Expressions. Logically, the database is traversed like a tree i.e. recursively corresponding to the pattern Items. A Symbol matches an Item or a variable part of an Item like a number, a string, or a tuple. If individual symbols occur in multiple pattern Items, joins are formed. If symbols in a result Item occur in multiple pattern Items, cross-products are formed.
Symbol attributesEach symbol can be defined to have a minimum, a maximum, a set of restricted data types, strong type checking, and runtime strict detection of undefined values and more. Symbols can also be ‘named expressions’. Some symbol attributes can be defined as expressions, hence are determined at runtime.
ExpressionsThese follow the standard expression syntax of Java, JavaScript, C, and C++. The data types are the 12 InfinityDB component types, plus an ‘item’ data type, being zero or more components. A sub-type of item is the ‘tuple’, which is zero or more of the 10 primitive types. Items and tuples resemble the Python tuple datatype but do not nest. Optional strong typing checks for many execution errors in advance. unary, binary and ternary operators are supported. There are many extension operators such as regex and text formatting, with more on the way. Expressions may occur in the pattern, in the results, and more.
Stateful SymbolsSome kinds of symbols have ‘statefulness’ and are affected by the logically recursive navigation. These include counters, timers, randoms, and reducers. A reducer allows any expression to be re-applied to its previous value, for extremely powerful iterated or recursive operations in a functional style.
Root SymbolsRoot symbols allow multiple sources and destinations for the pattern input and result output. Items can be matched from different databases, from a REST query input or output, from a session store, and more.
Filter SymbolsThe logically recursive navigation can be controlled by boolean expressions in the pattern to prevent visiting selected subtrees.

Execution Performance

The logically recursive navigation of the pattern tree does not correspond necessarily to physical access, as there are re-writes of the pattern for semantics and speed. Each navigation recursion is efficient, not simply following the structure blindly, but instead doing direct efficient b-tree index accesses at each step. So iterating over an ‘outer’ symbol, i.e. data found nearer the tree root, does not require visiting all of the contents of a subtree, i.e. the ‘inner’ symbols, Item-by-Item. Instead, each match of each outer symbol requires only one database access, hence performance can sometimes be orders of magnitude higher than simple depth-first recursion, depending on the size of each inner tree, which may be very large. There is a further optimization for ‘tail matching’, in which a portion of a pattern Item at the end contains ‘wildcard’ symbols and literals. The initial series of literals and expressions in the patterns allow direct, fast access. Also, restricting the matching of a symbol to a minimum or maximum value has no performance impact, being as fast as access based on equality.

Joins

In patterns, a symbol is allowed to occur in multiple places, forming a ‘join’. Using joins it is possible to create queries that combine input from different areas in the database, using the most efficient sources. The result can be an enormous performance improvement, similar to using indexes in an RDBMS. However, a Pattern Query effectively includes the decision about the use of such ‘indexes’ or ‘inversions’ to be encoded directly, rather than being hidden inside a query optimizer. This characteristic means that PatternQueries can have extreme speed, given good Item structures. Fast Item structures can be created simply by replicating certain kinds of data, such as by transposing or ‘permuting’ variable parts of Items and storing multiple equivalent Items. These replications replace indexes and are part of the database, navigable and mutable like any other data. The PatternQuery compiler rewrites the pattern items internally to handle joins.

Unlike SQL query compilers, the PatternQuery compiler does not need to try to guess which indexes and join orders or join techniques are best, so compiling always takes only milliseconds or less, producing an efficient plan each time. An RDBMS query compiler has far more to do, and as queries get more complex, the join count increases, compilation performance degrades tremendously, and optimal plans are seldom reached. In RDBMS, the proper set of indexes must be determined by guesswork or experimentation with all required queries, and indexes must be built and dropped from time to time in different situations.

Sample Query

A simple query can be specified using the ‘i text’ syntax. This is a text format that simply translates directly into Items that can then be executed. It is easier to read, and can be embedded into Java code easily. Here we have a query that examines Items beginning with an ‘Aircraft’ class, finding the id’s and models for each aircraft. The =id symbol represents a particular aircraft identifier, and the =model symbol represents the model of that aircraft. The ‘model’ component is of the ‘attribute’ data type. The input of the query is defined by the components after the ‘pattern’ attribute on the pattern line, and the output of the query is the part after the result attribute on the result line, in which the class becomes ‘AircraftModel’, and the =model and =id symbols are reversed. This is a simple ‘inversion’ that produces a re-sorting of all of the aircraft into an additional set of Items in the database. Such an inversion can later be used to find ids given a model with one database access, either by application Java code or other queries.

query {
    pattern Aircraft =id model =model;
    result AircraftModel =model id =id;
    comment 'do a simple inversion of Aircraft';
}

The contents of the database before the query might be:

Aircraft "747" model 100
Aircraft "747" model 200
Aircraft "757" model 300

And after the query executes, additional Items appear:

Aircraft "747" model 100
Aircraft "747" model 200
Aircraft "757" model 300
AircraftModel 100 id "747"
AircraftModel 200 id "747"
AircraftModel 300 id "747"

If your existing code does not use components of the EntityClass or Attribute data types like Aircraft or model but instead you use constants of some sort, you can just substitute those. (The EntityClass and Attribute data types are two of the 12 standard types, and formatted as text they look like Aircraft or model.) The ‘query’, ‘pattern’ and ‘result’ components are Attributes. If absent, the result defaults to being identical to the pattern. There is often a ‘Where’ EntityClass along with the pattern and result to describe the symbols further. The syntax of the above simply describes two Items for the query definition, which in the standard Item tokenized form look like:

query pattern Aircraft "=id" model "=model"
query result AircraftModel "=model" id "=id"

Application code can rewrite or construct a query in any way desired because the query is actually a set of Items. The text above is translated by iParser into these Items, and it can be re-created and made tidy by iFormatter. The ‘i text’ syntax is trivial. (It is not a real language, and things that are not real are imaginary, so it’s i). Here is how to execute some i text:

String queryIText = "the query i text";
// Create a VolatileItemSpace with the Items. Or, an existing ItemSpace can be used.
ItemSpace querySpace = new IParser(queryItext).parse();
PatternQuery patternQuery = new PatternQuery(querySpace);
patternQuery.compile();
// InputSpace and outputSpace are often your database but can be any ItemSpace
patternQuery.execute(inputSpace, outputSpace);

Here is a slightly longer query, where we put the result into ‘SelectedAircraft’, and we limit the range of the model to 200 to 300. The execution requires scanning the entire set of Aircraft and filtering by range on model. We will show a faster version below. This would be the same execution plan with an RDBMS as well without indexes.

query {
    Where 'model' {
        from 200;
        to 300;
    }
    pattern Aircraft =id model =model;
    result SelectedAircraft =id;
}

This query takes four Items , here in tokenized form, and sorted by Item as usual:

query Where "model" from 200
query Where "model" to 300
query pattern Aircraft "=id" model "=model"
query result SelectedAircraft "=id"

Any iText can also be written in a ‘Python style’, even mixed with the ‘Java style’, and both are interchangeably recognized by the compiler. The language is extremely simple:

query:
    Where 'model':
        from 200
        to 300
    pattern Aircraft =id model =model
    result SelectedAircraft =id

The pattern and results can be multi-line like Where, and there can be nesting. To the left of each opening brace or colon is a series of tokens for components to prefix to all of the nested suffixes. The i text format is really just for:

  • Factoring out common prefixes
  • Changing single quotes to double (double quoted strings are allowed as well)
  • Rewriting =symbol to “=symbol” to create standard strings starting with ‘=’
  • Rewriting (expression) to “=expression

Note that a string component starting with ‘=’ is interpreted not as a literal to be matched in the data in the database but instead as a symbol or expression. This means that if a query is constructed or customized by program code at runtime, then those initial ‘=’ must be watched for and just disallowed as user input text for example. If necessary, the string “=’user input‘” can be substituted for user input, but the user input must be backslash quoted as well for the quotes. This only affects the parts of the PatternQuery where symbols and expressions are allowed.

Here is how this query looks in the InfinityDB Server backend database browser. This tool can be used to create and use PatternQueries via a browser directly on the server for maximum performance. The display is based only on the set of Items designating the query, according to certain universal formatting rules. The browser is unaware that these Items represent anything other than regular data, until they are submitted for execution. It is simple to edit and execute these Items online. As a consequence of the formatting rules the suffixes of the Where class come out arranged in a nice table across the bottom, one row per symbol, and one column per symbol attribute. The pattern Items fill the stretchy box below the green ‘pattern’ attribute, and the result Items are below the ‘result’ attribute. There are no limits to the practical complexity of a PatternQuery in terms of executability or representation. The display of the PatternQuery in the browser expands and packs down automatically to handle any structure.

Query

Now for almost instant performance, we rewrite the query to use the inversion we created above. This requires only changing the pattern line:

query {
    Where 'model' {
        from 200;
        to 300;
    }
    pattern AircraftModel =model id =id;
    result SelectedAircraft =id;
}

Joins

The pattern part can be arbitrarily complex, and can do the equivalent of the relational ‘select’, ‘project’ and ‘Join’, or ‘SPJ’, which is the fundamental set of database operations. Here is an example of a join:

query {
    pattern {
        Aircraft =id model =m maker =mk;
        AircraftMaker =mk country =c;
    }
    result Results maker =mk id =id model =m country =c;
}

This query joins on the =mk symbol to combine information from two places in the database. The Aircraft Class uses the =id symbol to identify particular kinds of aircraft, but the AircraftMaker Class factors out the common facts about the makers. The results are being inserted under the ‘Results’ Class. (The Results class may need to be cleared first manually, but see stepped queries below which can also do it). PatternQueries can also delete the result Items. The PatternQuery compiler rewrites the pattern Items internally to handle joins in the pattern and to handle a situation called a ‘cross product’ specified by the results.

Expressions

There can be Java-style expressions within the pattern, result, and some parts of the Where. These expressions allow limiting the pattern matching, or computing on symbol values for inclusion in the results, or determining certain attributes of the symbols. An expression in i-text format is specified inside parentheses. An expression in the pattern is very efficient, causing a restriction of the matching to the value of the expression, even though the expression value may be determined by other symbols in the pattern. The access performance is similar to that of a literal, so there is no redundant iteration and throwing away of candidate matches.

Symbols in expressions use Java syntax, and correspond to the symbols defined in the pattern Items. Expressions are compiled directly into Java Objects where they are interpreted at runtime, without requiring the actual Java compiler. The compilation takes only milliseconds, and a compiled query can be re-used, although there is no way to persist the executable form. The execution speed is much faster than the database accesses for most operations. The expression syntax uses only unary, binary, ternary and n-ary operators. There are many extensions beyond Java. Here is an example result where we format an item using the n-ary format() operator:

    result Results 'Aircraft make %s in %s makes %s model %s'.format(mk, c, id, m);

What Expressions Cannot Do

The expression facility does not provide a general purpose programming language, but instead has ‘functional’ semantics. It is nevertheless very expressive, much more so than SQL for example. This means there are no:

  • variables (but symbols take on values)
  • while, for, if, else or other block statements
  • references or pointers
  • methods or functions
  • argument lists
  • classes or interfaces
  • arrays (except items and tuples)
  • Exceptions (except the ‘undefined’ value from e.g. divide by 0)
  • source code files (other than optional i texts)

What Expressions Can Do

So what is left? A ‘program’ is purely declarative. Instead of having explicit loop statements, so all of the expressions logically execute each time a pattern matches, assigning values to all of the symbols from the data in the matched input Items. So for each match, the symbols have fixed i.e. immutable values, and all expressions perform fixed computations on those values. Effectively, looping happens during the nested pattern matching. For conditionals, instead of if and else, the ‘ternary operator’ is used like (x ? y : z). This has the value y if x is true, otherwise z. These can be nested or chained arbitrarily. Expressions can be given symbol names and referred to in other expressions. All expressions are by default checked using compile-time strong typing to avoid common mistakes. Runtime errors like divide by zero create temporary ‘undefined’ values that can be caught or can cause an Exception. Without strong type checking, runtime type checking and numeric promotion occurs.

Item Values

An additional datatype for the value of an expression, the ‘item’ is added to the basic 12 for the purposes of computing the expression. An item represents an immutable sequence of zero or more components, each being of the basic 12 types. (An item is manipulated in the expression interpreter as a simple array of Objects.) Literal items are parenthetical comma-separated lists. like (x,y,z) so an empty item is just () and a singleton item, i.e. an item of length one is (x). A singleton item value is interchangeable with a single component value for all operators.

Item values are not nestable, and when necessary, they are ‘flattened’, so they can be temporarily ‘unflattened’ while being computed. For example (1,(2,3),4,5) is legal and evaluates to (1,2,3,4,5), and the elements can be any expressions. Also they cannot contain ‘undefined’, or they will either throw an Exception in strict mode or will turn into ‘undefined’ themselves. In fact, the value ‘undefined’ cannot be stored in the database at all, and an item value or a regular encoded Item can not contain it. The ‘undefined’ value is transitory, existing only during expression evaluation.

There is a practical limit on item length, in that the encoded form must fit in the chars of a regular encoded Item when stored in the database, which is always up to 1665 chars long. While being evaluated in an expression, however, items are not limited in length.

The optional compile-type strong type checking takes into account whether any expression value may be an item. (The compile-time type of an expression is a set of 12 booleans, one per basic type, plus a boolean for ‘itemness’. Each of the booleans represents a type that can be the result of the evaluation.)

Tuples

A ‘tuple’ is an item value or regular encoded Item that happens to contain only primitive elements (primitives exclude the class and attribute types). This is in keeping with the regular mathematical definition of a tuple. Tuple values are not flagged as such while being computed, but the strong compile-time type checking can distinguish them and find many errors early. It is possible to set the data type of a plain symbol to ‘tuple’, and only tuples will be matched by the symbol from the database. (A future feature will allow strong type checking and filtering of items during retrievals to enforce that they contain only a given predetermined subset of the 12 basic types. For example, items of only numbers could be enforced. Currently, a matched item symbol can contain any components and a matched tuple symbol can contain any primitives. )

Quoting Tokens with At Sign

Any value of the 12 basic data types – which are called components – can be written literally in a standard text form called a ‘token’. These conflict sometimes with the Java expression syntax, but they can be embedded in expressions by surrounding them with at signs like @MyEntityClass Bytes(25_A6) myattribute@ and so on. It is usually possible to hoist such tokens outside the expressions into the surrounding syntax where the at signs go away because everything is a token already.

Operators

There are some extensions for operators beyond those in Java. These produce ‘undefined’ values for error conditions, which may be caught or ignored or which may be designated to cause an Exception by casting any expression or subexpression to (unstrict) or (strict). Values that are items of length one are called ‘singletons’ and are treated identically to simple components. Any value can be tested in any boolean context without error, and the default values are considered false, as is ‘undefined’. The default values are false, 0, 0.0f, 0.0, Index(0) or zero-length values including “” or (). Boolean contexts are operands of logical operators such as ||, &&, or ?:. Catching ‘undefined’ can be done with (x instanceof undefined ? 0 : x + y) or ((isdefined)x ? x : x + y), or for some uses, (x ? x + y : 0) and so on. Note that plain symbols can be declared selective for a set of chosen types and will skip data of non-selected types found in the database. Constructing items merely requires placing the desired elements in parentheses with commas, so there is no concatenation operator needed, which would tend to be ambiguous. Such item expressions can include elements which are also items, which are flattened at compile time or at runtime. Compile-time i.e. ‘strong’ type checking can be disabled by setting strict false on any symbol, so type checking then relies on runtime tests or type adaptation such as numeric type promotion.

Operator Arities

Operators work differently depending on the number of operands, i.e. their ‘arity’. Some, like get() can have two or three operands. The standard Java operators are all present with the same precedence but are also defined to work over items as well as the basic types as in Java.

Operator ArityDescription
UnaryHave one parameter. These distribute over items, so -(4, 5) is (-4, -5) and (‘hello’, ‘world’).length() is (5, 5). Operators can look like parameterless method invocations, like expr.length(), or can be casts as in (length)expr. Some operators can be expressed optionally with either syntax.
BinaryHave two parameters. These distribute over items on both sides. So 3 * (4, 5) is (12, 15). Also (4, 5) * 3 is (12, 15). Furthermore, (3, 4) * (5, 6) is (15, 24).
TernaryDistribute on left only, so (‘hello’, ‘world’).substring(1,3) is (‘el’,’or’). There are two right parameters, which are non-items, i.e. components i.e. singletons.
N-AryDo not distribute on the left, with a right parameter that evaluates to a single component or item or a variadic parameter list. For example, ‘%s %d’.format(x, y). Also ‘%d %d’.format(x) where x is (1,2) will work. This is a bit like the Python formatter.
Item-Ary Operators like get(index), get(start, end), head(end), tail(start) or size() operate on items for the left parameter. A component as usual is considered to be a singleton item. The right parameters are non-items.

The operators are:

OperatorArityDistributiveOperation
string.format(parameter_ list)N-AryfalseFormat according to the string, with the usual %s and other escapes. This is n-ary, so the parameters may optionally evaluate to a single item, which is used as the parameter list (like Python).
string.simple_date_format(date)BinaryfalseFormat the date according to the SimpleDateFormat string.
string ~= regexSpecialfalseMatch the string against the regular expression, yielding a boolean result. The regex is not surrounded by // as in JavaScript but instead is a normal string in quotes. The regex is a literal, not an expression, which is important to avoid denial of service attacks. As usual, be careful to write safe regexes (which is hard).
(totokens)exprItem-AryfalseCreate a string formatted according to component tokens. Expr may be a component or item. String components are quoted within the result.
(fromtokens)stringUnaryfalseParse a string to create an item based on the standard token formats of the basic types. The parsed string components are expected to be quoted within the given string. The string may be empty.
expr instanceof typeSpecialfalseReturn a boolean determined by expr being of a given type. The type names have no quotes. The types are the 12 basic types class, attribute, string, boolean, double, float, long, date, chars, bytes, bytestring and index. Also the general types number, component, primitive, and meta are recognized. The type ‘item’ is orthogonal to the basic types, returning true only for non-singletons, i.e true unless the length is 1.
(typeof)exprUnaryfalseReturn the single basic type of the expr. If it is a non-singleton, return ‘item’.
(isdefined)exprUnaryfalseReturn a true iff expr is not ‘undefined’.
(strict)expr or (unstrict)exprSpecialfalseSwitch on or off the runtime check for ‘undefined’. A strict expression or subexpression will throw an Exception if it evaluates to ‘undefined’. All runtime errors cause ‘undefined’ expression values and do nothing else.
(type)exprUnarytrueAny expression may be cast to any type when meaningful.
(string)exprUnarytrueString casting works on all types. All types may be cast from strings, but produce the string ‘(undefined)’ on errors for meaningless text. Each data type and each possible value has a unique text representation.
(boolean)exprUnarytrueAny type may be cast to boolean or used in a boolean context without errors. ‘Undefined’, and default values such as false, 0, 0.0f, 0.0 Index(0) or zero-length values such as “” or () are treated as false.
(head)expr or expr.head(endIndex)Item-AryfalseReturn the components of an item before endIndex. On out-of-bounds, return undefined. If endIndex is negative, it designates a position relative to the end of the item. The unary version means head(1)
(tail)expr or expr.tail(startIndex)Item-AryfalseReturn the part of an item at and after startIndex. On out-of-bounds return undefined. If startIndex is negative, it designates a position relative to the end of the item. The unary version means tail(1)
(length)exprUnarytrueThe length of a string, char array, byte array, or bytestring. Other data types return 0. Not like (size)
(size)ItemItem-AryfalseThe number of components in an item. If the left parameter is a component, it has length 1. The content of the item is ignored.
item.get(index)Item-AryfalseGet the component of an item at the given index. If the left parameter is a component, treat it as a singleton item. A negative index is relative to the end of the item.
get(startIndex, endIndex)Item-AryfalseGet, from an item, a sub-item from start to before end. If the left parameter is a component, treat it as a singleton item. The result may be an item of length one, which is a singleton and therefore equivalent to a component, or may be empty or any other length. A negative startIndex or endIndex is relative to the end of the item.
Java unary operators like minus (-) binary not (~)UnarytrueJava syntax. Distribute over the operand, if it is an item.
Arithmetic Java binary operators like ‘*’ ‘/’ ‘%’ ‘>>’ ‘|’BinarytrueJava syntax. Distribute on left and right over items. Left and right operands may have any length, but one operand must be of length 1 or else the lengths must be equal. The size combinations are thus 1:n, n:1 or n:n where n can be >= 0.
Java comparison i.e. equality or relational operatorsItem-AryfalseJava syntax. For example ‘==’ compares all elements of two items. The items may be of different lengths, but to be equal they must be the same length. A component is considered to be identical to a singleton, i.e. an item of length one. For ‘>’ and so on, first the initial components are compared, and if equal, then longer items are larger.
Java logical operators like ‘&&’ ‘||’ or ?:. and logical not (!)Item-AryfalseThese are just like Java, but do not distribute, and work on booleans as if the operands are cast to boolean however, a zero-length item is false, and an item longer than 1 is true.

Reducers

A powerful technique in functional programming which is used in ‘big data’ databases is the ‘reducer’. In PatternQueries, there is a general reducer facility, plus simplified pre-defined reducers like ‘sum’. A reducer has a state which is advanced as the database is logically iterated through during query execution. There is an initial value expression and a recursion expression including a self-reference to get the previous value and to determine the next value. Here is an example totaller (normally a simple sum symbol kind would be used instead):

query {
    pattern =symbol1 =symbol2 =sum;
    result =symbol1 =symbol2 =sum;
    Where 'sum' {
        kind 'reducer';
        from =symbol2;
        equals (symbol2 + sum);
        depends_on 'symbol1';
        comment 'note from starts at symbol2 not 0';
    }
}

In practice the =symbol1 and =symbol2 would be different, as appropriate to the task and might be part of a more complex pattern. The ‘kind’ attribute declares the ‘sum’ symbol a reducer. The ‘from’ attribute of the ‘sum’ symbol initializes the reducer before the first iteration to be the value of symbol2, then the ‘equals’ attribute contains an expression that advances to the next iteration by adding the previous value given by the ‘sum’ symbol (a self-reference) to the current value of symbol2. The optional ‘depends_on’ attribute causes the summer to be re-initialized each time symbol1 changes value during the database iteration. The symbols at the left in the pattern are in ‘outer’ loops and change more slowly than those to the right, so the depends_on references a symbol to the left. In practice, reducers are used for the most difficult tasks, and pre-defined ones are often used.

Fibonacci Reducer

Here is a reducer that creates the first Fibonacci numbers. We use a ‘series’ kind of symbol at the left or ‘outside loop’ of the pattern to create integers, but we don’t use those numbers in the reducer – they are just for providing a series of iterations to drive the reducer. Often the source of iterations for the outer loops will be database data.

query {
    pattern =n =fib;
    result  =n ('fib(%s)=%s'.format(n,(head)fib));
    Where  {
        'n' {
            kind 'series';
            to 10;
        }
        'fib' {
            from ( (0,1,1) );
            kind 'reducer';
            equals ( ((tail)fib, fib.get(1) + fib.get(2)) );
        }
    }
}

The state of the reducer is a three-element item, which is a tuple containing fib(n) fib(n+1) and fib(n+2). The ‘from’ attribute initializes this to the item (0,1,1). This produces:

0 "fib(0)=0"
1 "fib(1)=1"
2 "fib(2)=1"
3 "fib(3)=2"
4 "fib(4)=3"
5 "fib(5)=5"
6 "fib(6)=8"
7 "fib(7)=13"
8 "fib(8)=21"
9 "fib(9)=34"
10 "fib(10)=55"

Running Weighted Average

Here is a more complex reducer that prints the weighted average of the last n input values. It shows some of the real power of reducers. The last n values are kept in an item, which ‘rolls forward’ by means of an item constructor ( ((tail)last_n, random) ). The weighting is done by multiplying two items, last_n and weights_normalized, which is element-by-element. Then the predefined (sum) cast totals that. The (weights_normalized * 0) just creates the initial Item of zeroes because a component times an item distributes the operation over the item elements. The ‘weights’ symbol constructs an item from a comma-separated list or ‘item constructor’, which can contain any expressions and is automatically flattened. The weights,, input and output could be elsewhere, such as in the database.

query {
    pattern =n =random =last_n;
    result  ('n=%d random=%2d avg=%.3f'.format(n, random, avg));
    Where  {
        'n' {
            kind 'series';
            to 10;
            comment 'generate series of numbers';
        }
        'random' {
            kind 'random';
            to_exclusive 100;
            depends_on 'n';
            comment 'generate arbitrary input. random is long';
        }
        'last_n' {
            from ( weights_normalized * 0 );
            kind 'reducer';
            equals ( ((tail)last_n, random) );
            comment 'the last n values as an item';
        }
        'weights' equals ((1.0, 2.0, 3.0, 5.0, 6.0));
        'weights_normalized' equals (weights/(sum)weights);
        'avg' equals ((sum)(last_n * weights_normalized));
    }
    description 'running weighted sum of last n values';
}

The output of one run was:

0 "n=0 random=47 avg=0.000"
1 "n=1 random= 8 avg=2.824"
2 "n=2 random=64 avg=24.941"
3 "n=3 random=17 avg=26.235"
4 "n=4 random=75 avg=43.706"
5 "n=5 random=51 avg=51.059"
6 "n=6 random=52 avg=52.353"
7 "n=7 random=44 avg=49.647"
8 "n=8 random=58 avg=53.000"
9 "n=9 random=63 avg=56.176"
10 "n=10 random= 1 avg=37.353"

Symbol Kinds

Reducers are only one of several kinds of symbols that can be designated with the ‘kind’ attribute in the Where table. Some are ‘stateful’, meaning that as the iteration of the database progresses, their values are updated in some way. Plain symbols, which have no specified kind, are not stateful, but only retrieve data from the database and do not depend on the order of iteration (they may in principle be computed in parallel therefore). They do not need a Where table entry unless non-default symbol attributes are needed.

There is a small set of possible symbol attributes that have similar meanings for the different symbol kinds:

Symbol attribute in WhereValue TypeDescription
kindstringDetermines the purpose of the symbol. If absent, it is a plain symbol and just retrieves data. Otherwise, the symbol can only occur once in the pattern because only plain symbols can join. Plain or other kinds can occur multiple times in expressions and the results.
depends_onstring The name of a symbol whose value changes are monitored to determine when a ‘reset’ happens . The ‘reset’ for some kinds causes the value of the ‘from’ attribute’s expression to be transferred into the symbol’s state, or else a new random or time to be taken.
from, from_exclusiveexpressionThe reset value or starting point or minimum. May also be a symbol or literal. Any data type is allowable depending on the kind and type attribute.
to, to_exclusiveexpressionThe ending point or maximum. May also be a symbol or literal. Any data type is allowable depending on the kind and type attribute.
typeset of stringsOptional set of the 12 data types, or ‘tuple’ to match a series of components of the 10 primitive types or ‘suffix’ to match any series of components to the end of the Item. Also ‘number’ matches ‘double’ ‘float’ or ‘long’ and ‘meta’ matches ‘class’ (i.e. ‘entityclass’) or ‘attribute’. Furthermore, ‘component’ matches any single component i.e. a singleton tuple, and ‘primitive’ is like ‘component’ but excludes ‘class’ or ‘attribute’ and is the default. For plain symbols, these restrict the set of values matched from the input data. Other kinds do not accept multiple types.
equalsexpressionA function to be applied to the current state or for other purposes. When appropriate, this can be a series of expressions, effectively designating a ‘tuple’. Wherever an expression is allowed, a symbol or literal is allowable too.
firstbooleanWhether to use only the initial value iterated. For plain symbols, only the lowest value is retrieved and it takes only one access.
lastboolean Whether to suppress output other than for the final value iterated. For plain symbols, only the highest value is retrieved and it takes only one access. For example, a sum kind outputs only the total.
incrementnumberFor series and counter, default 1.

Below are the kinds. The ‘Plain’ and ‘Root’ kinds are not stateful.

KindKind namesSymbol AttributesDescription
Plain(none)type
from
from_exclusive
to
to_exclusive
equals
first
last
Plain symbols have the default kind, or are not present in the Where table. They simply retrieve data from the database (or from a root – see below). The types may be a set, allowing selective ‘hiding’ of other types. Or, the type can be just ‘item’ in order to match 0 or more of any basic type, or ‘tuple’ in order to match 0 or more primitives.
Reducerreducerdepends_on
type
from
equals
first
last
Compute a sequence of states based on current state and possibly other symbols. ‘from’ is for initialization, and ‘equals’ is for the recursion function, which can be self-referencing.
Timedate ,
time,
time seconds,
time milliseconds, time nanoseconds
depends_on
type
Snapshot the current date as a Date or time as a long or double in seconds, milliseconds, or nanoseconds. Setting the type to ‘string’ generates standard date text. A depends_on determines when to snapshot: otherwise the time is snapshotted only once.
Randomrandom,
secure random
depends_on
length
from
to_exclusive
Generate a random number as a long with a given bit length or range, or else a secure random byte array, defaulting to 32 bytes. A depends_on determines when to re-generate: otherwise only one random is generated.
Countercounterdepends_on
type
from
increment
Starting from a given ‘from’ attribute defaulting to 0, add a given increment defaulting to 1 and provide the count as the symbol value. Type defaults to long but may be set to double or float or else it is the type of the from or increment. Each iteration increments the count.
Sumsumdepends_on
type
from
equals
first,
last
A simple reducer that adds up either the immediately outer component in the pattern, or the equals expression if present. A depended-on symbol causes a reset to the outer component or the from expression.
Seriesseriesdepends_on
type
from
to
to_exclusive
increment
equals
Generate a series of numbers N (of type double, float, or long). The equals is an optional self-referencing expression giving a function of N and possibly other symbols that determines the symbol value. Either the ‘to’ or ‘to_exclusive’ is required. A series is not dependent on database data except to be reset by a depends_on symbol. The complete series is generated on each iteration of the ‘outer’ part of the pattern to the immediate left.
Root query definition,
stream input,
request header,
request content,
response header,
response content,
session
Patterns can match data in alternative input sources, and results can output to alternative destinations . The root symbols are outermost in patterns or results in order to specify where to input or output. A given query may input and output to multiple roots. Stream input is for PatternQueryItemSpace, and others are for REST Client Queries.
FilterfilterequalsA boolean expression determines whether to continue the logical tree visiting inwards or to backtrack outwards. It is a kind of conditional ‘fence’.

Steps

It is possible to have a query do multiple passes or ‘steps’ over the data. This is just like running multiple queries one after the other. So, the output of earlier steps can be inputs to later steps and temporary work space can be used between them. The Where table is shared between the steps. This feature is in development. For example:

query {
    Steps {
        1 {
            pattern Temporary HelloWorld;
            pattern_mode 'delete';
        }
        2 {
            pattern Input HelloWorld =x;
            result Temporary HelloWorld =x;
        }
        3 {
            pattern Temporary HelloWorld =y;
            result Output HelloWorld =y;
        }
    }
    Where 'x' type 'string';
}

For the step numbers any type is acceptable, and only the ordering is important. In step 1, the result defaults to be a copy of the pattern.

The ‘pattern_mode’ in step 1 causes the Items that match the pattern to be deleted along with their suffixes. This clears out the temporary area. Then step 2 uses that area as a work area. One could delete the work area at the end instead, or do both. By default, the patterns use mode ‘none’ and the results use mode ‘insert’. The pattern_mode can be default, ‘none’ or ‘delete’. The result_mode can be default, ‘none’, ‘insert’, ‘delete’, or ‘delete suffixes’. The PatternQuery execute() method provides parameters that determine the modes for the default cases. Many queries just default the modes, and the execution environment chooses to set the modes, which are usually ‘none’ for the pattern and ‘insert’, ‘delete’ or ‘delete suffixes’ for the results. The modes are available on non-stepped queries as well.

(Later, a unique work area will be possible to be chosen by putting in a symbol for the time in nanoseconds, or a random number, but currently the time or random is not transferred between steps. In the Client Queries case, there is already unique temporary space although it is restricted to being in memory because it is a VolatileItemSpace. )

Using temporary work space allows very advanced execution plans that can be very efficient. There are many other uses for the modes as well, such as clearing out the destination of the results so that the results do not simply merge with the pre-existing results. The merging or ‘unioning’ behavior is desirable in many situations, though, and it is the inverse of the deleting mode, so it is symmetrical and a query’s effects may be reversed by executing in delete mode after insert mode. The symmetry is not perfect in case there are collisions in the unioning, and the colliding Items will be deleted while in the delete mode. The symmetry also works by following a delete with an insert.

PatternQueryItemSpace

In order to upgrade the application so that it continuously maintains the inversion we defined above for AircraftModel we can:

  • change the application’s Java Code, or
  • add or change the application’s PatternQueries by adding a result Item for the AircraftModel Items where the result already includes Aircraft Items, or
  • set up a PatternQueryItemSpace that intercepts the mutations to the database and transparently maintains the inverted forms of the Aircraft Items under AircraftModel. Any set of per-Item transformations such as permutations of Item data, even with joins is possible. The events can be either mutations of the Aircraft Items, the AircraftModel Items, or both. The result can be any combination as well.

An elegant way to handle the maintenance of the Item replications is to keep all the PatternQueries in the database itself. Then the application invokes them where they live, directly in the database, and they can be edited or copied from elsewhere in order to keep them in synch with the data structures. The database becomes a self-describing unit, with its structure and the operations it can perform against itself all in one place. The Java code can still contain some hard-coded parts, such as legacy code, as desired.

Alternatively, a PatternQueryItemSpace can transparently extend the hard-coded part or the other PatternQueries, by using a single per-database PatternQuery, whose definition is stored anywhere. The application constructs one PatternQueryItemSpace in order to wrap the database, and then does operations on it instead of on the database directly. In InfinityDB Client/Server, a special per-database PatternQuery is called the system query and is stored under ‘System Query’ (that’s the System EntityClass immediately followed by the Query EntityClass and then the pattern query definition). The PatternQuery sees an input stream of individual Items being inserted into or deleted from the wrapped database, and it rewrites them and can optionally join them with database Items to produce a set of output Items going back into the database. If the inputs are deletions, then the outputs are deletions as well. There is no other difference in the handling of insertions or deletions.

Performance will not reach the high levels of direct InfinityDB access, of course, but it can be concurrent. There is a SimplePatternQueryItemSpace and an alternative ThreadedPatternQueryItemSpace, both driven by a lower-level TriggerItemSpace. The PatternQuery definition affects the setup of the hidden TriggerItemSpace described below so that mostly only the necessary Items are captured and used, and the rest of the action is relatively fast.

For example, a system query could be (ignoring some Root symbol bookkeeping):

System Query query {
    pattern {
        Aircraft =id1 model =model1;
        AircraftModel =model2 id =id2;
    }
    result {
        Aircraft =id2 model =model2
        AircraftModel =model1 id =id1;
    }
}

Simplified Items

The output of the PatternQueryItemSpace or any other PatternQuery can if desired reduce the amount of information in the individual input Items reaching the database by including in some result Items less than all of the symbols of corresponding pattern Items, but this affects deletion. Without such ‘simplification’ of Items, the variable parts of Items are simply ‘permuted’, producing Items with the same meaning.

As an example, consider taking Items that include a ‘country’ value and storing that country value in a separate place in the database without the entire original Items. Then, the set of ‘seen’ countries would accumulate but there would be no way to deduce the original Items from which they came.

Collisions

On insertion, the ‘simplified’ result Items will sometimes harmlessly collide, but this may be fine, and in fact may allow very high performance improvements when the ‘simplified’ Items are used as later input. For example, the simplified Items can be joined to in other patterns. The simplified Items may take up relatively little space.

However, later deletions will generate the same simplified Items and ‘too much’ will be deleted. For this reason, Stepped queries may be used to separate insertions from deletions. Without deletions, the simplified Items will only accumulate, but in many cases this is acceptable, or even desirable. If necessary, the set of simplified Items can be manually deleted and an appropriate PatternQuery can reconstruct them from the source data, so they are refreshed and minimal. In many cases, the simplified Items will actually not collide based on the data characteristics, so there is no issue.

These considerations reflect no inherent limitation of PatternQuery but are logical consequences of storing different Items representing the same information but that omit some data rather than ‘permuting’ it.

TriggerItemSpace

A TriggerItemSpace is a a part of a PatternQueryItemSpace when used in the Server as the System Query, but it can also be used alone by wrapping any ItemSpace, to make the ItemSpace ‘active’ rather than ‘passive’. Any retrieval or mutation operation can be captured and custom Java code can be invoked. Pre-retrieval, post-retrieval, insert, and delete are independently handled, and if one of them is not active, it does not cause delays. A small set of fixed prefixes of various lengths can be detected to trigger different associated actions. The prefixes are independent, so some can be prefixes of others. Matching can be exact or by prefix.

Trigger Implementation

The TriggerItemSpace does a quick hash of a single fixed prefix length of the input Items to determine whether they are candidates to be intercepted, and if so then it compares the necessary associated variable-length prefixes. The length of the hashed prefix part is the minimum of the full prefixes, so if the minimum prefix is long, most operations only incur the hashing overhead, and short Items are not even hashed. The custom code can pass the triggering Item through to the underlying ItemSpace or it can block it and then it can do anything desired. Access is fully concurrent, with no locks or synchronizations. (Lock-free concurrency techniques are used). The trigger code can be replaced, and triggers can be added or removed at any time even with multiple threads active. About a few hundred to a few thousand triggers would be practical at one time. The triggered action code executes without locks, i.e. completely concurrently. Performance of the trigger detection should often be higher than the underlying InfinityDB ItemSpace, so little is lost.

Client Queries

An advanced feature under development is the ‘Client Query’ based on the REST API of the InfinityDB Client/Server version 6.0. The client does a REST access, but instead of accessing database data directly, a PatternQuery is transparently executed to input from the request, do the database access, and then output the results. Root symbols direct the access to the REST request header, request content, response header, response content, session data, and the usual database data.

If desired, the PatternQueries can be kept in the database itself, so a database becomes a self-describing unit including not only data but also the set of operations that can be safely and efficiently applied to it, according to its particular data structures. This should produce a more future-proof and secure system overall.

A typical use would be for JavaScript code on a web page to do an AJAX invocation of the Client Query and put the results on the page. Also the JavaScript can determine the AJAX input parameters based on user input. Any other way to hit a REST URL will work, such as curl, Python code, or Java code, or even a web browser.

Implementation Details

Like SQL, PatternQuery is not Turing complete, but all queries terminate eventually, so there are no infinite loops. A PatternQuery cannot run out of memory except for intentional use of temporary space, such as the session state in a REST client query . The execution engine uses only nested loops, without temporary space, but a stepped query can explicitly use temporary space or modify the database and then input the modifications. Unlike relational, the nested loops form a tree. Optimization is rule-based and almost immediate in all cases.

Queries can overlap, do not interfere, and are not limited in degree of concurrency. The multi-core concurrency of the underlying InfinityDB BTree is fully utilized. Each query executes single-Threaded (so far!). There are no inter-query locks, or in fact any locks at all within the PatternQuery code, except that if a query is executed in the context of an Optimistic transaction, then OptimisticLockConflictExceptions can be thrown and retries may be needed, as with any other code.

Common subexpressions are factored out and values are cached where helpful. A notification tree propagates value change events, so expressions and subexpressions are not unnecessarily re-executed when referred to. This can be important when doing regex matching and text formatting in relatively outer parts of the pattern.

In the future, parallel query execution may use stepped queries allow the parallel operations to be partially serialized.

Here is a printout of the execution plan of a particular test query. The symbols in the test definition are all named symbol(n) and the literals are numbers from 1 to 15, with no expressions. The nested loops are indicated by indentation. This plan is efficient because it is not just a diagonal but has many ‘outdents’ where backtracking can occur, showing the tree structure of the looping. The looping is dependent on the particular pattern used and is not always possible to flatten to this degree. Joins and cross-products in the result produce various ‘edits’ to the pattern not shown here. Terms in parentheses are from the pattern, and those with square brackets are from the result.

(root)
    (1:0)
        (=symbol1:1)
            (2:2)
                (=symbol2:3)
                    *:4t [11:0] [=symbol1:1] [12:2] [=symbol2:3]
            (3:2)
                (4:3)
                    (=symbol4:4)
                        *:5t [11:0] [=symbol1:1] [13:2] [14:3] [=symbol4:4]
                (5:3)
                    (=symbol3:4)
                        *:5t [11:0] [=symbol1:1] [13:2] [15:3] [=symbol3:4]

This contrasts with a relational system, in which the nesting of a nested loops plan is only diagonal, determining effectively a huge logical cross product that is filtered at appropriate points. Nested loops in a relational system are therefore not always very efficient, and various other join mechanisms are often used, but the selection of the join mechanisms is very complex and unpredictable in effect. Query optimization may take substantial time due to the difficulty of finding a reasonably efficient plan, and the efficiency is still often low, possibly extremely so. Plans other than nested loops require temporary space of some sort.