PatternQuery Usage

PatternQueries are declarative rather than procedural, using sets of Items as examples of input and output called ‘pattern’ and ‘result’. A query definition is a set of Items, just like any other data in the database. So, query definitions can be operated on and stored by any tools or systems that can work on databases or any other ItemSpace. See PatternQuery Examples for some sample code, and see PatternQuery Implementation for a discussion of the internals.

This document intersperses Java language concepts and code with the PatternQuery level concepts and code. However, no Java language understanding is necessary to make full use of PatternQueries. In the InfinityDB version 6 server, for example, PatternQueries may be worked with through the backend database browser UI without any Java code. Knowing basic Java is helpful, because PatternQuery uses concepts that are as similar to Java as possible. Also, the basic InfinityDB ItemSpace concepts are important, and although expressed with Java, the ideas are independent.

Essential Concepts

Patterns and ResultsAre simple sets of Items containing Literals, Symbols, and Expressions. Logically, the database is traversed like a tree i.e. recursively corresponding to the pattern Items. A Symbol matches an Item or a variable part of an Item like a number, a string, or a tuple. If individual symbols occur in multiple pattern Items, joins are formed. The result is also a set of Items that define how the symbol values and expressions using symbol values are to be used to insert or delete newly created Items . If symbols in a result Item occur in multiple pattern Items, cross-products are formed.
Symbol attributesEach symbol can be defined to have a minimum, a maximum, a set of restricted data types, strong type checking, and runtime strict detection of undefined values and more. Symbols can also be ‘named expressions’. Some symbol attributes can be defined as expressions, hence are determined at runtime. Symbols can have selectable ‘actions’. Symbol attributes are specified like Where 'symbol name' attribute 'value'.
ExpressionsThese follow the standard expression syntax of Java, JavaScript, C, and C++. The data types are the 12 InfinityDB component types, plus an ‘item’ data type, being a series of zero or more components. A sub-type of item is the ‘tuple’, which is a sequence of zero or more of the 10 primitive types. Items and tuples resemble the Python tuple datatype but do not nest. Optional strong typing checks for many execution errors in advance. There are many extension operators such as regex and text formatting, with more on the way. Expressions may occur in the pattern, in the result, and in certain symbol attributes.
Stateful SymbolsSome kinds of symbols have ‘statefulness’ and are affected by the logically recursive navigation. These include counters, timers, randoms, and reducers. A reducer allows any expression to be re-applied to its previous value, for extremely powerful iterated or recursive operations in a functional style.
RootMap SymbolsRootMap symbols allow multiple sources for the pattern input and multiple destinations for the result output. Items can be matched from different databases, from a REST query input or output, from a stream of input Items, and more.
Filter SymbolsThe logically recursive navigation can be controlled by boolean expressions in the pattern to prevent visiting selected subtrees.

Joins

In patterns, a symbol is allowed to occur in multiple places, forming a ‘join’. Using joins it is possible to create queries that combine input from different areas in the database, using the most efficient sources. The result can be an enormous performance improvement, similar to using indexes in an RDBMS. However, a Pattern Query effectively includes the decision about the use of such ‘indexes’ or ‘inversions’ to be encoded directly, rather than being hidden inside a query optimizer. This characteristic means that PatternQueries can have extreme speed, given good Item structures. Fast Item structures can be created simply by replicating certain kinds of data, such as by transposing or ‘permuting’ variable parts of Items and storing multiple equivalent Items. These replications replace indexes and are part of the database, navigable and mutable like any other data. The PatternQuery compiler rewrites the pattern items internally to handle joins.

Sample Query

A simple query can be specified using the ‘i code’ syntax. This is a text format that simply translates directly into Items that can then be executed. It is easier to read, and can be embedded into Java code easily. The InfinityDB backend database browser allows direct editing, saving, and executing queries expressed in i code and other forms. Here we have a query that examines Items beginning with an ‘Aircraft’ class, finding the id’s and models for each aircraft. The =id symbol represents a particular aircraft identifier, and the =model symbol represents the model of that aircraft. The input of the query is defined by the components after the ‘pattern’ attribute on the pattern line, and the output of the query is the part after the result attribute on the result line, in which the class becomes ‘AircraftModel’, and the =model and =id symbols are reversed. This is a simple ‘inversion’ that produces a re-sorting of all of the aircraft into an additional set of Items in the database. Such an inversion can later be used to find ids given a model with one database access, either by application Java code or other queries.

query {
    pattern Aircraft =id model =model;
    result AircraftModel =model id =id;
    comment 'do a simple inversion of Aircraft';
}

The contents of the database before the query might be:

Aircraft "747" model 100
Aircraft "747" model 200
Aircraft "757" model 300

And after the query executes, additional Items appear:

Aircraft "747" model 100
Aircraft "747" model 200
Aircraft "757" model 300
AircraftModel 100 id "747"
AircraftModel 200 id "747"
AircraftModel 300 id "747"

If your existing code does not use components of the EntityClass or Attribute data types like Aircraft or model but instead you use constants of some sort, you can just substitute those. (The EntityClass and Attribute data types are two of the 12 standard types, and formatted as text they look like Aircraft or model.) The ‘query’, ‘pattern’ and ‘result’ components are Attributes. There is often a ‘Where’ class along with the pattern and result to describe the symbols further. The syntax of the above simply describes two Items for the query definition, which in the standard Item tokenized form look like:

query pattern Aircraft "=id" model "=model"
query result AircraftModel "=model" id "=id"

Application code can rewrite or construct a query in any way desired because the query is actually a set of Items. The text above is translated by iParser into these Items, and it can be re-created and made tidy by iFormatter. The ‘i code’ syntax is trivial. (It is not a real language, and things that are not real are imaginary, so it’s i). Here is how to execute some i code:

String queryICode = "the query i code";
// Create a VolatileItemSpace with the Items. Or, an existing ItemSpace can be used.
ItemSpace querySpace = new IParser(queryICode).parse();
PatternQuery patternQuery = new PatternQuery(querySpace);
patternQuery.compile();
// InputSpace and outputSpace are often your database but can be any ItemSpace
patternQuery.execute(inputSpace, outputSpace);

Above we used ItemSpaces, but if you work up at the level of the InfinityDBMap API, just substitute those.

Here is a slightly longer query, where we put the result into ‘SelectedAircraft’, and we limit the range of the model to 200 to 300. The execution requires scanning the entire set of Aircraft and filtering by range on model. We will show a faster version below. This would rely on a similar execution plan compared to an RDBMS as well without indexes.

query {
    Where 'model' {
        from 200;
        to 300;
    }
    pattern Aircraft =id model =model;
    result SelectedAircraft =id;
}

This query takes four Items , here in tokenized form, and sorted by Item as usual:

query Where "model" from 200
query Where "model" to 300
query pattern Aircraft "=id" model "=model"
query result SelectedAircraft "=id"

Any i code can also be written in a ‘Python style’, even mixed with the ‘Java style’, and both are interchangeably recognized by the compiler. The language is extremely simple:

query:
    Where 'model':
        from 200
        to 300
    pattern Aircraft =id model =model
    result SelectedAircraft =id

The pattern and results can be multi-line like Where, and there can be nesting. To the left of each opening brace or colon is a series of tokens for components to prefix to all of the nested suffixes. The i code format is really just for:

  • Factoring out common prefixes
  • Changing single quotes to double (double quoted strings are allowed as well)
  • Rewriting =symbol to “=symbol” to create standard strings starting with an equal sign
  • Rewriting (expression) to “=expression
  • Changing lists with [..,.. ] to the InfinityDB Index data types, which are similar to longs that serve as list indices.

Here is how this query looks in the InfinityDB Server backend database browser in table display mode. This tool can be used to create and use PatternQueries via a browser directly on the server for maximum performance. The table display is based only on the set of Items designating the query, according to certain universal formatting rules. The browser is unaware that these Items represent anything other than regular data, until they are submitted for execution. It is simple to edit and execute these Items online. As a consequence of the formatting rules the suffixes of the Where class come out arranged in a nice table at the bottom, one row per symbol, and one column per symbol attribute. The pattern Items fill the stretchy box below the green ‘pattern’ attribute, and the result Items are below the ‘result’ attribute. There are no limits to the practical complexity of a PatternQuery in terms of executability or representation. The display of the PatternQuery in the browser expands and packs down automatically to handle any structure.

Query

Now for almost instant performance, we rewrite the query to use the inversion we created above. This requires only changing the pattern line:

query {
    Where 'model' {
        from 200;
        to 300;
    }
    pattern AircraftModel =model id =id;
    result SelectedAircraft =id;
}

Joins Again

The pattern part can be arbitrarily complex, and can do the equivalent of the relational ‘select’, ‘project’ and ‘Join’, or ‘SPJ’, which is the universally fundamental set of database operations. A join happens when a particular symbol occurs in multiple patterns. There can be multiple joins at the same time on different symbols forming a ‘multi-way’ join, and there can be more than two repetitions of a symbol to form ‘higher order’ joins although these are rarer. A first order join is also known as an ‘equi-join’ or ‘full join’ or just ‘join’. Furthermore, a symbol can occur multiple times within the same pattern forming a ‘lengthwise’ join. ‘Crosswise’ joins connect the suffixes of different patterns, although the distinction between lengthwise and crosswise is only terminology. Patterns that participate in joins are closely related, and their pattern matching requires that all of the related patterns match at once. When such a set of patterns match, all of the symbols in any of the patterns are given values together.

Here is an example of a join:

query {
    pattern {
        Aircraft =id model =m maker =mk;
        AircraftMaker =mk country =c;
    }
    result Results maker =mk id =id model =m country =c;
}

This query joins on the =mk symbol to combine information from two places in the database. The Aircraft Class uses the =id symbol to identify particular kinds of aircraft, but the AircraftMaker Class factors out the common facts about the makers. The results are being inserted under the ‘Results’ Class. The PatternQuery compiler rewrites the pattern Items internally to handle joins in the pattern and to handle a situation called a ‘cross product’ specified by the results.

Joins are a vital and powerful part of any DBMS. Not only do they determine the semantics of the query, but they also provide ways to increase performance tremendously, when applied correctly.

Expressions

There can be Java-style expressions within the pattern, result, and some parts of the Where. These expressions allow limiting the pattern matching, or computing on symbol values for inclusion in the results, or determining certain attributes of the symbols. An expression in i-text format is specified inside parentheses. Symbols in expressions correspond to the symbols defined in the pattern Items. Symbol names use the Java syntax for identifiers, which are mixtures of letters, digits, underscore, and dollar sign, not starting with a digit.

An expression in the pattern is very efficient, causing a restriction of the matching to the value of the expression, even though the expression value may be determined by other symbols in the pattern. The access performance is similar to that of a literal. Expressions are compiled directly into Java Objects where they are interpreted at runtime, without requiring the actual Java compiler. The compilation takes only tens of microseconds, and a compiled query can be re-used without re-compiling, although there is no way to persist the executable form. The executable form is not thread-safe but each thread can simply compile a new re-usable one for itself. The execution speed is much faster than the database accesses for most operations, and the Just-In-Time compiler can optimize the expression evaluation code easily.

The expression syntax uses only unary, binary, ternary and n-ary operators. There are many extensions beyond Java. Here is an example result where we format an item using the n-ary format() operator:

    result Results 'Aircraft make %s in %s makes %s model %s'.format(mk, c, id, m);

What Expressions Cannot Do

The expression facility does not provide a general purpose programming language, but instead has ‘functional’ semantics. It is nevertheless very expressive, much more so than SQL for example. This means there are no:

  • variables (but symbols take on various values during execution)
  • while, for, if, else or other block statements
  • references or pointers
  • classes or interfaces
  • arrays (except items and tuples)
  • Exceptions (except the undefined value from e.g. divide by 0)
  • source code files (other than optional i code)

What Expressions Can Do

So what is left? A ‘program’ is purely declarative. Instead of having explicit loop statements, all of the expressions logically execute each time a pattern or set of joined patterns matches, assigning values to all of the symbols from the data in the matched input Items that are necessary to fire related result Items. So for each match, the symbols have fixed i.e. immutable values, and all expressions perform fixed computations on those values. Effectively, looping happens in a nested way during the nested pattern matching. For conditionals, instead of if and else, the ‘ternary operator’ is used like x ? y : z. This has the value y if x is true, otherwise z. These can be nested or chained arbitrarily. Expressions can be given symbol names and referred to in other expressions: these are called ‘expression symbols’, as inWhere 'my_symbol' equals (other_symbol * 2);. All expressions are by default checked using compile-time strong typing to avoid common mistakes. Runtime errors like divide by zero create temporary undefined values that can be caught or can cause an Exception, but which never reach any ItemSpace, since they cannot be stored there. There is also runtime type checking and automatic numeric promotion when necessary.

External Java Code

As a side note: Currently, arbitrary external Java code cannot be invoked from within an expression. Until the security implications of such integration can be worked out, this will remain the case. Also, PatternQueries, like SQL, are intended to be portable and not sensitive to the local execution environment. Furthermore, environments such as the InfinityDB version 6 server will not usually provide a way to add access to arbitrary class files, at least not without a restart. Possibly the IoT-as-server case will be an exception, where the server is under local control in a dedicated device. Some safe java.lang.. and java.util.. classes and methods are being integrated.

Item Values

An additional datatype for the value of an expression, the ‘item’ is added to the basic 12 for the purposes of computing the expression. An item represents an immutable sequence of zero or more components, each being of the basic 12 types. (An item is manipulated in the expression interpreter as a simple array of Objects.) Literal items are parenthetical comma-separated lists. like (x,y,z) so an empty item is just () and a singleton item, i.e. an item of length one is x alone or (x). A singleton item value is interchangeable with a single component value.

Item values are not nestable, and when necessary, they are ‘flattened’, so they can be temporarily ‘unflattened’ while being computed. For example (1,(2,3),4,5) is legal and evaluates to (1,2,3,4,5), and the elements can be any expressions. Also they cannot contain undefined, or they will either throw an Exception in strict mode or will turn into undefined themselves. In fact, the value undefined cannot be stored in the database at all, and an item value or a regular encoded Item can not contain it. The undefined value is transitory, existing only during expression evaluation.

There is a practical limit on item length, in that the encoded form must fit in the chars of a regular encoded Item when stored in the database, which is always up to 1665 chars long. While being evaluated in an expression, however, items are not limited in length.

The optional compile-time strong type checking takes into account whether any expression value may be an item. (The compile-time type of an expression is a set of 12 booleans, one per basic type, plus a boolean for ‘itemness’. Each of the booleans represents a type that can be the result of the evaluation.)

Tuples

A ‘tuple’ is an item value or regular encoded Item that happens to contain only primitive elements (primitives exclude the class and attribute i.e. meta types). This is in keeping with the regular mathematical definition of a tuple. Tuple values are not flagged as such while being computed, but the strong compile-time type checking can distinguish them and find many errors early. It is possible to set the data type of a plain symbol to ‘tuple’, and only tuples will be matched by the symbol from the database. (A future feature will allow strong type checking and filtering of items during retrievals to enforce that they contain only a given predetermined subset of the 12 basic types. For example, items of only numbers could be enforced. Currently, a matched item symbol can contain any components and a matched tuple symbol can contain any primitives. )

Quoting Tokens with At Sign

Any value of the 12 basic data types – which are called components – can be written literally in a standard text form called a ‘token’. These conflict sometimes with the Java expression syntax, but they can be embedded in expressions by surrounding them with at signs like @MyEntityClass Bytes(25_A6) myattribute@ and so on. It is usually possible to hoist such tokens outside the expressions into the surrounding syntax where the at signs go away because everything is a token already in i code.

Operators

There are some extensions for operators beyond those in Java. For example, some operators produce ‘undefined‘ values for error conditions, which may be caught or ignored or which may be designated to cause an Exception. Expressions or subexpressions can be cast to (unstrict) or (strict) to control whether undefined should cause an Exception. Values that are items of length one are called ‘singletons’ and are treated identically to simple components. Any value can be tested in any boolean context without error, and the default values are considered false, as is undefined. The default values are false, 0, 0.0f, 0.0, Index(0) or zero-length values including “” or (). Boolean contexts are operands of logical operators such as ||, &&, or ?:. Catching undefined can be done with (x instanceof undefined ? 0 : x + y) or ((isdefined)x ? x : x + y), or for some uses, (x ? x + y : 0) and so on. Note that plain symbols can be declared selective for a set of chosen types and will skip data of non-selected types found in the database. Constructing items or tuples merely requires placing the desired elements in parentheses with commas, so there is no concatenation operator needed. Such item expressions can include elements which are also items, which are flattened at compile time or at runtime. Compile-time i.e. ‘strong’ type checking can be disabled by setting the strict symbol attribute false on any symbol, so type checking then relies on runtime tests or type adaptation such as numeric type promotion.

Operator Arities

Operators work differently depending on the number of operands, i.e. their ‘arity’. Some, like get() can have two or three operands. The standard Java operators are all present with the same precedence but are also defined to work over items as well as the basic types as in Java.

Operator ArityDescription
UnaryHave one parameter. These distribute over items, so -(4, 5) is (-4, -5) and (‘hello’, ‘world’).length() is (5, 5). Operators can look like parameterless method invocations, like expr.length(), or can be casts as in (length)expr. Some operators can be expressed optionally with either syntax.
BinaryHave two parameters. These distribute over items on both sides. So 3 * (4, 5) is (12, 15). Also (4, 5) * 3 is (12, 15). Furthermore, (3, 4) * (5, 6) is (15, 24).
TernaryDistribute on left only, so (‘hello’, ‘world’).substring(1,3) is (‘el’,’or’). There are two right parameters, which are non-items, i.e. components i.e. singletons.
N-AryDo not distribute on the left, with a right parameter that evaluates to a single component or item or a variadic parameter list. For example, ‘%s %d’.format(x, y). Also ‘%d %d’.format(x) where x is (1,2) will work. This is a bit like the Python formatter.
Item-Ary Operators like get(index), get(start, end), head(end), tail(start) or size() operate on items for the left parameter. A component as usual is considered to be a singleton item. The right parameters are non-items.

The operators are:

z

OperatorArityDistributiveOperation
string.format(parameter_ list)N-AryfalseFormat according to the string, with the usual %s and other escapes. This is n-ary, so the parameters may optionally evaluate to a single item, which is used as the parameter list (like Python).
string.simple_date_format(format)
date.simple_date_format(format)
BinaryfalseFormat or parse the date according to Java’s SimpleDateFormat using a format like ‘MM/dd/yy HH:mm:ss ZZ’ .
string ~= regexSpecialfalseMatch the string against the regular expression, yielding a boolean result. The regex is not surrounded by // as in JavaScript but instead is a normal string in quotes, as in Java. Being a constant is important to avoid denial of service attacks. As usual, be careful to write safe regexes (which is hard).
(totokens)exprItem-AryfalseCreate a string formatted according to component tokens. Expr may be a component or item. String components are quoted within the result.
(fromtokens)stringUnaryfalseParse a string to create an item based on the standard token formats of the basic types. The parsed string components are expected to be quoted within the given string. The string may be empty.
expr instanceof typeSpecialfalseReturn a boolean determined by expr being of a given type. The type names have no quotes. The types are the 12 basic types class, attribute, string, boolean, double, float, long, date, chars, bytes, bytestring and index. Also the general types number, component, primitive, and meta are recognized. The type ‘item’ is orthogonal to the basic types, returning true only for non-singletons, i.e true unless the length is 1.
(typeof)exprUnaryfalseReturn the single basic type of the expr. If it is a non-singleton, return ‘item’.
(isdefined)exprUnaryfalseReturn a true iff expr is not undefined.
(strict)expr or (unstrict)exprSpecialfalseSwitch on or off the runtime check for undefined. A strict expression or subexpression will throw an Exception with an explanatory message if it evaluates to undefined All runtime errors cause undefined expression values and do nothing else.
(double)expr, (float)expr, (long)exprUnarytrueNumber, boolean, string, or date can be cast to any numeric type. Casts from date return milliseconds. Casts from boolean return 0 or 1, 0.0 or 1.0, or 0.0f or 1.0f.
(date)exprDouble, float, long, string, or date can be cast to date. Numbers are interpreted as milliseconds.
(string)exprUnarytrueString casting works on all types. All types may be cast from strings, but produce the string ‘(undefined)’ for meaningless text. Each possible value has a unique text representation that also encodes the type. For castable types, (type)(string)expr == expr. A special case is that a tuple becomes a string of tokens, so for example the item (‘x’, ‘y’,5) becomes the string"x" "y" 5 and () becomes the empty string.
(boolean)exprUnarytrueAny type may be cast to boolean except that (boolean)’false’ is false and (boolean)’true’ is true and other strings are undefined. This is required so that (boolean)(string)expr == expr where expr is boolean. For the logical operators && || ?: and ! only undefined, and default values such as false, 0, 0.0f, 0.0 Index(0) or “” are treated as false. Items of size > 1 are true.
(index)exprThe Index data type can be created from any positive numeric type.
(item)exprConvert expr to an internal Java Object[]. This is only for external Java code (some day).
(head)expr or expr.head(endIndex)Item-AryfalseReturn the components of an item before endIndex. On out-of-bounds, return undefined. If endIndex is negative, it designates a position relative to the end of the item. The unary version means head(1)
(tail)expr or expr.tail(startIndex)Item-AryfalseReturn the part of an item at and after startIndex. On out-of-bounds return undefined. If startIndex is negative, it designates a position relative to the end of the item. The unary version means tail(1)
(length)exprUnarytrueThe length of a string, char array, byte array, or bytestring. Other data types return 0. Not like (size)
(size)ItemItem-AryfalseThe number of components in an item. If the left parameter is a component, it has length 1. The content of the item is ignored.
item.get(index)Item-AryfalseGet the component of an item at the given index. If the left parameter is a component, treat it as a singleton item. A negative index is relative to the end of the item.
get(startIndex, endIndex)Item-AryfalseGet, from an item, a sub-item from start to before end. If the left parameter is a component, treat it as a singleton item. The result may be an item of length one, which is a singleton and therefore equivalent to a component, or may be empty or any other length. A negative startIndex or endIndex is relative to the end of the item.
Java unary operators minus (-) and binary not (~)UnarytrueJava syntax. Distribute over the operand, if it is an item.
Arithmetic and bit-wise Java binary operators * / % >> >>> << | &BinarytrueJava syntax. Distribute on left and right over items. Left and right operands may have any length, but one operand must be of length 1 or else the lengths must be equal. The size combinations are thus 1:n, n:1 or n:n where n can be >= 0.
Java comparison i.e. equality or relational operators == != > < >= <=Item-AryfalseJava syntax. For example ‘==’ compares all elements of two items. The items may be of different lengths, but to be equal they must be the same length. A component is considered to be identical to a singleton, i.e. an item of length one. For ‘>’ and so on, first the initial components are compared, and if equal, then longer items are larger. The compared values may have different data types in any combination.
Java logical operators && || ?: !Item-AryfalseThese are just like Java, but do not distribute. Only undefined, and default values such as false, 0, 0.0f, 0.0 Index(0) or “” or zero-length values are treated as false. Items of size > 1 are true. No errors are possible except from the expressions after ? or :.

Symbol Attributes

There is a small set of possible symbol attributes that have similar meanings for different kinds of symbols. Symbol attributes are specified under the Where class under the query attribute. For example, a depends_on symbol attribute could be defined with Where 'mysymbol' depends_on 'some_other_symbol';

Symbol attribute in WhereValue TypeDescription
kindstringDetermines the purpose of the symbol. If absent, it is a plain symbol and just retrieves data. Otherwise, the symbol can only occur once in the pattern because only plain symbols can join. Plain or other kinds can occur multiple times in expressions and the results.
depends_onstring The name of a symbol whose value changes are monitored to determine when a ‘reset’ happens . The ‘reset’ for some kinds causes the value of the ‘from’ attribute’s expression to be transferred into the symbol’s state, or else a new random to be generated or the time to be taken.
from, from_exclusiveexpressionThe reset value or starting point or minimum. The type of the expression depends on the kind and type attribute.
to, to_exclusiveexpressionThe ending point or maximum. The type of the expression depends on the kind and type attribute. .
typeset of stringsOptional set of the 12 data types, or ‘tuple’ to match a series of zero or more components of the 10 primitive types or ‘item’ to match a series of zero or more components to the end of the Item. Also ‘number’ matches ‘double’ ‘float’ or ‘long’ and ‘meta’ matches ‘class’ (i.e. ‘entityclass’) or ‘attribute’. Furthermore, ‘component’ matches any single component i.e. a singleton tuple, and ‘primitive’ is like ‘component’ but excludes ‘class’ or ‘attribute’ and is the default. For plain symbols, these restrict the set of values matched from the input data. Other kinds do not accept multiple types. The set does not require any special syntax, so it can be like Where 'mysymbol' type { 'double'; 'float'; }
equalsexpressionA function to be applied to the current state or for other purposes for each symbol kind. When appropriate, this can be a series of expressions, effectively designating a ‘tuple’.
firstbooleanWhether to use only the initial value iterated. For plain symbols, only the lowest value is retrieved and it takes only one access.
lastboolean Whether to suppress output other than for the final value iterated. For plain symbols, only the highest value is retrieved and it takes only one access. For example, a sum kind outputs only the total.
incrementexpressionFor series and counter, the amount to add per iteration, default 1. For plain symbols, the number of input values to skip from the initial or from value, with positive being forwards, and negative backwards.
actionstringWhat do do when a pattern matches and result fires.
lengthexpressionThe size in bits of a random, size in bytes of a secure random, or for a plain symbol, the maximum number of database values to iterate over before ‘exiting’ the loop.
strictbooleanWhether to do strict compile-time type checking on the symbol, default true. In very rare cases, disabling strict may be needed.
commentstringuninterpreted text for humans to read

Actions

Each symbol has an ‘action’ that determines what happens when patterns ‘match’ and results ‘fire’. An action is a symbol attribute, and can be omitted, to get default ‘copy’ behavior. When a pattern or set of joined patterns match, the symbols they contain have values assigned to them for the purpose of creating result Items. A result ‘fires’ when all of the symbols referred to in the result are available with assigned values. If a symbol is at the end of a pattern, then the ‘move’ types of actions defined below are available. If a symbol is the one nearest the end of a result, which is called ‘innermost’, then any action is available for that symbol. The actions of symbols in earlier positions in the pattern or result must be default, and a symbol may be both at the end of a pattern and innermost in a result at the same time.

The actions are below. For example, you specify Where 'mysymbol' action 'move unioning'; and make sure that the mysymbol occurs at the end of some pattern and innermost in some result.

Action nameWhat happens
‘none’Nothing – normally for testing or disabling a result or pattern
defaultSame as ‘copy’
‘move’,
‘move safely’,
‘move unioning’,
‘move differencing’
Like the similar other variants, but delete the matched pattern Items afterwards. For safety, this deletion happens in a final ‘move’ pass during execution.
‘copy’Clear the result, then copy pattern data to create the result Items. In other words, this is like ‘clear’ and then ‘union.
‘copy safely’Make sure the copying would not overwrite existing data, and throw an Exception right away if so. In other words, this is like ‘ensure clear’ then ‘union’
‘union’Insert the new result Items with the previously existing Items. Note that this is a set operation, not a list, so no duplicate output Items can arise. This only requires the main execution pass, and is fastest.
‘difference’Delete the new result Items from the previously existing Items. This only requires the main execution pass, and is fastest.
‘clear’Remove the Items that are extensions of the prefix of the innermost result symbol. A special ‘clear’ execution pass happens before the main pass, and after a successful ‘safety pass’.
‘ensure clear’Check whether there is already result data and throw an Exception right away if so. In other words, make sure there are no Items that are extensions of the prefix of the innermost result symbol. A special initial execution pass does the checking for safety.
‘union or difference’Special for PatternQueryItemSpaces. Each execute() invocation provides a parameter that tells whether to union or difference.

Symbol Kinds

Each symbol has a ‘kind’, optionally specified in the kind symbol attribute. Plain symbols, which have no specified kind, do not need a Where table entry unless non-default symbol attributes are needed. Some attributes can have values that are expressions, and in those cases, symbols like =mysymbol or literals like ‘mystring’ or 73 are also allowed.

Below are the kinds:

KindKind namesSymbol AttributesDescription
Plain(none)type
from
from_exclusive
to
to_exclusive
equals
first
last
action comment
Plain symbols have the default kind, or are not present in the Where table. They simply retrieve data from the database (or from a root – see below).
Reducerreducerdepends_on
type
from
equals
first
last
action comment
Compute a sequence of states based on a self-reference to the current state and possibly other symbols. ‘from’ is for initialization, or resetting from the depends_on symbol and ‘equals’ is for the recursion function, which can be self-referencing.
Timedate ,
time,
time seconds,
time milliseconds, time nanoseconds
depends_on
type
action comment
Snapshot the current date as a Date or time as a long or double in seconds, milliseconds, or nanoseconds. Setting the type to ‘string’ generates standard date text. A depends_on determines when to snapshot: otherwise the time is snapshotted only once.
Randomrandom,
secure random
depends_on
length
from
to_exclusive
action comment
Generate a random number as a long with a given bit length or range, or else a secure random byte array of the given length, defaulting to 32 bytes. A depends_on determines when to re-generate: otherwise only one random is generated.
Countercounter
depends_on
type
from
first
last
increment
action
comment
Starting from a given ‘from’ attribute defaulting to 0, add a given increment defaulting to 1 and provide the count as the symbol value. Type defaults to long but may be set to double or float or else it is the type of the from or increment. Each iteration increments the count.
Sumcount,
sum,
mean,
variance,
sigma
depends_on
type
first
last
action
comment
A simple reducer that works on the immediately outer component in the pattern. Changes to a depended-on symbol cause a reset. Using last true gets a summary, such as a total.
Seriesseriesdepends_on
type
from
to
to_exclusive
increment
last
equals
action
comment
Generate a series of numbers N (of type double, float, or long). The equals is an optional self-referencing expression giving a function of N and possibly other symbols that determines the symbol value. Either the ‘to’ or ‘to_exclusive’ is required. A series is not dependent on database data except to be reset by a depends_on symbol. The complete series is generated on each iteration of the ‘outer’ part of the pattern to the immediate left. Only plain and series symbol kinds ‘generate loop iterations’.
Root query definition,
with data,
stream input,
request header,
request content,
response header,
response content,
database,
temporary
commentPatterns can match data in alternative input sources, and results can output to alternative destinations . The root symbols are outermost in patterns or results in order to specify where to input or output. A given query may input and output to multiple roots. Stream input is for PatternQueryItemSpace, and others are for REST Client Queries. ‘With data’ is anything under the WithData class in the query definition. Database kinds provide a constant database name in the equals attribute.

Reducers

A powerful technique in functional programming which is used in ‘big data’ databases is the ‘reducer’. In PatternQueries, there is a general reducer facility, plus simplified pre-defined reducers like count, sum, mean, variance and sigma. A reducer or other ‘stateful’ symbol has a state which is advanced as the database is logically iterated through during query execution. There is an initial value expression and a recursion expression including a self-reference to get the previous value and to determine the next value. Here is an example totaller (normally a simple sum symbol kind would be used instead):

query {
    pattern =symbol1 =symbol2 =sum;
    result =symbol1 =symbol2 =sum;
    Where 'sum' {
        kind 'reducer';
        from =symbol2;
        equals (symbol2 + sum);
        depends_on 'symbol1';
        comment 'note from starts at symbol2 not 0';
    }
}

In practice the =symbol1 and =symbol2 would be different, as appropriate to the task and might be part of a more complex pattern. The ‘kind’ attribute declares the ‘sum’ symbol a reducer. The ‘from’ attribute of the ‘sum’ symbol initializes the reducer before the first iteration to be the value of symbol2, then the ‘equals’ attribute contains an expression that advances to the next iteration by adding the previous value given by the ‘sum’ symbol (a self-reference) to the current value of symbol2. The optional ‘depends_on’ attribute causes the summer to be re-initialized each time symbol1 changes value during the database iteration. The symbols at the left in the pattern are in ‘outer’ loops and change more slowly than those to the right, so the depends_on references a symbol to the left. In practice, reducers are used for the most difficult tasks, and pre-defined ones are often used. See PatternQuery Examples for some elaborate reducers, such as a Fibonacci series printer, and a running weighted average of a series of values.

Reducer Performance

A note on reducer implementation is necessary, but this can be skipped. Reducers are not as slow or limited as experienced programmers might expect. While a reducer is a form of recursive function definition, reducers execute iteratively in constant time per iteration and with no limit on the number of iterations or database size. On each iteration, the reducer’s self-reference does not cause a recursive invocation of the interpreter and a push of the system stack. Instead, the self-reference is detected at compile time, and when executed, the previous reducer value is returned as the value of the self-referencing symbol. If the system stack approach were used in the traditional implementation of recursion, the performance would be unacceptable, there would be a practical limit on iteration count, and there would be stack overflows.

Steps

It is possible to have a query do multiple passes or ‘steps’ over the data. This is just like running multiple queries one after the other. So, the output of earlier steps can be inputs to later steps and temporary work space can be used between them. The Where table is shared between the steps. For example:

query {
    Steps {
        1 {
            pattern Input =hello;
            result Temporary =hello;
            comment 'copy the data matched by the =hello symbol to somewhere else';
        }
        2 {
            pattern Temporary =hello_again;
            result Output =hello_again;;
            comment 'copy the data matched by the =hello_again symbol to the output';
        }
    }
}

For the step numbers any type is acceptable, and only the ordering is important.

The temporary area can either be a dedicated area in the database or a root map (defined below) of kind ‘temporary’. (Later, a unique per-query-execution work area will be possible to be chosen by putting in a symbol for the time in nanoseconds, or a random number, but currently the time or random is not transferred between steps. ) Above, the data comes from a class Input in the database, then is copied to be under class Temporary, and then is copied to Output. So, after the stepped query executes, the database looks like:

Input "hello world"
Temporary "hello world"
Output "hello world"

Stepped queries allow dynamic re-organization of data, allowing intermediate computations and re-sorting, for arbitarily complex operations and extreme speed.

Root Map Symbols

It is sometimes necessary to deal with Items in places other than the database. The pattern Items can begin with special ‘root map symbols’ that designate various other sources, as can the result Items for designating various other destinations. For example, static input data can be included with the query itself, as part of the definition of the query, called the ‘with data’:

{
     query { 
         pattern =wd =message;
         result =message;
         Where 'wd' kind 'with data';
         WithData 'hello world';
     }
}

This just outputs ‘hello world’. The =wd symbol goes at the start of the pattern Items that are to be redirected. It is defined in the Where 'wd' kind 'with data' and can have any name (with Java identifier syntax as usual). If the query is running in standalone mode, i.e. invoked by application Java code, then the ‘with data’ can be set explicitly:

        RootMaps rootMaps = patternQuery.getRootMaps();
        InfinityDBMap<Object, Object> withDataMap = 
                new InfinityDBMap<>(someItemSpace);
        rootMaps.setRootMapByType(RootMaps.Type.WITH_DATA, withDataMap);
        patternQuery.setRootMaps(rootMaps);

Note that the Java code can put the with data anywhere it wants. In fact, standalone Java code can completely control all of the rootMaps, and is able to discover the defined maps after compilation, see their symbols and names, and provide an actual InfinityDBMap for each definition. A possible use of this capability would be automated testing.

There are other kinds of root maps for different contexts:

Root Map Symbol kindUsage
‘query definition’The input is the definition of the query itself.
‘database’in the server, this identifies a database like ‘demo/read_only’. Standalone, this can be bound to any map by name, like a named parameter. The name is in the equals symbol attribute as a literal.
‘request header’
‘request content’
‘response header’
‘response content’
in a Client Query executing on behalf of a REST access . These are all implemented as in-memory VolatileItemSpaces, so they should not have huge amounts of data.
‘temporary’is for temporary work in a stepped query, if referred to in the PatternQuery. Often this is an in-memory VolatileItemSpace. This must be provided explicitly by a standalone query, and there is none for a PatternQueryItemSpace.
‘stream input’in a PatternQueryItemSpace (explained below) providing the intercepted retrieval and mutation Items passed down to the wrapped ItemSpace or map. There are limitations on the overall transformation because the stream input provides one Item at a time for matching. This is used with symbol action ‘union or difference’.
‘with data’Input data embedded in the query definition itself under the WithData class. This can be overridden by standalone Java code that invokes the query.

Database Root Map

In the Server, explicitly named databases can be used as root maps by means of the ‘database’ kind of symbol. For this, one simply provides the database name in the symbol’s equals attribute as a literal. (Note that user access permissions are checked). The database name is not a general expression, but a constant, so it can be determined by the compiler before execution.

query { 
    pattern =my_db AnyInput =any_symbol;
    result AnyOutput =any_symbol;
    Where 'my_db' {
        kind 'database';
        equals 'demo/read_only';
        comment 'in the server, this identifies a database, but when standalone,
             it is a "named root map" like a parameter';
     }
}

In standalone mode, one adds a bit of code to the above to bind any desired InfinityDBMap to the query:

    // If you started out with an ItemSpace, wrap it using the map API
    InfinityDBMap<Object, Object> myMap = 
                new InfinityDBMap<>(myItemSpace);
    rootMaps.setRootMapByDataBaseName("my root map name", myMap);

Then, the query can input or output to this map. Effectively, the database kind symbol becomes a named parameter for the query. The standalone code can discover the set of named databases given the compiled query, or it can just assume that a given name is there, but if it is absent, false is returned by the setter method, and the query cannot be executed. There can be any number of root symbols.

Crosses

A general characteristic of PatternQuery optimization is that in some cases, loops must become nested, while in the original pattern they were independent. Any case of such increased nesting is called a ‘cross product’ or just a ‘cross’. When crosses occur, it is necessary to pay attention to the possibility of a large increase in execution time and possibly in space as well if the newly nested loops each execute a large number of times. Usually it is obvious to the author of a query whether crosses are significant, based on prior knowledge of the data characteristics or other factors. In fact unwanted crosses may signal the need to re-evaluate the purpose of the query. Sometimes crosses are actually intended. There are three kinds of crosses:

  • Join Crosses
  • Result Crosses
  • Tail Crosses

Joins cause crosses of the unshared parts of the joined patterns outer to the join symbol. This does not apply to the join symbol itself, which is very efficient, and it does not apply to outer parts that the joined patterns have in common, i.e. their common prefixes. These cases relate to joins that are ‘crosswise’, not ‘lengthwise’. Result crosses occur when the result refers to symbols in different patterns in a crosswise way, meaning that the symbols are not all contained within any individual pattern. Also, some very rare cases of ‘tail matching’ can cause crosses: these occur when the innermost pattern symbol that occurs in a result is followed by branching ‘tail’ portions of the pattern containing multiple ‘wildcard’ symbols that are not in the result. (Such crosses may be possible to speed up by the compiler in the future.)

As a consequence of the fact that joins combine patterns into related sets that must match together, it is therefore possible for a result cross to combine these related sets. In that case, the related sets all combine together into an even larger set for the given result. This level of combining is per-result, meaning that it relates to each result independently. In fact, each result is independent from each other result in any case.

Limiting Crosses

There are various ways to ensure speed if necessary. Often, one of the loops is known already to execute only a small number of times, typically zero or one time, possibly dependent on the data. Or, the data may actually not be structured as intended and may be possible to clean up. The symbols creating the iterations can have a ‘first’ or ‘last’ symbol attribute to force only the first or last value to be used. Or, the length symbol attribute can be set small, to limit the iterations. Filter kinds of symbols can be used to make the loops conditional on expressions in the equals symbol attribute. Matching can be limited by using expressions in the pattern within the cross. Often, one of the crossed parts does not even include any symbols at all but only literals and expressions, so no extra looping can occur. Also, only plain symbols or series kinds of symbols can create iterations, so other kinds of symbols are irrelevant. If there are replicated copies of the same data in the database with different organizations, the best one should be used anyway. If not, creating such a replication should be considered, possibly only temporarily.

Joins are an Essential Performance Tool

It may seem that the crosses caused by joins are a problem, but in fact joins are an essential tool to increase performance. Queries can often be restructured to use additional joins to bring in other data for tremendous performance increases.

Finally, it is possible to use the temporary root map or some space in the database to temporarily calculate, sort and sift inside a stepped query for difficult cases. In fact, stepped queries can be made extremely efficient and effective by using temporary space appropriately. In InfinityDB, using temporary space in this way is explicit in the query, while in RDBMS, the use of temporary space is up to the compiler, and it complicates and slows the optimization greatly, while not ensuring good results.

PatternQueryItemSpace

In order to upgrade the application so that it continuously maintains the inversion we defined above for AircraftModel we can:

  • change the application’s Java Code, or
  • add or change the application’s PatternQueries by adding a result Item for the AircraftModel Items where the result already includes Aircraft Items, or
  • set up a PatternQueryItemSpace that intercepts the mutations to the database and transparently maintains the inverted forms of the Aircraft Items under AircraftModel. Any set of per-Item transformations such as permutations of Item data, even with joins is possible. The events can be either mutations of the Aircraft Items, the AircraftModel Items, or both.

An elegant way to handle the maintenance of the Item replications is to keep all the PatternQueries in the database itself. Then the application invokes them where they live, directly in the database, and they can be edited or copied from elsewhere in order to keep them in synch with the data structures. The database becomes a self-describing unit, with its structure and the operations it can perform against itself all in one place. The Java code can still contain some hard-coded parts, such as legacy code, as desired.

Alternatively, a PatternQueryItemSpace can transparently extend the hard-coded part or the other PatternQueries, by using a single per-database PatternQuery, whose definition is stored anywhere. The application constructs one PatternQueryItemSpace in order to wrap the database, and then does operations on it instead of on the database directly. In InfinityDB Client/Server, a special per-database PatternQuery is called the system query and is stored under ‘System Query’ (that’s the System EntityClass immediately followed by the Query EntityClass and then the PatternQuery definition). The PatternQuery sees an input stream of individual Items being inserted into or deleted from the wrapped database, and it rewrites them and can optionally join them with database Items to produce a set of output Items that actually reach the wrapped database. If the inputs are deletions, then the outputs are deletions as well.

Performance will not reach the high levels of direct InfinityDB access, of course, but it can be concurrent. There is a SimplePatternQueryItemSpace and an alternative ThreadedPatternQueryItemSpace. Both of them are implemented by wrapping a TriggerItemSpace that in turn wraps the wrapped ItemSpace. The PatternQuery definition affects the setup of the hidden TriggerItemSpace described below so that mostly only the necessary Items are captured and used, and the rest of the work is relatively fast.

For example, a system query could be:

System Query query {
    pattern =si {
        Aircraft =id1 model =model1;
        AircraftModel =model2 id =id2;
    }
    result {
        Aircraft =id2 model =model2;
        AircraftModel =model1 id =id1;
    }
    Where {
        'si' kind 'stream input';
        'id1' action 'union or difference';
        'model2' action 'union or difference;
    }
}

With the above System Query, an insertion or deletion into the PatternQueryItemSpace of Aircraft '787' model 400 will match the first pattern, and then an Item AircraftModel 400 id '787 will be correspondingly inserted or deleted.

Permuted and Simplified Items

The output of the PatternQueryItemSpace or any other PatternQuery can if desired reduce the amount of information in the individual input Items reaching the database by including in some result Items less than all of the symbols of corresponding pattern Items, but this affects deletion. Without such simplification of Items, the variable parts of Items are simply permuted while literals and expressions may be changed, producing Items with the same meaning, and having a one-to-one correspondence. In a PatternQueryItemSpace, the symbols for permuted Items will normally but not necessarily have action ‘union or difference’, and then the permuted Items will track the original Items, both with insertion and deletion. The database is then maintained in a consistent state over future mutations. Also, further information can be collected as the stream input causes matches and firings, possibly with symbols having other actions such as ‘union’.

Collisions

On insertion, the simplified result Items will sometimes collide, but this may be fine and may in fact allow very high performance improvements when the simplified Items are used as later input. For example, the simplified Items can be joined to in other patterns. The simplified Items may take up relatively little space.

However, later deletions will generate the same simplified Items and ‘too much’ will be deleted. For this reason, symbols can be given actions that prevent loss. If the simplified results have an action of ‘union’, then deletions will not occur. Without deletions, the simplified Items will only accumulate, but in many cases this is acceptable, or even desirable. If necessary, the set of simplified Items can be manually deleted and an appropriate PatternQuery can reconstruct them from the source data, so they are refreshed and minimal. In many cases, the simplified Items will actually not collide based on the data characteristics, so there is no issue.

These considerations reflect no inherent limitation of PatternQuery or PatternQueryItemSpace but are logical consequences of storing simplified Items that omit some data rather than permuting it.

TriggerItemSpace

A TriggerItemSpace is a a part of a PatternQueryItemSpace when used in the Server as the System Query, but it can also be used alone by wrapping any ItemSpace, to make the ItemSpace ‘active’ rather than ‘passive’. Any retrieval or mutation operation can be captured and custom Java code can be invoked. Pre-retrieval, post-retrieval, insert, and delete are independently handled, and if one of them is not active, it does not cause delays. A small set of fixed prefixes of various lengths can be detected to trigger different associated actions. The prefixes are independent, so some can be prefixes of others. Matching can be exact or by prefix.

Trigger Implementation

The TriggerItemSpace does a quick hash of a single fixed prefix length of the input Items to determine whether they are candidates to be intercepted, and if so then it compares the necessary associated variable-length prefixes. The length of the hashed prefix part is the minimum of the full prefixes, so if the minimum prefix is long, most operations only incur the hashing overhead, and short Items are not even hashed. The custom code can pass the triggering Item through to the underlying ItemSpace or it can block it and then it can do anything desired. Access is fully concurrent, with no locks or synchronizations. (Lock-free concurrency techniques are used). The trigger code can be replaced, and triggers can be added or removed at any time even with multiple threads active. About a few hundred to a few thousand triggers would be practical at one time. The triggered action code executes without locks, i.e. completely concurrently. Performance of the trigger detection should often be higher than the underlying InfinityDB ItemSpace, so little is lost.

Client Queries

An advanced feature under development is the ‘Client Query’ based on the REST API of the InfinityDB Client/Server version 6.0. The client does a REST access, but instead of accessing database data directly, a PatternQuery is transparently executed to input from the request, do the database access, and then output the results. Root symbols direct the access to the REST request header, request content, response header, response content, temporary and the usual database data, either default or identified by a database kind of symbol.

Normally, the PatternQueries can be kept in the database itself, so a database becomes a self-describing unit including not only data but also the set of operations that can be safely and efficiently applied to it, according to its particular data structures. This should produce a more flexible, future-proof and secure system overall. There will be user permissions for use of Client Queries that can be granted independently of whether a given user has database read or write permissions. The permissions can be changed per-user or per-group by the admin user.

A typical use would be for JavaScript code on a web page to do an AJAX invocation of the Client Query and put the results on the page. Also the JavaScript can determine the AJAX input parameters based on user input. Any other way to hit a REST URL will work, such as curl, Python code, or Java code, or even a web browser. For Python, the infinitydb.access package allows using a Client Query.