Pattern Queries

InfinityDB version 6, which is the Client/Server release, has a new feature called PatternQueries that can be used by users or applications in place of Java code for rapid development, data structure extension, testing, data transformation, database administration, consistency maintenance, ad-hoc user data retrieval and processing and much more. PatternQueries access and transform InfinityDB databases in rich and powerful ways based on simple query definitions. No programming is required although the powerful feature set rivals the generality of custom application code in some ways. Version 6 includes all of version 4 and 5’s features, and even non-server code such as existing applications can benefit from PatternQueries.

A PatternQuery is much simpler to write than an equivalent Java program. Also, the definition of a PatternQuery can be stored naturally in the database itself, or it can be embedded as text in an application . It is even possible to take advantage of PatternQueries in isolation, with no application code involved at all, by means of user creation, saving, sharing and execution of PatternQuery definitions as database Items, especially via the InfinityDB Server Back-end database browser. Applications can mix PatternQueries as a kind of API together with the other ItemSpace and Map APIs seamlessly, choosing whichever API is easiest for each task. There is no interference between the three APIs. All of them are transactional, concurrent, and fast. PatternQueries are actually implemented using the Map API.

Comparison to SQL

PatternQueries are not table-oriented as SQL, but can operate on the hierarchical InfinityDB schema model to do any kind of retrieval, update, or schema change dynamically. The input of a PatternQuery is defined by a ‘pattern’ which is a simple example of the tree structured subset of the database to be collected together. Symbols in the pattern define the unknowns in the input that are determined during the hierarchical scan of the database. The output is also an example called a result, which is just a declarative tree structure with embedded symbols. The input can operate on any subset of the database to produce and output a new subset of the database either for insertion or deletion.

The SQL equivalent of a single PatternQuery would be a set of many commands to create or drop tables, to add or remove columns, to do inserts or deletes, and to transform table data using complex expressions.

In SQL a select statement produces a single output table. While it is possible to create new tables using a create table command, or to add columns, it is not easy to modify a schema once the system is created, especially when tables or columns are to be dropped. In InfinityDB, schemas are more flexible, and can be manipulated by PatternQueries. It could be said that an InfinityDB schema is ‘late binding’, where SQL is ‘early binding’. Early binding requires a permanently fixed schema, or at least one where only extension is practical by adding tables and columns once application code is created.

Execution Performance

The logically recursive navigation of the pattern tree does not correspond necessarily to physical access, as there are re-writes of the pattern for semantics and speed. Each navigation recursion is efficient, not simply following the structure blindly, but instead doing direct efficient b-tree index accesses at each step. So iterating over an ‘outer’ symbol, i.e. data found nearer the tree root, does not require visiting all of the contents of a subtree, i.e. the ‘inner’ symbols, Item-by-Item. Instead, each match of each outer symbol requires only one database access, hence performance can sometimes be orders of magnitude higher than simple depth-first recursion, depending on the size of each inner tree, which may be very large. There is a further optimization for ‘tail matching’, in which a portion of a pattern Item at the end contains ‘wildcard’ symbols and literals. The initial series of literals and expressions in the patterns allow direct, fast access. Also, restricting the matching of a symbol to a minimum or maximum value has no performance impact, being as fast as access based on equality.

Joins

In patterns, a symbol is allowed to occur in multiple places, forming a ‘join’. Using joins it is possible to create queries that combine input from different areas in the database, using the most efficient sources. The result can be an enormous performance improvement, similar to using indexes in an RDBMS. However, a Pattern Query effectively includes the decision about the use of such ‘indexes’ or ‘inversions’ to be encoded directly, rather than being hidden inside a query optimizer. This characteristic means that PatternQueries can have extreme speed, given good Item structures. Fast Item structures can be created simply by replicating certain kinds of data, such as by transposing or ‘permuting’ variable parts of Items and storing multiple equivalent Items. These replications replace indexes and are part of the database, navigable and mutable like any other data. The PatternQuery compiler rewrites the pattern items internally to handle joins.

Unlike SQL query compilers, the PatternQuery compiler does not need to try to guess which indexes and join orders or join techniques are best, so compiling always takes only milliseconds or less, producing an efficient plan each time. An RDBMS query compiler has far more to do, and as queries get more complex, the join count increases, compilation performance degrades tremendously, and optimal plans are seldom reached. In RDBMS, the proper set of indexes must be determined by guesswork or experimentation with all required queries, and indexes must be built and dropped from time to time in different situations.

Precomputed Joins

Because of the hierarchical structure of InfinityDB databases, a significant performance improvement is possible, since subtrees are logically equivalent to materialized or pre-computed joins in an RDBMS. Each nesting that occurs allows direct subtree access that bypasses indexes that propagate joins between tables. For example, an InfinityDB heirarchy where some subtrees reach three levels deep can avoid two degrees of joins depending on the query. In contrast to RDBMS materialized joins, the InfinityDB tree structures are completely dynamic, updatable directly with no re-computation. PatternQueries take advantage of these effectively pre-computed joins.

Sample Query

As an example of a very simple but useful PatternQuery, here we have one that computes an ‘inversion’, in which the variable parts of a set of database Items are reversed in the component sequence to create a new set of Items. The result is a duplication of the information in the input set into a new set that can be accessed more quickly. Then, this set can be left in the database to become permanent and maintained dynamically like an index, or it can be deleted right away. The resulting new Items are accessible very quickly via the model because the model occurs earlier in the new Items, being directly after a fixed known prefix. For more on this query, see Using PatternQueries.

query {
    pattern Aircraft =id model =model;
    result AircraftModel =model id =id;
    comment 'do a simple inversion of Aircraft';
}

PatternQueries can be very elaborate if necessary, and can be stored in the database for later re-use as Items rather than using the text format above, which would normally be used only embedded in applications. PatternQueries are hierarchical like the database itself, so when they are stored it is a natural fit, not a simple stored text BLOB.

For more see Using PatternQueries, PatternQuery Implementation, and PatternQuery Upcoming Features