Pattern Queries

InfinityDB version 6, which is the Client/Server release, has a new feature called PatternQueries that can be used by users or applications in place of Java code for rapid development, data structure extension, testing, data transformation, database administration, consistency maintenance, ad-hoc user data retrieval and processing and much more. Java, JavaScript, Python and other client programs may use a REST interface to invoke PatternQueries. REST is a simple request-response protocol based on HTTP that is in common use. A custom REST API can be easily defined and maintained over time with simple PatternQuery definitions residing in the database itself. No programming is required although the powerful feature set rivals the generality of custom application code in some ways.

A PatternQuery is much simpler to write than an equivalent program in Java, Python, or other language. If PatternQuery were to be compared with a traditional programming language, one could say that a useful query takes only a few ‘lines’ of code, but a query is actually just a hierarchical data structure that fits in with the InfinityDB data model itself. There is, however, an optional trivial text format called ‘i code’ that programmers may feel more comfortable with, and it is no more complex than SQL and is easily readable. The i code is translated into the simple data structure called ‘Items’ in an ‘ItemSpace’ before execution. The Server has a back-end web-based database browser that allows direct editing of PatternQueries in a convenient, graphical format, in the same was as any other data. All data, including PatternQueries, can be viewed and edited in a user-chosen format, including graphical, i code, JSON, CSV, texts, images and more. The database can store and display images, text files or blobs of any mime-type, intermixed with rich ‘ItemSpace’ data structures. The user-defined REST APIs communicate this data through PatternQueries.

PatternQueries are not table-oriented as SQL, but can operate on the hierarchical InfinityDB schema model to do any kind of retrieval, update, or schema change dynamically. The input of a PatternQuery is defined by a ‘pattern’ which is a simple example of the tree structured subset of the database to be collected together. Symbols in the pattern define the unknowns in the input that are determined during the hierarchical scan of the database. The output is also an example called a result, which is just a declarative tree structure with embedded symbols. The input can ‘match’ on any subset of the database to produce changes in the database by ‘firing’ results. The pattern and result can also include constants called ‘literals’ and rich expressions of a simple syntax (like Java, C, C++, and JavaScript). There are also different kinds of symbols and symbol attributes for special tasks including statistics, secure hashing, random, timing, a general-purpose facility called ‘reducers’ and much more.

Comparison to SQL

The SQL equivalent of a single PatternQuery would be a set of many commands to create or drop tables, to add or remove columns, to do inserts or deletes, to select and join multiple tables and create multiple output tables, and to transform table data using complex expressions.

In SQL a select statement produces a single output table. While it is possible to create new tables using a create table command, or to add columns, it is not easy to modify a schema once the system is created, especially when tables or columns are to be dropped. In InfinityDB, schemas are more flexible, and can be manipulated by PatternQueries. It could be said that an InfinityDB schema is ‘late binding’, where SQL is ‘early binding’. Relational databases become entangled with application code that embeds assumptions about the schema. PatternQueries provide a firewall or interface via well-defined REST calls to keep application code from making rigid assumptions or violating security constraints. This is follows the principles of Object-Oriented Programming, where interfaces hide internal private object or database implementation details. The implementation details such as schema structure and so on are therefore more fluid and adjustable at any time. PatternQueries make a database ‘active’.

Execution Performance

The logically recursive navigation of the pattern tree does not correspond necessarily to physical access, as there are re-writes of the pattern for semantics and speed. Each navigation recursion is efficient, not simply following the structure blindly, but instead doing direct efficient ‘B-tree’ index accesses at each step. So iterating over an ‘outer’ symbol, i.e. data values found nearer the pattern root, does not require visiting all of the contents of a subtree, i.e. the ‘inner’ symbols, Item-by-Item. Instead, each match of each outer symbol requires only one database access, hence performance can sometimes be orders of magnitude higher than simple depth-first recursion, depending on the size of each inner tree, which may be very large.

Joins

In patterns, a symbol is allowed to occur in multiple places, forming a ‘join’. Using joins it is possible to create queries that combine input from different areas in the database, using the most efficient sources. The result can be an enormous performance improvement, similar to using indexes in an RDBMS. However, a Pattern Query effectively includes the decision about the use of such ‘indexes’ or ‘inversions’ to be encoded directly, rather than being hidden inside a query optimizer. This characteristic means that PatternQueries can have extreme speed, given good Item structures. Fast Item structures can be created simply by replicating certain kinds of data, such as by transposing or ‘permuting’ variable parts of Items and storing multiple equivalent Items. These replications replace indexes and are part of the database, navigable and mutable like any other data. The PatternQuery compiler rewrites the pattern items internally to handle joins.

Unlike SQL query compilers, the PatternQuery compiler does not need to try to guess which indexes and join orders or join techniques are best, so compiling always takes only a few milliseconds, producing an efficient plan each time. An RDBMS query compiler has far more to do, and as queries get more complex, the join count increases, compilation performance degrades tremendously (combinatorially), and optimal plans are seldom reached. In RDBMS, the proper set of indexes must be determined by guesswork or experimentation with all required queries, and indexes must be built and dropped from time to time in different situations.

Precomputed Joins

Because of the hierarchical structure of InfinityDB databases, a significant performance improvement is possible, since subtrees are logically equivalent to materialized or pre-computed joins in an RDBMS. Each nesting that occurs allows direct subtree access that bypasses indexes that propagate joins between tables. For example, an InfinityDB heirarchy where some subtrees reach three levels deep can avoid two degrees of joins depending on the query. In contrast to RDBMS materialized joins, the InfinityDB tree structures are completely dynamic, updatable directly with no re-computation. PatternQueries take advantage of these effectively pre-computed joins.

Sample Query

As an example of a very simple but useful PatternQuery, here we have one that computes an ‘inversion’, in which the variable parts of a set of database Items are reversed in the component sequence to create a new set of Items. We are using the ‘i code’ format so we can easily represent it as text. The result is a duplication of the information in the input set into a new set that can be accessed more quickly. Then, this set can be left in the database to become permanent and maintained dynamically like an index, or it can be deleted right away. The resulting new Items are accessible very quickly via the model because the model occurs earlier in the new Items, being directly after a fixed known prefix. For more on this query, see Using PatternQueries.

query {
    pattern Aircraft =id model =model;
    result AircraftModel =model id =id;
    comment 'do a simple inversion of Aircraft';
}

Here is the same query as viewed in the tabular view of the database browser:

For more see PatternQuery Reference, PatternQuery Implementation, REST APIs with PatternQueries