This is intermediate-level documentation – for a general tour and introduction, see InfinityDB Server, for a leisurely discussion see PatternQuery Perspective and for more detail see PatternQuery Examples, PatternQuery Reference, or PatternQuery Implementation, and also see REST Access with PatternQueries.
A minimal PatternQuery in the server that completely defines a REST interface to return an image based on its name. This is the ‘i-code’ text format, but there are also JSON and CSV text formats, and a graphical mode – all editable. There is a ‘join’ on the =name symbol.
Free REST Code
A PatternQuery is Data
A PatternQuery is much simpler to write than an equivalent program in Java, Python, or other language. If PatternQuery were to be compared with a traditional programming language, one could say that a useful query takes only a few ‘lines’ of code, but a query is actually just a hierarchical data structure that fits in with the InfinityDB data model itself. There is an optional trivial text format called ‘i code’ that programmers may feel more comfortable with, and it is no more complex than SQL and is easily readable. The i code is translated into the simple data structure called ‘Items’ in an ‘ItemSpace’ before execution.
Edit Queries as Data
The Server has a back-end web-based database browser that allows direct editing of PatternQueries in a convenient, graphical format, in the same was as any other data. All data, including PatternQueries, can be viewed and edited in a user-chosen format, including graphical, i code, JSON, CSV, texts, images and more. The database can store and display images, text files or blobs of any mime-type, intermixed with rich ‘ItemSpace’ data structures.
Pattern and Result
PatternQueries are not table-oriented as SQL, but can operate on the semi-hierarchical ItemSpace data model to do any kind of retrieval, update, or schema change dynamically.
The input of a PatternQuery is defined by a ‘pattern’ which is a simple example of the tree structured subset of the database or the request content to be collected together. Symbols in the pattern define the unknowns in the input that are determined during the logically hierarchical scan of the database.
The output is also an example called a result, which is just a declarative tree structure with embedded symbols. The input can ‘match’ on any subset of the database to produce changes in the database by ‘firing’ results. The output can go into the database or can generate a response content or do other things.
Comparison to SQL
The SQL equivalent of a single PatternQuery would be a set of many commands to create or drop tables, to add or remove columns, to do inserts or deletes, to select and join multiple tables and create multiple output tables, and to transform table data using complex expressions. Thus the equivalent SQL operation would actually require many statements to create multiple retrieved tables, which must be tied together with traditional code embedded in a server-side program. Also, that code would never touch the schema while in operation, while PatternQueries dynamically and incrementally extend it. A single PatternQuery execution returns all of the result data at once in a custom dynamically determined nested structure with no algorithmic glue code.
In SQL a SELECT statement produces a single output table. While it is possible to create new tables using a create table command, or to add columns, these commands are always executed by a Database Administrator User, not an application. Moreover, it is not easy to modify a schema once the system is created. Relational databases become entangled with application code that embeds assumptions about the schema, tying particular versions of the application to particular instances of the database having compatible structures. This complicates backup and restore also.
There is an early design process in tabular systems called ‘normalization’ during which permanent assumptions about the structure are nailed down. Any data that does not fit into this table breakdown is forbidden – in fact impossible. Mistakes in the normalization process create big problems later when it is discovered that the data is richer than assumed. The normalization process breaks down the data into a fixed set of tables, and each step produces more and more tables. Multiple tables must be reunited later explicitly with joins coded into each SQL SELECT query, and they cause additional complications for the query compiler during planning. Normalization is dependent partly on the one-to-many relationships present in the data, and these relationships must be determined and fixed early on. These issues do not come up in PatternQueries.
Generating New Data
Each table that needs to be related to others via one-to-many associations uses an additional primary key column and foreign key columns in the other tables. These primary and foreign keys are almost always increasing integers that must be generated uniquely during application execution and which constitute a complication of the overall system by ‘creating data’. These new entities require their own indexes for reasonable performance. The one-to-many relationships require effort to maintain consistently and to use, while InfinityDB avoids creating new data as much as possible primarily by using the nesting feature, thereby preserving consistency innately. InfinityDB follows this rule:
Entities must not be multiplied beyond necessityWilliam of Ockham
Firewall and ‘Active’ Databases
PatternQueries provide a firewall or interface via flexible extensible REST APIs to keep application code or other external clients from making rigid assumptions or violating security constraints. This is follows the principles of Object-Oriented Programming, where interfaces hide internal private object or database implementation details. The implementation details such as schema structure, consistency constraints and so on are therefore more fluid and adjustable at any time by replacing relevant queries. PatternQueries make a database ‘active’, and the PatternQueries stay attached to and embedded within the data they relate to.
Multiple named interfaces can be attached to an InfinityDB database, so it can be interacted with from various perspectives. Interfaces are securely protected with fine-grained permissions. For example, some roles might have execute permission on certain databases through certain interfaces or interface name prefixes, plus possibly ‘setter’ or ‘getter’ execute permissions on others. The setters and getters allow classifying individual queries based on whether they logically only read or logically write to the database. Queries that relate to any interface and their individual setter/getter access classifications can be added or removed at any time by query authors having write permission with no special effort. However, users, roles, databases, and all permissions are controlled by the admin, who can modify them at any time. For example, a ‘public’ role might be given access to certain databases through only ‘getter’ queries on selected interfaces. Getters are in fact able to update data, such as for logging accesses or for debugging as defined by the query.
The logically recursive navigation of the pattern tree does not correspond necessarily to physical access, as there are re-writes of the pattern for semantics and speed. Each navigation recursion is efficient, not simply following the structure blindly, but instead doing direct efficient ‘B-tree’ index accesses at each step. So iterating over an ‘outer’ symbol, i.e. data values found nearer the pattern root, does not require visiting all of the contents of a subtree, i.e. the ‘inner’ symbols, Item-by-Item. Instead, each match of each outer symbol requires only one database access, hence performance can sometimes be orders of magnitude higher than simple depth-first recursion, depending on the size of each inner tree, which may be very large.
In patterns, a symbol is allowed to occur in multiple places, forming a ‘join’. Using joins it is possible to create queries that combine input from different areas in the database, using the most efficient sources. The result can be an enormous performance improvement, similar to using indexes in an RDBMS. However, a Pattern Query effectively includes the decision about the use of such ‘indexes’ or ‘inversions’ to be encoded directly, rather than being hidden inside a query optimizer. This characteristic means that PatternQueries can have extreme speed and responsiveness, given good Item structures.
Fast Item structures can be created simply by replicating certain kinds of data, such as by transposing or ‘permuting’ variable parts of Items and storing multiple equivalent Items. These replications replace indexes and are part of the database, navigable and mutable like any other data. The PatternQuery compiler rewrites the pattern items internally to handle joins when such replications are mentioned in the query. The replications further provide the foundation for additional structure.
Unlike SQL query compilers, the PatternQuery compiler does not need to try to guess which indexes and join orders or join techniques are best, so compiling always takes only a few milliseconds, producing an efficient plan each time. An RDBMS query compiler has far more to do, and as queries get more complex, the join count increases, compilation performance degrades tremendously (combinatorially), and optimal plans are seldom reached. In RDBMS, the proper set of indexes must be determined by guesswork or experimentation with all required queries, and indexes must be built and dropped from time to time in different situations.
Because of the hierarchical structure of InfinityDB databases, a significant performance improvement is possible, since subtrees are logically equivalent to materialized or pre-computed joins in an RDBMS. Each nesting that occurs allows direct subtree access that bypasses indexes that propagate joins between tables. For example, an InfinityDB heirarchy where some subtrees reach three levels deep can avoid two degrees of joins depending on the query. In contrast to RDBMS materialized joins, the InfinityDB tree structures are completely dynamic, updatable directly with no re-computation. PatternQueries take advantage of these effectively pre-computed joins. A special ‘ZigZag’ algorithm can take advantage of this to dynamically compute joins, intersections, and negations even in the absence of indexes.
As an example of a very simple but useful PatternQuery, here we have one that computes an ‘inversion’, in which the variable parts of a set of database Items are reversed in the component sequence to create a new set of Items. We are using the ‘i code’ format so we can easily represent it as text. The result is a duplication of the information in the input set into a new set that can be accessed more quickly. Then, this set can be left in the database to become permanent and maintained dynamically like an index, or it can be deleted right away. The resulting new Items are accessible very quickly via the model because the model occurs earlier in the new Items, being directly after a fixed known prefix.
A simple query to permute or rearrange the variable parts of a set of Items. The input is the Aircraft class, and the output is the AircraftModel class, while model and id are attributes, and =model and =id are symbols.
Here is the same query as viewed in the graphic view of the database browser:
For more about the basics and a tour see The InfinityDB Server. For detailed lower-level aspects, see PatternQuery Examples, PatternQuery Reference, PatternQuery Implementation, and REST APIs with PatternQueries