PatternQuery Perspective

Since time began for programming (just before the unix epoch 00:00:00 Jan 1 1970 GMT), programmers have talked about declarative programming as the ultimate goal, and how hard the current method is – procedural programming. Procedural means algorithms and the whole idea of a program having a ‘current state’ that moves along forwards or backwards as the execution proceeds, with variables, loops, if statements, and so on. The idea of declarative programming is that there is no such moving pointer into the code, just statements that describe what is wanted. There would be no reason to fret over the transitions in the state of the algorithm at various points during the execution, but instead only one single statement, executed all at once. For a DBMS, the input is the state of the database, the output is the desired information.

SQL is a declarative language for database access, and has been enormously successful as a result. SQL has no algorithms, no current execution point, no variables, no loops, no if statements and so on. SQL is therefor accessible to a vastly wider range of people than software engineers. Once compiled, an SQL statement always completes, at least in principle. The only debugging needed is to refine the query and re-execute.

PatternQueries are for declarative database access as well, but can do far more than SQL, taking declarative programming one step farther. It is possible to write simple queries just to retrieve data, like SQL, but also one can write much more complex queries that encroach on the generality of algorithms and procedures. An individual PatternQuery can gather, validate, transform, compute or distribute data with diverse sources and destinations from multiple intra-database and external locations. A PatternQuery can act like a complete application by itself for some tasks, with no procedural glue code, with simple, obvious structure, and with a very small efficient definition.

Simple Hierarchical Data

PatternQueries operate not only on tables, but also on a hierarchical generalization of tables called an ‘ItemSpace’, which is very flexible and simple, yet which can represent almost anything. A database structured in an ItemSpace can easily be extended at any time in small increments – unlike strict tables, which enforce a single rarely modified structure called a ‘schema’. When changed, a tabular schema can force large sections of applications to be rewritten, or even abandoned. This feature of the ItemSpace is called ‘late binding’. A late binding database can also do what an early binding database is forced to do, but the reverse is not true.

Simple Fast Execution

PatternQueries begin to run almost instantaneously and are efficient, because the ItemSpace can be configured for speed. Structures in an ItemSpace can effectively be ‘pre-computed’ portions of the query, which are analogous to the expensive SQL ‘materialized join’ but which are completely dynamic instead. Because of this capability, the PatternQuery has far less work to do, and can begin execution in tens of microseconds, not seconds to minutes, and some queries complete almost immediately. SQL has to solve a terrific riddle on each query, involving a combinatorially difficult process of choosing indexes, join methods, the execution order of the joins, temporary space allocation and more. PatternQuery instead just rewrites the query to handle joins and other issues immediately.

Patterns and Symbols

In a PatternQuery, one describes the desired information using a ‘pattern’, which is a template for matching and selecting database data. The pattern can have a simple linear structure but in general is capable of matching any kind of tree in the database. Each branch of the pattern tree is called a pattern, and it is comprised of a single series of literals, expressions, and symbols. The literals are constants like numbers, strings, and so on. The expressions are as general as those of any programming language – in fact they are identical to Java, C, C++ and JavaScript expressions, so for example one can filter on (price * quantity < 130.0 && quantity < 10 ). This expression grammar can be learned in minutes by anyone and represents a tiny fraction of the grammar of those languages, plus there are more data types and some extensions.. The symbols represent unknowns, and are written like =mysymbol. It is easy to alter these patterns to define, refine or speed up a query. It is possible to share symbol names between patterns, resulting in ‘joins’ which are vital in dealing with complex data in rich ways, and which can also speed queries by orders of magnitude, especially when combined with well-designed database configurations.

Results and Symbols

The output of the query is as simple as the input. One merely provides one or more ‘results’, each of which is a branch on the result tree. Again, there are literals, expressions, and symbols. Each result corresponds to one more ‘Item’ or ‘fact’ that is added to or removed from the database, and it can plug in or compute with the values that were assigned to the pattern symbols each time a match occurs. For example an expression embedded in a result could be (quantity * price + shipping). A result ‘fires’ when the matching of the patterns reaches a point where the symbol values necessary for the result are determined. Because there can be unlimited results, the database can be added to or updated in unlimited ways, whereas SQL can only have a single kind of effect on or return a single table at a time. Each result symbol has an ‘action’ which determines how it deals with its data: it can copy, move, union, difference, and more. So a PatternQuery is like a whole collection of SQL statements at once, and it is more like a small tidy ‘application’ than a simple query. When SQL is embedded in an application, multiple SQL statements are executed for any task, mixed in with procedural code that moves data into and out of the SQL queries and does some computations – an expensive, rigid mess.

Fancy Symbols

Symbols are much more than designators of unknowns in patterns and knowns in results. Each can be defined in a ‘Where table’ to have certain simple attributes that affect the meaning. For example, a symbol can be told to match only the first or last possibility, or at most 10 or a value greater than a given value. The dozen or so possible symbol attributes, such as the minimum or ‘from’ attribute, can have values that are also literals, expressions, or symbols. This allows fantastic expressive richness, producing a possible network of relationships between symbols. Some such networks are ‘uncomputable’ because they create circular definitions, but if not, then anything is allowed, as long as expressions do not violate the ‘strong typing’ rules for intermediate computations which are there to prevent certain errors in advance. These networks are declarative nevertheless, and one can read the query as a simple statement of the acceptable relationships and computations. The symbol networks define the query in a way that can maximize performance far beyond that provided by boolean filter expressions.

Reducers

As powerful as the symbol networks are, multi-step computations are still needed sometimes. A ‘reducer’ is a kind of symbol that can perform computations over series of pattern matches, accumulating results as the matching goes along. The reducer can refer to its state on the previous match, and update that state. Reducers are sequential like procedural programs, and can achieve many similar goals, but the definition of the reducer in the query is declarative, and there is no concept of a ‘current execution point’ or loops, if statements, variables and so on. The reducer (which is a ‘recursive function’ to be precise) simply describes in one place a transformation between two successive states, with input from any other part of the query to be combined with each transformation. So it participates in the network of symbol relationships, adding yet more richness without any non-declarative code. One reads the reducer as an ‘input’ and ‘output transformation between two states in combination with other symbol or expression values.

For simplicity, many predefined reducers are provided, such as count, sum, mean, variance, sigma, and so on. There are also timers and randomizers. Each of these is is a different ‘symbol kind’, and there are many kinds of symbol. Each kind can be told to reset itself dependent on the pattern matching of portions of the pattern tree, or to omit the output up to the last iteration, and other clearly understandable things dependent on a few general purpose symbol attributes.

Multiple Inputs and Outputs and Client Queries

A feature called ‘root maps’ allows the input and output of the query to be redirected. The patterns and results can individually be instructed to use data from diverse sources and destinations. An important example is that a PatternQuery can match source data that comes into the InfinityDB v6 Client/Server web server via client application programs, and then can create results to go back to the clients. Such an interchange is a ‘REST’ access, which is a request/response protocol compatible with many languages and systems, including Python and JavaScript. So a single PatternQuery acts like a complete application program, serving requests via a designated access URL. Each request causes the PatternQuery to be re-compiled and re-executed. This ‘client query’ can gather database data, process it, validate it, output database data, and deal with the request and response headers and content. Multi-step queries are possible for added generality. No special procedural code is necessary. Simply because of the flexibility of the ItemSpace, a client PatternQuery can choose to get and store data representing logs, configuration control, user permissions, and so on. The meaning of the data is determined by the definition of the PatternQuery, not by the PatternQuery execution system, so a log is not a concept that the execution system knows about. The InfinityDB Server also provides user access control to multiple databases.

Advanced Technology

The internals of PatternQuery are invisible to the PatternQuery author entirely, and there are no configuration files, installation procedures, tuning, administrator maintenance or other issues at all. Yet, the system uses many advanced techniques for high performance and features. It is based on the InfinityDB database engine, a tried and true, robust, fast hierarchical DBMS. A PatternQuery can be embedded into any existing Java program using InfinityDB with no dependencies on outside code. Special hidden algorithms include ‘zig-zag join’, pattern rewriting for optimization using topological sorting and transitive symbol resolution, expression pre-compiling, constant folding, common subexpression factoring, value change notification and value caching, predicate push-down, symbol re-ordering, range and literal access optimization and more. These features and more are invisible to a PatternQuery author except in reaching far higher performance in many situations.

See for Yourself

To get deeper, see PatternQueries, then see PatternQuery Usage for a full explanation of the details. A set of examples may be what you need to get a feel for it at PatternQuery Examples. If you are an SQL guru, there are some elementary explanations of the internals at PatternQuery Implementation. Contact us at support@boilerbay.com for more. We can provide you access to the sandbox at the infinitydb.com server to experiment with, and to your own database as well.