PatternQuery Examples

Here we show queries from very simple to advanced, and some REST accesses. Look in the demo/readonly or demo/writeable databases for these, such as on https://infinitydb.com or your own AWS instance.

Person Residence Database

Here are some Items in i-code Java style that we will query over (there is also ‘Python style’ i-code). Each row ending in a semicolon is parsed into a single Item. For example, the first Item is DemoData Person 'joe' residence '133 some street' 'soquel' 'ca' 'usa' end_date 2023-07-25T-07:00. We have omitted the overall prefix DemoData Person, and overall prefixes are implicit in many contexts. Joe has lived recently at two addresses. Each address is a tuple that uniquely identifies an address, rather than, say, a unique number.

{
    'joe' residence {
        '133 some street' 'soquel' 'ca' 'usa' {
            end_date 2023-07-25T-07:00;
            start_date 2023-07-9T-07:00;
        }
        '62 joes other address' 'capitola' 'ca' 'usa' {
            end_date 2023-07-8T-07:00;
            start_date 2022-10T-07:00;
        }
    }
    'sally' residence '3362 another street' 'santa barbara' 'ca' 'usa' {
        start_date 2023-01T23-08:00;
    }
}

Below we show how this database looks in the database browser’s table display mode. We are using ‘nested’ attributes for the residence and the dates. Without nesting, this would look more like a traditional table, but we want multiple values for the residence for each person, plus start and end related to each residence. The database can represent almost any structure desired, and we can nest any combination of classes or attributes. The table display mode has a small set of universal rules that govern the formatting of any data, with a ‘tabular’ format where possible. Classes are in blue and attributes in green.

Avoid Numeric Primary Keys

The residence is a tuple, so each residence is a unique identifier, without extra newly created numeric primary keys that have to be joined on each retrieval as in SQL. And, this is just one ‘table’, so no joins are necessary for simple cases, while in SQL this would be two tables. We could have further nestings, while in SQL that would require more and more tables, each of which has to be manually created and manually specified with joins for each select statement, while the rows have to have unique numeric primary keys that must generated consistently and must be kept in sync. Such SQL databases cannot be combined once further numeric primary keys are generated differently, while in InfinityDB two sets of Items can simply be unioned by inserting the Items of one database together with another.

Select country

Let’s select out the country from the residence and show the people there. The ‘out’ symbol puts the output in the response content, which can be seen on the web page if the query is executed in a browser or the response returned to a REST request. The output is automatically sorted according to country and person:

query {
    Where 'out' kind 'response content';
    pattern DemoData Person =person residence =street =city =state =country;
    result =out PersonState =country =person;
}

The result has two Items:

PersonState 'usa' {
  'joe' ;
  'sally' ;
}

To restrict the output to just ‘usa’, one can simply replace =country with ‘usa’. Or to find data about joe, change =person to ‘joe’.

To provide the input or output in a different structure we can just change the pattern or result. Note that the result structure here assumes that each person has only one residence – that was why we structured the data as above for more generality or ‘normalization’. We can force the output to be the first residence and do other things to ensure that.

query {
    Where 'out' kind 'response content';
    pattern DemoData Person =person residence =street =city =state =country;
    result =out Person =person {
        street =street;
        city =city;
        state =state;
        country =country;
    }
}

Select Country from REST

From Python we can execute the above to get a dict of countries and persons. We can either install the infinitydb module, or use the basic REST interface from any language. To install the module do: python3 -m pip install --upgrade infinitydb. Then:

from infinitydb.access import *

infinitydb_url = 'https://infinitydb.com/infinitydb/data'
database_uri = 'demo/readonly'
user = 'testUser'
password = 'db'
infdb = InfinityDBAccessor(infinitydb_url, db=database_uri, user=user, password=password)
success, content, response_content_type = infdb.execute_query(
    ['examples.person','get residence state'])
print(content)

The above prints the following. It is represented as a dict, where the keys are tuples representing the individual Items. The query could also format this as a list or other structure. We are showing the dict with a prefix class, but an attribute could be used instead, or anything else. Using such a a fixed prefix allows the data returned to be extended later easily without breaking client code. The module has convenience functions to transform data in many ways, such as by transforming the tuple keys into individual keys with deeper nesting. Also, the None at the end can be trimmed off. The PersonState prefix is a Python class PersonState. We have not shown a try..except for an InfinityDBError.

{PersonState: {('usa', 'joe'): None, ('usa', 'sally'): None}}

In JSON this is (note the underscore before _PersonState to declare it an InfinityDB class – if you really wanted a string starting with underscores, just add another):

{
    "_PersonState" : {
        "usa" : {
            "joe" : null,
            "sally" : null
        }
    }
}
          

Input Residence

We can also allow modification of the database in order to add a person with their address. We could also do both input and output at the same time and do computations using expressions. ‘For free’, this query can input multiple persons by just having multiple input Items. Any kind of structure can be input and distributed into various places in the database, such as replications of the Items with various permutations of the components for speed.

query {
    Where {
        'in' kind 'request content';
        'country' action 'union';
    }
    pattern =in Person =person residence =street =city =state =country;
    result DemoData Person =person residence =street =city =state =country;
}

Actions

We use Where 'country' action 'union' to cause the new Items to mix in with existing Items that are extensions: otherwise existing start_date and end_date would be deleted. In a result, the innermost i.e. rightmost symbol determines the effect based on the action symbol attribute, which defaults to copy. Copy is a clear followed by union, which means the suffixes of the prefix of the innermost result symbol are deleted, and then the new Items are inserted.

Actions are very powerful, and all are based on the prefix of innermost result symbol. The actions are:

  • copy – delete all extensions of the prefix followed by union
  • clear – delete all extensions of the prefix
  • ensure_clear – stop with an error if there are pre-existing extensions of the prefix
  • move – a copy, then clear the source – this requires that the symbol be innermost in a pattern as well
  • move safely – ensure clear, then move
  • move nowhere – clear source
  • union – just insert the new Items without clearing
  • move unioning – move without clearing
  • difference – just delete the new Items without clearing
  • move differencing – difference, then clear the source

Input More Complex Data

Here is a query to input residence dates and postal_code at the same time or later on. Any kind of data can be added, not just the dates, and the database will be extended dynamically, such as with the extra postal_code attribute. Each attribute is optional and can be multiple, but we could be more constraining with Where 'start_date' values 1 or use the ‘max_values’ or ‘min_values’ symbol attributes to cause the query to stop with an error. Alternatively, a Where 'start_date' limit 1 will just drop any extra values in the source beyond the smallest. Pre-existing start_date, end_date, postal_code or their extensions are deleted unless the default copy action is changed.

query {
    Where 'in' kind 'request content';
    pattern =in Person =person residence =street =city =state =country {
        start_date =start_date;
        end_date =end_date;
        postal_code =postal_code;
   }
    result DemoData Person =person residence =street =city =state =country {
        start_date =start_date;
        end_date =end_date;
        postal_code =postal_code;
    }
}

The input was:

Person 'joe' residence '23 new address' 'soquel' 'ca' 'usa' {
    end_date 2023-05-10T-07:00;
    postal_code 95003;
    start_date 2023-04-4T-07:00;
}

And the resulting database is below. In the future, the query on country above will still work, and queries on postal_code will work too, but will omit people with no postal_code unless a default value is given.

{
    'joe' residence {
        '133 some street' 'soquel' 'ca' 'usa' {
            end_date 2023-07-25T-07:00;
            start_date 2023-07-9T-07:00;
        }
        '23 new address' 'soquel' 'ca' 'usa' {
            end_date 2023-05-10T-07:00;
            postal_code 95003;
            start_date 2023-04-4T-07:00;
        }
        '62 joes other address' 'capitola' 'ca' 'usa' {
            end_date 2023-07-8T-07:00;
            start_date 2022-10T-07:00;
        }
    }
   'sally' residence '3362 another street' 'santa barbara' 'ca' 'usa' start_date 2023-01T23-08:00;
}}

Unrestricted Input

A simpler way to write the query can instead allow input to specify entire subtrees i.e. sets of suffixes at once using just a Where 'suffixes' type 'item' symbol attribute, generalizing the query for more flexibility but less control. There is no restriction on the data input for the person but all of it is stored under DemoData Person. It is possible to combine this general purpose pattern with more specific pattern matching in the same query in order to handle part of the data is some special way. The default copy action causes all of the pre-existing Items related to each input person to be cleared first. ‘For free’ multiple persons can be input in one execution. You know you have a good ‘normalized schema’ when you can use multiple Items like that. We have to explicitly match on =person so that the =suffix matches only on the new data specific to the given person – otherwise all of the Person data would be deleted due to the default action ‘copy’ on the =suffixes symbol. (If you ever delete more than you want or have other trouble, you can rollback to the previous commit using the rollback button, but you can’t rollback further than that. Do commits at reasonable times.)

query {
    Where {
        'in' kind 'request content';
        'suffixes' type 'item';
    }
    pattern =in Person =person =suffixes;
    result DemoData Person =person =suffixes;
}

Maintaining Inversions

The above input query for a person residence can be extended, for example to store replicas of the data with a permutation of the components for later use in more convenient ways. Importantly, for large databases, the permuted Items may be accessed very quickly when the set of known components tends to be nearer the left, i.e. ‘outer’ components are known. It is not necessary to have a complete known prefix, but any mixture of knowns and unknowns are allowed. For example we can maintain a Residence inversion:

query {
    Where {
        'in' kind 'request content';
        'suffixes' type 'item';
    }
    pattern =in Person =person {
        =suffixes;
        residence =street =city =state =country;
    }
    result DemoData {
        Person =person =suffixes;
        Residence =country =state =city =street person =person;
    }
}

In this case, we match both on =suffixes, which captures all of the data because it is of type ‘item’, and at the same time we match on the residence tuple which picks out part of the data and stores it in a new class Residence, where we chose to reverse the sequence of components in the tuple. Any structure or number of permutations can be used. Because we did this multiple matching, each person must have a residence or else is ignored. To allow persons that have no residences, break =person into =person1 and =person2, one for each pattern and result. The single =person symbol links the patterns together.

Inversion Speed

Later, we can access through that ‘inversion’ with very high speed, such as by providing the country or country and state or country and state and city and so on, to drill down. Data to the ‘left’ i.e. ‘outer’ is a key for data to the right, which may be accessed as a unit, but there is no fixed boundary between key and data as there is in other DBMS. It is less efficient when only some of the data to the left is known, such as if we know the state but not the country, in which case we scan only the country, and then do direct access on state. For each state, we can then easily and quickly retrieve all cities in that state and so on. If there are few countries, almost no performance hit occurs.

This consideration of ‘key’ versus ‘data’ is only a performance consideration and does not affect the logical capabilities of queries or limit them in any way. An appropriate inversion can speed queries for large databases by orders of magnitude by changing linear scans to direct accesses, which take only milliseconds. The query compiler always knows which data is outer and will use the best path. It is safe and reasonable to provide country and city but not state or street or any other combination of knowns and unknowns. However, the query compiler only works with data specified in the pattern, so the inversions have to be specifically matched on. Adding a new inversion does not automatically rewrite queries to use it. Also, there is really no special treatment of Items that happen to correspond to each other by containing the same data, such as by being permutations. In fact sometimes the inverse Items are looked at as the ‘original’ and the ‘original’ can be considered the ‘inversion’ – it is just a particular point of view. The ‘inversion’ can even have further structure inner to it.

Get Residence Duration

Let’s get the time of residence of each person at each address. In the Where we declare the data types of ‘date’ for the dates and ‘tuple’ for the residence address, for the compile-time strong type checking that catches many errors in advance. The dates have defaults to the current date and time so a missing date will not cause an ‘undefined’ symbol error in the expression, such as for sally. For convenience, using a type ‘tuple’ means the =residence_address matches all of the ‘primitive’ components as a unit, i.e. the components between classes and attributes thereby making the symbol independent of the number of address components actually found in each Item. This actually allows storing tuples with variable lengths. Note that queries are kept in the database under a prefix like Query "examples.person" "get residence duration" so there is no need to keep typing this in.

{
    query {
        Where {
            'end_date' {
                default (new Date());
                type 'date';
            }
            'out' kind 'response content';
            'residence_address' type 'tuple';
            'start_date' {
                default (new Date());
                type 'date';
            }
        }
        pattern DemoData Person =person_name residence =residence_address {
            end_date =end_date;
            start_date =start_date;
        }
        result =out TimeSpentAtResidence =person_name =residence_address duration (((long)end_date - (long)start_date)/1000/60/60/24/30) 'months';
    }
}

The result is sorted as usual.

TimeSpentAtResidence  {
    'joe' {
        '133 some street' 'soquel' 'ca' 'usa' duration 0 'months' ;
        '62 joes other address' 'capitola' 'ca' 'usa' duration 9 'months' ;
    }
    'sally' '3362 another street' 'santa barbara' 'ca' 'usa' duration 6 'months' ;
}

Stateful Symbols like Counters

In order to do calculations based on the sequence of matches, we use ‘stateful’ symbols including counters, sums, max, min, statistics, hashing, random, timestamps, and general-purpose ‘reducers’. Here is how to count the residences for each person. The Where 'address' summary true suppresses the running counts except the last. Where 'address' type 'tuple'; makes =address match the entire tuple, and it is what we count on. Where 'count' kind 'counter from one'means the default is 0 and it is from 1, while a ‘counter from zero’ has no default but starts at 0 on the first match. Regular ‘counter’ kinds can optionally have explicit default, from and increment symbol attributes.

Multiple stateful symbols can be interspersed anywhere in a pattern and do not affect the matching but only accumulate values based on calculations of some kind. For example, putting a symbol of kind sum immediately after another ‘outer’ pattern symbol totals up the outer symbol’s values – other statefuls are similar.

When a plain symbol is given a reset false symbol attribute, then the stateful symbols inner to it are not reset at the beginning of each iteration pass and the scope of the accumulation widens to the nearest outer symbol that is not reset false. Likewise, if reset is set true, then all stateful symbols inner to that one do reset on each immediately outer iteration pass. Any mixture of reset settings is allowed. Setting summary true has similar scoping rules, and it just suppresses output for all but the last iteration of the stateful inner symbols .

query {
    Where {
        'address' {
            summary true;
            type 'tuple';
        }
        'count' kind 'counter from one';
        'out' kind 'response content';
    }
    pattern DemoData Person =person residence =address =count;
    result =out PersonResidenceCount =person residences =count;
}

The output is:

PersonResidenceCount  {
    'joe' residences 2;
    'sally' residences 1;
}

Blob Query

In order to query for a blob, you use an ‘item’ type symbol. The query below will retrieve a picture based on its name. For Python, you use a get_blob_query() or in the REST URL a parameter &action=execute_get_blob_query. The response content will be the blob, and the return will include success, content, response_content_type. You can also put blobs with put_blob_query() or &action=execute_put_blob_query.

    query {
        Where {
            'image' type 'item';
            'in' kind 'request content';
            'out' kind 'response content';
        }
        pattern {
            Pictures =name =image;
            =in name =name;
        }
        result =out =image;
    }

If you execute this in the browser query view, you get more than the blob itself, but instead everything under that prefix as below. In order to see just the blob part of it as it will be returned via REST, do ‘show response as blob’.

Fish Farm

Here we will show the power of joins and expressions.

This is a database of fish farms and the fish prices over time. Notice the join on =fish_species to relate the farm to the prices over time. The =fish_species symbol occurs both in the fish_species attribute of the FishFarm class and in the species attribute of the FishMarket class. The result is just rows of data under the FishFarmProfit class that could be considered a series of tuples, such as in a CSV file, or spreadsheet. The result doesn’t have to be a single ‘CSV’ set of rows but can be arbitrarily complex, even creating multiple ‘outputs’ or structures. The =profit symbol takes data from the market price, the cost of the fish to each farm, and the number of fish of that species at the farm. This query might be kept available in the database itself for future re-use, as a set of Items just like any other data, say under Query "fish farm" "profits" query ....

This query is matching a symbol we don’t need – location – but this allows us to change the result easily without having to add or remove extra symbols, without affecting the query or its performance. Unused symbols in the pattern are ‘wildcards’ and either have no effect or can be meaningful as placeholders.

query {
    Where 'profit' equals ((price - fish_cost) * fish_count);
    pattern {
        FishFarm =farm_name {
            fish_cost =fish_cost;
            fish_count =fish_count;
            fish_species =fish_species;
            location =location;
        }
        FishMarket date =market_date {
            price =price;
            species =fish_species;
        }
    }
    result FishFarmProfit =farm_name =market_date =profit;
}

Here is the data in the database (this might have been loaded via the CSV format for example, and then cleaned up by a special PatternQuery):

{
    FishFarm {
        'santa cruz fish' {
            fish_cost 2.18;
            fish_count 30000;
            fish_species 'tilapia';
            location 'soquel CA';
        }
        'soquel fish' {
            fish_cost 5.03;
            fish_count 5000;
            fish_species 'tilapia';
            location 'soquel CA';
        }
    }
    FishMarket date {
        2021-10T-07:00 {
            price 10.01;
            species 'tilapia';
        }
        2021-10-2T-07:00 {
            price 10.5;
            species 'tilapia';
        }
    }
}

The profitabililty of the various fish farms is below. Dates are by default always in the ISO date format, although the day is omitted when it is 1 and the time part is omitted when it is 00:00:00..

FishFarmProfit {
    'santa cruz fish' {
        2021-10T-07:00 234900.0;
        2021-10-2T-07:00 249600.0;
    }
    'soquel fish' {
        2021-10T-07:00 24899.999999999996;
        2021-10-2T-07:00 27350.0;
    }
}

Downloading Data

Here is the same result as Items, which looks a lot like CSV. but is space delimited. In the server’s backend database browser, there is a ‘Show Items’ view mode that shows this, and the FishFarmProfit prefix can be factored out there automatically by selecting it in the ‘current prefix’. Then you can copy and paste this or part of it into Excel and convert text to columns using a space delimiter.

You can apply the simpleDateFormat(‘MM/dd/yy HH:mm:ss ZZ’) operator to a string, to convert it to a date or the reverse. Excel has a function to convert dates to ISO which is =TEXT(a1,”yyyy-mm-ddThh:MM:ss”). There is also a =DATE(…) function for the reverse. Unfortunately, Excel dates do not have time zones and are only local, not points in time. InfinityDB dates are internally standard longs representing milliseconds since the epoch January 1 1970 at 00:00:00UTC . In order to have the query output dates for Excel, use the date formatting function with the appropriate format for your locale: just put in (market_date.simpleDateFormat(‘MM/dd/yy HH:mm:ss ZZ’)) instead of =market_date in the result, and you get (but you may want to delete the ZZ first):

"santa cruz fish" "10/01/21 00:00:00 -0700" 234900.0
"santa cruz fish" "10/02/21 00:00:00 -0700" 249600.0
"soquel fish" "10/01/21 00:00:00 -0700" 24899.999999999996
"soquel fish" "10/02/21 00:00:00 -0700" 27350.0

Adding Structure to Fish Farm

Data can be added in many ways without affecting the fish farm query. For example, Attributes can be added to the FishFarm class, such as food_usage on all or some of the farms. The entire FishFarms class does not need to be altered to change the schema. It is easy to add data structures beneath existing data, such as a fish subspecies by plugging it in under fish_species – you just store fish_species ’tilapia’ ‘some subspecies‘;. The further data can be any kind of structure, without affecting the query because the =fish_species is matching on isolated ‘primitive’ data types by default, which ignores nested data. A query to capture all nested data as well would do here 'fish_species' type 'item'; where an item is zero or more components of any data types. Or to capture just the species and subspecies, use two symbols together: FishFarm =farm_name fish_species =fish_species =fish_subspecies;. Another possibility is to allow the species to expand in the future as a tuple with Where 'fish_species' type 'tuple'; Then, a fish species is a tuple, meaning it is any series of zero or more primitives delimited by any ‘meta’s inner to that. (A ‘meta’ is a component of class or attribute type, and the ‘primitives’ are the other 10 types.)

CSV mode

There is a ‘Show CSV’ view mode in the backend database browser which is compatible with Excel and other standard data sources. In this case, the lines are prefixed by increasing Index data types, so that one has a list. If you select Show CSV first, then paste in some CSV and store, these Index numbers are generated automatically, and the result is something like a ‘document’ so the line order is preserved. Then this can be parsed using a custom query into anything needed, including doing the date reformatting. These documents are shown as lists in the other display modes or formats, such as JSON or i Code. Python or JavaScript working with data arriving or leaving through REST client queries or via direct REST access can modify and read the database in a clean, structured way, but currently cannot parse and format CSV.

Formatting

The CSV i.e. ‘Comma-Separated-Values) format is widely used. CSV mode input tries to parse fields into strings, numbers, and ISO dates. If a field is not parseable as a number or date, it becomes a string. If you want a number or date to come out as a string component instead, precede it with an underscore. If you want to put in any other component types, put in a valid token, such as EntityClasses, Attributes, char[] as Chars(“..”), byte[] as Bytes(10_20), byte strings as ByteString(10_20), Indexes as [10], or booleans as true or false and so on, precede them with underscore, but if you put an underscore in front of a field that is not a valid isolated token, it will stay a string.

CSV mode preserves as much of the original text format as it can, so you can store a CSV file and then retrieve it in a format that is still compatible externally (spaces are deleted around numbers and dates for example). Line delimiters are flexible on input, but become the platform-dependent terminator on output. CSV quotes commas and double quotes within fields by surrounding the field with double quotes, but these double quotes are not considered part of the original data. An original double quote within a field is doubled. Therefore, double double quotes quote double quotes.

Extraction, Transformation, and Loading

For uploading large amounts of data, the REST interface is very effective. You can write a loader query that accepts a reasonable amount of data – say 1MB – and adds that to the accumulating database. The query can do various kinds of cleansing and transformation. A python program can extract the data from somewhere else or generate the data on the fly, and do REST accesses.

Also consider breaking data into multiple CSV documents, such as by prefixing each document with some other component type, like a long or another Index or a string ‘name’ or file name. This extra breakdown can then be handled in queries by putting a symbol in the place of the extra breakdown component, followed by a symbol for the line number, which is an Index data type. A query can renumber the lines later into a huge CSV document if desired or just use ‘wildcard symbols’ matching the breakdown and line number components to make them be ignored. The fields within the lines can be extracted, parsed, cleansed, transformed, and transferred anywhere else or the reverse with a suitable query, so that they become more understandable, flexible, and extensible, in a semantically correct schema. A single query can do this for any subset of the fields at once, and it can send the data to multiple destinations or retrieve it and assemble one or more CSV documents in elegant ways. The fields on input have some implied significance based on context including the field position each line, but the query can store them in ways that makes their significance clear and standardized by adding meta components (class and attribute components) into the stored Items in some way.

CSV MetaData

With some extra custom structure, the file name and other ‘metadata’ of each CSV document can be associated with it. It may be necessary manually to delete the first line, in case that appears to be column headings, but then of course when reproducing the file for output, it will be absent. The column headings can be moved to some metadata associated with the CSV document or just become implicit in the transformed structure, since often the CSV document will be deleted after use. The query can automatically omit the first line if desired, such as with a simple filter kind of symbol:

query {
    pattern MyCsv =csv_file_name =csv_line =not_first_line =first_column =second_column;
...
    Where {
        'not_first_line' {
            kind 'filter';
            expression ((long)csv_line > 0);
        }
       ...
    }
}

BinaryLongObject BLOB

You can store any raw stream data whatsoever with BLOBs, plus there is a way to associate a mime type with them in the REST access, which allows them to be displayed in the ‘Show Table’ mode or web browser as images, for example. These will work on any prefix, where the prefix is effectively the unique identifier of the BLOB. Internally, a BLOB associated with a mime type is stored under an attribute com.infinitydb.blob, and inner to that are two attributes com.infinitydb.blob.mimetype and com.infinitydb.blob.data. These attributes are rarely seen in practice as there are many higher-level ways to deal with blobs. Below is the general structure. (Note that this i-code is a text format, and is not related to the actual very efficient, fast storage in the database, which uses contiguous variable-length binary bytes, variable fine-grain common-prefix compression, ZLib compression i.e. Huffman and Lempel-Ziv, stored packed in variable-length blocks.)

Pictures 'pic0' {
    com.infinitydb.blob {
        com.infinitydb.blob.mimetype 'image/jpeg'
        com.infinitydb.blob.data [
            Bytes(AB_96_...),
            Bytes(67_43_...)
        ]
    }
}

There is a ‘blob upload’ button in the database browser to transfer any file into the database as an unnamed blob under the current prefix with its inferred mimetype (such as text/plain or image/jpeg). Be sure the ‘current prefix’ is where you want to be. Also there is a ‘file upload’ button, which transfers a set of files as blobs all at once. You select the files in a file chooser and they are all uploaded as blobs, each under the current prefix plus the original file name. For downloading, you can do one BLOB at a time by clicking on the blob in the database browser to have it shown full-screen, where you can then do ‘save as…’. For a large number of blob uploads or downloads, use a Python program or other REST access. Blobs can also be represented as i-code, even multiple blobs at once, for transfer as text, but it is only about 33% space efficient. You can do this by viewing as i-code and selecting ‘edit’ to show a text area where you can copy, cut, and paste i-code (or JSON or Item token formats). There are currently no PatternQuery features for operating on the internal contents of blobs, so image processing for example is done in Python e.g. using OpenCV. You can, however, cut and paste text instead using the ‘Show list of strings’ or ‘Show list of Comma Separated Values’ or ‘Show Set of Comma Separated Values’ modes, which allow queries to ‘see inside’. Text/plain blobs can be edited directly using ‘Show Plain Text Blob’ mode.

Search Queries

Here we show an advanced query that provides a general-purpose input so a user or REST request can do queries in a flexible way. The input is like Company { country 'us'; product { 'software'; 'support'; }} in order to find companies that are at least in the us and possibly other countries as well, and which produce at least one of software or support. Any set of attributes like country, product , employee, or postal_code can be provided, and the matched companies must have all of them, and each of the values under the attribute can be multiple, both in the database and the query request content. To match, the values must have a non-empty intersection between the database and the input for each specified attribute. New different attributes can be added to or removed from any company without changing this query or the query inputs or outputs.

We also avoid a join on the =other_attributes and instead use ((other_attributes,)) which is a tricky way to get a referencing expression but not a join (it is actually a singleton tuple expression, having a comma and parentheses). Without this, companies that are missing some of the request content attributes would match as well: we want the request content to take precedence (in relational DBMS, this is an ‘outer join’).

It is easy to customize the output to have more data, joins with other data, or some kind of calculations in expressions and so on.

In the database browser, there is a ‘Show i-code Java Style’ choice, and it shows the query as below. This can be edited as raw i-code text as well. Nested images and other blobs show up rendered properly.

Search Query With Inversions

For postal code and employee we want to maintain some inversions for speed, and these are to be used transparently when possible. The inversions should not take effect if the request content does not mention them so we set Where 'postal_code' disable_by_default true . When =postal_code does not match from the input, its inversion is effectively completely removed from the query.

The Where 'result_data' type 'item' causes the result data to match everything inner to the company name, so that is all of it. Instead we could select particular subparts of the data by being more specific with longer patterns. That added matching could still select all suffixes within those subparts, or all tuples, or all primitives or all strings or all numbers and so on by setting the symbol type. Note that a type is a multi-valued symbol attribute, so we could select every value whose type is in the set {string, number, date}. Also there are shorthands, where type ‘number’, means in {long, double, float}, ‘meta’ means in {class, attribute} and ‘primitive’ means ‘any component but meta’. The type ‘tuple’ is identical to ‘item’ but excludes metas, so it is a series of adjacent primitives.

Also we use an unexpected attribute in Where 'other_attributes' default nothing for the rare case when the input is totally empty just to avoid an ‘undefined’ error.

query {
    Where {
        'employee' disable_by_default true;
        'in' kind 'request content';
        'other_attributes' {
            all_must_match true;
            default nothing;
            type 'attribute';
        }
        'other_values' type 'tuple';
        'out' kind 'response content';
        'postal_code' disable_by_default true;
        'result_data' type 'item';
    }
    pattern {
        DemoData {
            Company =company {
                ((other_attributes,)) =other_values;
                =result_data;
            }
            Employee =employee company =company;
            PostalCode =postal_code company =company;
        }
        =in Company {
            employee =employee;
            postal_code =postal_code;
            =other_attributes =other_values;
        }
    }
    result =out Company =company =result_data;
}

Search Query with Allow Require Exclude

This query provides additional flexibility for the input to specify subsets of data in the database.

The input can specify an ‘allow’, ‘require’, and ‘exclude’ set of parameters, to control the selection. Each of them can specify the postal_code, country, product, or employees, or in fact any attribute that is or may become attached to a company. So, the request content below will match any companies in the us or canada that do not provide support but do have employees Alden or Roger and are in postal_code 95073. (The formatting of line breaks and indention is flexible for parsing.) Note that not only can the parameters be multi-valued, but also the attributes on the companies themselves may be multi-valued, and it is the intersections that determine matching. This query can be adapted easily for other data sources than the Company class – even a generic class.

{
    allow country {'us'; 'canada';}
    require { employee { 'Alden'; 'Roger';} postal_code 95073; }
    exclude product 'support';
}

Logically, the allowed set is first unioned into one set , then all but the required elements are removed from that set, and then the excluded elements are removed from the set. This set computation is done on-the-fly, however, and with no temporary space usage. Note that if a company has multiple values for a particular attribute, then non-excluded values will not prevent a match.

It is easy to customize the output to show more attributes, or to show all of the data or to compute expressions on the data.

Note the use of the equals symbol attribute to create an ‘outer join’. We don’t just repeat a symbol in multiple places as in a normal join, but instead use an expression to refer to it. Without that, attributes or values that are in the criteria but not the company are ignored. Because of the outer join, we also need to set strict false in order to allow certain symbols to take on undefined values instead of halting the query. Normally, symbols may not have undefined values. We use all_must_match true and none_must_match true to handle the required and exclusive matching. Therequired true causes each of the allow and require sets to stop the matching if necessary. (When a required symbol fails to match or an excluded symbol matches, it causes ‘backtracking’ of the matching.) The disable_by_default true effectively removes the symbol and the rest of the pattern inner to it from the query if it matches no values, so any of the allow, required, or exclude criteria can be omitted.

This is a linear scan of the company data, not using inversions for speed (see a query above that shows such inversions and queries below which use the logic feature for speed.) When each company is examined, there is a hidden ‘pre-pass’ that tests whether all of the all_must_match symbols and none of the none_must_match symbols is matched overall. This pre-pass is additional work but is aided by the fact that it is not necessary to iterate all possible values: if one value fails to match, then the all_must_match part of the pre-pass can return early, and if one value matches, then the none_must_match part of the pre-pass can return early.

{
    Where {
        'company_allow_attribute' {
            equals =in_allow_attribute;
            required true;
            type 'attribute';
        }
        'company_allow_value' {
            equals =in_allow_value;
            required true;
        }
        'company_exclude_attribute' {
            equals =in_exclude_attribute;
            type 'attribute';
        }
        'company_exclude_value' {
            equals =in_exclude_value;
            none_must_match true;
        }
        'company_require_attribute' {
            equals =in_require_attribute;
            required true;
            type 'attribute';
        }
        'company_require_value' {
            equals =in_require_value;
            required true;
        }
        'in' kind 'request content';
        'in_allow_attribute' {
            disable_by_default true;
            strict false;
            type 'attribute';
        }
        'in_allow_value' {
            disable_by_default true;
            strict false;
        }
        'in_exclude_attribute' {
            disable_by_default true;
            strict false;
            type 'attribute';
        }
        'in_exclude_value' {
            disable_by_default true;
            strict false;
        }
        'in_require_attribute' {
            all_must_match true;
            disable_by_default true;
            strict false;
            type 'attribute';
        }
        'in_require_value' {
            disable_by_default true;
            strict false;
        }
        'out' kind 'response content';
    }
    pattern {
        DemoData Company =company {
            =company_allow_attribute =company_allow_value;
            =company_exclude_attribute =company_exclude_value;
            =company_require_attribute =company_require_value;
        }
        =in {
            allow =in_allow_attribute =in_allow_value;
            exclude =in_exclude_attribute =in_exclude_value;
            require =in_require_attribute =in_require_value;
        }
    }
    result =out Company =company;
}

Here is the screen shot of the above in the database browser in tabular mode (the browser display ‘does not know’ that this data happens to be a query):

Search Query Using the Logic Feature

Here is how to extend the above query for extreme speed using a ‘logic’ feature.

We are not using any inversions for speed, but the logic feature is nevertheless very fast. The logic feature provides three kinds of filtering. The default logic ‘or’ mode as well as the explicit logic ‘and’ and ‘not’ symbol attributes determine how the allow, require, and exclude inputs work. We use the ‘required true’ symbol attribute to ‘and’ together the three by causing the matching to ‘backtrack’ when necessary rather than reaching a result to fire.

The zig-zag algorithm does not need a pre-pass like the all_must_match and none_must_match technique above, and it is extremely efficient when working with symbols having large fanouts. In this example, the fanouts are small, so the performance enhancement is minimal. If the fanouts were, say millions, then the performance improvement could be correspondingly large, bringing the time down from hours to milliseconds. Below this example is another that does show such a case.

Note the below seemingly strange expression that enables the company_employee, company_postal_code, company_product, and company_country. This just disables the attributes in the company from participating in the query unless needed – for example, if the criteria do not specify anything about employee, then disable employee testing. That is just a policy we have chosen for our design.

        enabled ((isdefined)allow_product || (isdefined)require_product || (isdefined)exclude_product);

We provide an optional fanout of 0 on some symbols to give clues to help the compiler optimize. Those symbols are directly from the criteria in the request content, and are normally quite small. The fanout symbol attribute is an estimate of the number of values of a symbol for a given value of its immediately outer symbol, i.e. for a given prefix. The default values of fanout can range from 0.1 for literals up to 5 for unconstrained plain symbols but any number can be given. A low fanout pushes the symbol outward for early computation. By giving true fanouts, an estimate of the total query execution work can be seen in the ‘check by compiling’ option in the query page of the browser (as well as a listing of the compiled ‘VisitorNodeList’ ready to be executed).

It can be confusing in the Where 'allow_country' equals country; the equals parameter is not a symbol but a literal – if we wanted a symbol we would say equals =country; The result is that the symbol =allow_country acts just like a literal attribute but because we have ‘hoisted’ it to become a full-fledged symbol, we can customize it with non-default symbol attributes.

{
    Where {
        'allow_country' {
            disable_by_default true;
            equals country;
            fanout 0;
            required true;
            type 'attribute';
        }
        'allow_employee' {
            disable_by_default true;
            equals employee;
            fanout 0;
            required true;
            type 'attribute';
        }
        'allow_postal_code' {
            disable_by_default true;
            equals postal_code;
            fanout 0;
            required true;
            type 'attribute';
        }
        'allow_product' {
            disable_by_default true;
            equals product;
            fanout 0;
            required true;
            type 'attribute';
        }
        'company_country' {
            enabled ((isdefined)allow_country || (isdefined)require_country || (isdefined)exclude_country);
            equals country;
            type 'attribute';
        }
        'company_employee' {
            enabled ((isdefined)allow_employee || (isdefined)require_employee || (isdefined)exclude_employee);
            equals employee;
            type 'attribute';
        }
        'company_postal_code' {
            enabled ((isdefined)allow_postal_code || (isdefined)require_postal_code || (isdefined)exclude_postal_code);
            equals postal_code;
            type 'attribute';
        }
        'company_product' {
            enabled ((isdefined)allow_product || (isdefined)require_product || (isdefined)exclude_product);
            equals product;
            type 'attribute';
        }
        'country' required true;
        'employee' required true;
        'exclude_country' {
            disable_by_default true;
            equals country;
            fanout 0;
            logic 'not';
            type 'attribute';
        }
        'exclude_employee' {
            disable_by_default true;
            equals employee;
            fanout 0;
            logic 'not';
            type 'attribute';
        }
        'exclude_postal_code' {
            disable_by_default true;
            equals postal_code;
            fanout 0;
            logic 'not';
            type 'attribute';
        }
        'exclude_product' {
            disable_by_default true;
            equals product;
            fanout 0;
            logic 'not';
            type 'attribute';
        }
        'in' kind 'request content';
        'out' kind 'response content';
        'postal_code' required true;
        'product' required true;
        'require_country' {
            disable_by_default true;
            equals country;
            fanout 0;
            logic 'and';
            type 'attribute';
        }
        'require_employee' {
            disable_by_default true;
            equals employee;
            fanout 0;
            logic 'and';
            type 'attribute';
        }
        'require_postal_code' {
            disable_by_default true;
            equals postal_code;
            fanout 0;
            logic 'and';
            type 'attribute';
        }
        'require_product' {
            disable_by_default true;
            equals product;
            fanout 0;
            logic 'and';
            type 'attribute';
        }
    }
    pattern {
        DemoData Company =company {
            =company_country =country;
            =company_employee =employee;
            =company_postal_code =postal_code;
            =company_product =product;
        }
        =in {
            allow {
                =allow_country =country;
                =allow_employee =employee;
                =allow_postal_code =postal_code;
                =allow_product =product;
            }
            exclude {
                =exclude_country =country;
                =exclude_employee =employee;
                =exclude_postal_code =postal_code;
                =exclude_product =product;
            }
            require {
                =require_country =country;
                =require_employee =employee;
                =require_postal_code =postal_code;
                =require_product =product;
            }
        }
    }
    result =out Company =company;
}

Search Query With High Fanouts

Below is a re-coding of the query above that works over Items like CompanyAddress =country =state =city =address company =company. This structure only identifies company locations, not any other data, but the =company symbol can participate in a join to get the rest. In this case, the fanout from country to state might be 50, from state to city might be 10 to 100, from city to address might be 10,000. These fanouts are large enough for the zig-zag algorithm of the logic feature to provide a very large boost. In fact, if several cities are given in the criteria, the worst case for the city access portion is only a few accesses, taking milliseconds: a speedup of thousands of times. This effect occurs whether the cities are in the allow, require, or exclude criteria. The same is true of country and state, and the speedups combine by multiplying. If any of the country, state, city or address is not provided in the criteria, then the result will be correspondingly larger, but the performance improvement still applies. The biggest improvements happen when the knowns tend to be outer. A fast database structure will tend to have access paths with reasonably large fanouts and with commonly known components outer. (No temporary workspace is used, and access is fully concurrent.)

query {
    Where {
        'address' required true;
        'allow_address' {
            disable_by_default true;
            equals address;
            fanout 0;
            type 'attribute';
        }
        'allow_city' {
            disable_by_default true;
            equals city;
            fanout 0;
            type 'attribute';
        }
        'allow_country' {
            disable_by_default true;
            equals country;
            fanout 0;
            type 'attribute';
        }
        'allow_state' {
            disable_by_default true;
            equals state;
            fanout 0;
            type 'attribute';
        }
        'city' required true;
        'country' required true;
        'exclude_address' {
            disable_by_default true;
            equals address;
            fanout 0;
            logic 'not';
            type 'attribute';
        }
        'exclude_city' {
            disable_by_default true;
            equals city;
            fanout 0;
            logic 'not';
            type 'attribute';
        }
        'exclude_country' {
            disable_by_default true;
            equals country;
            fanout 0;
            logic 'not';
            type 'attribute';
        }
        'exclude_state' {
            disable_by_default true;
            equals state;
            fanout 0;
            logic 'not';
            type 'attribute';
        }
        'in' kind 'request content';
        'out' kind 'response content';
        'require_address' {
            disable_by_default true;
            equals address;
            fanout 0;
            logic 'and';
            type 'attribute';
        }
        'require_city' {
            disable_by_default true;
            equals city;
            fanout 0;
            logic 'and';
            type 'attribute';
        }
        'require_country' {
            disable_by_default true;
            equals country;
            fanout 0;
            logic 'and';
            type 'attribute';
        }
        'require_state' {
            disable_by_default true;
            equals state;
            fanout 0;
            logic 'and';
            type 'attribute';
        }
        'state' required true;
    }
    pattern {
        CompanyAddress =country =state =city =address company =company;
        =in {
            allow {
                =allow_address =address;
                =allow_city =city;
                =allow_country =country;
                =allow_state =state;
            }
            exclude {
                =exclude_address =address;
                =exclude_city =city;
                =exclude_country =country;
                =exclude_state =state;
            }
            require {
                =require_address =address;
                =require_city =city;
                =require_country =country;
                =require_state =state;
            }
        }
    }
    result =out Company =company;
}

Search Query with Joins, Logic and Date Ranges

Here we show how joins can be combined with allow/require/exclude to form a very fast very flexible accessor to the database. (Use Ctrl-Shift + to magnify). We show it in the graphical or ‘tabular’ mode, and you can see that the Where section forms a nice table where there are clear patterns of the attributes to verify correctness. This is from an AI image labeling system, where each image is identified by a tuple of =image_set =mac =image_date =image_suffix. The mac is a unique number for each host with a camera. The CameraName associates these macs with the =image_camera which is a name, so there is a join in there. We can also efficiently select by a set of zero or more date ranges (these are efficient and do not work by scanning and throwing out early or late data). For parts of the image id ‘tree’ where the fanouts are large, the query is extremely efficient and fast due to the zig-zag join such as on image_set. The request content might be { require 'image set' 'my_image_set'; exclude 'label set' {'some labels'; 'other labels';} require 'camera' 'my camera'; } (You may have noticed in that request content a certain flexibility in the i-code parsing!)

Generate a Timeline Graph

Here is a query that takes several time series and merge them into a single time series for display on a line chart displayed by a Python or other REST client. The two time series are sets of (time1, y1) and (time2, y2) tuples, and the chart needs to see (sample_number, time, y1, y2). We use a series symbol kind to create the sample slots as a set of linearly increasing times by 12:00 hours, and then we can get the y1 and y2 at the end (or start) of each sample slot to combine them. Here we have the data and other constants in the query itself but these can all be brought in with expressions from the database or elsewhere. The WithData section of the query has some constant data for the input for demonstration.

The pattern contains the series symbol, which generates iterations and represents a loop, followed by a counter, which just increments by one for each iteration of the series. Only plain symbols, series kinds, and item scanner kinds of symbol can generate loop iterations.

The =wd symbol is a ‘root map’ of kind ‘with data’ so it is at the root of the pattern tree, and the patterns inner to it match the Items under WithData. This root map could be changed to get data from anywhere, and by default i.e. without the =wd symbol, the data comes from the input database, or it could be the request content.

The symbols =sample_date_1 and =sample_date_2 have symbol attributes first true and from =date so they each take from the input data the first time from the series within a sample interval. Alternatively, we could use the last time in each sample interval.

We convert the times to long milliseconds and back so we can do arithmetic on them. (A date is the number of milliseconds since January 1, 1970 at midnight GMT.)

{
    description 'Convert a pair of time series having inconsistent sample times into a series with consistent times for charting purposes. A series of 0 to 10 is used, and for each date in the fixed series, the nearest greater sample date from the sample series is used.';
    query {
        Where {
            'date' equals ((date)(series * (long)increment + (long)start_date));
            'increment' equals (12:00:00);
            'out' kind 'response content';
            'sample_date_1' {
                first true;
                from =date;
            }
            'sample_date_2' {
                first true;
                from =date;
            }
            'series' {
                kind 'series';
                to 10;
            }
            'start_date' equals 2021-10-8T-07:00;
            'wd' kind 'with data';
            'y1' {
                default 0;
                first true;
            }
            'y2' {
                default 0;
                first true;
            }
        }
        WithData {
            Samples1 {
                2021-10-8T-07:00 38;
                2021-10-10T-07:00 42;
                2021-10-12T-07:00 8;
            }
            Samples2 {
                2021-10-9T-07:00 77;
                2021-10-11T-07:00 98;
                2021-10-13T-07:00 102;
            }
        }
        pattern {
            =series;
            =wd {
                Samples1 =sample_date_1 =y1;
                Samples2 =sample_date_2 =y2;
            }
        }
        result =out Graph x =series {
            date =date;
            y1 =y1;
            y2 =y2;
        }
    }
}

The result shows increasing counts at the left, and then the value of y1, y2, and the sample time, which increases by 12 hours for each sample. This could be sent via a client query to a web page using AJAX, where the JavaScript uses a graphing library to draw a line chart. Or, the InfinityDB Python library can read it.

Graph x {
    0 {
        date 2021-10-8T-07:00;
        y1 38;
        y2 77;
    }
    1 {
        date 2021-10-8T12-07:00;
        y1 42;
        y2 77;
    }
    2 {
        date 2021-10-9T-07:00;
        y1 42;
        y2 77;
    }
    3 {
        date 2021-10-9T12-07:00;
        y1 42;
        y2 98;
    }
    4 {
        date 2021-10-10T-07:00;
        y1 42;
        y2 98;
    }
    5 {
        date 2021-10-10T12-07:00;
        y1 8;
        y2 98;
    }
    6 {
        date 2021-10-11T-07:00;
        y1 8;
        y2 98;
    }
    7 {
        date 2021-10-11T12-07:00;
        y1 8;
        y2 102;
    }
    8 {
        date 2021-10-12T-07:00;
        y1 8;
        y2 102;
    }
    9 {
        date 2021-10-12T12-07:00;
        y1 0;
        y2 102;
    }
    10 {
        date 2021-10-13T-07:00;
        y1 0;
        y2 102;
    }
}
 

Client Query to Find a Next Thing

Here is a simple query to access a remote InfinityDB through the REST interface, and ask for the next thing in a sequence of things that are in the database. This is the definition of a simple API that can be accessed via the URL for that query. The input here is from the ‘request content’ which is identified in the pattern using the symbol =in. The output is to the ‘response content’ which is identified in the result with the symbol =out. The work is done in the from =current_thing;and the increment 1;. The increment can be negative as well. There is only one thing returned at a time, limited by the limit 1; although it could be larger.

query {
    Where {
        'in' kind 'request content';
        'next_thing' {
            from =current_thing;
            increment 1;
            limit 1;
        }
        'out' kind 'response content';
    }
    pattern {
        Things =next_thing;
        =in current_thing =current_thing;
    }
    result {
        =out next_thing =next_thing;
    }
}

The database has this state, and this is where the ‘things’ are stored. We are asking for the thing after thing 2, and it returns ‘thing 3’.

Things "thing 1"
Things "thing 2"
Things "thing 3"

The output will go over the wire as JSON, and be decoded by the client. The Python client uses the infinitydb.access package which returns a dict, and a JavaScript client uses AJAX and gets an Object. Notice the underscore before “_next_thing”. That indicates to the other end that this is an InfinityDB Attribute, not a literal string, which would not have the underscore. By putting in these underscores, i.e. ‘underscore quoting’ we can encode any InfinityDB data type into a JSON key, which has to have the double quotes. For example, sending a long as a key would look like “_5”. To send a literal string already starting with underscores, the code transparently adds another underscore and removes it. Here is the JSON actually returned to the client :

{
    "_next_thing" : "thing 3"
}

If the things were to be more complex, this will return only their first primitive components, which are probably something like their ‘ids’. The reason is that by default, a symbol matches a single primitive value (remember a primitive is any component but class or attribute). However, it is possible to return the entire thing by setting 'next_thing' type 'item'which means each match of next_thing includes all of the rest of each Item. Also, 'next_thing' type 'number' could be used to select only the numeric ids, and so on. The possible types include any of the 12 basic types, or ‘tuple’ or ‘item’. The basic types can be a set, not just a single possibility, so 'next_thing' type { 'double'; 'float'; } could be used. Also ‘number’, ‘component’, ‘primitive’ and ‘meta’ are shorthands.

The Python Side

To talk to the REST interface, you can optionally use the infinitydb.access.py library for convenience, but any kind of client can be used, even the unix ‘curl’ command. (Do python3 -m pip install --upgrade infinitydb, and keep pip up-to-date too of course.)

import infinitydb.access as idb
server = idb.InfinityDBAccessor('https://infinitydb.com/infinitydb/data',
                             db='some/db', user='me', password='yesitsme')

# Optional early detection of connectivity
try:
    head_result, head_reason = server.head()
    if head_result != 200:
        print('Cannot connect to server: ' + str(head_result) + ' ' + head_reason)
        exit()
except InfinityDBError as e:
    print('Cannot connect to server: ' + str(e))
    exit()

# get the next thing via REST to a pattern query on the server
# stored under prefix Query 'my interface' 'get next thing'.
# success just means the data returned was not zero-length. 
def read_next_thing(current_thing):
    try:
        success, content, response_content_type = server.execute_query(
            ['my interface', 'get next thing'],
            data={
                idb.Attribute('current_thing') : current_thing
            })
        return content.get(idb.Attribute('next_thing'), None) if success else None
    except InfinityDBError as e:
        print('Error reading next thing: ' + str(e))
        return None

Fibonacci Reducer

Here is a reducer that creates the first Fibonacci numbers. We use a ‘series’ kind of symbol at the left or ‘outside loop’ of the pattern to create integers, but we don’t use those numbers in the reducer – they are just for providing a series of iterations to drive the reducer. A PatternQuery is analogous to a set of nested loops, where each plain symbol represents a loop surrounding the ‘inner’ or right part of the pattern, and each series symbol also represents a loop, generating increasing integers but without accessing the database. Only the plain symbol, series symbol and item scanner represent loops. The item scanner allows iterating over the components of any Item with reducers for example.

query {
    Where {
        'fib' {
            from ( (0,1,1) );
            kind 'reducer';
            recursion ( ((tail)fib, fib.get(1) + fib.get(2)) );
            type 'tuple';
        }
        'n' {
            kind 'series';
            to 10;
        }
        'out' kind 'response content';
    }
    pattern Fib =n =fib;
    result =out =n ((n,(head)fib).format('fib(%s)=%s'));
}

The state of the reducer here is a three-element item, which is a tuple containing fib(n) fib(n+1) and fib(n+2). The ‘from’ attribute initializes this to the tuple (0,1,1). The from attribute is an expression because it is in parentheses, and the tuple construction requires a further pair of parentheses. Tuples are indicated by having commas. In the recursion attribute of the ‘fib’ symbol, the definition of the next value involves a self-reference to the fib symbol, and the tuple for the next iteration is constructed. This produces:

{
    0 'fib(0)=0';
    1 'fib(1)=1';
    2 'fib(2)=1';
    3 'fib(3)=2';
    4 'fib(4)=3';
    5 'fib(5)=5';
    6 'fib(6)=8';
    7 'fib(7)=13';
    8 'fib(8)=21';
    9 'fib(9)=34';
    10 'fib(10)=55';
}
    

Running Weighted Average Reducer

Here is a more complex reducer that prints the weighted average of the last n values of a series of generated random numbers. It shows some of the real power of reducers. The last n values are kept in an item, which ‘rolls forward’ by means of an item constructor ( ((tail)last_n, random) ). The weighting is done by multiplying two items, last_n and weights_normalized, which is element-by-element. Then the predefined (sum) cast totals that. The (weights_normalized * 0) just creates the initial Item of zeroes because a component times an item distributes the operation over the item elements. The ‘weights’ symbol constructs an item from a comma-separated list or ‘item constructor’, which can contain any expressions and is automatically flattened. The weights,, input and output could be elsewhere, such as in the database.

query {
    Where {
        'avg' equals ((sum)(last_n * weights_normalized));
        'last_n' {
            comment 'the last n values as a tuple';
            from (weights_normalized * 0);
            kind 'reducer';
            recursion ( ((tail)last_n, random) );
            type 'tuple';
        }
        'n' {
            comment 'generate series of numbers';
            kind 'series';
            to 10;
        }
        'out' kind 'response content';
        'random' {
            comment 'generate arbitrary input. random is long';
            kind 'random';
            to_exclusive 100;
        }
        'weights' equals ((1.0, 2.0, 3.0, 5.0, 6.0));
        'weights_normalized' equals (weights/(sum)weights);
    }
    pattern Input =n =random =last_n;
    result =out =n ((n, random, avg).format('n=%d random=%2d avg=%.3f'));
}

The output for one run:

{
    0 'n=0 random=5756244996844164189 avg=0.000';
    1 'n=1 random=5756244996844164189 avg=2031615881239116800.000';
    2 'n=2 random=5756244996844164189 avg=3724629115605047300.000';
    3 'n=3 random=5756244996844164189 avg=4740437056224606200.000';
    4 'n=4 random=5756244996844164189 avg=5417642349970977800.000';
    5 'n=5 random=5756244996844164189 avg=5756244996844164100.000';
    6 'n=6 random=5756244996844164189 avg=5756244996844164100.000';
    7 'n=7 random=5756244996844164189 avg=5756244996844164100.000';
    8 'n=8 random=5756244996844164189 avg=5756244996844164100.000';
    9 'n=9 random=5756244996844164189 avg=5756244996844164100.000';
    10 'n=10 random=5756244996844164189 avg=5756244996844164100.000';
}
    

Filtering

Filter out some loop iterations – here the ones with counts not divisible by 3. The expression symbol attribute of the filter kind of symbol evaluates to a boolean that determines whether the inner parts of the pattern are visited. The series kind symbol just iterates from 0 to 10. The parentheses on the expression symbol attribute indicate that it is an expression, but it could equally well be a single symbol like =symbol. We show summary false so we can see the running values – however summary defaults false anyway. Note that filtering is less efficient than other means of restricting matching such as the from, from_exclusive, to, to_exclusive, or equals symbol attributes, expressions or literals in the pattern, and more. Filtering does accesses that are ‘thrown away’ during the matching, so it is like an ‘if’ statement. On the other hand, an expression in the pattern does not do filtering, but instead very efficiently matches only values equal to the value of the expression – this can be many orders of magnitude faster.

{
    Where {
        'counter' {
            kind 'counter';
            summary false;
        }
        'filter' {
            expression (series % 3 == 0);
            kind 'filter';
            summary false;
        }
        'out' kind 'response content';
        'series' {
            kind 'series';
            summary false;
            to 10;
        }
    }
    pattern Filter =series =filter =counter;
    result {
        Filter =series =counter;
        =out =series =counter;
    }
}

The output is below. Note that the counter only increases when the filter matches:

{
    0 0;
    1 0;
    2 0;
    3 1;
    4 1;
    5 1;
    6 2;
    7 2;
    8 2;
    9 3;
    10 3;
}
    

Generating Lists

Sometimes the output should be in the form of a list, such as when data is being generated for a client query, and the result will be read by Python or JavaScript. This requires only casting an increasing long to an index type. Index is a basic datatype, which is really just a special kind of long. There is also an ‘index counter’ symbol kind. Here we generate a sequence of longs in a series kind of symbol, but in the result we have an expression that casts the count to an index.

Lists can also be specified in i code in a literal form, like [‘a’, ‘b’] or [=symbol1,=symbol2,expression] and so on. One common use for literal lists is manually editing i-code to create a ‘rich document’ that is a list of strings, tables, images, text blobs and so on, possibly nested – see the Documentation class in demo/readonly.

query {
    Where 'indexes' {
        kind 'series';
        to 10;
    }
    pattern =indexes;
    result ((index)indexes) 'hello' =indexes;
}

The output, in i code form is:

[
    'hello' 0,
    'hello' 1,
    'hello' 2,
    'hello' 3,
    'hello' 4,
    'hello' 5,
    'hello' 6,
    'hello' 7,
    'hello' 8,
    'hello' 9,
    'hello' 10
]

In tokenized form, you can see the index components there, which look like [n]. (To see tokenized form, select ‘Show Items’ in the database browser).

[0] "hello" 0
[1] "hello" 1
[2] "hello" 2
[3] "hello" 3
[4] "hello" 4
[5] "hello" 5
[6] "hello" 6
[7] "hello" 7
[8] "hello" 8
[9] "hello" 9
[10] "hello" 10

In JSON form, it is a list, but it is rather verbose. Nevertheless, this is how it goes over the wire in a REST access. Note that if the numbers had been doubles rather than longs, they would not look like “_0” but like 0.0. This preserves the InfinityDB data typing as the data passes into and out of JSON form.

[
    {
        "hello" : "_0"
    },
    {
        "hello" : "_1"
    },
    {
        "hello" : "_2"
    },
    {
        "hello" : "_3"
    },
    {
        "hello" : "_4"
    },
    {
        "hello" : "_5"
    },
    {
        "hello" : "_6"
    },
    {
        "hello" : "_7"
    },
    {
        "hello" : "_8"
    },
    {
        "hello" : "_9"
    },
    {
        "hello" : "_10"
    }
]

Expression Symbols

It is possible to have a symbol that contains nothing but an expression, creating a ‘named expression’ or expression symbol. this symbol can be referred to in other expressions. It is useful for factoring out common code.

query {
    Where 'expr' equals ((string)symbol2 + 3);
    pattern =symbol1 =symbol2 =expr;
    result =symbol1 =symbol2 =expr;
}

Catching Undefined

Symbols may have ‘undefined’ values in some cases, and expressions or subexpressions may have undefined values, such as when there is a divide-by-zero or when no match occurs. This causes no problems unless the symbol is referenced in an expression or symbol attribute, in which case the query halts with an error. This can often be avoided simply by providing a default symbol attribute Where 'symbolname' default value when that makes sense. Or, the symbol can have Where 'symbolname' disable_by_default true , and the symbol will effectively be ‘edited out’ of the query, as if it and its inner pattern were not present at all.

However, we can also prevent the halt by ‘catching’ the exception and doing something about it, such as by substituting some value for the expression or subexpression to actually return as shown in the example below. An expression or subexpression is by default in ‘unstrict’ mode and will not cause a halt if an undefined arises during execution. You can put a symbol in ‘unstrict’ mode by setting the symbol attribute strict false. This will cause all references to the symbol to be unstrict and therefore to not halt. However, within the expression a cast like (strict)expr will cause a halt immediately when expr evaluates to undefined. There is also a cast to (unstrict)expr.

Undefined cannot be put into an item or tuple at any time, either temporarily during expression evaluation or in the database, and unstrict mode will just allow the entire erroneous item or tuple to turn into undefined during expression evaluation. Since a result is a kind of Item constructor, any undefined symbols or expressions it sees just prevent the result from firing. Also, the equals symbol attribute does not halt but disables its symbol when undefined. The from, from_exclusive, to, and to_exclusive simply have no effect given an undefined value. You can use a (strict) cast to change this behavior and force the halt.

Here we detect undefined with an expr instanceof undefined and return -1 instead. Or, we can use a boolean-returning cast like (isdefined)expr. There are two expressions in this code – dangerous_expression1 and dangerous_expression2 – that can cause the undefined from a division by zero. The first one defaults to strict everywhere it is referenced, but the second sets the strict symbol attribute false, so where it is referenced it will always be unstrict. In safe_expression1 we have to cast the reference to dangerous_expression1 to (unstrict) to avoid the exception, but in safe_expression2, we do not need the cast.

{
    description 'Show how (strict) and the (unstrict) casts and strict symbol attribute work';
    query {
        Where {
            'dangerous_expression1' {
                comment 'this defaults to strict wherever referenced';
                equals (1 / series);
            }
            'dangerous_expression2' {
                comment 'defaults to unstrict wherever referenced';
                equals ((1 / series));
                strict false;
            }
            'out' kind 'response content';
            'safe_expression1' {
                comment 'allow dangerous_expression1 to be undefined';
                equals (
                 ((unstrict)dangerous_expression1) instanceof
                  undefined ? -1 : dangerous_expression1);
            }
            'safe_expression2' {
                comment 'dangerous_expression2 is already unstrict';
                equals (
                 dangerous_expression2 instanceof
                  undefined ? -1 : dangerous_expression2);
            }
            'series' {
                kind 'series';
                to 3;
            }
        }
        pattern Strict =series;
        result =out Strict =series 'first' ((safe_expression1,)) 'second' ((safe_expression2,));
    }
}

We had to use two-deep parentheses and a comma around ((safe_expression1,)) to force it to be an expression rather than come out as a symbol: actually it is a singleton item constructor expression (singletons are effectively identical to their contained element). A simple (safe_expression1) looks exactly like =safe_expression1 to the compiler, and that would prevent the error we are trying to show. This fact is obvious when the query is saved – just look at the stored version. The result is below. The first row shows the division by zero and the substitution to -1.

Strict {
    0 'first' -1 'second' -1;
    1 'first' 1 'second' 1;
    2 'first' 0 'second' 0;
    3 'first' 0 'second' 0;
}