The i Language and Item Tokens

Here is a discussion of the simple convenient ‘i’ language for representing sets of Items, and the Item Token Format below. This is particularly nice for PatternQueries, which are really just sets of Items, but it is convenient for any data. I is based on the standard string ‘token’ representation of the 12 data types described below, which is fundamental. I-code is far easier to use than JSON and it provides more data types, while it is interchangeable by means of ‘underscore quoting’.

I is not a real language since it is just a way to format a set of Items in an ItemSpace for easy reading and writing by humans. Of course if it’s not real, it’s imaginary – hence the name. Formatting and Parsing are lossless inverses except that parsing and then formatting ‘cleans up’ the code.

Since the input to the PatternQuery compiler is the translated Items, it is not necessary to use i at all. For example, the back-end database browser in the server can be used to edit Items interactively directly in a database. Two of the selectable views in the backend browser are ‘i code Java’ and ‘i code Python’, to go along with the nice tabular view, enhanced JSON, underscore quoted JSON, List of CSV, Set of CSV, List of Strings, Text Blob, and tokenized Items.

A set of Items can be described with a simple Java-like syntax that compresses common prefixes and indents them. Also, Python-like syntax may be mixed in or used exclusively, and the parser adapts without knowing ahead of time which is used.

PatternQuery-Specific Features

In order to remove a layer of quoting for PatternQueries in case the parsed Items are to be used as a query definition, two additional forms of tokens are provided: symbols and expressions. A symbol starts with a ‘=’ and is followed by a Java identifier. An expression is a Java-like expression in parentheses. These are recognized by the PatternQuery compiler when it sees string components starting with ‘=’. The special syntax here effectively changes =symbol to the equivalent regular token “=symbol“, and the syntax (expression) becomes “=expression“. Such expressions occur in PatternQuery, because a PatternQuery definition is a set of Items itself, with string components that can define either literals, symbols or expressions in the pattern, the result, and the Where table attributes.

Java Style

Here is an example in the ‘Java style’ in the coloring format displayed in the database browser (however the coloring disappears while editing). The PatternQuery-specific parts are light blue, classes are dark blue, attributes are green, and primitive components are black. Embedded blobs show up rendered, but become text in a special format when editing.

This produces an ItemSpace that defines a PatternQuery, with three Items here in token form:

query pattern SomeClass "=symbol1" some_attribute "=symbol2"
query pattern OtherClass "=symbol3" other_attribute "=symbol4" "s"
query result "=symbol1" "=symbol1 * (symbol2 + 3) + 'x'"

Python Style

An alternate form is the ‘python’ format. The indentation is always 4 spaces and lines must be terminated. Items ending in ‘:’ define the prefix for the subsequent Items that are indented 4 more spaces. Outdenting anywhere reverts to the most recent appropriate prefix.

I code ‘Python style’:


The syntax says nothing about the values themselves. It is reasonable to put any token anywhere, so the i-code below is the equivalent of the i-code above. The parsed version is sorted and duplicates removed according to the rules of an ItemSpace. This can be very helpful during editing, as the parsing and re-formatting that happens automatically will generate a canonical form each time you save, and you can look at the canonical form to be sure you got what you wanted. Just experiment, and the syntax will become obvious. Any set of Items actually has a unique i-code text, although that is not guaranteed.

It is OK to have a single Item without any braces or even a terminating semicolon. It is OK to have outermost braces without a prefix like the ‘query’ attribute below, or even internally. Braces only serve to factor out a common prefix for the contained Items, and that common prefix is allowed to be empty. It is OK to have zero or more semicolon-terminated lines within the braces. It is actually meaningful to have a line with just a semicolon as that designates an Item that is a prefix of another, but this is rare and generally unnecessary. White space is insignificant in the Java style. You do have to have a semicolon terminator before the closing brace though in the Java style (as in C, C++, Java, and JavaScript). Each Item corresponds to one semicolon in Java style, or each newline not after a colon in Python style.

query {
    { pattern OtherClass =symbol3 other_attribute =symbol4 's';
    result =symbol1 (symbol1 * (symbol2 + 3) + 'x'); }
    pattern { SomeClass =symbol1 some_attribute =symbol2; }


Don’t panic! The grammar is simple (real programming languages take pages, but i-code is an imaginary language). This is mainly for completeness and if you experiment it will be obvious. When not editing PatternQueries, it is even simpler, and you can just stick with Java or Python style.

I := Component+ | Items
Items := Component* (Structure | List | (Structure List))
Structure := JavaStructure | PythonStructure
JavaStructure := '{' (Component+ ';' | Items)* '}'
PythonStructure := ':' newline_then_indention Items+
List := '[' (Items (',' Items))? ']' 
Component := Literal | '=' Symbol | '(' Expression ')' 
Literal := QuotedString | standard_InfinityDB_component_representation_except_string_and_index 
Symbol :=  Java_identifier 
Expression := ('(' Expression ')' | QuotedString | ExpressionChar)
ExpressionChar := any_char_except_any_paren_or_any_quote
QuotedString := Java_string_but_with_single quotes | Java_string_with_standard_double_quotes

The expression grammar matches any sequence of characters containing balanced parentheses mixed with any valid Java or JSON quoted strings except that they can use either single or double quotes.

White space can occur between productions where it is any sequence of adjacent spaces, newlines, and returns (TABS are prohibited wherever possible, and so far that means everywhere).

(I strings are nice inside quoted strings in Java code because i strings tend to use single quotes and are formatted with single quotes, plus the =symbol and the (expression) forms remove a layer of quoting.)

In the IFormatter, any string component starting with ‘=’ but not being a valid expression comes out as ‘=non-expression‘. Also, a string component starting with ‘==’ comes out as ‘==rest‘. The expression syntax is only shorthand and is not required by IParser, so any string of characters can be represented in quotes even if starting with ‘=’. IParser always succeeds in parsing a text output by IFormat. IFormatter always succeeds for any ItemSpace as input. Using the (expression) form provides simple syntax checking of embedded expressions during parsing, catching certain errors early.

Equals Stuffing for PatternQueries

An issue with PatternQueries that is not related to i itself is the fact that it is remotely possible that you would want a literal string that starts with ‘=’ but is not to be interpreted as a symbol or expression. To handle this case PatternQuery uses ‘equals stuffing’ in which literal initial equals chars are indicated by adding another equals char at the front. So when you write ‘==’ in i, it is put into the Item as “==” but PatternQuery considers it to be literally a single equals. PatternQuery does not allow “=”, which would designate an empty symbol name or empty expression.

Gaps in Lists

Note that there is no way to preserve or create numerical gaps in lists. If the Indexes in the IFormatter input Items jump from, say 5 to 7, there will be no element outputted corresponding to element 6. Parsing lists always produces Items with sequential Indexes starting at 0. This can be considered an advantage if any gaps that arise are unwanted. One solution is a switch to force gaps to produce empty elements, but that does not preserve the sparseness when parsed back in. Another solution is a new syntax, such as something that represents a gap of a certain size.

No Comments

There is intentionally no comment syntax. Comments can be put into the data itself, such as in the comment attribute of PatternQuery symbols, or elsewhere in the data and they will be preserved. If comments were included, they would necessarily disappear when being parsed and then formatted back again. That happens frequently, such as in the database browser.

Standard Item Token Format

InfinityDB Items are a sequence of zero or more components each of one of 12 data types. Each data type has a unique string representation called a ‘Token’. All of the data types that correspond to Java, JSON or JavaScript are represented the same way as in those languages. However, several data types are unique to InfinityDB:

Component typeFormatDescription
Booleantrue or falseLike Java, JavaScript or JSON
String“abc def\n”Like Java or JSON, but unlike JavaScript, single quotes are not allowed.
Long352Like Java, JavaScript, or JSON. 64-bit integers
Double352.0 or -1.9e52Like Java, JavaScript or JSON. 64-bit reals
Float352.0f or -1.9e52fLike Java. 32-bit reals, ending in ‘f’.
Index[n]Represent the location of the suffix of the Item within a logical list. N is a long, i.e. a 64-bit integer. Used in BLOBS, CLOBS, character streams, and byte streams, or any other list. In i-code or JSON these indicate where to use brackets [.. , ..].
Short char arrayChars(“…”)A 0 to 1024-char array. ::The chars are represented as a standard string. Used in Character Long Objects i.e. CLOBs and character streams, although these are rare.
Short byte arrayBytes(xx_xx…xx)A 0 to 1024-byte array. The bytes are two capital hex chars and are separated by underscores. Used in Binary Long Objects i.e. BLOBS and byte streams.
Byte stringsByteString(xx_xx…xx)Like short byte array, but sort like strings instead of by length. These are rare.
Date/time2023-12-31T10:30:26-0800World-wide standard ISO date format. A semi-standard milliseconds integer may be included after a dot after the seconds.
ClassClass_exampleMeta type. An upper case letter followed by digits, underscores, dots, and dashes. Also ClassExample would be good.
Attributeattribute_exampleMeta type. A lower case letter followed by digits, underscores, dots, and dashes. Also attributeExample would be good.