The i Language and Item Tokens

Here is a discussion of the simple convenient ‘i’ language for representing sets of Items. This is particularly nice for PatternQueries, which are really just sets of Items. I is based on the standard string token representation of the 12 data types described below.

I is not a real language since it is just a way to format Items for easy reading and writing by humans. Of course if it’s not real, it’s imaginary – hence the name. Formatting and Parsing are lossless inverses except that formatting ‘cleans up’ the code. This is far easier to use than JSON.

A set of Items can be described with a simple Java-like syntax that compresses common prefixes and indents them. Also, Python-like syntax may be mixed in or used exclusively, and the translator adapts without knowing ahead of time which is used.

In order to remove a layer of quoting for PatternQueries in case the parsed Items are to be used as a query definition, two additional forms of tokens are provided: symbols and expressions. A symbol starts with a ‘=’ and is followed by a Java identifier. An expression is a Java-like expression in parentheses. These are recognized by the PatternQuery compiler when it sees string components starting with ‘=’. The special syntax here effectively changes =symbol to the equivalent regular token “=symbol“, and the syntax (expression) becomes “=expression“. Such expressions occur in PatternQuery, because a PatternQuery definition is a set of Items itself, with string components that can define either literals, symbols or expressions in the pattern, the result, and the Where table attributes.

An example in the ‘Java style’:

query {
    pattern {
        SomeClass =symbol1 some_attribute =symbol2;
        OtherClass =symbol3 other_attribute =symbol4 's';
    result =symbol1 (symbol1 * (symbol2 + 3) + 'x');

This produces an ItemSpace that defines a PatternQuery:

query pattern SomeClass "=symbol1" some_attribute "=symbol2"
query pattern OtherClass "=symbol3" other_attribute "=symbol4" "s"
query result "=symbol1" "=symbol1 * (symbol2 + 3) + 'x'"

An alternate form is the ‘python’ format. The indentation is always 4 spaces and lines must be terminated. Items ending in ‘:’ define the prefix for the subsequent Items that are indented 4 more spaces. Outdenting anywhere reverts to the most recent appropriate prefix.

I code ‘Python style’:

        SomeClass =symbol1 some_attribute =symbol2
        OtherClass =symbol3 other_attribute =symbol4 's'
    result =symbol1 (symbol1 * (symbol2 + 3) + 'x')

Since the input to the PatternQuery compiler is the translated Items, it is not necessary to use i at all. For example, the back-end database browser in the server can be used to edit Items interactively directly in a database. Two of the selectable views in the backend browser are ‘i code Java’ and ‘i code Python’, to go along with the nice tabular view, enhanced JSON, underscore quoted JSON, and tokenized Items.

The trivial grammar:

I := Component+ | Items
Items := Component* (Structure | List | (Structure List))
Structure := JavaStructure | PythonStructure
JavaStructure := '{' (Component+ ';' | Items)* '}'
PythonStructure := ':' newline_then_indention Items+
List := '[' (Items (',' Items))? ']' 
Component := Literal | '=' Symbol | '(' Expression ')' 
Literal := QuotedString | standard_InfinityDB_component_representation_except_string_and_index 
Symbol :=  Java_identifier 
Expression := ('(' Expression ')' | QuotedString | ExpressionChar)
ExpressionChar := any_char_except_any_paren_or_any_quote
QuotedString := Java_string_but_with_single quotes | Java_string_with_standard_double_quotes

The expression grammar matches any sequence of characters containing balanced parentheses mixed with any valid Java or JSON quoted strings except that they can use either single or double quotes.

White space can occur between productions where it is any sequence of adjacent spaces, newlines, and returns (TABS are prohibited wherever possible, and so far that means everywhere).

I strings are nice inside quoted strings in Java code because i strings tend to use single quotes and are formatted with single quotes, plus the =symbol and the (expression) forms remove a layer of quoting.

In the IFormatter, any string component starting with ‘=’ but not being a valid expression comes out as ‘=non-expression‘. Also, a string component starting with ‘==’ comes out as ‘==rest‘. The expression syntax is only shorthand and is not required by IParser, so any string of characters can be represented in quotes even if starting with ‘=’. The IFormatter can have the special expression treatment disabled. IParser always succeeds in parsing a text output by IFormat. IFormatter always succeeds for any ItemSpace as input. Using the (expression) form provides simple syntax checking of embedded expressions during parsing, catching certain errors early.

An issue with PatternQueries that is not related to i itself is the fact that it is remotely possible that you would want a literal string that starts with ‘=’ but is not to be interpreted as a symbol or expression. To handle this case PatternQuery uses ‘equals stuffing’ in which literal initial equals chars are indicated by adding another equals char at the front. So when you write ‘==’ in i, it is put into the Item as “==” but PatternQuery considers it to be literally a single equals. PatternQuery does not allow “=”, which would designate an empty symbol name or empty expression.

(Note that there is no way to preserve or create numerical gaps in lists. If the Indexes in the IFormatter input Items jump from, say 5 to 7, there will be no element outputted corresponding to element 6. Parsing lists always produces Items with sequential Indexes starting at 0. This can be considered an advantage if any gaps that arise are unwanted. One solution is a switch to force gaps to produce empty elements, but that does not preserve the sparseness when parsed back in. Another solution is a new syntax, such as something that represents a gap of a certain size.)

There is intentionally no comment syntax. Comments can be put into the data itself, such as in the comment attribute of PatternQuery symbols, or elsewhere in the data and they will be preserved. If comments were included, they would necessarily disappear when being parsed and then formatted back again. That happens frequently, such as in the database browser.

Standard Token Format

InfinityDB Items are a sequence of zero or more components each of one of 12 data types. Each data type has a unique string representation. All of the data types that correspond to Java, JSON or JavaScript are represented the same way as in those languages. However, several data types are unique to InfinityDB:

Component typeFormatDescription
Booleantrue or falseLike Java, JavaScript or JSON
String“abc def\n”Like Java or JSON, but unlike JavaScript, single quotes are not allowed.
Long352Like Java, JavaScript, or JSON. 64-bit integers
Double352.0 or -1.9e52Like Java, JavaScript or JSON. 64-bit reals
Float352.0f or -1.9e52fLike Java. 32-bit reals, ending in ‘f’.
Index[n]Represent the location of the suffix of the Item within a logical list. N is a long, i.e. a 64-bit integer. Used in BLOBS, CLOBS, character streams, and byte streams.
Short char arrayChars(“…”)A 0 to 1024-char array. ::The chars are represented as a standard string. Used in Character Long Objects i.e. CLOBs and character streams.
Short byte arrayBytes(xx_xx…xx)A 0 to 1024-byte array. The bytes are two hex chars and are separated by underscores. Used in Binary Long Objects i.e. BLOBS and byte streams.
Byte stringsByteString(xx_xx…xx)Like short byte array, but sort like strings instead of by length.
Date/time2023-12-31T10:30:26-0800World-wide standard ISO date format. A semi-standard milliseconds integer may be included at the end.
ClassClass_exampleMeta type. An upper case letter followed by digits, underscores, dots, and dashes.
Attributeattribute_exampleMeta type. A lower case letter followed by digits, underscores, dots, and dashes.