The i Language and Item Tokens

Here is a discussion of the simple convenient ‘i’ language for representing sets of Items, and the Item Token Format below. This is particularly nice for PatternQueries, which are really just sets of Items, but it is convenient for any data. I is based on the standard string ‘token’ representation of the 12 data types described below, which is fundamental. I-code is far easier to use than JSON and it provides more data types, while it is interchangeable by means of ‘underscore quoting’.

No Programming Needed

A set of Items can be described with a simple syntax that compresses common prefixes and indents the suffixes. There is a similarity to the Java language, and also a mode that is Python-like, but there is no need to know any programming languages at all. If you know Java, C, C++, JavaScript or Python, this format will be obvious. Either of the modes may be mixed in or used exclusively, and the parser adapts without knowing ahead of time which is used.

I is not a real language since it is just a way to format a set of Items in an ItemSpace for easy reading and writing by humans. Of course if it’s not real, it’s imaginary – hence the name. Formatting and Parsing are lossless inverses except that parsing and then formatting ‘cleans up’ the code, such as by sorting, re-indenting and de-duplicating the Items.

Alternatives

Since the input to the PatternQuery compiler is the translated Items, it is not necessary to use i at all. For example, the back-end database browser in the server can be used to edit Items interactively directly in a database. Two of the selectable views in the backend browser are ‘i code Java Style’ and ‘i code Python Style’, to go along with the nice tabular view, enhanced JSON, underscore quoted JSON, Set of Comma Separated Values, and (Tokenized) Items, any of which are alternative views applicable to PatternQueries. There are some other modes but they are more special-purpose. With the text modes, you can click ‘Edit’ and then for example cut and paste queries or other data.

PatternQuery-Specific Features

In order to remove a layer of quoting for PatternQueries in case the parsed Items are to be used as a query definition, two additional forms of tokens are provided: symbols and expressions. A symbol starts with a ‘=’ and is followed by a Java identifier, which is a mixture of letters, digits, or underscores not starting with a digit, which is typical. An expression is a Java-like expression in parentheses. Java-like expressions are obvious for most operations such as normal math (see the PatternQuery Reference for the operators and PatternQuery Examples). Java-like expressions also look like C, C++, and JavaScript expressions, with extensions. I-code does not ‘understand’ expressions.

These are recognized by the PatternQuery compiler when it sees string components starting with ‘=’. The special syntax here effectively changes =symbol to the equivalent regular token “=symbol“, and the syntax (expression) becomes “=expression“. Such expressions occur in PatternQuery, because a PatternQuery definition is a set of Items itself, where the string components can define either literals, symbols or expressions in the pattern, the result, and in some Where table attributes.

Java Style

Here is an example in the ‘Java style’ in the coloring format displayed in the database browser (however the coloring disappears while editing. This is a meaningless query). The PatternQuery-specific parts are light blue, classes are dark blue, attributes are green, and primitive components are black. Embedded blobs show up rendered, but become text in a special format when editing.


This produces an ItemSpace that defines a PatternQuery, with three Items here in token form:

query pattern SomeClass "=symbol1" some_attribute "=symbol2"
query pattern OtherClass "=symbol3" other_attribute "=symbol4" "s"
query result "=symbol1" "=symbol1 * (symbol2 + 3) + 'x'"

Python Style

An alternate form is the ‘python’ format. The indentation is always 4 spaces and lines must be terminated. Items ending in ‘:’ define the prefix for the subsequent Items that are indented 4 more spaces. Outdenting anywhere reverts to the most recent appropriate prefix.

I code ‘Python style’:

Flexibility

The syntax says nothing about the values themselves. It is reasonable to put any token anywhere, so the i-code below is the equivalent of the i-code above. The parsed version is sorted and duplicates removed according to the rules of an ItemSpace. This can be very helpful during editing, as the parsing and re-formatting that happens automatically will generate a canonical form each time you save, and you can look at the canonical form to be sure you got what you wanted. Just experiment, and the syntax will become obvious. Any set of Items actually has a unique i-code text, (although that is not guaranteed in the future).

It is OK to have a single Item without any braces or even a terminating semicolon. It is OK to have outermost braces without a prefix like the ‘query’ attribute below, or even internally. Braces only serve to factor out a common prefix for the contained Items, and that common prefix is allowed to be empty. It is OK to have zero or more semicolon-terminated lines within the braces. It is actually meaningful to have a line with just a semicolon as that designates an Item that is a prefix of another, but this is rare and generally unnecessary. White space is insignificant in the Java style. You do have to have a semicolon terminator before the closing brace though in the Java style (as in C, C++, Java, and JavaScript). Each Item corresponds to one semicolon in Java style, or each newline not after a colon in Python style.

query {
    { pattern OtherClass =symbol3 other_attribute =symbol4 's';
    result =symbol1 (symbol1 * (symbol2 + 3) + 'x'); }
    pattern { SomeClass =symbol1 some_attribute =symbol2; }
}

Grammar

Don’t panic! The grammar is simple (real programming languages take pages, but i-code is an imaginary language). This is mainly for completeness and if you experiment it will be obvious. When not editing PatternQueries, it is even simpler, and you can just stick with Java or Python style.

I := Component+ | Items
Items := Component* (Structure | List | (Structure List))
Structure := JavaStructure | PythonStructure
JavaStructure := '{' (Component+ ';' | Items)* '}'
PythonStructure := ':' newline_then_indention Items+
List := '[' (Items (',' Items))? ']' 
Component := Literal | '=' Symbol | '(' Expression ')' 
Literal := QuotedString | standard_InfinityDB_component_representation_except_string_and_index 
Symbol :=  Java_identifier
Expression := ('(' Expression ')' | QuotedString | ExpressionChar)
ExpressionChar := any_char_except_any_paren_or_any_quote
QuotedString := Java_string_but_with_single quotes | Java_string_with_standard_double_quotes
Java_identifier := letters_digits_underscores_not_starting_with_a_digit

The expression grammar matches any sequence of characters containing balanced parentheses mixed with any valid Java or JSON quoted strings except that they can use either single or double quotes.

White space can occur between productions where it is any sequence of adjacent spaces, newlines, and returns (TABS are prohibited wherever possible, and so far that means everywhere but \t can be in literal strings).

(I strings are nice inside quoted strings in Java code because i strings tend to use single quotes and are formatted with single quotes, plus the =symbol and the (expression) forms remove a layer of quoting.)

In the IFormatter, any string component starting with ‘=’ but not being a valid expression comes out as ‘=non-expression‘. Also, a string component starting with ‘==’ comes out as ‘==rest‘. The expression syntax is only shorthand and is not required by IParser, so any string of characters can be represented in quotes even if starting with ‘=’. IParser always succeeds in parsing a text output by IFormat into an ItemSpace. IFormatter always succeeds for any ItemSpace as input. Using the (expression) form provides simple syntax checking of embedded expressions during parsing, catching certain errors early – unbalanced parentheses and invalid or unterminated quoted strings.

Equals Stuffing for PatternQueries

An issue with PatternQueries that is not related to i itself is the fact that it is remotely possible that you would want a literal string that starts with ‘=’ but is not to be interpreted as a symbol or expression. To handle this case PatternQuery uses ‘equals stuffing’ in which literal initial equals chars are indicated by adding another equals char at the front. So when you write ‘==’ in i, it is put into the Item as “==” but PatternQuery considers it to be literally a single equals. PatternQuery does not allow “=”, which would designate an empty symbol name or empty expression.

Gaps in Lists

Note that there is no way to preserve or create numerical gaps in lists. If the Indexes in the IFormatter input Items jump from, say 5 to 7, there will be no element outputted corresponding to element 6. Parsing lists always produces Items with sequential Indexes starting at 0. This can be considered an advantage if any gaps that arise are unwanted. One solution is a switch to force gaps to produce empty elements, but that does not preserve the sparseness when parsed back in. Another solution is a new syntax, such as something that represents a gap of a certain size.

No Comments

There is intentionally no comment syntax. Comments can be put into the data itself, such as in the comment attribute of PatternQuery symbols, or elsewhere in the data and they will be preserved. If comments were included, they would necessarily disappear when being parsed and then formatted back again. That happens frequently, such as in the database browser.

Standard Item Token Format

The Item is divided up into a series of a few ‘components’ – such as zero to about 30 – each of which has one of 12 available data types. All of the types are defined to be comparable to any of the others, so every possible Item can be compared to any other possible Item without error. If two Items have a common prefix, then the components at the start of the suffixes determine the ordering, no matter what the types of the components. All of the data types that correspond to Java, JSON or JavaScript are represented the same way as in those languages. The types are, in order of the way they sort:

Type NameMeaningExamples in ‘Token’ Form
ClassA ‘Meta’ type use for a kind of ‘punctuation’ to separate the non-meta types, which are the ‘primitives’. Think of is as delimiting ‘tables’, although it is much more flexible. A capital letter, then letters, digits, dots, dashes and underscores. This is ‘atomic’ so is immutable. Another rarely seen form is numeric, like MyClass(731).MyClass, Documents, Log, Scheduled_meetings, Scheduled.meetings. Log_1997.7.11
AttributeThe other ‘Meta’ type. Think of it as delimiting column values in a table, although it is much more flexible. A lower case letter, then letters, digits, dots, dashes and underscores. This is ‘atomic’ so is immutable. Another rarely seen form is numeric, like myAttribute(9328).myAttribute, my_attribute, my.attribute, attribute_1997.7.11
StringCharacter sequence 0 to 1024 chars. Similar to Java, JavaScript, or JSON, but single quotes are not allowed as outside delimiters.“Hello World”, “Special chars \r\n \t”
Booleantrue or falsetrue, false
Float32-bit real number like double, always containing a dot but prints with a trailing ‘f’. IEEE-754 binary format.5.3f, 1.2E99f
Double64-bit real number. There is always a dot, even if it has a 0 fractional part. IEEE-754 binary format. Format like Java, JavaScript, JSON5.3, 1.2E-99
Long64-bit integer. Format like Java, JavaScript, or JSON.0, 1, 500
DateDate and time, to 1ms since 1/1/1970 midnight GMT. Printed in ISO format, ending in timezone 07:00 or Z. A semi-standard milliseconds integer may be included after a dot after the seconds.2018-03-02T16:00:09+00:00
BytesA sequence of 0 to 1024 bytes, which sort by their length, then by the bytes. Printed in capital hex. Used in BLOBs to contain the data.Bytes(A6_19_44)
ByteStringA sequence of 0 to 1024 bytes like Bytes but sort like strings, on the bytes themselves. Printed in capital hex.ByteString(A6_19_44)
CharsA sequence of 0 to 1024 16-bit chars, like string but sort on the initial length. These are faster to work with than Strings. Used in CLOBS.Chars(“Hello World”)
IndexLike a ‘long’ but has a special meaning to delimit elements of a list, where the elements are the sets of suffixes with the same prefix. Lists may be very long and the elements may be complex or large. Used for BLOBs and CLOBs.[0], [1], [39155164]