InfinityDB Client/Server

The latest improvement of InfinityDB in version 5 is  InfinityDB Client/Server. It organizes a set of InfinityDB Embedded files into a secure, remotely accessible, shareable database. The features:

  • Remote shared access of a set of InfinityDB Embedded files in each server, which is a single JVM
  • REST access via Python, bash via curl (also Java soon) using JSON or BLOBS, or fast binary ‘ItemSpace’ Java access
  • Secure access with SSL/TLS everywhere, hashed and encrypted passwords, obfuscated data, encrypted metadata and keys/certs
  • Web access for admin, data browser/editor, pattern queries. Ssh is not needed after installation.
  • Administration of users, roles, databases, permissions, passwords
  • Data browser and editor – graphical tabular, cut-and-paste JSON, tuples (Items), or images
  • Graphical display of ‘flexible’ self-extending schemas (with the optional ‘EntityClass’ and ‘Attribute’ data types)
  • ‘Pattern Queries’ for easy powerful restructuring or querying data – like select, project, join, order-by and more

Every Item has a URL

The most important new concept for those familiar with ‘Items’ and the ‘ItemSpace’: Every Item has a URL. RESTful access can modify or read the database securely with one or more variable-length Items at once via JSON or BLOB access from Python or the Unix ‘curl’ command (Java also soon), or via Java using the low-level 10-method  ‘ItemSpace’ data model. Here is a link for the  JSON of the Items describing some aircraft in the demo/readonly database: Aircraft username ‘testUser’ password ‘db’. Here is a link for the Items for a picture: Apollo-Soyuz.

The Optional Flexible Data Model using ‘EntityClasses’ and ‘Attributes’

InfinityDB is a novel DBMS that has an optional very flexible self-adapting data model. It’s different because you don’t define schemas like tables or key/value stores ahead of time, but instead they develop gradually and extend automatically as the need arises, new data sources come on-line and client apps evolve. The things we show here are online, and you can experiment with them yourself.

The flexible format data model is so adaptable that with just a few concepts, you can create multi-user structures that represent tables, trees, documents, blobs like pictures, all nestable and hierarchical. It’s perfect for highly dynamic environments like tech startups or anywhere you have evolving data to share between people, scripts, IoTs, applications that are evolving and so on. For example, you can have what look like structured documents that are editable by multiple users – it’s like googledocs but part of a database! It’s not for most end-users using word, excel, or dedicated constrained apps connected to a highly formalized nailed-down relational database, but it’s great for cooperating groups of power users or data sources and sinks in an evolving organization or system. It is possible to store any BLOB data, i.e. files, in the database as well, but the database does not ‘understand’ them. However, image BLOBs are displayed graphically in the data browser.

 Try the Back End Web App

You can see the backend web app now at https://infinitydb.com:37411. The guest user is ‘testUser’ and password ‘db’. We appreciate your experimentation and exploration of it. To get a flavor of the system, select ‘Access Databases’ on the home page. You will see a table of the available InfinityDB Embedded database files for users in the guest role, which are demo/readonly and demo/writeable. You can select ‘edit’ on either one to see the main browsing and editing page with a list of the existing ‘EntityClasses’ (which are like nestable tables). Choose any of them, such as ‘Pictures’, ‘Samples’, ‘Aircraft’ or ‘Documentation’ to see the main tabular browsing page with the optional rich flexible data format. The ‘Documentation’ EntityClass there describes the system overall. You can try the Tabular, regular JSON, extended JSON, or Item views. If you want, you can add and modify your own data in demo/writeable for others to see, and then please comment to us. We will see your changes! We will create you a free personal public or private trial database if you contact us, and provide the secure Python REST driver.

The Flexible Data Model Viewed by the Backend Web UI

There is some flexible-format demo data that is actually in the distribution ‘demo/readonly’ database. We’ll show various structures you can get with a simple set of ‘Items’ plus the two special ‘EntityClass’ and ‘Attribute’ data types.

 

Here’s a set of samples in ‘demo/readonly’ from a simulated IoT, where pressure and temperature are measured very quickly. This could be embedded in another table or document or have additional internal structure. You can add new columns without changing a schema. For example, a new humidity sensor comes online, and it sends its data to the ‘SamplesIndexed’ table but using a new column name, and the old and new sensor data is merged as it arrives, with no changes to the old sensor or database by anyone. The data is not stored as text, but is ‘strongly typed’  with 10 basic data types plus special ‘EntityClass’ and ‘Attribute’ data types that encode the semantics. The ‘sensors’ could be IoT’ devices, Java programs, Python scripts or ‘curl’ commands or other ‘RESTful’ data source or sink or even users. The data is compatible with backed-up or distributed databases and so on. There is no rewriting of a global database schema to incorporate the data – the structure is in the data.

Here is a table in ‘demo/readonly’ that contains BLOBs or ‘binary long objects’ which happen to be images in this case.  This is a ‘table’ called ‘Pictures’, with  four ‘columns’ or attributes.

 

Below is some documentation about the system in the flexible format. It’s not a pretty word doc with fonts and so on, but it captures the logic of a rich document that can be concurrently edited. All of it down to individual paragraphs is independently and concurrently editable by multiple users, and the embedded tables can be added, removed, and edited too, down to one cell at a time, by multiple users. You can use hierarchical numerical or text-titled sections, deeply nested documents or tables and so on. It is real database data, mixed in with any other kind, accessible to remote programs.

 

It is actually a table that just looks like a rich document. The ‘keys’ at the left are the section titles, and the contents of the ‘description’ column is the text. The text has multiple embedded tables – the visible one is called ‘Display Components’, where the key is the name of a screen widget, and the single column it has is the description of the widget. The rest of the document is similar. The structure of this is determined solely by a set of ‘Items’ in the database with a very simple format. The back-end GUI  web app interprets the Item patterns to generate the display resembling a document.

Now let’s get wild and see a really nested table . This is the definition of a ‘pattern query’ found in the public database ‘demo/readonly’, which is how you can transform structures of data in a database to change almost anything about its organization, or select data, or sort, and so on. The nesting is very deep, because the queries work on input and output patterns rather than a painful query language like SQL. First of all, there is an outer table called ‘Queries’. The keys are the query names, so you can re-use queries. Inside that are the ‘definition’ and ‘description’ columns, where the first is the query’s specification, and the second is an explanation, a little documentation about what it is for. You can see the value of nesting documents inside tables here. The description is not limited in size, and can be rich, although individual paragraphs are always limited to 1K characters. The definition has nested columns called ‘pattern’ and ‘result’, plus a nested table called ‘Where’. Don’t worry about the things deeply nested inside those right now. They just represent the input and output of data in the rest of the database. You can execute these queries yourself at https://infinitydb.com:37411, (user testUser password db) or experiment, creating anything you want in the public ‘demo/writeable’. A ‘nested’ structure just adds some suffix data components to some ‘outer’ prefix data components.

This is just more structure represented by the almost trivial data model called the ‘ItemSpace’, with normal data types plus the two special flexible data types ‘EntityClass’ and ‘Attribute’ mixed in to some of the Items. The GUI formats it with certain simple fixed rules into a graphical display based on patterns in the data. The displays above follow these fixed rules, without any special ‘formatting’ instructions or external data structure. That means every structure above is nestable and can be combined with any other structure at any time. The ‘document’ shown above is not a file, but a data pattern. It could be read from and written to at any level of hierarchical detail in JSON or BLOB format. The ‘IndexedSamples’ table is not a file but a data pattern, as are the ‘Picture’ and ‘Pattern Query’ displays. All can be combined, with any degree of detail and richness, anywhere, and are just aspects of the same underlying flexible data model. Some operations on the data can ignore the patterns in it, so they work with raw data, JSON paths, documents, tables, pictures, and anything else, all nestable. The ‘Transfer Suffixes’ web data browser feature does this for example.

The Flexible ItemSpace Data Model

Now let’s describe this trivial flexible data model. Here is the full backend browser page looking at a ‘Trees’ table in the flexible format. The functions of the display components are described in the table ‘Documentation’ in database ‘demo/readonly’ using the flexible format. The user data could instead be represented by simpler low-level ‘raw’ format, resembling comma-separated files i.e. raw variable-length tuples, or raw logical JSON with 10 data types by just not mixing in data of two special data types. In raw format, you still use the special ‘EntityClass’ data type for the first data component to distinguish the unlimited named ‘tables’ from each other.

 

 

The ‘Current Prefix’ here is like your ‘current working directory’ in the shell. This prefix contains an ‘Item’ which is composed of strings, longs, doubles, floats, Booleans, dates, indexes, short byte arrays, short byte strings, and short char arrays, but in the flexible format it also can have any combination of two optional additional special ‘EntityClass’ and ‘Attribute’ type components that describe the schema of the Item internally. There are 12 data types in total.

There’s no schema structure defined anywhere but inside the flexible-format ‘Items’ themselves. The database is nothing more than an ordered set of these Items. When an Item is inserted, say with the insert button or by a secure Python or Java client, or by a curl command in the shell, the database schema is effectively extended at that moment. Deleting the Item reverses it, leaving behind the exact original structure. Any kind of structure can be created instantly and painlessly: more JSON trees, raw or flexible tables, rows, new attributes, values, nested structures, whatever.

This self-extending system allows us, for example, to add a brand new EntityClass – think of an EntityClass like a ‘table name’ for now. All raw or flexible data begins with an ‘EntityClass’. An EntityClass or Attribute contains a string to name it, with an EntityClass beginning with an upper case character, an Attribute beginning with a lower case letter, and thereafter zero or more letters, digits, dot, dash, or underscore.  (The .-_ can be used for Morse code when necessary.) Here is an Item with an EntityClass called ‘Trees’ and then an ‘entity’ data component for the tree type, “red fir” – which is like a key – then a new Attribute we are creating at the same time called “type” and then “conifer” at the end for the value of the Attribute:

You can read this Item like a sentence: “There is a Tree called a red fir whose type is conifer”. Now I just insert it with the Insert button, and there is a new table:

If I delete that Item, the table vanishes, leaving nothing at all behind. Now I’ll edit the Item in the current prefix line it to put in a new tree – “oak” as a “deciduous”.

When I insert it, there are now two rows. I can get back the Items by clicking on the table display – so the original Items re-appear in the current prefix. I can delete the entire table with the ‘delete suffixes’ button after clicking the ‘Trees’ EntityClass. I hit ‘commit’ from time to time when I like the current state, so if I goof, I can hit rollback. (This is the ‘global’ transaction feature, not the fine-grained ‘Optimistic’ ACID feature obtained by checking ‘Transactional’ explained elsewhere).

Adding a column happens when an Item with a brand new Attribute is inserted. Let’s add “hardwood” as an attribute with the value ‘false’. (Oak is not a hardwood actually.) Insert caused the table to widen by the new column. The cell for ‘red fir’ under hardwood is gray because it has no Item. Here I’ll insert false and it goes white. Each white cell has an Item in the database.

 

I can edit the data cells – here’s the edit box containing “deciduous” now I can update it and hit checkmark, or hit plus after changing the text in the box to get a new entry, or hit minus to delete the entry in the edit box. The edit box appears when you click on the already selected cell.  You can’t add structure, only ‘data’. For structure changes, you use the Current Prefix. Structure change basically means creating a brand-new EntityClass or Attribute. The table GUI finds the Attributes to display by looking forwards a bit in the database from the Current Prefix.

The tables we get this way are more flexible than regular tables. You can put structure inside the cells as well as data. Let’s say we add ‘larch’, and then discover that ‘conifer’ and ‘deciduous’ are not mutually exclusive! We can put in multiple values to fix it. This is not possible in standard tabular or relational DBMS. Hence standard relational tables require an initial analysis ‘pass’ by the developers in which they figure out the desired capabilities of an application once and for all, and then create the schema and write apps that assume that schema. InfinityDB does not do that, but is more ‘agile’.

Now let’s say we find that we want to store facts about red as well as white oak. We can edit the “oak” cell to add “red” and hit ‘+’.

 

And now we have:

Now there is a row starting with “oak” “red”, and we can put in the hardwood and type attributes for it. The original “oak” row can be changed to “oak” “white” by clicking and editing and clicking the checkmark. The result is a table with ‘composite keys’. The fact that there are multiple components in some keys is OK. Any white cell can have any combination of any number of components of the 10 ‘primitive’ data types, including string, number, Boolean, date and so on.

We call the white-cell components under ‘Trees’ ‘entities’ and those under ‘hardwood’ or ‘type’ we say are ‘values’. A sequence of zero or more primitive components in a white cell is a ‘tuple’. Each of the multiple values of an attribute can be a different tuple. So the “conifer” and “deciduous” of the “larch” form two values, not a tuple. If you move the pointer over a tuple, it goes yellow, including all of the primitive values in it, so you can distinguish multi-component tuples from multi-value attributes if they wrap. Each value can be a tuple, with no limit on the number of values. Tuples should stay relatively short, though.

Relational systems cannot have keys of varying numbers of columns – this is baked in, and impossible to change later. Also, relational systems limit the data types of the keys and values to a fixed type. This is good and bad, because the limitation keeps the data ‘clean’, but it precludes extension. Using varying-length tuples, we can do things like create hierarchies, where the entity tuples represent the paths to the substructure. Using varying data types, we can still expand when the data type assumptions turn out later to be too limiting. We can combine numbers and text, to form hierarchically numbered sections that can have titles too.

We’ll extend this even further in a minute, but as an aside, here’s a bit about the non-tabular modes.

The JSON and Item formats

If you need the JSON data format, here is our Trees table (click on ‘Trees’, select ‘show as Extended JSON and click load). You can edit, cut and paste and save it or email it. The initial ‘Trees’ EntityClass component is implied. Note that the keys are Attributes in some places – we can actually use any of the 12 data types for keys or values or list elements. To get rid of that behavior, see ‘Underscore quoted JSON’, in which the 12 data types are encoded as strings with an initial underscore to identify them for compatibility with standard JSON.  Plain strings that happen to have an initial underscore have one more ‘stuffed in’ at the front to avoid being interpreted as non-string data types. There are data types for raw hex-displayed short byte arrays or byte ‘strings’ or short char arrays with which we can store BLOBs, even in JSON. A JSON list uses the ‘Index’ data type.

{
    "larch" : {
        type : {
            "conifer" : null,
            "deciduous" : null
        }
    },
    "oak" : {
        hardwood : false,
        type : "deciduous",
        "red" : null
    },
    "red fir" : {
        hardwood : false,
        type : "conifer"
    }
}

Here are the underlying Items for the Trees (Click on ‘Trees’, Select Show as Items and hit load). The initial ‘Trees’ EntityClass component is implied. The text of each component uses extended JSON format.

"larch" type "conifer"
"larch" type "deciduous"
"oak" hardwood false
"oak" type "deciduous"
"oak" "red"
"red fir" hardwood false
"red fir" type "conifer"

The secure Python and other RESTful access uses JSON, while secure Java uses discrete Items or batches in their underlying binary form for extreme speed.

Nested Tables

Back to tabular view again. Let’s add more info to the flexible Trees table because we realized we have nurseries that stock them for sale. (In a relational system, we would create a new Nurseries table and have a connection table with a fixed composite key of nursery id and species id including quantities, then set up relational integrity maintenance mechanisms.) We can transform and query our Trees table easily with the pattern query feature into a different form at any time, but we want other people and fixed data sources like IoTs and older databases like backups or distributed databases to remain compatible with the existing structure.  (And, we don’t need three tables and annoying joins every time we access it as in relational systems.) So let’s add a subtable to each tree that lists the nurseries and on-hand stock. We click on the “oak” “red” to get back the Trees “oak” “red” Item, then add to it:

 

This can be read “There is a tree called ‘oak’ and it is a ‘red’ subtype, which is in a nursery in the location aptos in quantity 2”. We start ‘nursery’ with lower case to make it an Attribute, and ‘Location’ with upper case so it is an ‘EntityClass’. Now we have a nested table:

This kind of extension is almost limitless. Any attribute can have any number of values, distinct nested tables, distinct nested attributes, lists, pictures or BLOBs all at once. The structure grows as data flows in. It is determined entirely by the placement of components in the Items that flow in and out. Any possible set of Items has a unique corresponding representation, either in the table display, JSON, or a text Item list. A particular sensor or script producing data will often keep sending in Items of the same structure, but new data sources come along all the time, with new structure to merge in. If the structure becomes limiting in some way, it can be transformed almost limitlessly using the ‘Pattern Query’ feature.

The semantics of the Items containing EntityClasses and Attributes depends only on how pairs of them occur in the Item. There are four ways to pair them at any position in the Item:

Pairing Meaning
EntityClass then data then Attribute then data a ‘table’
EntityClass then data then EntityClass then data a ‘sub-table’
Attribute then data then Attribute then data a ‘sub-attribute’
Attribute then data then EntityClass then data a ‘nested table’

The data parts are any adjacent sequence of zero or more of the 10 primitive data types. The adjacent primitives are a ‘tuple’. The tuple after an EntityClass is an ‘entity’, and the tuple after an Attribute is a ‘value’.

Examples of the rich GUI displays you will see for different simple and complex Item patterns are shown in the database ‘demo/readonly’ and the EntityClass ‘Documents’ at https://infinitydb.com:37411 user name ‘testUser’ password ‘db’.

 The Index Data Type

There is also an ‘index’ primitive data type that allows lists of Items to be described, as shown in the ‘Documentation’ EntityClass above, where the paragraphs are numbered by ‘[n]’ components. These can be appended to easily, and they form lists in the JSON form, which are easily editable to handle renumbering if you insert new paragraphs in the middle.

Dialing it Back with Access Permissions

Of course this is so flexible that it can get out of hand if everyone is just adding data any time they want. So in an organization, there will be various means of coordinating the changes. This is done currently by having multiple databases with different access permissions for each role, and users can be given multiple roles. Then, a user can own a database and use it for loosely organized things, or a more formal database’s structure can be maintained by agreement at routine meetings in various groups, or a single person or two might be the controllers of the structure of a ‘curated’ database, which can be made read-only to many other users and data sinks via role permissions. There is a single ‘admin’ user who creates databases, users, roles, and permissions.

In the future, ‘Metadata’ will be provided, which allows a database to be restricted to certain patterns, EntityClasses, Attributes, tuple structure, data types and so on, so that multiple users can edit it in a controlled way. The Metadata can be enhanced as needed to allow more structure, some of it strict, some loose. The existing Pattern Query feature can help in the process of creating more formal databases out of less formal, or extracting the formalized parts out of a database containing a mixture of data of various levels of formality or quality.

Pattern Queries

A very powerful feature is the ‘Pattern Query’, which can transform an ItemSpace in a wide variety of ways based on only an input ‘pattern’ and an output ‘result’ plus a ‘Where’ table. These query definition elements are stored as normal Items, so they can be viewed in tabular form in the web-based data browser and editor or used in any other way as data themselves. They can be named and individually documented for re-use. The definition pattern and result can resemble tables and other ‘flexible’ structure given by EntityClass and Attribute components that may exist in the Items. Pattern and result Items contain constants and ‘symbols’ which connect them together, specifying the transfer of information from the input Items to the output Items, and the re-arrangement of symbols in the output.

The queries can do these things with simple sets of a few definition Items and no SQL or other language:

  • Pattern matching of input Items:
    • match multiple correlated input Items – these ‘join’
    • match constants by given data component, tuple, EntityClass or Attribute – these ‘select’
    • match any given data type, any EntityClass, any Attribute, any tuple, or any suffix – these create symbols
    • ignore any unmatched Items – these ‘project’
  • Result output creating new Items:
    • output symbols matched in the input as individual components, tuples, or suffixes at any position
    • place constant components: a given data component, EntityClass, or Attribute, at any position
    • create multiple output Items

These capabilities are as powerful as the relational ‘select’, ‘project’, ‘join’, and ‘order by’ and more, but far simpler. To get ‘order by’, one moves matched symbols towards the front of the output Items.

The optional ‘Where table’ specifies facts about the symbols, which are named by normal strings:

  • Symbols may be given fixed literal values with the ‘equals’ Attribute for input, output, or both
  • Symbols may be given ‘type’ Attributes to match in input, such as matching any data component, any given data type, any EntityClass, any Attribute, any tuple, or any suffix
  • (Future: Symbols will eventually be given by math expressions for predicates and results)

The input pattern can be pointed at a given input prefix, and the results can be stored back into the database or merged or differenced with existing Items under a given result prefix. Therefore, data can be easily moved, copied, trimmed, annotated, filtered, simplified, canonicalized, sorted, restructured, re-nested, and more, with no complex syntax like SQL. When a matched component or tuple or suffix ends up being moved closer to the front of a set of output Items, it re-orders them, providing a sort. Setting a symbol to a constant in the ‘Where’ and then executing the query effectively parameterizes it.

(The internal execution engine is like a relational ‘nested loops’, where the performance can be extremely high when the input Items have known prefixes or match as mall number of prefixes.)

Transfer Suffixes

Another back-end feature is the ability to move data quickly based on a pair of prefixes. Any Item can be a source or destination prefix, and data can be moved within or between databases, for backup, database copying or structure re-organization. The operands are the two sets of suffixes. The operation can copy, move, union, difference, or intersect the two sets of suffixes. Data ‘aliasing’ is handled – for example, the error that occurs in Unix when a directory is moved ‘inside itself’, or ‘outside itself’ does not occur. The operation does not depend on the pattern of components in the Items.

Custom Structures

The types of structures we have discussed above do not limit the uses. Applications sometimes create specific structures, such as text indexes or time-series databases. These will show up in the browser/editor as ‘raw’ Items of the 10 primitive data types without the EntityClass or Attribute data types.

Differences with the Embedded Version

Many applications are light-weight, using only one or a few database files, and they use only InfinityDB Embedded. No back-end server is necessary for many applications, but the database is then used in a single JVM process. Security and administration of an InfinityDB Embedded file is much simpler. InfinityDB Client/Server requires a directory structure to store web pages, data files, SSL keys and certificates, and encrypted metadata. InfinityDB Client/Server requires an ‘admin’ user – at least at first – to manage users, roles, permissions, and databases. The need for these arises as soon as data is exposed on a port, and a simplistic remote socket connection protocol on the other hand cannot easily add security and other features.

Licensing

We provide our clients with free online trials and limited personal trial data storage at https://infinitydb.com:37411, and we license the InfinityDB Client/Server software for private on-premises use, such as behind a firewall or on client’s secure public servers. We license it for inclusion in client’s products as well. We are working on providing servers within IoT’s, so that an IoT can collect data on its own, and then provide it to applications or users on demand.

Contact us at support@boilerbay.com.