InfinityDB Embedded

InfinityDB Embedded is a Java  NoSQL DBMS component with flexibility far beyond that of document databases  without sacrificing the features of an RDBMS. Now InfinityDB Server is available as well, and it is based on this code. InfinityDB Encrypted is an enhanced version that includes the option to encrypt 100% of the data 100% of the time.
InfinityDB Embedded and InfinityDB Encrypted are Java components for use in Java applications, licensable separately directly from Boiler Bay. They have superior performance, multi-core concurrency, compression, transaction support, safety and near indestructability. Authenticate and encrypt 100% of the data 100% of the time with AES256, HMAC-SHA256, X509 signatures and key rotation. The API and data model are very simple. Each db is a single file so administration is easy. Many custom uses.
 InfinityDB Embedded is easy to use:
  • Accessed by a single JVM at a time – hence it is ’embedded’
  • Simple 10-method ‘ItemSpace’ API (insert, delete, delete suffixes, update, first, next, last, previous, commit, rollback)
  • 12 data types (string, double, float, long, boolean, date, index, short byte array, short byte string, short char array, EntityClass, Attribute)
  • Character Long OBjects, Binary Long OBjects, long texts of paragraphs and nested structures
  • Optional extended standard Java ConcurrentNavigableMap adapter access with composite keys and values, multi-maps, nestable
  •  Can represent nested key/value, trees, graphs, documents, huge sparse arrays, tables, JSON, based on application usage
  • 1M ops/sec in memory multi-threaded
  • Multi-core Concurrency – Threads do not block each other, which is vital for servers. Patented.
  • Compression – ZLib, UTF-8, variable data, variable blocks, shared prefixes
  • Transactions – optimistic ACID for Threads, or global ACD for bulk operations
  • Zero administration – one file per database, no install or upgrade scripts, no logs, no configuration
  • Instant recovery – a safe disk write pattern prevents corruption or data loss
  • Flexible data representation using ‘EntityClass’ and ‘Attribute’ metadata data types if desired
  • Dynamic views of data for limited instant queries – set logic views, delta views, ranges, more
  • Remote InfinityDB Server access using ItemPacket protocol for virtual local DB

Zero Administration

InfinityDB Embedded applications can run indefinitely with no DBA attention for installation, management, application upgrade, or schema definitions like create table scripts.

A Database is a Single-File

InfinityDB  Embedded uses a single file for all purposes. The combination of this feature and the instant guaranteed recovery on abrupt application termination help make InfinityDB Embedded administrator free. No logs need to be archived or re-applied. There are no configuration files, temporary files, or text logs. No junk files are left behind after any kind of termination, so there is never any cleanup.

 Reliability and Safety

InfinityDB Embedded uses a rugged internal storage update protocol for persistence on demand or cache spilling to disk for large amounts of data that maintains system-wide data integrity, and survives abrupt application termination or other problems. The single data file remains up-to-date, safe, correct, and usable through any event. There is no log-based recovery, hence restart and recovery is immediate in all cases. No unexpected Exceptions are ever thrown: not even due to any kind of deadlock or internal resource limits (optional optimistic locking throws expected Exceptions on conflict however). No dangerous off-heap storage or native libraries are used.

Efficient Storage

Continuous Space Reclamation

Space allocation for individual and aggregated data is fully dynamic: no space is used until structures are created or after they are deleted. During growing or shrinking, structure storage is always minimal and efficient. The single data file is 100% efficient with compressed data on initial loading, and stays at least 50% efficient in the worst case after very large global transactions, which may include any amount of data. Normally, free space is about 10%. The single file never shrinks. Applications can run forever without gradual space loss. There are no temporary peaks in space usage, or temporary external files. There is no need for occasional reorganization or packing, and there is no garbage collector thread. All freed space is recycled on commit or rollback. Deletions or updates do not leave sparse structures behind – all freed space is reclaimed completely for immediate reuse without rebuilding indexes or running offline reorganizers.

High Compression on Disk and In Memory

InfinityDB Embedded  uses continuous, dynamic ZLib and UTF-8 data compression to pack data into variable-length blocks, avoiding almost all wasted space that would normally be needed for internal fragmentation. I/O bandwith is reduced accordingly. Variable-length binary-encoded primitives, variable-length concatenations of primitives or ‘Items’, and prefix and branch-cell suffix compression are used on disk and in the memory cache as well. Data compression means that the branching factor is kept high for fast access, and the OS file cache is better used. For compressible data, 10x is often achieved. There is no pre-allocation or waste in ‘extents’, ‘segments’, ‘clusters’, or fixed-size blocks.No gradual space leaks can occur because free space management is transactional. Any size database benefits from the compression, from 10KB to 100GB and beyond.

Simple API Can Model a Document DB

There are two APIs: one is the versatile, fast low-level proprietary ‘ItemSpace’ and the other one is an extended nested Map view. These can be converted to an extended JSON and back, but that is not normally necessary. A database is logically a single document, but can scale to terabytes with fast access at any level of detail.

Nested Map-based Access

The nested Map view is a wrapper around the basic ItemSpace API, and it implements and extends the java.util.concurrent.ConcurrentNavigableMap, thereby providing the capability of a ConcurrentHashMap or ConcurrentSkipListMap. InfinityDBMaps may contain other InfinityDBMaps or InfinityDBSets which are extended ConcurrentSets. The InfinityDBMap is a light-weight Object which can be constructed dynamically without itself being persisted: the Map mutator methods actually store data in the database.  Extensions to the NavigableMap API include:
  • composite keys – any mixture of data types with a variable component count per key
  • composite values or set elements
  • multi-map – efficient unlimited values per key
  • tuple views – tuples are Object[] in the interface, but not in storage
  • nestable Maps and Sets

Direct No-SQL ‘ItemSpace’ API

For the ultimate speed and extreme flexibility, the simple lower-level ‘ItemSpace’ API allows you access to the same data as the Map-based view. There are only 10 essential storage and retrieval methods that operate on the ItemSpace: insert, delete, deleteSubspace, update, first, next, last, previous, commit, and rollback. You gain low-level access by momentarily allocating a ‘Cu’ cursor, and then using it for the API method invocations and disposing it. There are helper utilities for things like text indexes, hierarchical sorting, inversions, and more. Applications can define rich creative models on top of the ItemSpace.

Fast Multi-Core Design

InfinityDB Embedded was already incredibly fast, but then we redesigned it to make use of all cores at the same time, each operating safely on a different thread. Now, InfinityDB Embedded runs at 1 million ops per second on 8 cores with good scaling. You can take advantage of this speed immediately on a server, or you can use multiple threading in your  application. Cores are multiplying at Moore’s-law speed, and applications are adding more and more threads. Without the multi-core technology in InfinityDB Embedded to avoid inter-thread interference, bottlenecks called ‘convoys’ can occur when threads contend for data. Performance can drop dramatically, even far below single-thread speed. The concurrency algorithm is patented.

Transactionality

Two kinds of transactions are available:

Global Transactions

Global. This persists all current changes to disk, providing Atomic, Consistent, and Durable semantics. It does not use any kind of lock, so it does not provide inter-thread Isolation. However all access is concurrent during the commit by any threads. Effectively, there is a single ‘global transaction’ in effect at all times. Optimistic Locking commits also cause global commits.
 Fine-grained multi-thread transactions use optimistic locking and support complete ACID atomic, consistent, isolated and durable semantics. Locks do not follow the usual rules of other DBMS’ but have the equivalent capability as table locks and row locks, index locks, or even single-column value locks and single set element locks. This diversity of lock types is not actually a complex spectrum of details – it follows trivially from the basic data model and is automatic and almost invisible to the programmer. If desired, the programmer can easily control the lock order for maximum concurrency simply by accessing appropriate data early in the transaction. The locks are actually just set on prefixes of tuples, i.e. prefixes of Items, and are maintained transparently. The set of locked prefixes is kept in memory per database globally and also associated with each thread. Lock conflicts throw an OptimisticLockConflictException and are optionally retried by the application code. Concurrent optimistic transactions can reach hundreds of commits per second on disk, and thousands per second on flash.

Data Structures

Applications do not need to invent binary encodings or convert primitives to binary or text. Data is not stored as formatted text or as custom raw binary, but as an intermediate form, with standard pre-defined binary encodings of the individual Java primitives in a consistent way that allows extremely high speed. InfinityDB Embedded supports all primitive Java data types and more:
  • long (stored as compressed bits to handle byte, short, char, with no more space)
  • float
  • double
  • boolean
  • String (stored as zlib compressed UTF-8)
  • Date/time
  • index (for ‘huge sparse arrays’, lists in JSON, and BLOBs/CLOBs, texts)
  • short byte and char arrays (sort by length first, used for BLOBs and CLOBs)
  • short byte strings (sort like strings but with bytes instead of 2-byte chars)
  • ‘EntityClasses’ and ‘Attributes’ which are optional metadata for rich ‘flexible’ structures.

Application-Specific Data Models

InfinityDB provides a rich data representation space for structured, semi-structured, or unstructured data. The  basic data model is simple but flexible enough to be used by the application to define and represent any mixture of trees, graphs, key/value maps, documents, text indexes, huge sparse arrays, tables with an unlimited number of columns of an unlimited number of values per column, nested multi-maps, inverted Entity-Attribute-Value triples, or creative custom structures.

Items and the ItemSpace

Also see The ItemSpace Data Model for a simple higher-level description. All structures in the entire database are represented as a magnitude-ordered set of ‘Items’ which are each a short variable-length composition of one or more arbitrary strongly-typed variable-length binary-encoded ‘components’. An Item can be thought of as a variable-length tuple, but is at base a logical array of 0 to 1665 chars – this internal binary format allows great speed and compression. These ordered Items represent the entire state of the database. All other conceptual upper-level structures are composed of Items with an application-defined meaning. Prefixes of Items are often used to logically nest Items into arbitrary recursive sub-spaces, i.e. sets of suffixes. All basic access to the database uses a temporary ‘Cu’ cursor containing one Item and no other state. The binary encoding of each component in an Item is transparent to the application, which uses only Java primitives indirectly to build and examine Items in a cursor. The internal binary encoding is done by InfinityDB Embedded in a fixed permanent way. Variable-length Items can represent multiple sets of fixed-length tuples, the equivalent of multiple CSV files, or can represent paths to JSON type data. JSON can be parsed and formatted from the Items.  The JSON is not stored literally: the entire database can be accessed at any level of hierarchical detail, because there is no fixed predefined division between keys and JSON documents.

Flexible Extensible Data Structures with ‘EntityClass’ and ‘Attribute’ Data Types

If the special EntityClass and Attribute data types are mixed in with the other ‘primitive’ data types in the Items, flexible, ‘incrementally self extending’ structures can be represented. See the InfinityDB Client/Server page for a graphical view of some examples of the flexible structures. An initial EntityClass component is normally used to separate data for unlimited independent uses even without the flexible structuring in a single InfinityDB Embedded file. An EntityClass is encoded as binary but contains a string with the format [A-Z][A-Za-z0-9._-]*. An Attribute matches [a-z][a-zA-Z0-9._-]*. When used to represent a flexible tabular structure, keys can be:
  • ‘tuples’, where a tuple is any concatenation of zero or more primitives of any type,
  • heterogenous – different keys can have different primitive types or tuple types,
  • variadic – different keys can  be tuples of different lengths,
  • nestable sparse arrays or lists of unlimited size of any key type, i.e. lists, using the ‘index’ data type
Flexible table column values can be the same as keys plus:
  • multi-valued, with no limit on number, and where an absence of any value takes no storage,
  • CharacterLongObjects or BinaryLongObjects of unlimited size.
Furthermore, any such flexible structures can be nested by concatenating their Items onto the ends of other Items. A particular set of suffixes can contain any kind of nested structure. The ‘EntityClass’ and ‘Attribute’ components can represent four patterns depending on their pairings:
  • EntityClass then data then Attribute then data: a ‘table’
  • EntityClass then data then EntityClass then data: a ‘sub-table’
  • Attribute then data then Attribute then data: ‘a sub-attribute’
  • Attribute then data then EntityClass then data: ‘ a nested table’
The data immediately following an EntityClass is called an ‘entity’, and the data immediately after an Attribute is a ‘value’. The GUI display of such flexible structures is very rich – see it in action in InfinityDB Server. The displays look like nestable ‘documents’, tables, lists, trees, and so on. The flexible structures are very extensible at runtime, such as by adding nestings, changing the data types or lengths of entities or values, and more.

Forwards and Backwards Schema Compatibility

The ItemSpace model is inherently extensible, but with the flexible ‘EntityClass’ and ‘Attribute’ metadata embedded in the Items, databases become ‘self-describing’ and can be extended in ways that avoid incompatibilities with earlier or later database backups, old or new application versions, or changing or extending data sources and sinks like scripts or IoT’s or distributed databases.

Virtual View ItemSpaces

InfinityDB Embedded provides many utilities for dynamically viewing one or more underlying ItemSpaces as a virtual ItemSpace . All underlying ItemSpace changes reflect immediately in the virtual view ItemSpace. A view is a true ItemSpace itself:
  • ItemSubspace  virtually hides and restricts by a fixed prefix of an ItemSpace;
  • DeltaItemSpace is a mutable view of a fixed underlying ItemSpace with its own commit and rollback;
  • AndSpace views a logically intersected set of underlying ItemSpaces;
  • OrSpace views a logically unioned set of underlying ItemSpaces;
  • RangeItemSpace views a limited range of Items;
  • VolatileItemSpace stores Items in memory non-persistently;
  • IncrementalMergingItemSpace views a special kind of index that can be incrementally built and optimized efficiently at any size while being accessed concurrently. Concurrent deletions are allowed. Text indexing is one use.
Views can be nested. An arbitrarily deep nesting of AndSpace and OrSpace can be flattened automatically for best speed. These capabilities provide a type of instant dynamic query capability without indexes, query compilation, execution, or temporary space usage. The virtual ItemSpaces are light-weight Objects. Any number of views can exist at once. The views can underlie the Map-based wrappers. They work with the flexible data representation using EntityClass and Attribute data types as well.

More Information

See the old original Manual for detailed information. See the original Documents from 2002 on the internal structure or the principles for constructing any higher-order data model from the trivial underlying ‘ItemSpace‘ data model. For a Free Trial Download see the shop. Here is the InfinityDB Embedded_Trial License.

The new InfinityDB Server

Please see the new InflinityDB Server for secure shared remote access to a set of InfinityDB Embedded files. The flexible data representation described above is shown there in graphical form. InfinityDB Server has web-based administration and data browsing and editing, SSL security, REST access via Python or curl, ItemSpace access via local or remote Java, and ‘Pattern Queries‘ for NoSQL restructuring, filtering, queries and more. For licensing, email support@boilerbay.com. The InfinityDB Server is available on AWS Marketplace or separately for on-premises use.