InfinityDB

We created the fastest, no-SQL, Embedded Java Database Engine.  Here are the features:

  • Nestable Multi-value (can represent trees, graphs, K/V, documents, huge sparse arrays, tables)
  • Strong Typing (with no text/binary trap. Plus CLOB’s, BLOB’s, short byte and char arrays)
  • Runtime schema evolution (for forwards/backwards compatibility)
  • 1M ops/sec in memory multi-threaded
  • Multi-core Concurrency – a true, scalable performance boost
  • Extreme Compression – ZLib, UTF-8, variable data, variable blocks, shared prefixes
  • Transactions – optimistic ACID for threads, or global ACD for bulk operations
  • Self administration (one file, no DBA, no upgrade scripts, no logs, no configuration)
  • Instant recovery – a unique disk write pattern prevents corruption or data loss
  • Simple API – instant developer productivity.
  • Dynamic views of data for queries – set logic views, delta views, ranges

Reliability and Safety

InfinityDB uses a rugged internal storage update protocol for persistence on demand or cache spilling to disk for large amounts of data that maintains system-wide data integrity, and survives abrupt application termination, file system bugs, and kernel panics. The single data file remains up-to-date, safe, correct, and usable through any event. There is no log-based recovery, hence restart and recovery is immediate in all cases. (There is one exception for power failure that is a fundamental weakness of any software explained in Operating Guidelines.)  No unexpected Exceptions are ever thrown: not even due to any kind of deadlock or internal resource limits (optional optimistic locking throws expected Exceptions on conflict however). No dangerous off-heap storage or native libraries are used.

Single-File Design

InfinityDB is designed to use one single file. The combination of this feature and the instant recovery help make this product administrator free. No logs need to be archived or re-applied. There are no configuration files, temporary files, or text logs. No junk files are left behind after any kind of termination, so there is never any cleanup.

Fast Multi-Core Design

Our product was already incredibly fast, but then we redesigned it to make use of all cores at the same time, each operating safely on a different thread. Now, InfinityDB runs at 1 million ops per second on 8 cores with good scaling. You can take advantage of this speed immediately on a server, or you can use multiple threading in your  application. Cores are multiplying at Moore’s-law speed, and applications are adding more and more threads.

Without the multi-core technology supplied by InfinityDB to avoid inter-thread interference, bottlenecks called ‘convoys’ can occur when threads contend for data. Performance can drop dramatically, even far below single-thread speed.

We are here to assist you in multi-threaded programming. Write to us at support@boilerbay.com

Mixed Relational and Application-Specific Data Models

InfinityDB provides a rich data representation space for structured, semi-structured, or unstructured data. The simple basic data model is used by the application to define and represent any mixture of trees, graphs, key/value maps, documents, text indexes, huge sparse arrays, tables with an unlimited number of columns of an unlimited number of values per row and column, nested multi-maps, inverted Entity-Attribute-Value triples, or creative custom structures.

Nested Map-based Access

There are two APIs, both of which are simple: one is the versatile, fast low-level proprietary ‘ItemSpace’ and the other one is a simple nested Map view.

The nested Map view is a wrapper around the ItemSpace, and it implements and extends the java.util.concurrent.ConcurrentNavigableMap, thereby providing the capability of a ConcurrentHashMap or ConcurrentSkipListMap. InfinityDBMaps may contain other InfinityDBMaps or InfinityDBSets which are standard ConcurrentSets. The InfinityDBMap is a light-weight Object which can be constructed dynamically without itself being persisted: only the Map mutator methods store data in the ItemSpace.

Direct No-SQL ‘ItemSpace’ API

For the ultimate speed and extreme flexibility, our trivial lower-level ‘ItemSpace’ API allows you access to the same data as the Map-based view. There are only a few storage and retrieval operations that operate on the noSQL InfinityDB ItemSpace. You gain low-level access by momentarily allocating a cursor, and then using it to insert, delete, update, locate, or scan data in the ItemSpace, which is nothing but an ordered set of variable-length ‘Items’. An Item can be thought of as an encoded tuple.

The database is defined entirely by the set of Items it contains – there is no other state. The Items are all kept and accessed sorted in the database, and can be accessed in sequence with prefix matching. All other structure is higher-level and is defined dynamically by the application via a few simple access operations. There is also a wide set of helper utilities.

Appropriate for Small to Large Installations

You can use InfinityDB as an In-Memory-Only DBMS keeping all data in the cache, or let it grow smoothly to hundreds of GB with no code changes. Access to data in the memory cache is fully multi-core, while infrequently used data is paged to disk transparently.

Data is operated on efficiently with fine granularity for small or large data structures regardless of memory capacity. Fine granularity accesses transition smoothly to coarse block-oriented granularity when and where needed.

Extensibility and Forwards and Backwards Compatibility

No application-defined data structures have physical or practical size limits but can always expand efficiently from zero size upwards to any size. Data structures that are empty take no space, hence any additional structure requires no reorganization, as each data structure effectively already exists virtually but with no size.

Applications can anticipate extensions and often provide some forwards compatibility and can provide vital full backwards compatibility. No upgrade or downgrade scripts are needed – in fact there are no scripts at all, only runtime-created structures.

For example, when data is structured and viewed as tables, a new table or column can be added at any time, because all column values are nullable and come into existence on the first use, taking no space until then. There is no physical or practical limit on number of tables, columns per table, or the number of values per column. A single-valued column that is to be converted to multi-value is already in the proper format, and more values can simply be inserted along with it. Column values can also become aggregate structures at runtime.

Continuous Space Reclamation

Space allocation for individual and aggregated data is fully dynamic: no space is used until structures are created or after they are deleted. During growing or shrinking, structure storage is always minimal and efficient. The single data file is 100% efficient with compressed data on initial loading, and stays at least 50% efficient in the worst case after very large global transactions, which may include any amount of data. Normally, free space is about 10%. The file never shrinks. Applications can run forever without gradual space loss. There are no temporary peaks in space usage, or temporary external files. There is no need for occasional reorganization or packing, and there is no garbage collector thread. Freed space is recycled immediately.

High Compression on Disk and In Memory

InfinityDB’s continuous, dynamic ZLib and UTF-8 data compression packs data into variable-length blocks, avoiding almost all wasted space that would normally be needed for internal fragmentation. I/O bandwith is reduced accordingly. Variable-length binary-encoded primitives, variable-length concatenations of primitives or ‘Items’, and prefix and branch-cell suffix compression are used on disk and in the memory cache as well. Data compression means that the branching factor is kept high for fast access, and the OS file cache is better used.

Transactionality

Two kinds of transactions are available:
  • Global. This persists all current changes to disk, providing Atomic, Consistent, and Durable semantics. It does not use any kind of lock, so it does not provide inter-thread Isolation. However all access is concurrent during the commit by any threads. Effectively, there is a single ‘global transaction’ in effect at all times. Optimistic Locking commits also cause global commits.
  • Optimistic. Fine-grained multi-thread transactions use optimistic locking and support complete ACID atomic, consistent, isolated and durable semantics. Locks do not follow the usual rules of other DBMS’ but have the equivalent capability as table locks and row locks, index locks, or even single-column value locks and single set element locks. This diversity of lock types is not actually a complex spectrum of details – it follows trivially from the basic data model and is automatic and almost invisible to the programmer. If desired, the programmer can easily control the lock order for maximum concurrency simply by accessing appropriate data early in the transaction. The locks are actually just set on prefixes of tuples, i.e. prefixes of Items, and are maintained transparently. The set of locked prefixes is kept in memory per database globally and also associated with each thread. Lock conflicts throw an OptimisticLockConflictException and are optionally retried by the application code. Concurrent optimistic transactions can reach hundreds of commits per second on disk, and thousands per second on flash.

Sensible Data Representation

Data is not stored as formatted text or as custom raw binary, but as an intermediate form, with standard pre-defined binary encodings of the individual Java primitives in a consistent way that allows extremely high speed combined with clarity of representation. Applications do not need to invent binary encodings or convert primitives to binary or text.

InfinityDb natively supports all common primitive Java data types and more:

  • long (stored as compressed bits so byte, short, and int take no more space)
  • float
  • double
  • boolean
  • String (stored as UTF-8)
  • Date/time
  • short byte and char arrays (sort by length first)
  • short byte strings (sort like strings but with bytes instead of 2-byte chars)

When used to represent a relational structure, InfinityDB column values can be either:

  • sets of an arbitrary mixture of:
    • composite i.e. a heterogenous concatenation of one or more primitive data types, or
    • any value type identified by a heterogenous composite of primitive data types
  • sparse arrays of unlimited size of any other value type, or
  • CharacterLongObjects or BinaryLongObjects of unlimited size.

In such an extended relational structure, keys can be limited-length composite and heterogenous. Heterogenous means having mixed but identifiable primitive types.

All structures in the entire database are represented as an ordered set of ‘Items’ which are each a short limited-length composition of one or more arbitrary strongly typed binary-encoded primitives. An Item can be thought of as a tuple. These ordered Items represent the entire state of the database. All other conceptual upper-level structures are composed of Items with an application-defined meaning. Prefixes of Items are used to logically nest Items into arbitrary recursive sub-spaces. All basic access to the database uses a single cursor containing one Item and no other state. The binary encoding is transparent to the application, which uses only Java primitives to build and examine Items in a cursor.

Avoid the Text/Binary Trap

Other NoSQL databases store either text or custom binary and are ‘traps’.

JSON or XML representations require slow formatting and parsing, are targeted only at document granularity within practical size limits, cannot sort by key or value, use only strings as keys, cannot compose keys, cannot natively or efficiently represent binary or character streams or ‘LOBs’, especially when they are long, cannot have multi-values, and can be space inefficient. Each random access requires constructing an entire JavaScript object or XML DOM. They do provide extensibility, however.

Java Object serialization is also targeted at chunk-at-a-time access within practical size limits, and does not provide key-based or other access of the chunks or to their internal structures. Programmers must carefully architect the storage structure, and that structure becomes bound to the class structure instead of the data semantics. Serialization has extensibility, versioning, security, upgrade, Object integrity, documentation, reliability, coding complexity, space efficiency, and other issues. Similar problems afflict POJO persistence.

Object/Relational mapping has the familiar, classic ‘impedance mismatch’ problem. The systems are complex and high maintenance. Database structure is determined by both the class structure and the relation structure, which must be versioned in sync and require both upgrade scripts and class code rewrites. Hence runtime extensibility is impossible. Objects end up with either embedded dynamic SQL or else heavy mapping frameworks to separate out the SQL.

Virtual View ItemSpaces

InfinityDB provides many utilities for dynamically viewing one or more underlying ItemSpaces as a virtual ItemSpace . All underlying ItemSpace changes reflect immediately in the virtual view ItemSpace. A view is a true ItemSpace itself:

  • SubSpace  virtually hides and restricts by a fixed prefix of an ItemSpace;
  • DeltaItemSpace is a mutable view of a fixed ItemSpace with commit/rollback;
  • AndSpace views a logically intersected set of underlying ItemSpaces;
  • OrSpace views a logically unioned set of underlying ItemSpaces;
  • RangeItemSpace views a limited range of Items;
  • VolatileItemSpace stores Items in memory non-persistently;
  • IncrementalMergingItemSpace views a special kind of index that can be incrementally built and optimized efficiently at any size while being accessed concurrently. Concurrent deletions are allowed. Text indexing is one use.

Views can be nested. A nesting of set operation views can be flattened automatically for best speed. These capabilities provide a type of instant dynamic query capability without indexes, query compilation, execution, or temporary space usage. The virtual ItemSpaces are light-weight Objects. Any number of views can exist at once. The views can use the Map-based wrappers.

More Information

See the Manual for detailed information. See Documents on the internal structure or the principles for constructing any higher-order data model from the trivial underlying ‘ItemSpace‘ data model. For a Free Trial Download see the shop.

For licensing, email support@boilerbay.com.

 

atlassian_logo(1) pacific_knowledge_systems_logo