The ItemSpace Data Model

InfinityDB is based on an ‘ItemSpace’ data model, with a unique minimal API that is elegantly simple and which is the basis for a wide range of ‘superstructures’. Complete applications of diverse types are often built on top of the ItemSpace alone. Here we discuss the model as briefly as possible. Also see the basic ItemSpace example code.

The Definition of ‘ItemSpace’

An ItemSpace represents a sorted set of ‘Items’, each a Item being a fixed-length sequence of 0 to 1665 chars. There is no other state. ItemSpace is an abstract class descending from the abstract ItemOutput (early versions combine these). The initial chars are most significant, and shorter Items come first. Above this level, the char sequence in the Items has a particular ‘component’ structure described below, but in many situations the meaning of the chars is ignored. For example, InfinityDB exposes its contents as an ItemSpace.

There are four essential abstract mutation methods in ItemOutput:

  • insert(insertedItem) – add an Item
  • delete(deletedItem) – remove an Item
  • deleteSubspace(prefixItem, boolean isDeletePrefix) – remove all Items starting with prefixItem, and optionally the prefixItem itself.
  • update(insertedItem, int prefixLength, boolean isDeletePrefix) is an atomic:
    • deleteSubspace(insertedItemUpToPrefixLength, isDeletePrefix), followed by
    • insert(insertedItem).

There are four essential abstract retrieval methods in ItemSpace. Each returns the Item in the ItemSpace nearest to a given Item. They return true iff a matching Item was found. A matching Item has the same chars as the given searchItem up to the given prefix length.

  • first(searchItem, int prefixLength) – get the nearestItem >= searchItem
  • next(searchItem, int prefixLength) – get the nearestItem > searchItem
  • last(searchItem, int prefixLength) – get the nearestItem <= searchItem
  • previous(searchItem, int prefixLength) – get the nearestItem < searchItem

Many convenience methods are provided by ItemOutput and ItemSpace based on the above. In many ItemSpaces, these operations are thread-safe, and often highly concurrent. For speed, batches of Items can be read with nextItems(item, int prefixLength, byte[] buf, int off, int len) and written with writeItems(byte[] buf, int off, int len).  The ItemSpace convenience methods may be overridden for speed or other reasons.

ItemSpace Implementations

ItemSpace implementations can ‘wrap’ or delegate to others, or provide special functionality. Some provide ‘views’ of others according to various rules. Many problems can be solved by simply composing ItemSpaces of various kinds. Another route is custom subclassing of ItemSpace, often in order to wrap instances of one or more other ItemSpaces.

These ItemSpaces are provided with InfinityDB:

  • InfinityDB database exposes its data as an ItemSpace, adding transactionality and much more
  • AndItemSpace views the logical intersection of multiple given ItemSpaces
  • OrSpace views the logical union of multiple given ItemSpaces.
  • AndOrItemSpace is an optimized nested AndItemSpace and OrItemSpace with no nesting limit
  • IndirectItemSpace delegates to a given ItemSpace
  • ItemSubspace – views the suffixes of a given Item of a given ItemSpace
  • RangeItemSpace – views a given ItemSpace over a fixed range of Items
  • VolatileItemSpace – is a simple in-memory ItemSpace
  • DeltaItemSpace – views a given static ItemSpace as if it were mutable, optionally applying mutations later
  • RemoteClientItemSpace – is a transparently remote ItemSpace using the ‘ItemPacket’ protocol
  • TransactionalItemSpace – handles transactions for InfinityDB
  • BufferedItemSpace – speeds mostly-sequential operations by batching, such as for remote access
  • RangeBufferedItemSpace – speeds mostly-local retrievals by batching, such as for remote access
  • ReadOnlyItemSpace – throws Exception on mutations
  • EmptyItemSpace – always contains nothing

The provided ItemOutputs are:

  • DiskBasedMergeSorterItemOutput – does time-efficient sorting of large datasets
  • ItemSpaceBasedMergeSorterItemOutput – does space-efficient sorting of large datasets
  • OutputStreamItemOutput – writes mutations to an OutputStream in an ‘ItemPacket’ format

There is also the independent ‘ItemInput’ class:

  • InputStreamItemInput – reads Items or batches of Items from an InputStream using an ‘ItemPacket’ format
  • ItemSpaceScannerItemInput – produces Items in sequence from a given ItemSpace under a given prefix

An example of composition is a fast dynamic boolean-logic query view of any underlying wrapped ItemSpace. This requires only the appropriate compositions of the provided AndSpace and OrSpaces. The AndSpace uses an efficient ‘zig-zag’ algorithm, and it handles ‘negative’ inputs. Deep nestings can optimize and flatten themselves. The query needs no execution phase, but just ‘views’ the underlying ItemSpace.

The Component Structure

When it is necessary to add semantics to an ItemSpace for applications, the Items are formatted into ‘components’. Each component represents a single data element in one of 12 types. Each component contains a type code, followed by its value in a variable-length format, and components are packed in sequence at the front of the Item. The format of components is permanently fixed, and has never changed. Applications do not need to know the format. The components are extracted by scanning from the front of the Item, or they can be appended to an Item. The types are:

  • Long  – stores all integral values, with smaller numbers taking less space
  • String – stored with terminating 0 and quoted internal 0’s.
  • Float
  • Double
  • Boolean (requires only one char – a ‘true’ and a ‘false’ type code)
  • Date – a millisecond time
  • char[] – a short array which must fit in an Item, with its length at the front
  • byte[] – a short array which must fit in an Item, with its length at the front
  • ByteString – like a string but binary for sorting purposes, with no initial length char
  • Index – a special kind of long that is used for ‘vertical’ array-like structures
  • EntityClass – a ‘non-primitive’ ‘metadata’ component that describes the semantics
  • Attribute – a ‘non-primitive’ ‘metadata’ component that describes the semantics

Multi-Item Structures

Sets of Items are often used to represent arbitrarily large structures. Any such structure can be nested inside any other set of Items by prepending a constant prefix to identify it. The most common are:

  • BLOB’s or CLOB’s: ‘Binary Long OBject’ or ‘Character Long OBject’ are structured as an Index component determining a particular block, followed by a block in a byte[] or char[] component of up to 1024 elements. The utility classes BinaryLongObjectOutputStream, BinaryLongObjectInputStream, CharacterLongObjectOutputStream and CharacterLongObjectInputStream provide this.
  • Texts: an Index component designating a line number, followed by a String
  • Huge Sparse Arrays: an Index component indicating the position of the following components in a ‘vertical’ logical array, with arbitrary gaps
  • Tables: the components following a given prefix can be thought of as a ‘tuple’. The prefix components are the table id and key.
  • Trees: each Item is a path to a leaf, with a variable number of components

Item Representations

Sets of Items can be formatted, stored and communicated in various ways:

  • In InfinityDB: Items are prefix-compressed, packed, UTF-8 and ZLib-compressed, in variable-length blocks
  • As JSON: any set of Items can be formatted and parsed as JSON under a prefix Item. There are extensions to JSON to handle all of the 12 data types, plus an ‘underscore-quoted’ format to adapt to standard JSON syntax. Index components designate JSON arrays.
  • As ‘ItemPackets’:  a fast, checksummed binary stream of Items and mutation operations for communication, sorting, or external file storage
  • Tokenized: each component type has a default text format based on JSON or Java
  • As URL-quoted tokens: for RESTful access via Python, curl, or browser. See this Picture (user ‘testUser’ and password  ‘db’) or this JSON
  • In a Cu ‘cursor’ in a private char[] with a private current length.

The Cu Class

A Cu ‘cursor’ instance holds one Item, for temporary manipulation, and it has no other state. The contained Item can be appended-to, parsed, truncated, and more. A Cu is allocated very quickly with Cu.alloc() and returned to the pool by Cu.dispose(), often for only a few operations. The ItemSpace methods pass Items in and out via Cu references. A Cu can be thought of as a char[] of length 1665 with a current length.

Appending to a Cu

Various objects can be appended with Cu.append(…):

  • null, which does nothing
  • A primitive like long, int, short, byte, double, float, boolean
  • A primitive wrapper like Long, Double, Float, Boolean, String, or Date
  • A short array like byte[] or char[], which becomes a corresponding component of array type
  • An array of Objects, which is appended by appending its elements according to these rules
  • A CuAppendable implementer, which provides a custom appendTo(Cu)
  • A Cu, either completely or after an offset
  • Instances of ByteString, Index, EntityClass, or Attribute
  • char, which is special – it appends a single ‘raw’ char without a type code (it is not a component)

Or, for special cases: appendXXX(…)

  • appendTime(long milliseconds), to append a Date
  • appendIndex(long index) to append an Index
  • appendDirect(…) for special purposes.

Parsing a Cu

Given an offset into a Cu, components can be extracted or skipped:

  • Components of known expected primitive type with Cu.typeAt(int off), or  Cu.skipType(int off).
  • Components of known expected primitive Wrapper type with Cu.TypeAt(int off)
  • A generic component with Cu.componentAt(int off), or Cu.skipComponent(int off).
  • An Object[], with Cu.compositeAt(int off), or Cu.skipComposite(int off) (does not include EntityClasses or Attributes)
  • An Object[], with Cu.toComponentArray(int off) any components (includes EntityClass and Attribute)

EntityClass and Attribute Components

The optional EntityClass and Attribute ‘non-primitive’ components provide an elegant way to create very rich ‘flexible’ structures as explained in InfinityDB Client/Server Java NoSQL Database. There, you will find a description and graphical views of the structures as can be seen in the server back-end database browser. These two special components allow schema metadata to be embedded in the Item rather than maintained externally. This allows a schema to be effectively extended at the moment an Item with a new structure enters or becomes visible in an ItemSpace, or likewise to have the schema effectively restored when an Item leaves or becomes invisible. An EntityClass or Attribute has a special type code, followed by an ID long (in newer versions, the ID may also be a String, with a letter followed by letters, digits, and ‘morse code’, meaning dot, dash, and underscore. EntityClasses start with upper case, Attributes start with lower-case.)

For more on the ItemSpace, see the Basic ItemSpace Example Code, see the Manual and Download a Trial. Email support@boilerbay.com.

The Map Model Wrappers

All ItemSpace operations can be indirectly invoked via an optional Map model, based on an extended java.util.concurrent.ConcurrentNavigableMap. Just construct an InfinityDBMap or InfinityDBSet around any ItemSpace. There are many extensions. See InfinityDBMap Access and InfinityDBMap Example code.