commit()is invoked are guaranteed to be saved atomically, and all updates after
commit()returns are not guaranteed to be saved. A new
setOverlappingCommits(boolean)makes commits completely atomic and also speeds commits by allowing multiple Threads to be inside the
commit()method at the same time rather than serializing the commits. The performance improvement of overlapping commits is roughly proportional to the number of Threads waiting inside the commit() invocation. There is a global
InfinityDB.rollback()that can be invoked at any time, however, the cache is emptied as a side effect, so it is not fast. It throws away all updates that happened after the most recent
commit()returned, but it waits for any overlapping commits to finish. Any amount of changes can be rolled back. This rollback is not friendly with the new ACID transactionality of InfinityDB 2.0, however.
InfinityDB performance is not far from the maximum measured commit speed. The InfinityDB commit protocol involves flushing and syncing all dirty blocks from the memory block cache to disk, followed by writing and syncing a special block in the file header to lock in all changes. The maximum measured speed for a file sync operation in one test on a modern system in 2006 was about 55/sec, which implies that two random I/O's are being performed: this is in agreement with the requirement that a directory entry must also be updated to reflect the update time.
The InfinityDB double-sync reduces the maximum performance achievable;
however, the limitation will only be a bottleneck in
certain situations. For example, if transactions do more
than a few operations, such as updating multiple inversions
and therefore doing more than a few random disk block I/O's
(which run about 110/sec) then the commit speed limit is not
so noticeable. A large number of Threads, a large number of
cache misses between commits, or a large number of
update operations between commits will mitigate the expense
of the double-sync. This table shows the worst-case limits
for a 2.5GHz X86 when in overlapping-commit mode (
As can be seen, the worst case occurs with one Thread and one operation per commit, where we see 13 commits/sec, substantially below the 55/sec tested outer limit. The best-case is, however, very fast. Note the very positive effect of multi-Threading, and note the very high throughput with more operations per commit.
Comparing this performance to a log-based system seems to favor the log. However, arguing that appending a log record is fast begs the question, as it is still necessary to write the updated data itself back to disk, so there is still a penalty of two writes. Furthermore, the log has to be flushed, and since it is a separate file, its directory entry (actuallly its inode) may need to be updated as well to reflect the lengthening. In InfinityDB, the directory metadata is ignored and not necessarily updated. For a 'write ahead log', there will be two sync's in a fixed order.
commit(). With this feature, Threads can continue quickly while being given only the assurance that data will become durable soon. The feature is used simply by invoking
commit(boolean isWaitingForDurable). When commit does occur, it is still atomic and durable. The actual delay between invocation of
commit()and durability - the latency - is normally a maximum of a few seconds, but usually much less than a second. Even with only one Thread, no-sync performance is very high.
A mixture of no-wait and normal commits can be used, but no-wait commits can overwhelm normal commits if there is an extreme inbalance favoring the no-wait commits, in which case the overall rate of sync's can reduce to the order of several seconds. This only occurs when there are many Threads and only a few are doing normal commits. There is still 'sharing' of the sync's between committers, so the normal commit throughput is not actually as low as the above might imply. The effect will show up as an increased latency for normal committers in heavily loaded, highly Threaded environments. One way to fix this is to throttle the no-sync committers by using Thread.sleep(), for example.
When waiting commits are far apart, experiments have shown that the disk block writes actually become more efficient, and throughput is increased. This may be due to the disk and driver using an 'elevator' algorithm to reduce head motion, or to writing multiple blocks within one spindle rotation.
delete()update operations other than as related to commit speed. If commits are very rare or no-wait commits are used, then the throughput becomes the primary performance consideration. The basic in-cache operation performance of InfinityDB on JDK 1.6 is approximately:
|Operation||Ops/sec at 2.5GHz||Ops/sec/GHz|
VolatileItemSpace is several times faster, but data is not durable (no file is used.)
InfinityDB was designed to maximize throughput by optimizing in-cache operations as well as disk I/O. Blocks that are updated or read repeatedly while in the memory cache do not incur I/O on each update; instead the updates are batched by an I/O Thread, and blocks are written to disk in background as space is needed in the cache. Each block contains only Items in sort order, and Items with common prefixes are generally kept together in one block as well. Threads that require block I/O do not interfere with Threads that need data already in memory.
Blocks in InfinityDB are about 10KB in memory with
75% utilization (25% free space inside the block),
and since Items are often about 30 bytes (very roughly, as
Items can be structured in many ways) there should be
nominally 250 Items per block. However, prefix compression
may double the Items per block, while using
CharacterLongObjects and BinaryLongObjects
may reduce it to only a few. (Note that blocks
on disk are much smaller than 10KB due to compression during block write.)
The in-cache insert speed on a 2.5 GHz CPU is about 200K operations/sec.
Thus blocks with 250 Items can be created by repeated invocations of
ItemSpace.insert() at a rate of about
520 per second. This far exceeds the random-access write rate
of blocks to disk (about 150/sec), so it is not
possible for continuous random
delete() operations to be limited by
The above calculations have implications for the
choice of data model. The
Entity-Attribute-Value data model can be
In EAV, a single 'record' is stored as a set of Items having
a common prefix like <ENTITY_CLASS, entity>
rather than, for example, as a single Item containing
all of the record's fields concatenated
(see Record Retrieval).
Given the update speed of InfinityDB, it is not possible for
the modification, creation or deletion of one
or more EAV 'records' or any other structure inside
a single block to be a bottleneck
when the sets of operations are associated with
random-access block I/O.