commit() is invoked are saved
atomically, and all updates after commit()
returns are not saved.
A new setOverlappingCommits(boolean)
makes commits completely atomic and also speeds
commits by allowing multiple Threads to be inside the
commit() method at the same time rather than
serializing the commits. The performance improvement of
overlapping commits is roughly proportional to the
number of Threads waiting inside the commit() invocation.
A new beta feature also allows 'best-effort' or 'no-sync'
commits for even greater speed, still preserving atomicity.
There is a global InfinityDB.rollback()
can be invoked at any time, however,
the cache is emptied as a side effect, so it is not
fast. It throws away all updates that happened after
the most recent commit() returned, but it
waits for any overlapping commits to finish. Any amount
of changes can be rolled back.InfinityDB performance is not far from the maximum measured commit speed. The InfinityDB commit protocol involves flushing and syncing all dirty blocks from the memory block cache to disk, followed by writing and syncing a special block in the file header to lock in all changes. The maximum measured speed for a file sync operation in one test on a modern system in 2006 was about 55/sec, which implies that two random I/O's are being performed: this is in agreement with the requirement that a directory entry must also be updated to reflect the update time.
The InfinityDB double-sync reduces the maximum performance achievable;
however, the limitation will only be a bottleneck in
certain situations. For example, if transactions do more
than a few operations, such as updating multiple inversions
and therefore doing more than a few random disk block I/O's
(which run about 110/sec) then the commit speed limit is not
so noticeable. A large number of Threads, a large number of
cache misses between commits, or a large number of
update operations between commits will mitigate the expense
of the double-sync. This table shows the worst-case limits
for a 2.5GHz X86 when in overlapping-commit mode (ItemSpace.setOverlappingCommits(true)):
| Threads | Ops/Commit | Commits/sec | Ops/sec |
|---|---|---|---|
| 1 | 1 | 13 | 13 |
| 1K | 1 | 834 | 834 |
| 1 | 1K | 2.9 | 2900 |
| 1K | 1K | 50 | 50K |
| 1 | >>1K | approaches 130K | |
| 1K | >>1K | approaches 130K |
As can be seen, the worst case occurs with one Thread and one operation per commit, where we see 13 commits/sec, substantially below the 55/sec tested outer limit. However, the 55/sec tested limit was obtained with only one block being written and only on the end of a file, not randomly. Any change to this test setup, such as writing multiple blocks, reduced performance considerably. Note the very positive effect of multi-Threading, and note the high throughput with more operations per commit.
commit(). With this feature,
Threads can continue quickly while being given only a
'best-effort' assurance that data will become durable.
In this mode, commit performance is almost removed as a
performance consideration. The feature is used simply by
invoking commit(boolean isWaitingForSync).
When commit does occur, it is still atomic and durable.
The actual delay between invocation of commit()
and durability - the latency - is normally a maximum of a few
seconds, but usually much less than a second. Even
with only one Thread, no-sync performance is very high.
A mixture of no-sync and normal commits can be used, but no-sync commits can overwhelm normal commits if there is an extreme inbalance favoring the no-sync commits, in which case the overall rate of sync's can reduce to the order of several seconds. This only occurs when there are many Threads and only a few are doing normal commits. There is still 'sharing' of the sync's between committers, so the normal commit throughput is not actually as low as the above might imply. The effect will show up as an increased latency for normal committers in heavily loaded, highly Threaded environments. One way to fix this is to throttle the no-sync committers by using Thread.sleep(), for example.
When sync's are far apart, experiments have shown that the disk block writes actually become more efficient, and throughput is increased. This may be due to the disk and driver using an 'elevator' algorithm to reduce head motion, or to writing multiple blocks within one spindle rotation.
insert() and delete()
update operations other than as related to commit speed. If
commits are very rare or no-sync commits are used,
then the throughput becomes the primary performance
consideration. The basic in-cache operation
performance of InfinityDB is approximately:
| Operation | Ops/sec at 2.5GHz | Ops/sec/GHz |
|---|---|---|
| retrievals | 250K | 100K |
| updates | 130K | 52K |
VolatileItemSpace is several times faster, but data is not durable (no file is used.) Run com.infinitydb.examples.InfinityDBPerformanceTest for more precise numbers on your system.
InfinityDB was designed to maximize throughput by optimizing in-cache operations as well as disk I/O. Blocks that are updated or read repeatedly while in the memory cache do not incur I/O on each update; instead the updates are batched by an I/O Thread, and blocks are written to disk in background as space is needed in the cache. Each block contains only Items in sort order, and Items with common prefixes are generally kept together in one block as well. Threads that require block I/O do not interfere with Threads that need data already in memory.
Blocks in InfinityDB are about 10KB in memory with
75% utilization (25% free space inside the block),
and since Items are often about 30 bytes (very roughly, as
Items can be structured in many ways) there should be
nominally 250 Items per block. However, prefix compression
may double the Items per block, while using
CharacterLongObjects and BinaryLongObjects
may reduce it to only a few. (Note that blocks
on disk are much smaller than 10KB due to compression during block write.)
The in-cache insert speed on a 2.5 GHz CPU is about 130K operations/sec.
Thus blocks with 250 Items can be created by repeated invocations of
ItemSpace.insert() at a rate of about
520 per second. This far exceeds the random-access write rate
of blocks to disk (about 110/sec), so it is not
possible for continuous random insert()
or delete() operations to be limited by
CPU performance.
The above calculations have implications for the
choice of data model. The
Entity-Attribute-Value data model can be
considered 'update()-intensive'.
In EAV, a single 'record' is stored as a set of Items having
a common prefix like <ENTITY_CLASS, entity>
rather than, for example, as a single Item containing
all of the record's fields concatenated
(see Record Retrieval).
Given the update speed of InfinityDB, it is not possible for
the modification, creation or deletion of one
or more EAV 'records' or any other structure inside
a single block to be a bottleneck
when the sets of operations are associated with
random-access block I/O.