ACD Transactions

Previous Next

The Global Transaction Model of InfinityDB 1.0 and 2.0

Both old and new applications need to use the original 'ACD' mode of transactionality first provided in InfinityDB 1.0. New applications will often use the new InfinityDB 2.0 'ACID' level of transactionality, sometimes mixed with the 'ACD' mode of InfinityDB 1.0. There are situations where the old API and behavior are needed, and backwards compatibility has been provided for all cases. The file format is also unchanged in InfinityDB 2.0. Here again are the relevant ACID features, but we discuss them relative to the 1.0 'ACD' capability. Note that in InfinityDB 2.0.0 the old 'ACD' and the new 'ACID' modes do not conflict and can be used simultaneously.

Atomicity means that the commit operation either completes with all changes within the transaction being applied to the db or none being applied. InfinityDB versions prior to 1.0.48 are atomic except that modifications made by other Threads during the execution of commit() itself are not guaranteed to make it into the db on that commit cycle. All updates before commit() is invoked are guaranteed to be saved atomically, and all updates after commit() returns are not guaranteed to be saved. A new setOverlappingCommits(boolean) makes commits completely atomic and also speeds commits by allowing multiple Threads to be inside the commit() method at the same time rather than serializing the commits. The performance improvement of overlapping commits is roughly proportional to the number of Threads waiting inside the commit() invocation. There is a global InfinityDB.rollback() that can be invoked at any time, however, the cache is emptied as a side effect, so it is not fast. It throws away all updates that happened after the most recent commit() returned, but it waits for any overlapping commits to finish. Any amount of changes can be rolled back. This rollback is not friendly with the new ACID transactionality of InfinityDB 2.0, however.
The database changes from one consistent state directly to another. This is actually a property provided mostly by the application, since only the application can determine whether the db is in a state corresponding to its invariants (except for referential integrity constraints, which are sometimes provided by a DBMS but which are often disabled due to performance limitations.)
Multiple overlapping transactions must not interfere with each other by reading and/or writing data in conflicting ways. For the global transactionality model of InfinityDB 1.0 (which is preserved in 2.0) this isolation must be either unimportant or client- provided. The new InfinityDB 2.0 introduces locking, which allows multiple threads to be automatically isolated.
When a transaction finishes committing, the changes it has made to the db must never be lost. In most DBMS, this is handled by providing 'redo logs', which are files that constantly accumulate transaction data. InfinityDB uses a single-file model, and does not use logs at all, simplifying application installation, configuration, operation, space management, and more. A special on-disk file update protocol guarantees data retention, eliminating the usual 'recovery' process normally needed after hard shut-down. An InfinityDB database cannot lose internal consistency or lose committed data, barring media failure. There is a special high-speed commit mode that sacrifices durability protection for a short time window. This is commit(isWaitingForDurable)

Commit Speed

The speed of transaction processing applications in a DBMS is primarily limited by the speed of the I/O system writing one or more blocks on disk on the end of the redo log and then doing a file sync() on the log to flush data permanently to disk. When multiple Threads are available, all trying to commit at once, it is possible to write these final blocks containing transaction data for multiple transactions all at the same time, hence the performance bottleneck becomes #ThreadsInCommit/(BlockWriteTime + SyncTime).

InfinityDB performance is not far from the maximum measured commit speed. The InfinityDB commit protocol involves flushing and syncing all dirty blocks from the memory block cache to disk, followed by writing and syncing a special block in the file header to lock in all changes. The maximum measured speed for a file sync operation in one test on a modern system in 2006 was about 55/sec, which implies that two random I/O's are being performed: this is in agreement with the requirement that a directory entry must also be updated to reflect the update time.

The InfinityDB double-sync reduces the maximum performance achievable; however, the limitation will only be a bottleneck in certain situations. For example, if transactions do more than a few operations, such as updating multiple inversions and therefore doing more than a few random disk block I/O's (which run about 110/sec) then the commit speed limit is not so noticeable. A large number of Threads, a large number of cache misses between commits, or a large number of update operations between commits will mitigate the expense of the double-sync. This table shows the worst-case limits for a 2.5GHz X86 when in overlapping-commit mode (ItemSpace.setOverlappingCommits(true)):

1>>1K approaches 130K
1K>>1K approaches 130K

As can be seen, the worst case occurs with one Thread and one operation per commit, where we see 13 commits/sec, substantially below the 55/sec tested outer limit. The best-case is, however, very fast. Note the very positive effect of multi-Threading, and note the very high throughput with more operations per commit.

Comparing this performance to a log-based system seems to favor the log. However, arguing that appending a log record is fast begs the question, as it is still necessary to write the updated data itself back to disk, so there is still a penalty of two writes. Furthermore, the log has to be flushed, and since it is a separate file, its directory entry (actuallly its inode) may need to be updated as well to reflect the lengthening. In InfinityDB, the directory metadata is ignored and not necessarily updated. For a 'write ahead log', there will be two sync's in a fixed order.

No-Wait-for-Durable Commit

It is possible to request a commit without requiring durability immediately on return from commit(). With this feature, Threads can continue quickly while being given only the assurance that data will become durable soon. The feature is used simply by invoking commit(boolean isWaitingForDurable). When commit does occur, it is still atomic and durable. The actual delay between invocation of commit() and durability - the latency - is normally a maximum of a few seconds, but usually much less than a second. Even with only one Thread, no-sync performance is very high.

A mixture of no-wait and normal commits can be used, but no-wait commits can overwhelm normal commits if there is an extreme inbalance favoring the no-wait commits, in which case the overall rate of sync's can reduce to the order of several seconds. This only occurs when there are many Threads and only a few are doing normal commits. There is still 'sharing' of the sync's between committers, so the normal commit throughput is not actually as low as the above might imply. The effect will show up as an increased latency for normal committers in heavily loaded, highly Threaded environments. One way to fix this is to throttle the no-sync committers by using Thread.sleep(), for example.

When waiting commits are far apart, experiments have shown that the disk block writes actually become more efficient, and throughput is increased. This may be due to the disk and driver using an 'elevator' algorithm to reduce head motion, or to writing multiple blocks within one spindle rotation.


In this section, we discuss the performance limitations of the basic insert() and delete() update operations other than as related to commit speed. If commits are very rare or no-wait commits are used, then the throughput becomes the primary performance consideration. The basic in-cache operation performance of InfinityDB on JDK 1.6 is approximately:

OperationOps/sec at 2.5GHzOps/sec/GHz

VolatileItemSpace is several times faster, but data is not durable (no file is used.)

InfinityDB was designed to maximize throughput by optimizing in-cache operations as well as disk I/O. Blocks that are updated or read repeatedly while in the memory cache do not incur I/O on each update; instead the updates are batched by an I/O Thread, and blocks are written to disk in background as space is needed in the cache. Each block contains only Items in sort order, and Items with common prefixes are generally kept together in one block as well. Threads that require block I/O do not interfere with Threads that need data already in memory.

Blocks in InfinityDB are about 10KB in memory with 75% utilization (25% free space inside the block), and since Items are often about 30 bytes (very roughly, as Items can be structured in many ways) there should be nominally 250 Items per block. However, prefix compression may double the Items per block, while using CharacterLongObjects and BinaryLongObjects may reduce it to only a few. (Note that blocks on disk are much smaller than 10KB due to compression during block write.) The in-cache insert speed on a 2.5 GHz CPU is about 200K operations/sec. Thus blocks with 250 Items can be created by repeated invocations of ItemSpace.insert() at a rate of about 520 per second. This far exceeds the random-access write rate of blocks to disk (about 150/sec), so it is not possible for continuous random insert() or delete() operations to be limited by CPU performance.

The above calculations have implications for the choice of data model. The Entity-Attribute-Value data model can be considered 'update()-intensive'. In EAV, a single 'record' is stored as a set of Items having a common prefix like <ENTITY_CLASS, entity> rather than, for example, as a single Item containing all of the record's fields concatenated (see Record Retrieval). Given the update speed of InfinityDB, it is not possible for the modification, creation or deletion of one or more EAV 'records' or any other structure inside a single block to be a bottleneck when the sets of operations are associated with random-access block I/O.

Previous Next

Copyright © 1997-2006 Boiler Bay.