Here are the meanings of these features:
Atomic | A transaction either commits entirely at a single 'moment' or not at all. Often, each transaction commit will be considered to have moved the system forwards by one 'SCN' or 'System Change Number' or similar. InfinityDB does not use version numbering. |
Consistent | The data that a transaction commits is self-consistent, not partial or corrupted. This is largely the application's own issue. |
Isolated | Transactions can execute concurrently but not interfere with each other (except for delays or retries caused by lock conflicts or deadlocks). Transactions can begin and optionally commit or discard changes they make in any order and in any possibly overlapping pattern. There can be no violation of read/write or write/write ordering dependencies between the transactions. Other transactions never see the 'dirty' data or work of another transaction before the other transaction commits. |
Durable | Once committed, a transaction will never disappear, even if there is some kind of application or system crash. It is not acceptable for there to be a report to the application of a completed commit with a later loss of the transaction. |
Essentially all transaction processing systems that provide isolation use use some form of locking. This is a serious burden on the system and its clients, but it has never been possible to eliminate practically. Sometimes locking is carried out in a strict predetermined order or some knowledge can be introduced to help reduce the need or simplify the locking. However, in general-purpose systems, there is typically no advance knowledge about the locking pattern. The InfinityDB 2.0 ACID transaction facility has no inherent restrictions on locking patterns and requires no client consideration other than for the handling of rare OptimisticLockConflictExceptions as discussed below. It is possible for clients to improve performance by ordering accesses, however.
commit()
captures all modifications up to that point, with
guaranteed durability due to a special copy-on-write
protocol. There is no inter-thread isolation. There is
also a no-wait-for-durability type of commit for speed.
InfinityDB 2.0, adds transparent multi-Threaded transactionality.
Each Thread can either view and modify the database using
the previous global ACD protocol, even with
unlimited concurrent Threads, but now it can also enter
the 'ACID' domain by creating and using
TransactionalItemSpace
. The ACD
level is still available for backwards
compatibility, as in InfinityDB 1.0, and it
can be used by Threads coexisting with ACID user Threads.
The ACD mode is now used normally only for bulk loading or large
queries or other situations where ACID level
transactionality and consistency are not
important. The previous ACD level
is fully Thread safe, but only globally transactional,
while the ACID level provides non-thread-safe
TransactionalItemSpaces
, normally one per
thread. The client threads do not need to know about
the locking going on, and can simply proceed with
their normal accesses and modifications as desired,
with locks set as needed by the system.
The InfinityDB 2.0 ACID implementation uses
'optimistic' locking, in that data to be worked
on by a transaction in a particular Thread
is assumed to be generally available for use and
lock conflicts are usually rare. In practice,
the system is also useful in situations with high conflict
rates due to considerations we will describe below
(e.g. fine lock granularity and scheduler characteristics).
The optimistic system differs mainly from pessimistic systems in that
there are no cases where data to be locked is waited for at
the same time as locks are held. When locks are both held and
waited on by a transactions, there is always
the possibility of a cyclic wait-for
graph - in other words a deadlock between two or more threads that
hold data or other resources desired by the others.
InfinityDB does not use pessimistic locking, so
there is no possibility of deadlocks. When InfinityDB's
commit()
is invoked, it never fails
due to locking issues.
The InfinityDB optimistic protocol is two-phase. While locks
are being acquired for data, it is always
possible that a transaction executing in a particular
Thread will encounter a lock already held by another transaction in
another Thread, and if
so, the existing locks of the requester are all immediately released,
all changes are rolled back,
the transaction is reset to the not-in-progress state,
and an OptimisticLockConflictException is thrown. The Thread
desiring the data must start over by invoking
begin()
again, and by repeating the transaction's
work, with its view of the database brought up-to-date.
Releasing all locks on conflict may seem draconian, but the alternative is equally draconian in a pessimistic locking system: in a pessimistic system there must be some kind of deadlock detector, usually running slowly (on the order of seconds) that finds deadlocks and throws Deadlock Exceptions into waiting Threads. It is possible to detect deadlocks as the locks are set, but this is in theoretical complexity and in practice very slow. The result, in any case, is that a pessimistic transaction or an optimistic transaction both must respond to failure cases by handling failure returns or Exceptions. The optimistic system fails immediately, however, rather than waiting for the deadlock detector. The ostensible advantage of a pessimistic system is an increase in scheduling fairness in highly contentious situations, but in practice, such situations simply produce 'deadlock storms'. Highly contentious situations are always difficult and normally just require reducing Thread count. but see Fairness below.
In practice, the handling of the OptimisticLockConflictException can:
The simplicity of the data model of InfinityDB - the ItemSpace - allows for an equally simple locking system. Each lock corresponds to a single Item that locks all Items in the database of which it is a prefix. It also locks itself. When a lock conflict occurs, a subclass of OptimisticLockConflictException is thrown.
When a Transaction desires to set a lock, it does not
need to know that it is doing so: it just goes about its
way, doing normal data accesses and modifications, and the
locks are automatically set as needed.
For example, a next(Cu cu, int protectedPrefixLength)
operation is one of the basic retrieval operations, and it
just sets a lock for the prefix that corresponds to the
characters of the Item currently in the cursor cu
, shortened
to the length given by protectedPrefixLength
.
The next(Cu cu, int protectedPrefixLength)
simply changes the characters of the Item currently in cu
to be the next stored Item in sorted
sequence greater than that in cu
while preventing a change to the given Item in the
characters before the protectedPrefixLength
.
Therefore, next(Cu cu, int protectedPrefixLength)
cannot possibly need to see Items other than those
having the given prefix, and the lock can be requested
on the protected portion of the characters in cu
.
This locking is transparent to the client, and very fast.
For insert(Cu cu)
and delete(Cu cu)
,
the entire sequence of chars in the cu
Item
is used as the lock prefix.
Other optimistic systems usually have a 'version number' column in each
table that is used to detect 'in-flight collisions', whereas InfinityDB
automatically locks all data without such a column. Instead,
every piece of data that is read or written is automatically
locked, and conflicts are reported
immediately, not on commit. Hence commit()
or
rollback()
can never cause a lock conflict.
It is possible for a transaction to obtain a group of locks and then request and obtain a lock prefix that 'contains' those locks. In other words, the new lock prefix obtained is a prefix of some set of existing locks. The request will throw an OptimisticPromotionLockException if any of the existing locks are owned by other transactions, but in case they are not, all of the contained locks will be replaced by a 'promoted' lock having the given prefix. This is expensive to the extent that a large number of existing locks may need to be dealt with. OptimisticPromotionLockException is a subclass of OptimisticLockConflictException.
There is no limit to the number of locks, either per-Thread, or per-transaction, or per-database or in total for the JVM, other than total heap memory, within which the prefixes are kept efficiently packed together. The lock prefixes are not Java Objects, but only data packed efficiently into internal B-trees, saving a considerable amount of space, especially on 64-bit platforms. Individual lock prefix space usage will typically be a few bytes to tens of bytes, or more depending on the application.
The 'deltas', which are the pending per-transaction changes to the database, are also kept packed efficiently in in-memory BTrees until commit. As for locks, only JVM memory limits the total delta usage for all transactions. Note that while the sum of lock and delta storage is limited by memory, the typical alternative is to lock entire blocks in memory, which is hundreds of times less efficient and can easily lead to memory pressure failures. Commit() and rollback() release most of the resources associated with a TransactionalItemSpace, but eventually a TransactionalItemSpace.dispose() should be invoked to avoid leaks instead of allowing a TransactionalItemSpace to be garbage collected.
A TransactionalItemSpace requires space starting in the 10's of KB. There is at least one 10KB BTree block for the locks, one for the insertions, and one for the deletions, and, if necessary, more blocks as transactions grow. On commit, the blocks beyond the initial three, plus some overhead, may be reclaimed by the system. This space usage description is only an approximation and only applies to v2.0.x.
The lock performance, either for testing or setting
a lock, goes as O(log(nLocks)).
Ignoring the commit time, during a transaction
the deltas provide insert performance
as O(log(insertedItemCount)). The deletes are similar,
performing as O(log(deletedItemCount)). These operations are very fast
compared to the modifications of the InfinityDB itself,
so the bottleneck often occurs at commit instead, which incurs
the final actual updates to the database in the block cache
as well as the final I/O's and two sync()'s. The I/O's
and sync()'s are shared between concurrently committing
threads: this is a 'group commit' and it
can have a dramatic effect on performance, so it is
good to have many threads. One to a thousand threads is
common. Another dramatic performance improvement
can be had by using the commit(boolean isWaitingForDurable)
method, with the parameter false. This causes
all but one thread to return immediately on commit():
the one 'unlucky' thread makes sure the I/O's and sync do
happen soon.
Retrievals during a transaction are slightly
slower than modifications because they must examine both the deltas
and the base InfinityDB in order to provide
to the thread an up-to-date view of the modifications it
has done to the ItemSpace. However, for speed,
the thread can opt to bypass the
consultation of the deltas during retrievals via
TransactionalItemSpace.setPassThroughRetrievals(boolean isPassThroughRetrievals)
.
One way to simplify and speed up a transaction is to access all of the
necessary Item prefixes at the beginning of the
transaction: after that, no more lock conflicts can occur.
Accessing the necessary prefixes early usually only
requires arranging code properly or doing one or a few
extra accesses. For example, a loop
that modifies a large number of Items can be moved after an access
that sets a lock that covers all of the inserts, avoiding a
lock promotion. Note again that the ACID rules are
automatically enforced no matter what lock ordering strategy
is used, and any lock ordering can be used without consideration
of locking if desired. To see a printout of the lock prefixes
that are in effect, use
ItemSpaceTransactionManager.dumpLocks(PrintStream)
or
TransactionalItemSpace.dumpLocks(PrintStream)
.
The optimistic model has a perceived flaw in that the
retries after lock conflict - the throwing
of the OptimisticLockConflictException
-
can loop for ever. This does not happen in practice.
Fairness is provided by the Thread scheduler, which tries to keep each Thread busy, and tends to cause retries to work fairly. Below is an output of a tester that shows locks being set and released fairly. The 'S' is a successful commit, with the following digit identifying the Thread, an 'i' is an insertion attempt, the '_' is a prefix lock conflict, and a ' ' is a promotion conflict. The prefix lengths are randomly generated, so some cover more ground than others, but they all get their locks in a short time. Some of the shorter generated prefixes correspond to table locks, some longer prefixes correspond to row locks, some to value locks within a row, some to index entry locks, and so on, but these correspondences, if they are imposed by the client at all, are not enforced by or even visible to the locking system. Each transaction in this test locks 10 prefixes by doing 10 random inserts. There are 10 threads. There are no sleeps. OptimisticLockConflictExceptions are handled by looping. This is typical in testing, and has been tested for a range of 1 to 1K Threads. The test database is only 500MB, so conflicts should not be rare. As can be seen, the optimistic lock protocol is working well as in fact long runs of ' ' or '_' are not common in the data - only short runs, and all Threads, identified by their thread number, get access.
commits/s total=68.82312456985547 count=400 t=5.812 dt=1.656 this commits/s=60.38647342995169 iiiiiiiii_iiiiiiiiiiS4iiiiiiiiiiS9iiiiiiiiii S6iiiiiiiiiiS3iiiiiiiiiiS4iiiiiiiiiiS5i_ii_iiiiiiiiiiS8iiiiiiiiiiS7iiiiiiiiiiS9i iiiiiiiiiS2iiiiiiiiiiS1iiiiiiiiiiS0iiiiiiiiiiS6iiiiiiiiiiS4iiiiiiiiiiS2i_iiiiiii iiiS9iiiiiiiiiiS0iiiiiiiiiiS7iiiiiiiiiiS5iiiiiiiiiiS3iiiiiiiiiiS8iiiiiiii i i i i i iiiiiiiiiiS1iiiiiiiiiiS4iiiiiiiiiiS7iiiiiiiiiiS1iiiiiiiiiiS9iiiffiiiiiiiS6iiii iiiiiiS3iiiiiiiiii_iiiiiiiiiiS8iiiiiiiiiiS5iiiiiiiiiiS0iiii_iiiiiii_iiiiiiiiiiS2 iiiiii_iiiiiiiiiiS2iiiiiiiiiiS9iiiiiiiiiiS1iiiiiiiiiiS0iiiiiiiiiiS4iiiiiiiiiiS3i iiiiiiiiiS5iiiiiiiii_iiiiiiiiiiS6iiiiiiiiiiS7iiiiiiiiiiS8iiiiiiiiiiS6iiiiiiiiiiS2 iiiiiiiiiiS9iiiiiiiiiiS3iiiiiiiiiiS7iiiiiiiiiiS1iiiiiiiiiiS4iiiii_iiiiiiiiiiS8ii iiiiiiiiS0iiiiiiiiiiS5iiiiiiiiiiS8iiiiiiiiiiS4iiiiiiiiiiS5iiiiii_iiiiii i iiiiii iiiiS3iii_iiiiiiiiiiS7iiiiiiiiiiS2iiiiiiiiiiS6iiiiiiiiiiS0iiiiiiiiiiS1iiiiiiiiii
For performance reasons, other ACID systems always must make concessions with respect to the consistency of the data, even though the 'C' stands for consistency. InfinityDB does not make concessions. There are four 'standard' basic transaction isolation levels: READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, and SERIALIZED. InfinityDB tries to stay only with the best: SERIALIZABLE. Whenever a transaction-aware programmer hears this, the assumption the programmer makes is that there must be locks that limit access to the entire database in a strict sequence, with no transaction overlap, and terrible performance. InfinityDB avoids the SERIALIZED performance problem by using extremely fine grained locks. Using the SERIALIZABLE level means there are no possible reads of data that change on subsequent reads, and no possible 'phantom' reads. Most other databases provide READ_COMMITTED level by default for performance, but this and the other non-SERIALIZABLE transaction isolation levels allow inconsistencies to appear in certain cases, and these inconsistencies are normally simply ignored. InfinityDB does provide READ_COMMITTED as well, but it is not generally necessary (it differs from SERIALIZABLE in that it sets exclusive write locks but no read locks.) In the examples directory you can find a 'ConcurrencyTest.java' that demonstrates this and allows performance testing and experimentation.
The locking granularity is as fine as possible for every transaction. Although InfinityDB does not innately use the table/column/value terminology or relational concepts but instead the ItemSpace model, we can still compare the effective lock granularity assuming a typical application-imposed organization of the data. There are then effectively locks possible on tables, rows, column values, and index entries or even index key prefixes, with the finest granularity usable for each lock that is set. In fact every possible InfinityDB client-imposed data structure that can be devised on top of the ItemSpace data model is automatically locked at the finest possible granularity.
InfinityDB 2.0, as its predecessor 1.0, requires no recovery log. The transactionality is hidden inside the single InfinityDB data file, without any other new files required. The existing InfinityDB files will continue to work, with no changes to the format and no automatic upgrades to the files. Also, the new API is backwards compatible.
Almost all (possibly all) other transactional DBMS require a transaction log, which is used for transaction rollback and transaction durable commit. The committed data for a transaction is kept in this log and migrated into the actual database as quickly as possible. On crash recovery, the log is scanned again and the transactions are redone or undone as necessary. This log can grow very large if there are long-running update transactions, and it must be kept in a safe place, associated with its related database in case recovery is needed.
InfinityDB instead uses a 'Copy on Write' style system inside the database file to ensure global atomicity, rollback capability, and durability. This original InfinityDB feature is built on by the 2.0 ACID transactionality. Hence there is no time-consuming recovery phase on system bring-up: InfinityDB databases recover instantly.
The API for InfinityDB transactions is the
standard Java Transaction Architecture or JTA v1.0.1 (there is
also a JTA 1.1 but it is not implemented
currently in InfinityDB). However most of the methods there
are not implemented or necessary. In order to create a transaction,
one starts by asking the InfinityDB to create a new TransactionalItemSpace
and thereafter the
begin()
,
commit()
,
commit(boolean isWaitingForDurable)
, and
rollback()
methods are available on that TransactionalItemSpace.
A TransactionalItemSpace is non-thread-safe, so normally one
(or, rarely, more than one) will be created for the use of each client thread.
In some circumstances, such as in legacy code, the TransactionalItemSpace
will be visible only as as its superclass ItemSpace
,
and the commit() and so on are not visible. In these
situations, the new ItemSpace.getTransactionManager()
can return a ItemSpaceTransactionManager
which can be used instead for
begin()
commit()
and so on.
There is no need for Objects of the Transaction class in the
current implementation. Without
a Transaction class, there is no suspend()
or
resume(Transaction)
. These methods are needed
rarely in normal JTA in order to switch the thread-local transaction
context between different transactions: in InfinityDB,
each TransactionalItemSpace contains a transaction context,
so there is no need to switch back and forth.
Here is an example of transactional code, using a single insert() to get a lock. We then read the stored Item back from outside the transaction. There are no threads, so this is rather contrived.
package com.infinitydb.manual; import java.io.IOException; import com.infinitydb.Cu; import com.infinitydb.InfinityDB; import com.infinitydb.OptimisticLockConflictException; import com.infinitydb.TransactionalItemSpace; public class ACIDTest { /** * Do the simplest possible thing using InfinityDB 2.0 ACID * Transactionality. */ public static void main(String[] args) { try { // catch IOException from InfinityDB.open() or db.insert() InfinityDB db = InfinityDB.create("c:/temp/junk", true); /* * Create a TransactionalItemSpace. This is only for * demonstration - it is necessary only with multiple * competing threads. */ TransactionalItemSpace tdb = db.getNewTransactionalItemSpace(); /* * We use the looping technique to force success. Instead we could * just report the error or skip the transaction. */ for (boolean isSuccess = false; !isSuccess;) { tdb.begin(); try { /* * Do as many operations as desired. They will automatically * be locked invisibly. If there is a collision, the * Exception is thrown and we choose to retry. We could have * a retry counter too. */ tdb.insert("hello"); isSuccess = true; } catch (OptimisticLockConflictException e) { /* * The lock has been removed, but will be attempted again on * the next db.insert() above */ isSuccess = false; } } tdb.commit(); // Always do either commit or rollback /* * Find the Item we inserted. The Item has one component: a string */ Cu cu = Cu.alloc("hello"); if (tdb.exists(cu)) { System.out.println("success!"); } /* * Release resources. */ tdb.dispose(); } catch (IOException e) { e.printStackTrace(); } } }
Look in the JavaDoc for ItemSpaceTransactionManager
and TransactionalItemSpace
for full details.
The examples directory contains ConcurrencyTest.java, which will
fully demonstrate the system. The ConcurrencyTest can show the
global consistency maintenance property of the SERIALIZABLE isolation level,
and can show performance characteristics, lock conflict
frequency, and much more.