March 9, 2010

New ACID Transactionality

Version 2.0 of InfinityDB is here, bringing in full ACID transactionality. This is an improvement over the already successful 'ACD' transactionality of v1.0, which provided 'Atomic', 'Consistent', and 'Durable' transactionality, but the 'I' for 'Isolation' was missing. Isolation requires some form of locking: InfinityDB 2.0 uses optimistic prefix locking. There is almost no API - just begin()/commit()/rollback() are necessary, as well as obtaining a TransactionalItemSpace from the InfinityDB that is already available. Locks are set automatically as applications proceed doing retrievals and modifications. Try out the ConcurrencyTest.java example program to test the performance and correctness - even a repetitive process kill test is there to show that the ACID rules are maintained. See the new unpacker in the downloads page.

InfinityDB 1.0 is still available, as it has so many years of in-the-field use that it is necessary to continue to be able to deploy it in critical little-attended or unattended environments. The v1.0 unpacker is also on the downloads page.

November 15, 2009

new Merger, ItemInput and ItemOutput streaming

There is a new high-speed sorted data stream
merger in InfinityDB as of version 1.1.0 beta,
which is available at infinitydb.com/beta.
This class merges multiple streams of sorted Items into one stream
with performance approximately equal to the performance of
any one of the inputs. This new
'ItemInputMergerItemInput' functionality and the
existing OrSpace are similar in that the both
represent unions of Items, but they differ in these ways:

* OrSpace wraps a set of base ItemSpaces so that
all of the Items visible in any of the base spaces
is visible in the wrapped space. The result is a fully
random-access, dynamic view, that can be plugged into
any other place an ItemSpace is expected. The
performance is approximately the sum of the
base space access times.
* ItemInputMergerItemInput is not random-access nor
dynamic, but is more efficient in that the performance
is approximately the same as each input independently.
Reading all of the Items in all of the streams takes
only a little more than the time to read all of each
stream. Thousands of inputs can be combined with no
substantial loss of performance. This class gets its input not from an ItemSpace,
but from the new ItemInput functionality, and it presents
its output as an ItemInput as well.

Reading from an ItemInput is simple:

class ItemInput {
public abstract boolean readItem(Cu cu) throws IOException;
public int readItems(byte[] buff, int offset, int length) throws IOException;
//..
}

The second method returns the number of bytes read. The
readItems() may look familiar: it has the same signature
as InputStream.read(buf, offset, length), and it works basically
the same way, except that the data returned is formatted
as a sequence of serialized Items. There is also a new way
to write such buffers to an 'ItemOutput' that looks just
like the OutputStream function, and there is also a
single-Item-at-a-time method:

class ItemOutput {
public abstract void insert(Cu cu) throws IOException;
public abstract void delete(Cu cu) throws IOException;
public void writeItems(byte[] buff, final int offset, int length)
throws IOException;
//..
}

It is easy to obtain an ItemOutput, because now ItemSpace
is descended from ItemOuput. Hence any writeable
ItemSpace can accept an Item stream, especially in the
serialized buffer format for extreme speed. This is
possible because an ItemOutput is 'stateless'.

An ItemInput is not 'stateless'. Hence an ItemSpace cannot
be used as an ItemInput to get serialized buffers full of
Items. Being stateless, an ItemInput cannot be used by
multiple threads - unlike any ItemSpace - without getting
results dependent on thread access patterns. Multi-thread
reading is still possible but the client must synchronize
the accesses.

There are several classes to create ItemInputs,
for example, by reading Items from an InputStream. Hence
you can simply wrap an InputStream with InputStreamItemInput,
and read Items from a file. Files can be created by
similarly using the OutputStreamItemOutput.

To extract Items quickly from an ItemSpace, use the
wrapper ItemSpaceScannerItemInput. It uses an internal
cursor to maintain is position in the ItemSpace as the
Items are streamed out. The wrapper can use a
protected prefix length and a controllable starting
point, which is the familiar pattern for normal enumeration
of an ItemSpace by iterating over a Cu cursor. Actually,
ItemSpaceScannerItemInput uses a new method on ItemSpace
that simplifies and speeds the access:

class ItemSpace {
public int nextItems(Cu cu, int pl, byte buff[], int offset, int length)
throws IOException..
//..
}

nextItems moves the cu forwards the right distance to match
the Items streamed into buff.

For the definitive information on all of these, see the new manual
page on ItemInputMergerItemInput and the Javadoc on it
as well as the Javadoc on ItemInput and ItemOutput.

For definitive information on the format of Items in a
stream, look at the javadoc on ItemPacket.

May 13, 2007

New Dynamic Bulk Loader

InfinityDB now comes with a 'dynamic bulk loader' class called ncrementalMergingItemSpace that can keep a database up-to-date with high speed while still allowing concurrent retrievals. Inserts flow in at 10 to 100 times the speed of non-local random insertions into a normal InfinityDB, and cause a minimum of seeking. The cost is that retrievals and deletes slow down - from 1 to 30 times or more. The slowdown is most noticeable with non-local access, because multiple logical ItemSpaces must be searched due to the fact that data is kept in a hierarchy of 'batches' of various sizes. For localized access, retrieval performance is much better, since once all of the blocks for each of the batches is in the cache, they may
all be sequentally or locally read without extra I/O.

For example, a stock brokerage may need to have a continuous flow of data that is immediately up-to-date, and may need to do simple queries such as access on a given customer account or transaction id. The IncrementalMergingItemSpace will reflect the Items being inserted immediately, while the queries will still run at reasonable speed, since the required 20 or so block reads will still take only a fraction of a second.

The alternative to a dynamic system like IncrementalMergingItemSpace is to load data on a schedule, such as once every day or every few hours. The data then must always be stale. Performance of the InfinityDB bulk loaders is higher than that of IncrementalMergingItemSpace, but they must build entire databases on each run.

See docs/manual/dynamic-bulk-loading.html.

January 28, 2007

New Sort Engines can be used as Bulk Loaders

InfinityDB Now comes with two high-speed Item sorters for bulk sorting of large sets of Items. The API is extremely simple, consisting of only a few basic methods and classes for use within an embedded system that also contains an InfinityDB database instance. Bulk loading data into an InfinityDB database can be accelerated dramatically by providing the Items in sorted order, so that they all go at the end of the BTree as it builds up, rather than at random positions in the database. Random access incurs slow disk seeks, but sorted input appends data directly to the end of the database file. You can see the performance improvement for yourself by running the new example programs provided with InfinityDB.

The sorters use an efficient sort/merge algorithm that requires minimal, controllable temporary space on disk. One sorter provides maximum disk space efficiency, while the other provides maximum speed. There is an 'ItemSpaceBased' sorter that keeps all temporary data in InfinityDB instances. This sorter gains from the high compression efficiency of InfinityDB itself, which is always good and can be as high as 10-fold. The 'DiskBased' sorter keeps temporary data in a very simple uncompressed form for faster, sequential transfers.

The API requires only instantiating a sorter, inserting Items into it, getting an Item reader object from the sorter, and reading back the sorted Items. (It is easy to use the sorters for virtually any kind of efficient sorting job, because an Item can contain virtually any kind of data - see the docs/manual/basic-operations.html).

For a complete description of these sorters, see the manual in docs/manual bulk-loading-and-sorting.html, as well as the Javadoc, which has extensive explanations of the sorter usage. These docs are available in the trial and deployent downloads. The classes are called ItemSpaceBasedMergeSorterItemOutput and DiskBasedMergeSorterItemOutput.

June 1, 2006

New betas: VolatileItemSpace, DeltaItemSpace, and ItemSpace set operations

Many users have asked for an 'in-memory' ItemSpace to handle fast, flexible data storage but without the need for an associated file and another background 'IndexUpdate' thread. The new VolatileItemSpace does just that. It is somewhat faster than InfinityDB in general, and much faster for small db's - in the tens or hundreds of Items. VolatileItemSpace uses the technology in the 'meta access method' inside InfinityDB for speed while exposing exactly the same client view as all other ItemSpaces, making it useful as a drop-in component in a wide variety of situations with no client code change at all.


Another new ItemSpace is DeltaItemSpace, which allows a virtual modification of a base ItemSpace, which is not modified, followed by an optional rollback of the changes or an application of the changes to the base ItemSpace. The applications for this ItemSpace are endless: for example, a transaction can use it for simple atomicity (see the new manual for more), or a GUI can provide a CANCEL button that rolls back the deltas and leaves the db unchanged.


ItemSpace now supports set operations like union() and intersection(), as well as deleteSubspace() and others. There are currently only default implementations of these, but smarter ItemSpaces can override them and increase speed for some operations. For example, InfinityDB may be able eventually to delete quickly all Items with a given prefix by overriding deleteSubspace(). The intersection() implementation is very smart, using the 'zigzag' algorithm described in docs/manual/andorspaces.html.


There is also a new manual in docs/manual/index.html. Even if you are an experienced InfinityDB user, we would appreciate feedback on it.


Prefix compression - It affects your world view

David X wrote:

> Dear Support,
>  
> Suppose I enter a number of people in the database as below:
>  
> // persons[]:  array of person names to add to the 
< database (n == numberOfPersonsInArray)
> for (i=0; i<n; i++) {
>     cu = cu.clear().append(ENTITY_TYPE).append("person").append(person[i]);
>     db.insert();
> }
>  
> Keeping in mind that .append(ENTITY_TYPE).append("person") are always 
> constant, I would expect the tree to look like the following:
>  
> //                     "ENTITY_TYPE"
> //                           |
> //                         "person"
> //                           |
> //              +------------+--------------+
> //              |            |              |
> //           person[0]    person[1]  ... person[n-1]
>  
> However, "ENTITY_TYPE" and "person" are also stored in the database 
> "n" times, not once like I expected.
>  
> I would like to know how to think about "ENTITY_TYPE" and 
"person", which I look at as singular entries, but are not.  
I assume this has to do with each append() putting a new pointer into the database.
>  
> I am not having any problems with this, because it all works fine; 
but I am wanting to build a hierarchy from this information, and 
the multiple entries for each "ENTITY_TYPE" and "person" surprised 
me (the hierarchy can still be built by ignoring duplicate entries).
>  
>  
> David
>  

Yes, thought about as a completely independent series of Items, it does look
like there is duplication of the EntityClass and Entity when  you  use next() to
scan over a subtree or 'subspace'.

I think you have to remember the way the next() and other operations work
with a protected prefix length. That protected prefix length will always cover the part of
the Items that are to be considered 'higher' in the tree the way you drew it,
so that higher part of the data does not change in the Cu. When you use next(), you are
always only looking at the changes in the Cu beyond that prefix length, and you
ignore the common prefix. So the fact that the prefix is still sitting in there unchanged
really doesn't matter practically.

>From the perspective of any ItemSpace, Items are actually only simple variable-length
char sequences kept in order, but they are basically
always stored in the database with prefixes shared between adjacent Items
compressed out at the character level. This compression is transparent, though, so
there is no way any user of an ItemSpace can know that prefix
compression is going on. However, it would just be so ineffiicient to store
the common prefixes that you get used to assuming that that
prefix compression is going on, and the apparent duplication of the prefixes
when you just list out the Items in sequence doesn't bother you after a while.
The ItemEditor shows the apparent duplications as well, and you just have to look
past that.

So there actually aren't any special internal pointers or other tricky stuff
going on. An ItemSpace is just an ordered sequence of Items, each Item  is
a variable-length sequence of up to 1600 chars [which is not a significant
limit due to the ability always to structure data in ways that avoids 
the problem - ed]. The higher-level meaning
of Items as a series of components is actually not something any ItemSpace knows
about. That's why it's so easy to invent new ItemSpaces, like the DeltaItemSpace
or the VolatileItemSpace in the latest release. The concept of components is
actually totally restricted to the append(), skipXX(), and other operations on Cu.
(In fact you aren't even restricted to use those component-oriented
operations: you could invent your own meaning of the chars in the
Item by using raw operations like Cu.appendChar(c) and so on.)

One way you would be concerned about the apparent duplication of initial prefixes would be
if you wrote a simple database dumper that scanned all the Items in the db using
db.next(cu,0) and then wrote each Item out as a raw sequence of chars. There
would be no compression of shared adjacent prefixes, and the output would be
rather inefficient. It would be easy to fix that dumper to output only the
suffixes that actually change from Item to Item. That would be as efficient as
the InfinityDB internal compression.

I agree with you now that I think about it that that apparent duplication of
prefixes is a bit disconcerting at first. But the whole Idea of an EAV-structured
database, or even most of the other ItemSpace-based structures you might use, depend on
the internal compression inside the ItemSpace implementation for efficiency.

Good point.

Roger

May 19, 2006

Nio File Channel Locking is Now the Default

On the latest version of InfinityDB, as of 5/18/06, the
Java nio channel locking feature has become the default.
From now on, if InfinityDB is running on a JVM 1.4 or
later, files will be locked using the native locks as
described in the previous blog.

February 19, 2006

New File Locking

The latest version of InfinityDB includes an option for InfinitydDB.open() and InfinityDB.create() that provides Java 1.4 nio native file locking. These options are recommended for all future development in order to prevent sharing of InfinityDB files by multiple processes.

The nio native file locks solve a long-standing problem for any application that needs to prevent inter-process concurrent access to file data. Simple solutions that have been tried in the past, such as using a separate lock file, have virtually all failed in some circumstances, although it has been reported that a certain technique using File.rename() actually does work. If you are using a custom scheme, please try to switch to the new system.

There is no change to the existing behavior of any InfinityDB API. If you are using InfinityDB in a J2ME environment, or anywhere else that does not allow Java 1.4, such as in memory-constrained devices, the nio locking is automatically bypassed.

January 22, 2006

Overlapping Commit in IDB 1.0.40

The new online trial download of InfinityDB is v1.0.40, and it contains a new performance enhancement. If you activate it using InfinityDB.setOverlappingCommits(true), you will see a large performance improvement in situations where multiple Threads continuously do multiple inserts or deletes followed quickly by commits. In this case, commit passes can overlap, thereby executing the commits for most or all of the threads at the same time. Previous versions of InfinityDB always serialized commits.


November 5, 2005

Compare Java Edition with InfinityDB

The trial download now includes testers that enable direct comparisons between two embeddable database engines: Sleepy Cat Java Edition and InfinityDB. Using compressible data, the tests show that InfinityDB is 13 to 22 times faster and from 37 to 100 times more space efficient because of its unique dynamic data compression. Our clients typically subject the Infinity Database Engine B*Tree to performance, stress, and scalability tests and prove for themselves that it outperforms the Java edition. They choose InfinityDB for performance, incorruptibility, compression, and simplicity, whether it is to be embedded in server applications; throughout a distributed system for storage of configuration data; or within a PDA application.

We appreciate our customers who do this work for themselves, but now we offer our own testers, to speed the initial evaluation. These tests support our claims, while revealing how to code with our API. Access the download link from our home page.

InfinityDB is cost effective whether you are comparing it with the Java Edition or to rolling your own in-house data store. Its simple API and single file architecture simplifies the design and implementation phase; its incorruptibility eliminates support issues; its high performance enables the most powerful server applications; its extreme compressibility saves hardware; and its small footprint makes it embeddable in PDA's.

We are happy to recommend the best strategy for employing our flexible data model for your particular circumstances -- we even provide up to three free hours of support.

Check out our white papers and the other blog entries.

We look forward to hearing from you.
Jennifer Douglas
Licensing and Support Director