Example code launch scripts are in installation-root/examples and source is in installation-root/src/com/infinitydb/examples. Start with HelloWorld.java, then see DemoSnippets.java or PatientExample.java. This document is independent of the examples. Also, "ItemSpace Data Structures.pdf" will be interesting as it explains some possible mappings between the base API and upper layers.
import com.infinitydb.InfinityDB; .. // create a db String fileName = "c:/temp/testdb.idb"; InfinityDB db = InfinityDB.create(fileName, true/*overwrite if exists*/); ... // or, if a db file already exists: ... InfinityDB db = InfinityDB.open(fileName, true/*allow updates*/);If desired, the default cache size of 2.5MB can be overridden on in the
create()
or
open()
as well. The cache contains copies
in memory of disk blocks from the database file that are in
frequent use or which have just been created. The cache grows
as needed but will not exceed the specified size. A bigger
cache improves performance at the expense of memory space.
Later, the code will just refer to db
, meaning
a database opened or created this way.
Cu cu = Cu.alloc(); try { // use cu in various ways to do database updates and retrievals } finally { cu.dispose(); // this is optional, for speed }It is not necessary to dispose a Cu, but it is about 3KB in size, so if garbage collection is to be minimized, it is recommended to do so. The try .. finally block is therefore also optional. Java 1.4 JVMs can GC a Cu in about 15K cycles, which is comparable to the basic InfinityDB operations, while the alloc()/dispose() takes about 1600 cycles. There is a pool of Cu's, whose size can be adusted with
Cu.setPoolSize(int newPoolSize)
, the default
being 20 Cu's. The pool size need not be larger than the
maximum instantaneous number of allocations.
The pool starts empty and grows as needed to
the limit, so setting a high limit is not
necessarily wasteful.
A Cu can only be used by one Thread at a time, although any
number of Threads can be using the database with different Cu's at
a time. (To detect whether an application is accidentally
sharing Cu's, use InfinityDB.setCheckForConcurrentCuModification(true)
.)
A single Thread will often allocate multiple Cu's at a time,
but they are almost always only used for a short
time before being disposed.
cu.clear().append("hello world"); // Temporarily use Cu. db.insert(cu); // Now db contains one more 'Item'Delete works identically - just substitute
delete(cu)
,
and the data in cu will be removed from the database.
insert()
and delete()
operations
do not create records with fields as in most databases. The
semantics of an InfinityDB are more flexible and simple, providing
a base for the equivalent of record-oriented access, as well
as for creative structures of many kinds. The lowest-level
InfinityDB data model is the
ItemSpace. An ItemSpace is a container for a set of
Items. An Item is a simple variable-length
char sequence, which is similar to a key in a record-oriented
model, but an Item combines the key and the data into one unit.
An Item can be up to 1.6K chars long, but this length
limitation does not create actual limits on the
size of anything to be stored, because in practice data can
always be broken down into a set of Items. An Item is
not an Object, only a conceptual sequence of chars.
Two Items with the same sequence of chars are identical.
The Items in an ItemSpace are kept in ascending order; for comparison purposes, the initial chars of an Item are most significant, and an Item that is a prefix of another is smallest. There is no inherent limit to the number of Items in an ItemSpace. There is no other information in an ItemSpace beyond the Items themselves: there is no separate 'data' or 'record' attached to an Item. There is no flat file or other separate data store. Only the Items in the ItemSpace, which is in one InfinityDB file, contain all of the information. An ItemSpace can also be empty.
A Cu cursor contains no information other than a single Item and is not connected to a particular ItemSpace in any way. Since an Item is not an Object, but just a conceptual sequence of chars, it can live equally well either in an ItemSpace or in a Cu or anywhere else, such as in a char[] or, less often, a byte[]. The Item in a Cu does not need to be in the ItemSpace.
The Items in an ItemSpace can be inserted and deleted randomly and can be retrieved randomly or sequentially in both directions by using a Cu. At the ItemSpace level, there are no semantics attached to the Items, and the fundamental operations are very simple. The simplicity of this lowest-level data view allows the InfinityDB engine to concentrate on performance, while semantics are applied by upper layers.
In order for this basic data model to be useful, the sorted sequence of Items in the ItemSpace needs to be organized. This is partially accomplished by keeping related Items together by sharing a prefix of some kind: this is a very general concept and it shows up in many higher-level structures. Another organizational idea is the component which is described next.
cu.clear().append("a string").append(5).append(2.6).append(true); System.out.println(cu); // prints: // "a string" 5 2.6 true // This is not what StringBuffer would print!The Cu is cleared to make sure it is empty, then the variable-length component representations of various primitives are concatenated together inside the Cu to create a single Item. Most Cu methods return
this
so they can be chained. Each primitive takes
a variable number of
chars in the Item, so a long
component,
for example, will take only one char to represent a 0 or 9
and so on, but will take more chars for larger numbers.
If a component to be appended would not fit, CursorLengthException
is thrown and the Cu is not changed. CursorLengthExceptions are
extremely rare in practice.
Now we can parse back these primitives:
int offset = 0; String s = cu.stringAt(offset); // get the initial string component offset = cu.skipComponent(offset); // parse over the string long n = cu.longAt(offset); // get back the long offset = cu.skipComponent(offset); // skip the long double d = cu.doubleAt(offset); offset = cu.skipComponent(offset); boolean b = cu.booleanAt(offset);The append() method may appear similar to that of java.lang.StringBuffer, but the chars inside the Cu cursor do not comprise a single printable String but a sequence of self-describing, self-delimiting binary-encoded elements.
Given the Java primitives, a wide range of applications of
the ItemSpace model is possible, but there are other
component types that help in creating further semantic levels,
which will be described in further sections.
For a complete description of each component type, see the javadoc for
Cu
. Also note the
Component Type Printable Representations
section.
For now, lets investigate getting back information
from a set of component-structured Items in the ItemSpace db.
cu.clear().append("customers").append("mary"); db.insert(cu); cu.clear().append("customers").append("john"); db.insert(cu); // and so on. Now retrieve the set of customer names cu.clear().append("customers"); // the prefix for all customers int prefixLength = cu.length(); // the size of the Item in cu while (db.next(cu, prefixLength)) { System.out.println(cu.stringAt(prefixLength)); }The
next(Cu, int)
will scan forwards over Items in the db
having the prefix "customers". The Cu parameter is modified by
next()
but the prefixLength parameter indicates
how many chars at the start of the Item in Cu are to be protected
from alteration by the operation. When the next Item in
sequence in the ItemSpace has a different prefix,
next()
returns false without modifying the Item
in cu. This technique can be
used to enumerate anything inside a subspace or set of
Items having any common prefix.
It is important to note here that it is very efficient in any normal ItemSpace to store Items with shared prefixes: the shared prefixes are compressed out (with character granularity). This is only an implementation detail, but it nevertheless affects how we think about organizing the entire database. We often store huge numbers of Items with sometimes long common prefixes.
There are also operations for scanning backwards, and for scanning without skipping the given Item if present in the db. All of these operations interpret the two parameters the same way, as described above. The operations are:
method | direction | skip |
---|---|---|
boolean next(Cu cu, int protectedPrefixLength) throws IOException | forwards | given Item skipped if present |
boolean first(Cu cu, int protectedPrefixLength) throws IOException | forwards | given Item not skipped if present |
boolean previous(Cu cu, int protectedPrefixLength) throws IOException | backwards | given Item skipped if present |
boolean last(Cu cu, int protectedPrefixLength) throws IOException | backwards | given Item not skipped if present |
// We have a customerId of any primitive type. Create and save a record for it. long customerId = 1000; long zipCode = 91633; String spouseName = "Cynthia"; double balance = 918.31; .. cu.append("customer table").append(customerId).append(zipCode).append(spouseName).append(balance); db.insert(cu); // create a single 'record' in the "customer table" // we have ignored deleting a previous customer record if necessaryNow the db contains a customer row. Let's get it back:
printCustomer(long customerId) { Cu cu = cu.alloc().append("customer table").append(customerId); // the prefix int protectedPrefixLength = cu.length(); // how many chars to preserve at start of cu if (db.next(cu, protectedPrefixLength)) { int offset = protectedPrefixLength; // offset moves rightwards through the record long zipCode = cu.longAt(offset); // get long component's value at offset in the record int offset = cu.skipLong(offset); // move over zipCode to spouseName String spouseName = cu.stringAt(offset); // get string component's value at offset in the record offset = cu.skipString(); double balance = cu.doubleAt(offset); // do something with zipCode, spouseName, balance, etc. } else .. // record not present. Cu not modified }
Now suppose we want to access by zipCode. We need to insert Items with a different sequence of components that will sort by zipCode. Here is how to insert them:
// Create and save a zipCode index entry on customers. // Do this when the customer row is inserted. cu.append("customers by zipcode index").append(zipCode).append(customerId); db.insert(cu);Now let's retrieve all customers having a given zipCode. Similar code has already appeared above in enumerating customers, but here it is used with an index:
// loop over customers in the index given zipCode cu.clear().append("customers by zipcode index").append(zipCode); int prefixLength = cu.length(); // the size of the Item in cu while (db.next(cu, prefixLength)) { long customerId = cu.longAt(prefixLength); printCustomer(customerId); // method defined above }Most of the work done above can be factored out into helper methods or encapsulated. This is a simple system that will work very effectively and efficiently, representing records directly. There are a great many possible data representations that can be imposed on the ItemSpace model. Another more flexible model, a superset of the relational data model, is called Entity-Attribute-Value'.