Basic InfinityDB Operations

  1. Introduction
  2. Creating and opening a database
  3. Allocating and disposing a Cu cursor
  4. Updates: insert() and delete()
  5. The ItemSpace data model
  6. The contents of an Item: components
  7. The retrieval operations
Previous Next

Introduction

This section describes the lowest-level InfinityDB Database Engine operations. The API is extremely simple, with only a few basic operations, but it is easy to build complete complex applications by wrapping these operations in higher-layer code, as we will show later. Because the API is so simple, it can be learned very quickly. API simplicity also makes application code independent of the Engine implementation, and there is no substantial development investment needed to experiment with it to prove its reliability and performance. Coding is quicker, safer, and easier. Debugging is easier because the internal state of the database is well- defined and limited in complexity, and there is no mysterious internal logic. There are no significant capability limitations such as a maximum number of fields, record size, number of indexes, or database size or block cache size.

Example code launch scripts are in installation-root/examples and source is in installation-root/src/com/infinitydb/examples. Start with HelloWorld.java, then see DemoSnippets.java or PatientExample.java. This document is independent of the examples. Also, "ItemSpace Data Structures.pdf" will be interesting as it explains some possible mappings between the base API and upper layers.  

Creating and opening a database

An InfinityDB database resides in a single file. To create or open a database only one method is needed:
import com.infinitydb.InfinityDB;
	..
	// create a db
    String fileName = "c:/temp/testdb.idb";
    InfinityDB db = InfinityDB.create(fileName, true/*overwrite if exists*/);
    ...    
    // or, if a db file already exists:
    ...
    InfinityDB db = InfinityDB.open(fileName, true/*allow updates*/);
If desired, the default cache size of 2.5MB can be overridden on in the create() or open() as well. The cache contains copies in memory of disk blocks from the database file that are in frequent use or which have just been created. The cache grows as needed but will not exceed the specified size. A bigger cache improves performance at the expense of memory space.

Later, the code will just refer to db, meaning a database opened or created this way.  

Allocating and disposing of a Cu cursor

All database update and retrieval operations require a Cu to be allocated. The typical code, which will not be repeated elsewhere, looks like:
    Cu cu = Cu.alloc();
    try {
        // use cu in various ways to do database updates and retrievals
    } finally {
        cu.dispose(); // this is optional, for speed
    }
It is not necessary to dispose a Cu, but it is about 3KB in size, so if garbage collection is to be minimized, it is recommended to do so. The try .. finally block is therefore also optional. Java 1.4 JVMs can GC a Cu in about 15K cycles, which is comparable to the basic InfinityDB operations, while the alloc()/dispose() takes about 1600 cycles. There is a pool of Cu's, whose size can be adusted with Cu.setPoolSize(int newPoolSize), the default being 20 Cu's. The pool size need not be larger than the maximum instantaneous number of allocations. The pool starts empty and grows as needed to the limit, so setting a high limit is not necessarily wasteful.

A Cu can only be used by one Thread at a time, although any number of Threads can be using the database with different Cu's at a time. (To detect whether an application is accidentally sharing Cu's, use InfinityDB.setCheckForConcurrentCuModification(true).) A single Thread will often allocate multiple Cu's at a time, but they are almost always only used for a short time before being disposed.  

Updates: insert() and delete()

The database update operations are very simple. A Cu is initialized with data and is inserted into the database in two operations:
    cu.clear().append("hello world"); // Temporarily use Cu.
    db.insert(cu); // Now db contains one more 'Item'
Delete works identically - just substitute delete(cu), and the data in cu will be removed from the database.  

The ItemSpace data model

The insert() and delete() operations do not create records with fields as in most databases. The semantics of an InfinityDB are more flexible and simple, providing a base for the equivalent of record-oriented access, as well as for creative structures of many kinds. The lowest-level InfinityDB data model is the ItemSpace. An ItemSpace is a container for a set of Items. An Item is a simple variable-length char sequence, which is similar to a key in a record-oriented model, but an Item combines the key and the data into one unit. An Item can be up to 1.6K chars long, but this length limitation does not create actual limits on the size of anything to be stored, because in practice data can always be broken down into a set of Items. An Item is not an Object, only a conceptual sequence of chars. Two Items with the same sequence of chars are identical.

The Items in an ItemSpace are kept in ascending order; for comparison purposes, the initial chars of an Item are most significant, and an Item that is a prefix of another is smallest. There is no inherent limit to the number of Items in an ItemSpace. There is no other information in an ItemSpace beyond the Items themselves: there is no separate 'data' or 'record' attached to an Item. There is no flat file or other separate data store. Only the Items in the ItemSpace, which is in one InfinityDB file, contain all of the information. An ItemSpace can also be empty.

A Cu cursor contains no information other than a single Item and is not connected to a particular ItemSpace in any way. Since an Item is not an Object, but just a conceptual sequence of chars, it can live equally well either in an ItemSpace or in a Cu or anywhere else, such as in a char[] or, less often, a byte[]. The Item in a Cu does not need to be in the ItemSpace.

The Items in an ItemSpace can be inserted and deleted randomly and can be retrieved randomly or sequentially in both directions by using a Cu. At the ItemSpace level, there are no semantics attached to the Items, and the fundamental operations are very simple. The simplicity of this lowest-level data view allows the InfinityDB engine to concentrate on performance, while semantics are applied by upper layers.

In order for this basic data model to be useful, the sorted sequence of Items in the ItemSpace needs to be organized. This is partially accomplished by keeping related Items together by sharing a prefix of some kind: this is a very general concept and it shows up in many higher-level structures. Another organizational idea is the component which is described next.  

The contents of an Item: Components

In order for application code easily to recognize the meaning of a given Item, the sequence of chars in an Item are interpreted by the application, with the help of methods in the Cu class, as a sequence of components. The components of an Item are self-delimiting and variable-length representations of the Java primitives and similar basic classes. (Actually, only the widest promotions are used, so byte, short, char, and int are stored efficiently as long.) Hence an Item can be considered as a sequence of Java primitives (and more component types described later). The type of each component can be determined at runtime, and the offsets of the components in the Item can be determined by parsing them from the start of the Item. Here is the code to create an Item in a Cu containing some Java primitive components:
    cu.clear().append("a string").append(5).append(2.6).append(true);
    System.out.println(cu);
    // prints:
    // "a string" 5 2.6 true
    // This is not what StringBuffer would print!
The Cu is cleared to make sure it is empty, then the variable-length component representations of various primitives are concatenated together inside the Cu to create a single Item. Most Cu methods return this so they can be chained. Each primitive takes a variable number of chars in the Item, so a long component, for example, will take only one char to represent a 0 or 9 and so on, but will take more chars for larger numbers. If a component to be appended would not fit, CursorLengthException is thrown and the Cu is not changed. CursorLengthExceptions are extremely rare in practice.

Now we can parse back these primitives:

    int offset = 0;
    String s = cu.stringAt(offset); // get the initial string component
    offset = cu.skipComponent(offset); // parse over the string
    long n = cu.longAt(offset); // get back the long
    offset = cu.skipComponent(offset); // skip the long
    double d = cu.doubleAt(offset);
    offset = cu.skipComponent(offset);
    boolean b = cu.booleanAt(offset);
The append() method may appear similar to that of java.lang.StringBuffer, but the chars inside the Cu cursor do not comprise a single printable String but a sequence of self-describing, self-delimiting binary-encoded elements.

Given the Java primitives, a wide range of applications of the ItemSpace model is possible, but there are other component types that help in creating further semantic levels, which will be described in further sections. For a complete description of each component type, see the javadoc for Cu. Also note the Component Type Printable Representations section. For now, lets investigate getting back information from a set of component-structured Items in the ItemSpace db.  

The simple retrieval operations

Retrieval operations all operate on a subset of the Items in the ItemSpace that have a common prefix. Below we insert a set of Items with a common prefix, then scan them back.
    cu.clear().append("customers").append("mary");
    db.insert(cu);
    cu.clear().append("customers").append("john");
    db.insert(cu);
    
    // and so on. Now retrieve the set of customer names
    
    cu.clear().append("customers"); // the prefix for all customers
    int prefixLength = cu.length(); // the size of the Item in cu
    while (db.next(cu, prefixLength)) {
        System.out.println(cu.stringAt(prefixLength));
    }
The next(Cu, int) will scan forwards over Items in the db having the prefix "customers". The Cu parameter is modified by next() but the prefixLength parameter indicates how many chars at the start of the Item in Cu are to be protected from alteration by the operation. When the next Item in sequence in the ItemSpace has a different prefix, next() returns false without modifying the Item in cu. This technique can be used to enumerate anything inside a subspace or set of Items having any common prefix.

It is important to note here that it is very efficient in any normal ItemSpace to store Items with shared prefixes: the shared prefixes are compressed out (with character granularity). This is only an implementation detail, but it nevertheless affects how we think about organizing the entire database. We often store huge numbers of Items with sometimes long common prefixes.

There are also operations for scanning backwards, and for scanning without skipping the given Item if present in the db. All of these operations interpret the two parameters the same way, as described above. The operations are:

methoddirectionskip
boolean next(Cu cu, int protectedPrefixLength) throws IOExceptionforwardsgiven Item skipped if present
boolean first(Cu cu, int protectedPrefixLength) throws IOExceptionforwardsgiven Item not skipped if present
boolean previous(Cu cu, int protectedPrefixLength) throws IOExceptionbackwardsgiven Item skipped if present
boolean last(Cu cu, int protectedPrefixLength) throws IOExceptionbackwardsgiven Item not skipped if present
 

Getting back a Record

Now that we can enumerate a set of customers, how do we store and retrieve the customer 'records' and the field values in them? Below is one way that is fast and simple. Here we simply concatenate the various record fields onto the customer table prefix and the customer id to create a single Item and store it in one operation. We can do a primary-key-based retrieval in one operation as well. The first component in the Item we will use as a 'table name', the second as the 'key', and the rest as the 'fields' of the record.
	// We have a customerId of any primitive type. Create and save a record for it.
	long customerId = 1000;
	long zipCode = 91633;
	String spouseName = "Cynthia";
	double balance = 918.31;
	..
	cu.append("customer table").append(customerId).append(zipCode).append(spouseName).append(balance);
	db.insert(cu); // create a single 'record' in the "customer table"
	// we have ignored deleting a previous customer record if necessary
Now the db contains a customer row. Let's get it back:
	printCustomer(long customerId) {
		Cu cu = cu.alloc().append("customer table").append(customerId); // the prefix
		int protectedPrefixLength = cu.length(); // how many chars to preserve at start of cu
		if (db.next(cu, protectedPrefixLength)) {
			int offset = protectedPrefixLength; // offset moves rightwards through the record
			long zipCode = cu.longAt(offset); // get long component's value at offset in the record
			int offset = cu.skipLong(offset); // move over zipCode to spouseName
			String spouseName = cu.stringAt(offset); // get string component's value at offset in the record
			offset = cu.skipString();
			double balance = cu.doubleAt(offset);
			// do something with zipCode, spouseName, balance, etc. 
		} else .. // record not present. Cu not modified
	}

Now suppose we want to access by zipCode. We need to insert Items with a different sequence of components that will sort by zipCode. Here is how to insert them:

	// Create and save a zipCode index entry on customers.
	// Do this when the customer row is inserted.
	cu.append("customers by zipcode index").append(zipCode).append(customerId);
	db.insert(cu);
Now let's retrieve all customers having a given zipCode. Similar code has already appeared above in enumerating customers, but here it is used with an index:
	// loop over customers in the index given zipCode
    cu.clear().append("customers by zipcode index").append(zipCode);
    int prefixLength = cu.length(); // the size of the Item in cu
    while (db.next(cu, prefixLength)) {
    	long customerId = cu.longAt(prefixLength);
    	printCustomer(customerId); // method defined above
    }
Most of the work done above can be factored out into helper methods or encapsulated. This is a simple system that will work very effectively and efficiently, representing records directly. There are a great many possible data representations that can be imposed on the ItemSpace model. Another more flexible model, a superset of the relational data model, is called Entity-Attribute-Value'.

Previous Next


Copyright © 1997-2006 Boiler Bay.