Object Persistence

Previous Next

Use of BinaryLongObjects

It is easy to persist Objects in an InfinityDB by using the existing serialization system. The Object can just be written to a BinaryLongObject using obvious code.
	// db is an existing ItemSpace, such as the InfinityDB itself
	cu.clear().append("any prefix that identifies the serialized object");
	BinaryLongObjectOutputStream bos = new BinaryLongObjectOutputStream(db, cu);
	ObjectOutputStream oos = new ObjectOutputStream(bos);
	oos.writeObject(myObject);
	oos.close();
This will handle recursive structure enumerations and deal with looping structures and so on.

Re-use of BinaryLongObject Streams

The BinaryLongObjectOutputStream contains a Cu, which is not something we want to garbage collect frequently (it is about 3KB) so we want to re-use it. We can set keepResourcesOnClose() on the BinaryLongObjectOutputStream, and then change its ItemSpace and prefix:

	bos.setKeepResourcesOnClose(true);
	...
	oos.close();
	..
	cu.clear().append(objectPrefix2);
	bos.setSpaceAndPrefix(db, cu);
	ObjectOutputStream oos = new ObjectOutputStream(bos); // can we reuse this too?
	oos.writeObject(myObject2);
	oos.close();
The BLOB method is efficient because each serialized Object takes only the space it really needs (see BLOB efficiency). However, there is no way to do navigation or modification in-place, or to store very large graphs using the Java serialization mechanism.

Abstract Datatypes

To encapsulate the persistence of small Objects, or to persist Objects that represent composites such as composite keys or short records, we can create an abstract datatype by implementing the ItemHolder interface. The 'serialized' form of such objects must fit in a single Item, so they cannot be more than 1.6KChars long (note that the representations of the primitives in an Item is variable-length). The appendTo(Cu) method is for getting the serialized form out of an ItemHolder, and setItem(Cu cu, int off) is for putting a serialized Item into an ItemHolder. There is also a setItem(Object o) which can accept another compatible class and set the internal state to match. An Example abstract datatype class follows.
class PersonId implements ItemHolder {
    String firstName;
    String lastName;
    long birthDate;

    public PersonId() {}

    public PersonId(String firstName, String lastName, long birthDate) {
        this.firstName  = firstName;
        this.lastName   = lastName;
        this.birthDate  = birthDate;
    }

    // Implement ItemHolder
    // This parses the components out of a Cu into member variables

    public int setItem(Cu cu, int off) {
        firstName = cu.stringAt(off);
        off = cu.skipString(off);
        lastName = cu.stringAt(off);
        off = cu.skipString(off);
        // timeAt() not longAt() so Cu.toString() shows it as a date etc
        birthDate = cu.timeAt(off);
        off = cu.skipTime(off);
        return off;
    }

    // Implement ItemHolder

    public void setItem(Object o) {
        PersonId p  = (PersonId) o;
        firstName   = p.firstName;
        lastName    = p.lastName;
        birthDate   = p.birthDate;
    }

    // Implement CuAppendable, the superinterface of ItemHolder

    public void appendTo(Cu cu) {
        // Note the appendTime() not append().
        cu.append(firstName).append(lastName).appendTime(birthDate);
    }
}
The above code is completely surprise-free (in the future it may be automatically generated). The three ItemHolder methods can easily be added to an existing class. Now we can append a PersonId to a Cu anywhere, as if it were a Java primitive, such as in this code that persists it and creates an inversion using it:
    void insertCardId(ItemSpace db, PersonId personId, long cardId) throws IOException {
        // Append the person to any prefix, like this EAV prefix
        Cu cu = Cu.alloc().append(CREDIT_CARD).append(cardId)
            .append(CARD_HOLDER).append(personId);
        db.insert(cu); // insert into db in one operation
        // and we can use a Person as an Entity too in an inverse Item:
        cu.append(PERSON).append(personId).append(HOLDS_CARD).append(cardId);
        db.insert(cu);
    }
And to get it back we use some typical code:
    PersonId getPersonIdForCardId(ItemSpace db, long cardId) throws IOException {
        Cu cu = Cu.alloc().append(CREDIT_CARD).append(cardId)
                          .append(CARD_HOLDER);
        int pl = cu.length();
        PersonId personId = null; // we handle a missing personId Item
        if (db.next(cu, pl)) {
            personId = new PersonId();
            personId.setItem(cu, pl); // parse the Item in cu into personId
        }
        cu.dispose();
        return personId;
    }

Extensible Abstract Datatypes

It is possible for an abstract datatype implementing ItemHolder to be 'exensible', in that there can be optional components at the end of the serialized form in an Item. The Object can parse itself out of a Cu in a way that is conditional on the components it sees in the input Item in int setItem(Cu, off), and also it can conditionally append components to the Item that is serialized in appendTo(Cu). For example, the application may be extended to allow a Person optionally to include a unique identifier (a social security number or SSN in the US) rather than to rely on the name and birthdate. For this we just add some code to the ItemHolder implementation methods:

/**
 * A PersonId with an optional SSN extension component. This code just shows the
 * new parts beyond PersonId. The sort order is unchanged.
 */
class PersonIdWithSsn extends PersonId {
    // ..
    long ssn; // The optional extension to firstName, lastName, and birthDate

    // ..
    public PersonIdWithSsn(String firstName, String lastName, long birthDate, long ssn) {
        // ..
        this.ssn = ssn;
    }

    // Implement ItemHolder

    public int setItem(Cu cu, int off) {
        // ..
        ssn = 0;
        if (cu.length() > off && cu.typeAt(off) == Cu.LONG_TYPE) {
            ssn = cu.longAt(off);
            off = cu.skipLong(off);
        }
        return off;
    }

    // Implement ItemHolder

    public void setItem(Object o) {
        PersonId p = (PersonId) o;
        // ..
        if (p instanceof PersonIdWithSsn) {
            ssn = ((PersonIdWithSsn) p).ssn;
        }
    }

    // Implement CuAppendable, the superinterface of ItemHolder

    public void appendTo(Cu cu) {
        cu.append(firstName).append(lastName).append(birthDate);
        if (ssn != 0)
            cu.append(ssn);
    }
}

This abstract datatype is backwards-compatible with older databases, and it dynamically upgrades the databases it finds, but only as necessary when the SSN is not 0. Older applications can continue to work with upgraded databases, but if they attempt to write to an upgraded database, the written Person's SSN's will disappear. If this is not acceptable, the older version of the application can just be limited to read-access, and it will work properly.

For forwards compatibility as well, the original implementation of PersonId can deal genericallly with whatever components it finds following the components it expects. Generic access is easy using Object Cu.componentAt(int off), int Cu.typeAt(int off), and Cu.append(Object o). Thus the original PersonId could be forwards and backwards compatible if it anticipated some kind of extension. PersonId could even allocate and keep a special internal Cu in which to hold unexpected trailing components, or it could just have a few extension Object class member variables, one per extension component.

Heterogenous Abstract Datatypes

It is also possible to have an abstract datatype use mixtures of completely different or 'heterogenous' patterns of components beyond only adding components at the end.

An example heterogenous abstract datatype is a StreetAddress. The first component will identify a country. After the country are components that depend on the address structure for each country. There can be more or fewer address lines, and in different countries there will be different types of components for postal codes: in the US the postal code is the 'zipcode', which is numeric. In other countries the postal code may be a string or even a composite.

If a database is upgraded to have a more complex heterogenous abstract datatype, forwards compatibility can be provided as described above. Databases can be extended dynamically as needed.

The extensibility described above is not possible with typical fixed bytecode-rewriter persistency systems.

Encapsulating Database Access

We can also encapsulate the basic database access in a class to provide type safety and to hide the shape of the persisted data. Here is a Person class with a parent/child relationship and Iterators over the entire set of people as well as over the parents or children of a specific person. We also have a set() and a get() that persist or retrieve all of the single-valued fields. The client can set up the Person using getters and setters for the individual single-valued fields, then use set() to persist all of them. The get() sets all of the single-valued fields from the stored Person having the given PersonId.

class Person {
    static final EntityClass PERSON = new EntityClass(3);
    // Inverse of P_PARENTS
    static final Attribute P_CHILDREN = new Attribute(33);
    // Inverse of P_CHILDREN
    static final Attribute P_PARENTS = new Attribute(34);
    static final Attribute P_AGE_AND_STREET = new Attribute(35);
    /*
     * We omit setters/getters for simplicity and just make the fields public
     */
    public ItemSpace db; // database holding Persons
    public PersonId id; // Entity, i.e. Key or ID
    public long age;
    public StreetAddress streetAddress;

    Person(ItemSpace db) {
        this.db = db;
    }

    Person(ItemSpace db, PersonId id) throws IOException {
        this.db = db;
        this.id = id;
    }

    /*
     * A convenient multi-parameter constructor.
     */
    Person(ItemSpace db, PersonId id, long age, StreetAddress streetAddress) {
        this.db = db;
        this.id = id;
        this.age = age;
        this.streetAddress = streetAddress;
    }

    /*
     * Retrieve all single-valued fields. We store all fields in one Item for
     * speed.
     */
    Person get() throws IOException {
        Cu cu = Cu.alloc().append(PERSON).append(id).append(P_AGE_AND_STREET);
        try {
            int off = cu.length();
            if (db.next(cu, off)) {
                age = cu.longAt(off); // Parse components out of Item in cu
                off = cu.skipLong(off);
                streetAddress = new StreetAddress();
                streetAddress.setItem(cu, off); // returns off after address
            }
        } finally {
            cu.dispose();
        }
        return this;
    }

    /*
     * Persist all single-valued fields.
     */
    Person set() throws IOException {
        Cu cu = Cu.alloc().append(PERSON).append(id).append(P_AGE_AND_STREET);
        int off = cu.length();
        cu.append(age).append(streetAddress);
        try {
            db.update(cu, off); // deletes previous Item, inserts new
        } finally {
            cu.dispose();
        }
        return this;
    }

    /*
     * We hide the internal inversion that allows iteration by either parents or
     * children. There is one Item inserted per direction.
     */
    public void insertChild(PersonId child) throws IOException {
        Cu cu = Cu.alloc().append(PERSON).append(id).append(P_CHILDREN).append(child);
        db.insert(cu);
        // Invert
        cu.clear().append(PERSON).append(child).append(P_PARENTS).append(id);
        db.insert(cu);
        cu.dispose();
    }

    /*
     * Iterate over all PersonIds
     */
    ItemSpaceIterator iterator() {
        return new ItemSpaceIterator(db, PERSON);
    }

    ItemSpaceIterator getChildren() {
        return new ItemSpaceIterator(db, PERSON, id, P_CHILDREN);
    }

    ItemSpaceIterator getParents() {
        return new ItemSpaceIterator(db, PERSON, id, P_PARENTS);
    }
}
The above encapsulated code will evolve as the database evolves, adding inversions, single and multi-value fields, CharacterLongObjects, BinaryLongObjects, new and extended abstract datatypes and so on. The stored Person may grow into multiple Items or continue by extending the single Item for speed. The code can be backwards compatible with older databases by making the new components at the end optional. Any new conditional components in the PersonId or the StreetAddress abstract datatypes will be encapsulated in those classes and will not affect Person.

Separator Attributes

Note, however, that if we append a new component like 'income' after an extended StreetAddress that adds an optional zipcode, there will be an ambiguity: the income will be 'absorbed' by the preceding StreetAddress and will be considered a zipcode. To avoid this, we can define a new Attribute P_INCOME and place it before the new income component to delimit it from the preceding StreetAddress. This is an example of one purpose of the Attribute class component: it is a delimiter. Attribute components also come after the entity, e.g. the PersonId, so the PersonId is already delimited in this way and can always be extended with optional components.

Using ItemSpaceIterator

When we are iterating the multi-value children attribute, Person.getChildren() will return an ItemSpaceIterator that encapsulates the database access. However, we cannot use the basic Iterator.next() method because it can only return simple values like Long, String, small char arrays and all of the other 'primitives' or other simple components. So, we just use ItemSpaceIterator.next(ItemHolder) instead, with a PersonId as the ItemHolder:
    void printChildren(Person parent) throws IOException {
        PersonId child = new PersonId(); // this receives each child in order
        ItemSpaceIterator children = parent.getChildren();
        while (children.hasNext()) {
             children.next(child); // sets child 
            // use child..
        }
        children.dispose();
    }
The special ItemSpaceIterator.next(ItemHolder) avoids casting the return value coming from a normal Iterator's next(), and it avoids the construction and GC of the return value. Thus the while loop can be very fast. Note that the dispose() of an ItemSpaceIterator - such as in printChildren() can run at about 3.3M per second, while a GC of the iterator happens at about 500K per second. Since the database access will run at about 250K Items/second, only performance-critical code needs to be concerned with ItemSpaceIterator.dispose().

Previous Next


Copyright © 1997-2006 Boiler Bay.