Index Components for Storing Large Arrays

Previous Next

Semantics

Index Components are useful in storing arrays of any kind of thing as a set of Items sharing a prefix. These are 'vertical arrays' having no size constraints or constraints on the number of dimensions. By 'vertical', we mean that if their Items are displayed in sorted order vertically, as they are in the ItemEditor, the elements of the also come out separated vertically. They can be sparse and ragged, without using extra space. The element types are unlimited, so they can be anything representable as a set of Item suffixes.

An index component has nothing to do with a relational index. The functionality of relational indexes is handled by storing sets of Items with the components in a different order, so that the prefixes of one set of items are the suffixes of the other set and vice-versa. See Inversion.

If the EAV model is used, Values can be any set of Item suffixes whatsoever, so it is possible to use the structures described here as EAV Values.

A 'vertical array' is a set of Items each having a common prefix of any kind, followed by an Index component, and ending with any other sequence of components desired. An Index component contains a long numeric value that identifies an 'offset' within the vertical array.

There is no functional difference between an Index component and a normal long primitive component; Index components are primarily 'documentation'. When an Index component appears in an Item, it is clear what it means. Application code can also distinguish at runtime between a set of Items that are to be grouped together into a vertical array and a set of Items that simply occur together with some common prefix.  

Example

Suppose we have a String[][] that we want to persist. Many of the String elements are null, and we don't want to waste space for them. Each Item will look like:
	"my prefix" [900] [126] "my element"
Where the components in brackets are the Cu.toString() representations of Index components. The common prefix of the array is "my prefix", and that prefix identifies this array instance. The elements are, in this case, simple string components.

Here is how to store the array:

	// cu contains the prefix of the array
	// array[][] is the possibly sparse, possibly ragged array
	int offset = cu.length();
	for (int i = 0; i < array.length; i++) {
		for (int j = 0; j < array[i].length; j++) {
			cu.setLength(offset).appendIndex(i).appendIndex(j)
				.append(array[i][j]);
			db.insert(cu);
		}
	}
Here is how get it back:
	// cu contains the prefix of the array
	// array[][] is the pre-allocated array of the right 
	// shape with the String references all null.
	// If simple enumeration is needed, the shape does not need
	// to be known in advance
	Cu cu = Cu.alloc();
	int offsetOfOuterIndex = cu.length();
	while (db.next(cu, offsetOfOuterIndex)) {
		long i = cu.indexValueAt(offsetOfOuterIndex);
		int offsetOfInnerIndex = cu.skipIndex(offsetOfOuterIndex);
		long j = cu.indexValueAt(offsetOfInnerIndex);
		int offsetOfString = cu.skipIndex(offsetOfInnerIndex);
		String s = cu.stringAt(offsetOfString);
		System.out.println("s[" + i + "][" + j + "]=" + s);
		array[(int)i][(int)j] = s;
	}
Random access over a persisted array is straightforward as well, so it is not always necessary to retrieve a persisted array as a Java array. The following code indexes into a sparse two-dimensional array of Strings of any size (even bigger than Java arrays):
	// prefix identifies the array - it could be "my array" etc
	String get(ItemSpace db, Object prefix, long i, long j) throws IOException {
		Cu cu = Cu.alloc().append(prefix).append(i).append(j);
		try {
			int length = cu.length();
			return db.next(cu, length) ? cu.stringAt(length) : null;
		} finally { 
			cu.dispose(); // just for speed
		}
	}
Storing with random access is simple - look at insertIntoArray() in the next example.  

Retrieving Sparse Arrays of Unknown Size as Java Arrays

Sometimes it is enough simply to enumerate all of the elements or to randomly access the elements of a persisted array that uses index components, even if the array is sparse - this is easy for any number of dimensions. To retrieve sparse arrays of unknown size as actual arrays, a 'sparse array' class can be defined to collect the elements as they are enumerated. Below we define and use a SparseStringArray to allow a simple retrieval of a one-dimensional array stored with index components. (These 'sparse array' classes do not actually squeeze out space for missing elements, and are only used temporarily.)
package com.infinitydb.manualexamples;

import java.io.IOException;

import com.infinitydb.Cu;
import com.infinitydb.ItemSpace;
import com.infinitydb.VolatileItemSpace;

public class PersistingSparseArrays {
	// This prints:
	// s[0]=element[0]
	// s[1]=null
	// s[2]=null
	// s[3]=element[3]

	public static void main(String[] args) throws IOException {
		ItemSpace db = new VolatileItemSpace();
		insertIntoArray(db, "prefix", 0, "element[0]");
		insertIntoArray(db, "prefix", 3, "element[3]");
		String[] s = readSparseStringArray(db, "prefix");
		for (int i = 0; i < s.length; i++) {
			System.out.println("s[" + i + "]=" + s[i]);
		}
	}
	// Does not replace previous array element if it exists.
	// If previous element exists, a multi-value array element is created.
	static void insertIntoArray(ItemSpace db, Object prefix, 
			int i, Object o) throws IOException {
		Cu cu = Cu.alloc().append(prefix)
						.appendIndex(i).append(o);
		db.insert(cu);
		cu.dispose();
	}
	static String[] readSparseStringArray(ItemSpace db, 
			Object prefix) throws IOException {
		Cu cu = Cu.alloc(prefix);
		int offsetOfIndex = cu.length();
		SparseStringArray ssa = new SparseStringArray();
		while (db.next(cu, offsetOfIndex)) {
			long i = cu.indexValueAt(offsetOfIndex);
			int offsetOfString = cu.skipIndex(offsetOfIndex);
			String sElement = cu.stringAt(offsetOfString);
			ssa.set((int)i, sElement);
		}
		cu.dispose();
		return ssa.toStringArray();
	}
}
The helper class SparseStringArray just makes an ArrayList able to handle set()/get() beyond the current size, and provides strong typing:
package com.infinitydb.manualexamples;

import java.util.ArrayList;

public class SparseStringArray {
	ArrayList list = new ArrayList();
	
	public void set(int index, String s) {
		while (index >= list.size()) list.add(null);
		list.set(index, s);
	}
	public String get(int index) {
		if (index >= list.size()) { // prevent IndexOutOfBoundsException
			return null;
		}
		return (String)list.get(index);
	}
	public String[] toStringArray() {
		String[] s = new String[list.size()];
		for (int i = 0; i < list.size(); i++) {
			s[i] = (String)list.get(i);
		}
		return s;
	}
}
 

Retrieving Ragged Arrays as Java Arrays

Multidimensional arrays in Java can be 'ragged', meaning that the subarrays do not all have to be the same size. A two-dimensional ragged array of Strings can be persisted and enumerated very easily, but it is more work to get it back as an actual Java array. The code is almost identical to the code above for single- dimensional arrays. For two dimensions, we use another helper, in this case a SparseStringArrayArray, which provides strong typing. Here is the code to persist and retrieve the ragged array:
package com.infinitydb.manualexamples;

import java.io.IOException;

import com.infinitydb.Cu;
import com.infinitydb.ItemSpace;
import com.infinitydb.VolatileItemSpace;

public class PersistingRaggedArrays {
	// Prints:
	//	s[0][0]=null
	//	s[0][1]=element[0][1]
	//	s[0][2]=null
	//	s[0][3]=element[0][3]
	//	s[1][0]=element[1][0]
	//	s[1][1]=null
	//	s[1][2]=element[1][2]
	public static void main(String[] args) throws IOException {
		ItemSpace db = new VolatileItemSpace();
		insertIntoArray(db, "prefix", 0, 1, "element[0][1]");
		insertIntoArray(db, "prefix", 0, 3, "element[0][3]");
		insertIntoArray(db, "prefix", 1, 0, "element[1][0]");
		insertIntoArray(db, "prefix", 1, 2, "element[1][2]");
		String[][] s = readRaggedStringArrayArray(db, "prefix");
		for (int i = 0; i < s.length; i++) {
			for (int j = 0; j < s[i].length; j++) {
				System.out.println("s[" + i + "][" + j + "]=" + s[i][j]);
			}
		}
	}
	// Does not replace previous array element if it exists.
	// If previous element exists, a multi-value array element is created.
	static void insertIntoArray(ItemSpace db, Object prefix, int i, int j, Object o) throws IOException {
		Cu cu = Cu.alloc().append(prefix)
			.appendIndex(i).appendIndex(j).append(o);
		db.insert(cu);
		cu.dispose();
	}
	static String[][] readRaggedStringArrayArray(ItemSpace db, Object prefix) throws IOException {
		Cu cu = Cu.alloc(prefix);
		int offsetOfOuterIndex = cu.length();
		SparseStringArrayArray ssaa = 
			new SparseStringArrayArray();
		while (db.next(cu, offsetOfOuterIndex)) {
			long i = cu.indexValueAt(offsetOfOuterIndex);
			int offsetOfInnerIndex = cu.skipIndex(offsetOfOuterIndex);
			long j = cu.indexValueAt(offsetOfInnerIndex);
			int offsetOfString = cu.skipIndex(offsetOfInnerIndex);
			String sElement = cu.stringAt(offsetOfString);
			ssaa.set((int)i, (int)j, sElement);
		}
		cu.dispose();
		return ssaa.toStringArrayArray();
	}
}
Here is the helper class using strong typing for an array of arrays of Strings:
package com.infinitydb.manualexamples;

import java.util.ArrayList;

class SparseStringArrayArray {
	ArrayList list = new ArrayList();
	
	SparseStringArray get(int outerIndex) {
		return outerIndex >= list.size() ? null :
			(SparseStringArray)list.get(outerIndex);
	}
	public void set(int outerIndex, SparseStringArray s) {
		while (outerIndex >= list.size()) {
			list.add(null);
		}
		list.set(outerIndex, s);
	}
	public String get(int outerIndex, int innerIndex) {
		if (outerIndex >= list.size()) {
			return null;
		}
		SparseStringArray inner = get(outerIndex);
		return inner == null ? null : inner.get(innerIndex);
	}
	public void set(int outerIndex, int innerIndex, String s) {
		SparseStringArray inner = get(outerIndex);
		if (inner == null) {
			inner = new SparseStringArray();
			set(outerIndex, inner);
		}
		inner.set(innerIndex, s);
	}
	public String[][] toStringArrayArray() {
		String[][] s = new String[list.size()][];
		for (int i = 0; i < s.length; i++) {
			SparseStringArray inner = 
				(SparseStringArray)list.get(i);
			// we choose to convert null inner arrays to String[0];
			s[i] = inner == null ? new String[0] :
				inner.toStringArray();
		}
		return s;
	}
}
 

Further Uses

At runtime, it is possible to look at a set of Items with a common prefix and identify the original Java primitive types or arrays of primitives that were stored. Of course the elements of the array can be more complex than Java primitives (see Complex Values), so it is not always possible to retrieve any set of Item suffixes into Java primitives or arrays. If it is necessary to store Java Object structures with complete generality including references to other Java Objects, InfinityDB's Object persistence can be used.

Another important application of the Index component is within the CharacterLongObject and BinaryLongObject structures. These represent char or byte streams that have been stored in Infinity under a given prefix. InfinityDB Object serialization simply uses standard Java serialization with an InfinityDB BinaryLongObject.

Previous Next


Copyright © 1997-2006 Boiler Bay.