November 15, 2009

new Merger, ItemInput and ItemOutput streaming

There is a new high-speed sorted data stream
merger in InfinityDB as of version 1.1.0 beta,
which is available at infinitydb.com/beta.
This class merges multiple streams of sorted Items into one stream
with performance approximately equal to the performance of
any one of the inputs. This new
'ItemInputMergerItemInput' functionality and the
existing OrSpace are similar in that the both
represent unions of Items, but they differ in these ways:

* OrSpace wraps a set of base ItemSpaces so that
all of the Items visible in any of the base spaces
is visible in the wrapped space. The result is a fully
random-access, dynamic view, that can be plugged into
any other place an ItemSpace is expected. The
performance is approximately the sum of the
base space access times.
* ItemInputMergerItemInput is not random-access nor
dynamic, but is more efficient in that the performance
is approximately the same as each input independently.
Reading all of the Items in all of the streams takes
only a little more than the time to read all of each
stream. Thousands of inputs can be combined with no
substantial loss of performance. This class gets its input not from an ItemSpace,
but from the new ItemInput functionality, and it presents
its output as an ItemInput as well.

Reading from an ItemInput is simple:

class ItemInput {
public abstract boolean readItem(Cu cu) throws IOException;
public int readItems(byte[] buff, int offset, int length) throws IOException;
//..
}

The second method returns the number of bytes read. The
readItems() may look familiar: it has the same signature
as InputStream.read(buf, offset, length), and it works basically
the same way, except that the data returned is formatted
as a sequence of serialized Items. There is also a new way
to write such buffers to an 'ItemOutput' that looks just
like the OutputStream function, and there is also a
single-Item-at-a-time method:

class ItemOutput {
public abstract void insert(Cu cu) throws IOException;
public abstract void delete(Cu cu) throws IOException;
public void writeItems(byte[] buff, final int offset, int length)
throws IOException;
//..
}

It is easy to obtain an ItemOutput, because now ItemSpace
is descended from ItemOuput. Hence any writeable
ItemSpace can accept an Item stream, especially in the
serialized buffer format for extreme speed. This is
possible because an ItemOutput is 'stateless'.

An ItemInput is not 'stateless'. Hence an ItemSpace cannot
be used as an ItemInput to get serialized buffers full of
Items. Being stateless, an ItemInput cannot be used by
multiple threads - unlike any ItemSpace - without getting
results dependent on thread access patterns. Multi-thread
reading is still possible but the client must synchronize
the accesses.

There are several classes to create ItemInputs,
for example, by reading Items from an InputStream. Hence
you can simply wrap an InputStream with InputStreamItemInput,
and read Items from a file. Files can be created by
similarly using the OutputStreamItemOutput.

To extract Items quickly from an ItemSpace, use the
wrapper ItemSpaceScannerItemInput. It uses an internal
cursor to maintain is position in the ItemSpace as the
Items are streamed out. The wrapper can use a
protected prefix length and a controllable starting
point, which is the familiar pattern for normal enumeration
of an ItemSpace by iterating over a Cu cursor. Actually,
ItemSpaceScannerItemInput uses a new method on ItemSpace
that simplifies and speeds the access:

class ItemSpace {
public int nextItems(Cu cu, int pl, byte buff[], int offset, int length)
throws IOException..
//..
}

nextItems moves the cu forwards the right distance to match
the Items streamed into buff.

For the definitive information on all of these, see the new manual
page on ItemInputMergerItemInput and the Javadoc on it
as well as the Javadoc on ItemInput and ItemOutput.

For definitive information on the format of Items in a
stream, look at the javadoc on ItemPacket.

Posted 10 months, 4 days ago on November 15, 2009
The trackback url for this post is http://boilerbay.com/infinitydb/forum/bblog/trackback.php/41/

Comments have now been turned off for this post