ItemInput and ItemOutput
Previous Next
Fast ItemTransport
In some cases, it is necessary to transport Items without incurring
the overhead of the multi-threaded, stateless,
random access capabilities of an ItemSpace. This is the
purpose of the ItemInput and ItemOutput
classes. These classes are simple to use: you can fetch
Items in an unsorted sequence from
an ItemInput using the
boolean readItem(Cu cu) and
int readItems(byte[] buff, int offset, int length)
methods. Any sequence of unsorted Items can be sent to an
ItemOutput using some insert(..)
methods and a writeItems(buf, offset, length) method.
ItemInput
The simple ItemInput.read(Cu) method simply modifies the
Cu such that it contains the next Item or else
returns false. The
readItems(byte[] buf, int offset, ing length) method
works with a buffer, somewhat like the
read(byte[]buf, int off, int len) function of
InputStream. Items are placed into the buffer
at position offset and are limited by the
length parameter. The total bytes
transferred are returned, with -1 if there are no
more Items. The Items in the buffer are formatted as
ItemPackets. For most uses,
the format of the Items in the buffer is not
necessary to understand or parse. The buffer must be
long enough to contain one Item plus some
overhead: this is ItemPacket.MAX_PACKET_LENGTH_BYTES.
ItemOutput
The ItemOutput is very simple: you can
transmit Items into an ItemOutput using
void insert(Cu cu) or insert(Object object)
orvoid writeItems(byte[] buff, final int offset, int length).
For example you can transfer Items quickly using the
idiom:
byte[] buff = new byte[ItemPacket.MAX_PACKET_LENGTH_BYTES * 10];
for (int bytesRead = 0; (bytesRead = itemInput.read(buff, 0, buff.length)) != -1;) {
itemOutput.write(buff, 0, bytesRead);
}
In fact, ItemSpace is a subclass of ItemOutput,
so the output methods described above will work without
special effort. ItemOutput is stateless, hence
it does not complicate the multi-threading capability of ItemSpace.
Streaming Out an ItemSpace using ItemSpaceScannerItemInput
In contrast to ItemOutput,
ItemInput is stateful,
because it has a current position. An example of creating
an ItemInput is to use the adapter class
ItemSpaceScannerItemInput, which can
wrap any typical ItemSpace:
ItemInput itemInput = ItemSpaceScannerItemInput(scannedSpace, cuStart, pl);
In the above code, cuStart is the Item in the scannedSpace where
the scan is to begin, and pl is the 'protected length', i.e.
the number of initial chars in the Item that are guaranteed
not to change during the scan. This class is basically
an 'iterator' over a selected part of an ItemSpace.
The ItemSpaceScannerItemInput actually just
contains a Cu contaning the current position, as it moves
through the ItemSpace. To help the scanner, ItemSpace
contains a general-purpose helper method that can use an
external Cu as the state - boolean next(Cu cu, int pl).
The actual code in ItemSpaceScannerItemInput which
uses it is:
public boolean readItem(Cu cu) throws IOException {
if (scannedSpace.next(cuScanner, pl)) {
cu.copyFrom(cuScanner);
return true;
}
return false;
}
Various ItemSpaces, ItemInput, and ItemOutput classes will
often override the methods mentioned above in order to
provide more speed. For example, VolatileItemSpace
overrides the readItems(buf, offset, length) method
to increase speed dramatically (as of 11/1/2009).
Merging sorted Item streams using ItemInputMergerItemInput
One valuable use of the ItemInput facility is in merging
multiple series of sorted Items into a longer sorted
series. This is valuable in merge sorting large numbers of Items,
such as internally in the InfinityDBBasedMergeSorterItemOutput.
Here is the sample code that shows the efficient merging of a large number
of input Item streams:
static final int COUNT = 1*100;
static final int ITEM_SPACES = 10*1000;
static void testItemSpaceMergerItemInput() {
try {
ItemSpace db = InfinityDB.create("c:/temp/junk.infdb",true, 100*1000*1000);
ItemSpace[] spaces = new ItemSpace[ITEM_SPACES];
ItemInput[] inputs = new ItemInput[ITEM_SPACES];
for (int i = 0; i < ITEM_SPACES; i++) {
spaces[i] = new ItemSubspace(db, new Long(i));
inputs[i] = new ItemSpaceScannerItemInput(spaces[i]); // make an ordered iterator over s1
for (int j = 0; j < COUNT; j++) {
spaces[i].insert(new Long(j * ITEM_SPACES + i));
}
}
ItemInput result = new ItemInputMergerItemInput(inputs); // iterate them together sorted
Cu cu = Cu.alloc();
while (result.readItem(cu)) { // very fast
// System.out.println(cu);
}
cu.dispose();
} catch (IOException e) {
e.printStackTrace();
}
}
This merging technique can be very valuable in general query
processing. If a small desired subset of the values of
the initial components of the Items in
a given ItemSpace are known, they can be set up each in
an ItemSubspace, each ItemSubspace wrapped in an
ItemSpaceScannerItemInput, and then these streamed out as shown
above. The same effect can be achieved using an OrSpace
wrapping the same set of ItemSubspaces, but that is inefficient
for more than a few inputs. The OrSpace has the advantage, though,
that it constitutes a real ItemSpace, with all of its accessibility
and stateless multi-threading capability. The code above will work for up to
tens of thousands of inputs. The limit on the number of inputs
is approximately the InfinityDB cache size in bytes divided by 10000
(the block size). The limit affects performance, not correctness,
and there can be situations where the actual performance
limit is much higher due to data locality patterns. Note that
the number of inputs is not limited by the actual number
of stored values for the initial components.
ItemPackets
The actual format of the data returned by readItems()
and input to writeItems() is in general not important to
the InfinityDB client, but it is quite simple. There are two
bytes of total length, followed by an 'opcode', and the the
big-endian bytes of the chars of an Item. The length is always
even. The opcodes can be ItemPacket.INSERT_OPCODE,
or ItemPacket.DELETE_OPCODE,
or ItemPacket.META_OPCODE. There is an extension mechanism, as
documented in the Javadoc for ItemPacket. It is possible for
client programs to generate and parse a stream of Item packets
for maximum speed. For example, a socket-based remote system could
stream Items from place to place efficiently.
Previous Next
Copyright © 1997-2009
Boiler Bay.