ItemInput and ItemOutput

Previous Next

Fast ItemTransport

In some cases, it is necessary to transport Items without incurring the overhead of the multi-threaded, stateless, random access capabilities of an ItemSpace. This is the purpose of the ItemInput and ItemOutput classes. These classes are simple to use: you can fetch Items in an unsorted sequence from an ItemInput using the boolean readItem(Cu cu) and int readItems(byte[] buff, int offset, int length) methods. Any sequence of unsorted Items can be sent to an ItemOutput using some insert(..) methods and a writeItems(buf, offset, length) method.

ItemInput

The simple ItemInput.read(Cu) method simply modifies the Cu such that it contains the next Item or else returns false. The readItems(byte[] buf, int offset, ing length) method works with a buffer, somewhat like the read(byte[]buf, int off, int len) function of InputStream. Items are placed into the buffer at position offset and are limited by the length parameter. The total bytes transferred are returned, with -1 if there are no more Items. The Items in the buffer are formatted as ItemPackets. For most uses, the format of the Items in the buffer is not necessary to understand or parse. The buffer must be long enough to contain one Item plus some overhead: this is ItemPacket.MAX_PACKET_LENGTH_BYTES.

ItemOutput

The ItemOutput is very simple: you can transmit Items into an ItemOutput using void insert(Cu cu) or insert(Object object) orvoid writeItems(byte[] buff, final int offset, int length). For example you can transfer Items quickly using the idiom:
byte[] buff = new byte[ItemPacket.MAX_PACKET_LENGTH_BYTES * 10];
for (int bytesRead = 0; (bytesRead = itemInput.read(buff, 0, buff.length)) != -1;) {
	itemOutput.write(buff, 0, bytesRead);
}
In fact, ItemSpace is a subclass of ItemOutput, so the output methods described above will work without special effort. ItemOutput is stateless, hence it does not complicate the multi-threading capability of ItemSpace.

Streaming Out an ItemSpace using ItemSpaceScannerItemInput

In contrast to ItemOutput, ItemInput is stateful, because it has a current position. An example of creating an ItemInput is to use the adapter class ItemSpaceScannerItemInput, which can wrap any typical ItemSpace:
    ItemInput itemInput = ItemSpaceScannerItemInput(scannedSpace, cuStart, pl);
In the above code, cuStart is the Item in the scannedSpace where the scan is to begin, and pl is the 'protected length', i.e. the number of initial chars in the Item that are guaranteed not to change during the scan. This class is basically an 'iterator' over a selected part of an ItemSpace. The ItemSpaceScannerItemInput actually just contains a Cu contaning the current position, as it moves through the ItemSpace. To help the scanner, ItemSpace contains a general-purpose helper method that can use an external Cu as the state - boolean next(Cu cu, int pl). The actual code in ItemSpaceScannerItemInput which uses it is:
    public boolean readItem(Cu cu) throws IOException {
        if (scannedSpace.next(cuScanner, pl)) {
            cu.copyFrom(cuScanner);
            return true;
        }
        return false;
    }
Various ItemSpaces, ItemInput, and ItemOutput classes will often override the methods mentioned above in order to provide more speed. For example, VolatileItemSpace overrides the readItems(buf, offset, length) method to increase speed dramatically (as of 11/1/2009).

Merging sorted Item streams using ItemInputMergerItemInput

One valuable use of the ItemInput facility is in merging multiple series of sorted Items into a longer sorted series. This is valuable in merge sorting large numbers of Items, such as internally in the InfinityDBBasedMergeSorterItemOutput. Here is the sample code that shows the efficient merging of a large number of input Item streams:
	static final int COUNT = 1*100;
	static final int ITEM_SPACES = 10*1000;
	static void testItemSpaceMergerItemInput() {
		try {
			ItemSpace db = InfinityDB.create("c:/temp/junk.infdb",true, 100*1000*1000);
			ItemSpace[] spaces = new ItemSpace[ITEM_SPACES];
			ItemInput[] inputs = new ItemInput[ITEM_SPACES];
			for (int i = 0; i < ITEM_SPACES; i++) {
				spaces[i] = new ItemSubspace(db, new Long(i));
				inputs[i] = new ItemSpaceScannerItemInput(spaces[i]); // make an ordered iterator over s1
				for (int j = 0; j < COUNT; j++) {
					spaces[i].insert(new Long(j * ITEM_SPACES + i));
				}
			}
			ItemInput result = new ItemInputMergerItemInput(inputs); // iterate them together sorted
			Cu cu = Cu.alloc();
			while (result.readItem(cu)) { // very fast
//				System.out.println(cu);
			}
			cu.dispose();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
This merging technique can be very valuable in general query processing. If a small desired subset of the values of the initial components of the Items in a given ItemSpace are known, they can be set up each in an ItemSubspace, each ItemSubspace wrapped in an ItemSpaceScannerItemInput, and then these streamed out as shown above. The same effect can be achieved using an OrSpace wrapping the same set of ItemSubspaces, but that is inefficient for more than a few inputs. The OrSpace has the advantage, though, that it constitutes a real ItemSpace, with all of its accessibility and stateless multi-threading capability. The code above will work for up to tens of thousands of inputs. The limit on the number of inputs is approximately the InfinityDB cache size in bytes divided by 10000 (the block size). The limit affects performance, not correctness, and there can be situations where the actual performance limit is much higher due to data locality patterns. Note that the number of inputs is not limited by the actual number of stored values for the initial components.

ItemPackets

The actual format of the data returned by readItems() and input to writeItems() is in general not important to the InfinityDB client, but it is quite simple. There are two bytes of total length, followed by an 'opcode', and the the big-endian bytes of the chars of an Item. The length is always even. The opcodes can be ItemPacket.INSERT_OPCODE, or ItemPacket.DELETE_OPCODE, or ItemPacket.META_OPCODE. There is an extension mechanism, as documented in the Javadoc for ItemPacket. It is possible for client programs to generate and parse a stream of Item packets for maximum speed. For example, a socket-based remote system could stream Items from place to place efficiently.

Previous Next


Copyright © 1997-2009 Boiler Bay.