InfinityDB is ideal as a time-series data collection and processing system such as a historian in industrial monitoring systems. It is has been in use for years as a historian for oil and gas wellhead monitoring by Rockwell in Kuwait. InfinityDB is not a historian itself but an underlying technology for implementing a historian or other time-series database, and Boiler Bay is interested in integrating it into new or existing systems. InfinityDB is a Java component with a simple interface.
What is a historian?
A historian is a component in a sensor system that augments a relational DBMS to efficiently store and retrieve time-series measurements from sensors of all kinds. A historian collects sequences of sensor measurement events, including at least a timestamp, a tag, and a sensor value, where the tag identifies a particular sensor in the network. The sensor data may be very large, growing to terabytes or petabytes. The time-series data is inappropriate for storage in a relational system because of the need for extreme data compression, multiple file partitioning, real-time collection at high speed, and the particular kinds of queries required. A relational DBMS augments the historian with the other non-time-series data.
What is SCADA?
SCADA is ‘Supervisory Control and Data Acquisition’ , and it is used in industrial systems like factories, chip fabrication lines, vehicle monitoring, and countless other environments. SCADA generates massive amounts of time-series data – even petabytes – that must be input in real-time, stored, manipulated, filtered, and analyzed or displayed via various query methods.
Queries over time-series data are typically by time range or point-in-time, and return a set of series of numeric values filtered by various criteria. The time series are correlated and interleaved to adjust for the various measurement points in time, which may require interpolation, because various sensors are not necessarily synchronized, and may have various sampling intervals. The results can be displayed as a time-based chart, for example, on the screen, or stored for other uses later. The returned time sequences of values may include directly the values returned by selected sensors, value limits, or mathematical functions of one or more sensor values.
Time Series Files
A time-series database in InfinityDB may be structured as a large set of files, each normally covering a certain set of sensors or derived data and a given time range. There are normally many more sensors available than are desired in the query results, so the time series files must be easily filterable to select a subset of sensors, and to interpolate the results to create an estimate of real-time sensor values. It may be necessary to ‘back fill’ a particular file with more data or to remove data or to generate a new file. It may be necessary to add new data types and structures into the data files beyond simple time series, and InfinityDB can do this: its data model is extensible to cover almost anything needed. Data can be indexed and sorted by any dimension desired – such as by sensor – and stored co-located with matching time-series data in a single file.
The data compression of InfinityDB uses a hierarchy of compression techniques to reduce in-memory and on-disk space. Sample events are stored as timestamp/tag/value triples or other intermixed data, as individual contiguous bytes called ‘Items’. Each Item component is encoded efficiently, with variable length and no padding, and Items themselves are variable-length. The Items are kept prefix compressed in memory at all times so that Timestamps are effectively deltas, and sensor ids are like deltas when samples share timestamps. When data reaches disk, it is further compressed with ZLib, to remove common values occurring close together and to turn the bytes into packed bits using Huffman encoding. Then, adjacent samples are written in variable-sized blocks with no padding into the file. Data is initially packed at the end of each file, but the variable-length blocks within the file are replaced as needed if new data is added or data is removed later, with automatic selection of the optimum existing blocks to replace. Thus efficiency is as high as possible. Any Items can be added or removed at any time later, and the old and new Items are stored compressed in exactly the same way. New files can be generated from sets of old files to maximize packing. All modifications to the contents of a file are transactional and cannot produce corrupt or incorrect results or internal inconsistency due to a proprietary update system.
Instant Random Access
Each InfinityDB file is actually an index, not just a series of events. This makes it almost instantaneous to access by any point in time or range of times for direct high-speed sequential output. A particular file is of unlimited length, and any portion of it as a set of blocks is kept in memory in a cache. The optimum file length may thus be chosen, and the sets of sensors or time ranges may be partitioned between files, or even duplicated in various files. Data can be ‘backfilled’ into any given file by combining time sequences of sensors from other existing files or derived time sequences of values with filtering. Or, new files can be quickly generated from other time sequence ranges.
Non-Time Series Data Intermixed
Any kind of data may actually be intermixed with time-series data in an InfinityDB file, or even without time-series data. The basic structure is actually completely data-type agnostic, including all Java primitive data types plus large objects like complete embedded files. This can reach the data representation capabilities of a relational system or more.
InfinityDB is extremely fast – reaching millions of operations per second. Speed is necessary due to the volume of data. Data from multiple sources can be added to any file in real-time, sorted and interleaved with existing data, even while being queried. The access is concurrent, using all available CPU cores.
Please Contact Us
Boiler Bay is interested in applying InfinityDB in new or existing time-based database systems. We can do the development, partner, or help. Please contact us:
408 314 7353