July 18, 2005

Entity Attribute Value and other general topics

Hi X.

We have been using the EAV model here for
many years, and it has simplified and sped up
the applications we and others write considerably. As you
may have realized from the reading
the docs, the EAV model and the other custom
structures available are very descriptive,
providing a superset of the relational model.
This makes them ideal for representing data
that you may have trouble fitting into a pure
relational model. 

You can also intermix
data in customized structures that you need in
order to optimize for speed or space, and the
results can be dramatic. The performance and
efficiency improvements you can get depend on
structuring the data properly in the engine.
Just following a few simple guidelines like
creating the proper 'inversions', (rougly 'indexes' in
relational terms), you will automatically get
the maximum speed available in any system, such as direct
access through a B-Tree for random access.
But in IDB, you can also be creative and get
even better performance.

The RDBMS model gives you the advantage of isolation
between the performance characteristics and the
representation in that the DBA can add
or remove indexes after the fact. However, this
is to some extent an illusion. Every application
is designed with many expectations about the performance
characteristics of the underlying persistence. Thus the
DBA is actually constrained to provide at least
a minimum set of indexes.

In the EAV model, we just identify these indexes
as individual entity classes in advance. Each
entity class looks like a relation in an RDBMS, but
unlike a relation, it also provides a fast
access path to the data in other entity classes through
the appropriate attributes (which are often multi-valued.)
Thus instead of the RDBM's table/index combination, we
have only entity classes. Searching an EAV model can
be done either using the proper entity class as an
entry point to access the data, which is the equivalent
of an RDBMS index access, or of course can be done
using the equivalent of an RDBMS full table scan.

However, we also have other options for customized access.
For example, we can store 'sets' in the form of Item 'subspaces'
i.e. sets of Items with some common prefix.
A set can, for example, be a dynamically maintained
precomputed inquiry or predicate. We have used this in
BoilerBase and other applications to support a
collection of dynamically maintained 'inquiries' so that
users can see the results of a query immediately without
requiring a full entity class ('table') scan. The user can also combine
these sets ad-hoc to get complex logic, or can define more
deeply nested precomputed inquiries that refer or include
one another. This has been very effective. In BoilerBase,
an inquiry can precompute expensive operations such as
matching the text of an email, so that the difficult
part of an inquiry can be avoided during ad-hoc queries.
For your purposes, with completely ad-hoc inquiries, you
could use dynamically maintained sets to precompute
difficult subparts of the inquiry that you think will often occur.

Another feature you can use is the library of virtual
ItemSpaces including the AndSpace and OrSpace classes.
These classes allow you dynamically to view a collection of
sets (a multi-valued attribute can be viewed as a set) using
boolean logic. Because these classes are virtual ItemSpace's,
you can directly view the results rather than waiting for a
complex SQL computation. The OrSpace directly enumerates
the values in the combined set in sorted order. It corresponds
to the merge operation in a sort/merge. More importantly, the
AndSpace does the same thing but while doing an intersection
instead of a union. AndSpaces and OrSpaces can be arbitrarily
nested. This is just what you need for many kinds
of inquiry. Again, you have to make sure the appropriate sets of
Items exist either in the form of the precomputed sets I mentioned
above or in the form of multi-valued attributes or some
other form. Finally, to display the data, you use the elements of
the viewed set to access the entities in the entity class and
get back the associated attribute values.

I hope the rather long explanation above gives you a flavor for
the kinds of query support we can provide. There are
other possible structures and techniques too. Now on
to other questions.

The current InfinityDB has no support for clustering. It is just a
Java component running in one JVM with no sharing of
data in the DB with other instances running in other JVM's. It does
support complete multi-Threading with a high degree of I/O
concurrency. It will speed up given two CPU's, but more CPU's
do not help much. We concentrated on increasing disk performance by
eliminating locks on disk blocks so that a maximum number of
Threads can have outstanding I/O requests and throughput is
maximized. Multiple disks would thus probably be helpful,
although we have not run InfinityDB in this way. InfinityDB should
scale well on increasing number of disks.

Of course you could structure your application in a
client-server way so that the low-level
operations - insert(), delete(), next() - are redirected to
a central server JVM. I have been interested in a structure
like this for some time, but have not had client requests for it.
If you like, we could explore it. I think it would also be
valuable in an embedded environment for doing remote
debugging. Some applications expect the low-level operations
to be very fast (~70K cachedReads/sec/GHz) so the bottleneck
might be network I/O speed.

There is no internal support for backup. One way you could
easily provide mirroring would be simply to send each insert() and
delete() operation to a pair of InfinityDB databases, each in
a separate file on a different disk. A failure of one disk
could be detected by the return of a file corruption
exception from one of the update operations. Or, the insert()
and delete()'s going to one DB file could be captured and
serialized and sent to a remote InfinityDB instance. I think
this could be done with no serious performance hit. The
remote instance idea would require a serialization protocol, but
it would be unidirectional, hence fast. Are you interested
in working together on such a thing?

Poor schema evolution is one of my main gripes with SQL systems.
I have seen companies heavily weighted down by the proliferation
of SQL upgrade scripts that multiply with a square law
as the number of schema versions grows. With InfinityDB, the
application can detect older versions and create new entity
classes or sets or attributes as needed. There is no user
interaction at all. The application simply begins storing
data using the new entity class, attribute, or set, or any
other structure. The only exception to this is that if a
new entity class is added, the application must 'invert',
i.e. index, any of the attributes that are defined as
inverses between two entity classes. For attributes, the
older database can be viewed as having missing
values. I use the word 'missing' rather than 'null'
values because 'null' is the SQL idea of a column with
no value; in the SQL scenario, a column nust already
exist in order to have a null value. You cannot use
new version SQL statements with an older schema
because you will get an error since the table or column does not
exist. InfinityDB simply does not have this problem. This is
true also for entire entity classes or sets, or any other
structure, which may simply be considered to be empty, so
there are no errors in a newer app version when an old DB
schema is used. BoilerBase has gone through many versions
with no schema change. It is even possible to have some
'forwards' compatibility.

Hope this all helps. There are other important features
to talk about; please feel free to ask about anything.
For some source code, you can look in the download in
the examples directory. There is code at various levels of
complexity in there up to a full-text file indexer.

Would you mind if we provide this email thread to
others on our web site? I think your questions and
these answers may be important generally.

Thanks for your interest.

Roger Deran

X wrote:

> Jennifer,
>
> Thanks for the quick reply. I really appreciate your offer on the 3-hour free advise and I definitely want to take advantage of it. I would like to make the process as efficient as possible. If you could, please forward the following questions I have to your technical experts:
>
> My project requires to store different kind of business data in a single persistence store and provide fast searching on dynamic criteria. Here are my questions regarding the feasibility on using Infinity DB Entity-Attribute-Value model:
> 1) Can Infinity Entity-Attribute-Value model provide fast searching on dynamic criteria without using extensive CPU?
> 2) Is index required?
> 3) Can Infinity be deployed in client server model instead of embedded model?
> 4) Does Infinity support clustering
> 5) What kind of backup process does Infinity offer?
> 6) How does Infinity Entity-Attribute-Value cope with schema evolution?
> If possible, please attach sample source code.
>
> Once again, I thank you for your quick response.
>
> Regards,
> X
>
> On Tuesday, December 23, 2003, at 03:04  PM, Jennifer M. Douglas wrote:
>
>> X,
>>
>> Thanks very  much for your inquiry. It sounds like your requirements are a
>> good match for InfinityDB's capabilities. Indeed you can perform searches of
>> your InfinityDB file, from the application layer. Searching is a key feature
>> of the BoilerBase Email Indexer and Categorizer, which we built on Infinity.
>>
>> You can download a free trial of BoilerBase if you would like to see an
>> example of the kinds of searches we do there. Essentially, the application
>> downloads, indexes and categorizes email. Perhaps its strongest feature is
>> its ability to categorize messages according to user specifications. We
>> refer to this as the Inquiry feature. The user chooses from a number of
>> types of characteristics (text strings, whole words, addresses, date ranges,
>> etc. ) and specifies particulars for these choices. Then, the user runs the
>> inquiry, causing each message in the database to be evaluated against the
>> specified criteria. The matches are tagged with the inquiry name, and can be
>> viewed in the Inquiry view. I use this capability on a daily basis over
>> 65,000 messages! Of course, this is just one way in which searching can be
>> implemented on InfinityDB.
>>
>> Perhaps equally or more important, are the special data structures you can
>> deploy to eliminate or greatly speed up searching. I can put you in touch
>> with a technical expert who can help you decide on which approach would be
>> the right one for you. We offer limited free advise (approximately 3 hours
>> of email writing and/or phone discussion time) and then charge a minimal
>> amount after that. You can begin using your free advise time immediately,
>> even before you purchase a license. We want to make this process
>> inexpensive, highly reliable, and fast! We believe you will be amazed at how
>> quickly you can come up to speed and begin writing useful code.
>>
>> Let me know if you wish to be put in contact with a developer. You will be
>> given extremely useful advise, up to and including sample code.
>>
>> I hope I have given you a useful preliminary answer to your question. Let me
>> know if you have any further questions, or would like to contact a
>> developer.
>>
>> Sincerely,
>>
>> Jennifer Douglas
>> Marketing Director
>> Infinity Database Engine B-Tree
>> BoilerBase Email Indexer and Categorizer
>> Boiler Bay
>> http://www.boilerbay.com
>> http://infinitydb.com
>>
>>
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> To: 
>> Sent: Tuesday, December 23, 2003 9:53 AM
>> Subject: Infinity DB search capability
>>
>>
>>> I have read through the white paper on ItemSpace data structure and I
>>
>>
>> found
>>
>>> it very useful for the project I am working on. My project requires a
>>> database engine to handle different type of business data efficiently.
>>
>>
>> Using
>>
>>> RDBMS requires very complicated data management because it will end up
>>> creating millions of database table to host different business data.
>>> ItemSpace with Entity-Attribute-Value seems to be able to resolve this
>>> issue. However, the white paper never mention about searching. My project
>>> requires a fast searching capability on business data with the given
>>> criteria (ie. Search for students who are over 13 years old and with math
>>> grade larger than 80%). Can Infinity DB handle this? If so, where can I
>>
>>
>> find
>>
>>> more information about it?
>>>
>>> Regards,
>>> X
>>>
>>
>>
>

Posted 5 years, 7 months ago on July 18, 2005
The trackback url for this post is http://boilerbay.com/infinitydb/forum/bblog/trackback.php/12/

...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation

Comments have now been turned off for this post