June 1, 2006
Prefix compression - It affects your world view
David X wrote:
> Dear Support,
>
> Suppose I enter a number of people in the database as below:
>
> // persons[]: array of person names to add to the
< database (n == numberOfPersonsInArray)
> for (i=0; i<n; i++) {
> cu = cu.clear().append(ENTITY_TYPE).append("person").append(person[i]);
> db.insert();
> }
>
> Keeping in mind that .append(ENTITY_TYPE).append("person") are always
> constant, I would expect the tree to look like the following:
>
> // "ENTITY_TYPE"
> // |
> // "person"
> // |
> // +------------+--------------+
> // | | |
> // person[0] person[1] ... person[n-1]
>
> However, "ENTITY_TYPE" and "person" are also stored in the database
> "n" times, not once like I expected.
>
> I would like to know how to think about "ENTITY_TYPE" and
"person", which I look at as singular entries, but are not.
I assume this has to do with each append() putting a new pointer into the database.
>
> I am not having any problems with this, because it all works fine;
but I am wanting to build a hierarchy from this information, and
the multiple entries for each "ENTITY_TYPE" and "person" surprised
me (the hierarchy can still be built by ignoring duplicate entries).
>
>
> David
>
Yes, thought about as a completely independent series of Items, it does look
like there is duplication of the EntityClass and Entity when you use next() to
scan over a subtree or 'subspace'.
I think you have to remember the way the next() and other operations work
with a protected prefix length. That protected prefix length will always cover the part of
the Items that are to be considered 'higher' in the tree the way you drew it,
so that higher part of the data does not change in the Cu. When you use next(), you are
always only looking at the changes in the Cu beyond that prefix length, and you
ignore the common prefix. So the fact that the prefix is still sitting in there unchanged
really doesn't matter practically.
>From the perspective of any ItemSpace, Items are actually only simple variable-length
char sequences kept in order, but they are basically
always stored in the database with prefixes shared between adjacent Items
compressed out at the character level. This compression is transparent, though, so
there is no way any user of an ItemSpace can know that prefix
compression is going on. However, it would just be so ineffiicient to store
the common prefixes that you get used to assuming that that
prefix compression is going on, and the apparent duplication of the prefixes
when you just list out the Items in sequence doesn't bother you after a while.
The ItemEditor shows the apparent duplications as well, and you just have to look
past that.
So there actually aren't any special internal pointers or other tricky stuff
going on. An ItemSpace is just an ordered sequence of Items, each Item is
a variable-length sequence of up to 1600 chars [which is not a significant
limit due to the ability always to structure data in ways that avoids
the problem - ed]. The higher-level meaning
of Items as a series of components is actually not something any ItemSpace knows
about. That's why it's so easy to invent new ItemSpaces, like the DeltaItemSpace
or the VolatileItemSpace in the latest release. The concept of components is
actually totally restricted to the append(), skipXX(), and other operations on Cu.
(In fact you aren't even restricted to use those component-oriented
operations: you could invent your own meaning of the chars in the
Item by using raw operations like Cu.appendChar(c) and so on.)
One way you would be concerned about the apparent duplication of initial prefixes would be
if you wrote a simple database dumper that scanned all the Items in the db using
db.next(cu,0) and then wrote each Item out as a raw sequence of chars. There
would be no compression of shared adjacent prefixes, and the output would be
rather inefficient. It would be easy to fix that dumper to output only the
suffixes that actually change from Item to Item. That would be as efficient as
the InfinityDB internal compression.
I agree with you now that I think about it that that apparent duplication of
prefixes is a bit disconcerting at first. But the whole Idea of an EAV-structured
database, or even most of the other ItemSpace-based structures you might use, depend on
the internal compression inside the ItemSpace implementation for efficiency.
Good point.
Roger
Posted 4 years, 7 months ago on June 1, 2006
The trackback url for this post is http://boilerbay.com/infinitydb/forum/bblog/trackback.php/37/
The trackback url for this post is http://boilerbay.com/infinitydb/forum/bblog/trackback.php/37/
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
...
Comment pending moderation
Comments have now been turned off for this post