[Z3-zemantic] Scalability, part 2

Michel Pelletier michel at dialnetwork.com
Wed Mar 9 16:10:52 MET 2005


On Wednesday 09 March 2005 02:50 am, Paul Everitt wrote:
> Sorry for following up to my own note, but this is an interesting and
> academically-rigorous article on the mathematical techniques in RDF
> indexing and querying.

Zemantic is implemented just like they describe here, it uses a 
forward/reverse index that maps nodes to integers, and then stores those 
integer mappings into a four dimensional btree index.  Essentially (and 
coincidentally) exactly the same as described in this paper.

Note that that zemantic has no "query optimizers", but they aren't very 
specific about what that means.  There is some simple alegebraic reduction 
that can be applied to improve query times, but I haven't implemented any of 
that.

> The benchmarks, showing index time on 3 million triples, is worth
> looking at.  Certainly the query time is worth looking at too.
>
> Basic conclusion: knowing a lot about the theory can give a big impact
> on design.

Definitely, one thing to note, their system has text indexing on literals, but 
they do it in a broken way, they "tokenize" the literals and index them, but 
tokenization is a language specific concept, they don't explain how to 
abstract out languages that cannot be tokenized on simple spaces.  My guess 
is they copped out of that problem (ie, what Vocabularies do for ZCatalog).

-Michel


More information about the Z3-zemantic mailing list