[Z3-zemantic] Scalability, part 2
Michel Pelletier
michel at dialnetwork.com
Wed Mar 9 16:10:52 MET 2005
On Wednesday 09 March 2005 02:50 am, Paul Everitt wrote:
> Sorry for following up to my own note, but this is an interesting and
> academically-rigorous article on the mathematical techniques in RDF
> indexing and querying.
Zemantic is implemented just like they describe here, it uses a
forward/reverse index that maps nodes to integers, and then stores those
integer mappings into a four dimensional btree index. Essentially (and
coincidentally) exactly the same as described in this paper.
Note that that zemantic has no "query optimizers", but they aren't very
specific about what that means. There is some simple alegebraic reduction
that can be applied to improve query times, but I haven't implemented any of
that.
> The benchmarks, showing index time on 3 million triples, is worth
> looking at. Certainly the query time is worth looking at too.
>
> Basic conclusion: knowing a lot about the theory can give a big impact
> on design.
Definitely, one thing to note, their system has text indexing on literals, but
they do it in a broken way, they "tokenize" the literals and index them, but
tokenization is a language specific concept, they don't explain how to
abstract out languages that cannot be tokenized on simple spaces. My guess
is they copped out of that problem (ie, what Vocabularies do for ZCatalog).
-Michel
More information about the Z3-zemantic
mailing list