[Z3-zemantic] thaughts about zodb backend storage

Tres Seaver tseaver at zope.com
Tue Mar 29 17:08:14 MEST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tarek Ziadé wrote:

> I have indexed my mailbox today with zemantic (about 30 000 mails) in
> the webmail and i had performance issues.
> 
> Since the add code is not linear, it gets slower and slower when I index
> all messages. When it comes around 5000 mails indexed, its speeds is
> around 1 mail per second on my laptop. Around 10 000, the speed is more
> likely to be 1 minute per mail, so i had to stop the process.

Your problem sounds RAM / swap related;  I would suggest adding a
"sub-commit" of your transaction ('commit(1)'), after every "batch" of
mails (the number could be tuned, but try 100 to start).

> the reason is that, beside other triples I have this big triple :
> 
> Message / body is / Message body
> 
> this generates *big* triple statements, besides the text indexing

The "body-is" triple doesn't "feel right" to me.  I would rather use a
separate text index for the bodies (perhaps actually a "SearchableText"
style aggregate) and then work out how to intersect the results of an
RDF-based Zemantic query with results from that index.

> This is more likely to be a conceptual problem in the webmail program,
> but thinking about how it could go faster can't be a bad thing, as the
> problem might raises in big zemantic storage in classical uses cases.
> 
> idea :
> 
> I just had that feeling (this can be totally wrong as I don't know
> nothing about Btrees and i don't know what is actually stored in a
> OIBTree (the full Literal is stored ?) ) :

Yes.

> the reverse OIBtree could be skipped if the id that are generated for
> the forward IOBtree would  be md5 hash keys calculated with subject,
> predicate and object, then any search that are actually made for
> example with  "r.has_key(object)" could be replace by
> "f.has_key(object_md5_key)"

That would be a "saner" predicate.

Tres.
- --
===============================================================
Tres Seaver                                tseaver at zope.com
Zope Corporation      "Zope Dealers"       http://www.zope.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCSW9cGqWXf00rNCgRAiIpAKCgI6dMUrkHlfyObOHAORf4AFs/lACgjv7C
PsFdE0JiSzDmAYo9dEOegdU=
=4oTq
-----END PGP SIGNATURE-----


More information about the Z3-zemantic mailing list