[Z3-zemantic] Re: big zemantic storage / some changes

David Pratt fairwinds at eastlink.ca
Wed May 18 01:33:30 CEST 2005


Hi Michel

On Tuesday, May 17, 2005, at 02:41 PM, Michel Pelletier wrote:

> On Sun, 2005-05-15 at 19:37 -0300, David Pratt wrote:
>> Hi Michel.
>>
>> Thank you for your replies to the thread.  This is very helpful.  Is
>> there interest in a sql backend for Zemantic?
>
> Absolutely.
>
>> The comments about large storage is real concern with the zodb only
>> option in my view when you think how much metadata can be collected 
>> per
>> record.
>
> Keep in mind that all our ZODB scale experiments have been just that,
> experiments.  Tarek is using a very experimental version of Zemantic
> that includes a text index that index every literal value, so that
> obviously adds a lot of weight that skews the impression of what a
> "true" triple store should get.  In the next version of Zemantic (which
> I'm working on a first sketch today) the Zope 3 catalog will resposible
> for text indexing and other interpretive indexing.  Zemantic will go
> back to being a straight RDF store.

This is great.

>
>>   My feeling is it could lead to big RAM being needed for the
>> application (this is unacceptable to me since not everyone can afford
>> beefy servers to make this work)  Second the repository for me is
>> something that I believe ought be able to be used more generally
>> instead of tied specifically to the zope application so that is
>> possible potentially for other applications to interact with it.
>
> I wouldn't gamble that ZODB uses much more memory than other databases,
> but your second point is very good, a sql backend would be perfect for
> applications from different frameworks that you want to share a store.

My concern is to keep the RAM footprint low at the application level 
because it tends to be more acceptable for sql is served by a dedicated 
machine.  So I have generally been employing methods like filesystem 
and sql storage to keep the footprint as small as possible in zope.  On 
the filesystem side, there are a couple of things out now that are 
beautiful since you still have data that behaves as an object in zope 
but its footprint in the zodb is virtually nothing.

>
>> My
>> hope is to use it as a container for other rdf data you wish to store
>> and query as well from other sources than just the zope application.
>
> Zemantic works that way now, the RDF does not need to come from 
> "inside"
> Zope.

Yes, and I would like to see some kinds of methods for import and 
export and some serialization possibilities.  I am looking as 
serializing rdf for files on filesystem and there are also interesting 
transformation possibilities.

>
>> The archetypes ids for me are not important since I don't work with
>> Plone but CMF due to GPL but I can see its potential role in Plone.
>> Let me know when you are starting this rewrite since I have a strong
>> interest in this and may have also tried some sql storage ideas by 
>> then.
>
> Great, Dan and I have talked a bit about a SQL backend.  A plain SQL
> backend for rdflib would be pretty easy, it would look something like
> the existing sleepycat backend just with SQL commands instead of
> berkeley.  The trickyh part is when you want to get a sql backend (and
> the sleepycat one too) to work with Zope's (well, ZODB's) transaction
> management.
>
> Dan and I figured that backends that want to "play well" with Zope's
> transactions need to implement some kind of interface that actually 
> does
> the database level commit and rollback and then a "wrapper" class that
> interacts with Zope's TM.  It would look someting like
>
> from rdflib import Graph
> from mySQLBackend import SQLBackend
> from zemantic import TransactionalBackend
>
> g = Graph(TransactionalBackend(SQLBackend(param1=..., param2=...)))
>
> TransactionalBackend would expect the SQL bckend to implement methods
> like rollback() and commit() (see code in Zope for things like database
> adapters and sessions to see how to interact with ZODB transactional
> boundaries).

Yes - exactly.  An example of this is vuedastore.  Look for this on 
sourceforge.  Basically it is a 3store like implementation with sparql 
querying built in and written in python. Unfortunately, it was written 
with rdflib 2.0.4 but there are some things here that are interesting.  
I have been conversing with Phil Dawes who wrote the code and have 
asked him for a licensing change in the event that some of this code 
could find itself in a zope product so that zpl 2.1 could be 
maintained.  I have done this out of my interest in Zemantic and 
potential for a Zope2 type implementation.  I was looking for BSD from 
GPL since any derived work would be zpl 2.1 but he had gone to MIT 
which I feel is still too GPLish since I believe any derived work would 
also fit under MIT license.  In any case, he understands the issue I am 
raising and if derived work is MIT, I believe he would willing to bring 
the data store under BSD.  He should be getting back to me shortly.

Phil has done some benchmarking and there is a good data structure to 
work from.  There are also some importing methods. You still need RAM 
with sql and when you get into millions of triples to query effectively 
certainly but I feel this is more acceptable that adding on top of 
zodb. Only issue for me is that I would like to see the implementation 
on Postgres because it has much more going for it over all than MySQL 
(although MySQL can be a bit faster).

In any case, my only problem is that I don't know zope3 but perhaps 
there are common classes that can be used to provide an implementation 
in 3 and 2.  I am interested in a data store that will work with zope 2 
because I will continue to be in this space for  some time as it will 
take time to bring myself up to speed in  zope3.

Regards,
David


More information about the Z3-zemantic mailing list