[Z3-zemantic] Re: big zemantic storage / some changes
David Pratt
fairwinds at eastlink.ca
Wed May 18 01:33:30 CEST 2005
Hi Michel
On Tuesday, May 17, 2005, at 02:41 PM, Michel Pelletier wrote:
> On Sun, 2005-05-15 at 19:37 -0300, David Pratt wrote:
>> Hi Michel.
>>
>> Thank you for your replies to the thread. This is very helpful. Is
>> there interest in a sql backend for Zemantic?
>
> Absolutely.
>
>> The comments about large storage is real concern with the zodb only
>> option in my view when you think how much metadata can be collected
>> per
>> record.
>
> Keep in mind that all our ZODB scale experiments have been just that,
> experiments. Tarek is using a very experimental version of Zemantic
> that includes a text index that index every literal value, so that
> obviously adds a lot of weight that skews the impression of what a
> "true" triple store should get. In the next version of Zemantic (which
> I'm working on a first sketch today) the Zope 3 catalog will resposible
> for text indexing and other interpretive indexing. Zemantic will go
> back to being a straight RDF store.
This is great.
>
>> My feeling is it could lead to big RAM being needed for the
>> application (this is unacceptable to me since not everyone can afford
>> beefy servers to make this work) Second the repository for me is
>> something that I believe ought be able to be used more generally
>> instead of tied specifically to the zope application so that is
>> possible potentially for other applications to interact with it.
>
> I wouldn't gamble that ZODB uses much more memory than other databases,
> but your second point is very good, a sql backend would be perfect for
> applications from different frameworks that you want to share a store.
My concern is to keep the RAM footprint low at the application level
because it tends to be more acceptable for sql is served by a dedicated
machine. So I have generally been employing methods like filesystem
and sql storage to keep the footprint as small as possible in zope. On
the filesystem side, there are a couple of things out now that are
beautiful since you still have data that behaves as an object in zope
but its footprint in the zodb is virtually nothing.
>
>> My
>> hope is to use it as a container for other rdf data you wish to store
>> and query as well from other sources than just the zope application.
>
> Zemantic works that way now, the RDF does not need to come from
> "inside"
> Zope.
Yes, and I would like to see some kinds of methods for import and
export and some serialization possibilities. I am looking as
serializing rdf for files on filesystem and there are also interesting
transformation possibilities.
>
>> The archetypes ids for me are not important since I don't work with
>> Plone but CMF due to GPL but I can see its potential role in Plone.
>> Let me know when you are starting this rewrite since I have a strong
>> interest in this and may have also tried some sql storage ideas by
>> then.
>
> Great, Dan and I have talked a bit about a SQL backend. A plain SQL
> backend for rdflib would be pretty easy, it would look something like
> the existing sleepycat backend just with SQL commands instead of
> berkeley. The trickyh part is when you want to get a sql backend (and
> the sleepycat one too) to work with Zope's (well, ZODB's) transaction
> management.
>
> Dan and I figured that backends that want to "play well" with Zope's
> transactions need to implement some kind of interface that actually
> does
> the database level commit and rollback and then a "wrapper" class that
> interacts with Zope's TM. It would look someting like
>
> from rdflib import Graph
> from mySQLBackend import SQLBackend
> from zemantic import TransactionalBackend
>
> g = Graph(TransactionalBackend(SQLBackend(param1=..., param2=...)))
>
> TransactionalBackend would expect the SQL bckend to implement methods
> like rollback() and commit() (see code in Zope for things like database
> adapters and sessions to see how to interact with ZODB transactional
> boundaries).
Yes - exactly. An example of this is vuedastore. Look for this on
sourceforge. Basically it is a 3store like implementation with sparql
querying built in and written in python. Unfortunately, it was written
with rdflib 2.0.4 but there are some things here that are interesting.
I have been conversing with Phil Dawes who wrote the code and have
asked him for a licensing change in the event that some of this code
could find itself in a zope product so that zpl 2.1 could be
maintained. I have done this out of my interest in Zemantic and
potential for a Zope2 type implementation. I was looking for BSD from
GPL since any derived work would be zpl 2.1 but he had gone to MIT
which I feel is still too GPLish since I believe any derived work would
also fit under MIT license. In any case, he understands the issue I am
raising and if derived work is MIT, I believe he would willing to bring
the data store under BSD. He should be getting back to me shortly.
Phil has done some benchmarking and there is a good data structure to
work from. There are also some importing methods. You still need RAM
with sql and when you get into millions of triples to query effectively
certainly but I feel this is more acceptable that adding on top of
zodb. Only issue for me is that I would like to see the implementation
on Postgres because it has much more going for it over all than MySQL
(although MySQL can be a bit faster).
In any case, my only problem is that I don't know zope3 but perhaps
there are common classes that can be used to provide an implementation
in 3 and 2. I am interested in a data store that will work with zope 2
because I will continue to be in this space for some time as it will
take time to bring myself up to speed in zope3.
Regards,
David
More information about the Z3-zemantic
mailing list