[Z3-zemantic] Re: Zemantic 0.5 released

Michel Pelletier michel at dialnetwork.com
Mon Jul 18 22:51:51 CEST 2005


On Mon, 2005-07-18 at 19:52 +0200, Olivier Grisel wrote:
> Michel Pelletier wrote:
> 
> >   - Integrating Zemantic with natural language processors like NLTK, so that 
> > RDF data can be extracted from unstructured textual documents.
> > 
> 
> I am extremly interested in working on this subject (on my spare time 
> only however). What do you guys have done so far, or what plans do you 
> have in mind?

Nothing on this particular subject yet.  More recently I've been working
on RDFS entailment in rdflib which is a priority right now, but I can
help you as much as possible.

First I would propose that you think about this feature generally in
rdflib, not just to Zemantic.

> Do you plan to use such a semantic indexer against some kind of 
> predefined vocabulary/ontology ? For instance:
> 
>                        My doc (raw text) + My ontology
> 
>                                       ||
>                                [Zemantic/NLTK]
>                                       ||
>                                       \/
> 
>                            Semantic model of my doc
>                               stored in a graph
> 
> If so, what kind of ontology do you plan to use ? Existing ones on some 
> specific topic ? Or general / common sense ones ?

Didn't think about that particular pattern, but it makes sense to me.
There will certainly be nice language grammar ontologies out there, and
I imagine that they are useful.  

What I had thought of was to take NLTK's default output (to tag words
with their grammatical label) and derive subject predicate object
statements from them.  So, a document that talks a lot about French
history after WWII would automatically identify subjects like 'Paris'
and 'Charles De Gaul' and automatically create triples like 'Charles De
Gaul' => 'prime minister of'  => 'France' and other information like
that.

> 
> One could imagine to reuse semi-structured vocabulary from free online 
> dictionaries such as http://en.wiktionary.org to build such a general 
> ontology. If so which syntax should be used: OWL DL?

Good question, haven't thought about it too hard, I think a very
practical approach would be to try and make a very, very simple ontology
that mapped very easily onto NLTK's default output so that we can at
least have some input/output to query on and experiment with, and then
make it more complex from there.

-Michel




More information about the Z3-zemantic mailing list