[Z3-zemantic] Re: Zemantic 0.5 released
Michel Pelletier
michel at dialnetwork.com
Mon Jul 18 22:51:51 CEST 2005
On Mon, 2005-07-18 at 19:52 +0200, Olivier Grisel wrote:
> Michel Pelletier wrote:
>
> > - Integrating Zemantic with natural language processors like NLTK, so that
> > RDF data can be extracted from unstructured textual documents.
> >
>
> I am extremly interested in working on this subject (on my spare time
> only however). What do you guys have done so far, or what plans do you
> have in mind?
Nothing on this particular subject yet. More recently I've been working
on RDFS entailment in rdflib which is a priority right now, but I can
help you as much as possible.
First I would propose that you think about this feature generally in
rdflib, not just to Zemantic.
> Do you plan to use such a semantic indexer against some kind of
> predefined vocabulary/ontology ? For instance:
>
> My doc (raw text) + My ontology
>
> ||
> [Zemantic/NLTK]
> ||
> \/
>
> Semantic model of my doc
> stored in a graph
>
> If so, what kind of ontology do you plan to use ? Existing ones on some
> specific topic ? Or general / common sense ones ?
Didn't think about that particular pattern, but it makes sense to me.
There will certainly be nice language grammar ontologies out there, and
I imagine that they are useful.
What I had thought of was to take NLTK's default output (to tag words
with their grammatical label) and derive subject predicate object
statements from them. So, a document that talks a lot about French
history after WWII would automatically identify subjects like 'Paris'
and 'Charles De Gaul' and automatically create triples like 'Charles De
Gaul' => 'prime minister of' => 'France' and other information like
that.
>
> One could imagine to reuse semi-structured vocabulary from free online
> dictionaries such as http://en.wiktionary.org to build such a general
> ontology. If so which syntax should be used: OWL DL?
Good question, haven't thought about it too hard, I think a very
practical approach would be to try and make a very, very simple ontology
that mapped very easily onto NLTK's default output so that we can at
least have some input/output to query on and experiment with, and then
make it more complex from there.
-Michel
More information about the Z3-zemantic
mailing list