[lxml-dev] Beginner question
Stefan Behnel
stefan_ml at behnel.de
Thu Oct 11 13:49:59 CEST 2007
Andreas Tille wrote:
> I'm sorry to start with this beginner question.
Everyone's a beginner right from the start. :)
> With the code that I adopted from the tutorial
>
> for event, elem in etree.iterparse(infile, events=("start")):
> if event == "start":
> print "start:", etree.tostring(elem, pretty_print=True)
> print "--->", elem.tag
The "start" event only guarantees that the Element itself is complete, but its
children may or may not be parsed yet. Use the "end" event if you need to
access the children.
BTW, testing for event == "start" if you already restricted the events to
("start",) is redundant.
> idea how to finally access the values like 'idSource="NRZ Berlin" '
That would be an attribute. Read the tutorial on this.
http://codespeak.net/lxml/tutorial.html#elements-carry-attributes
> nor do I have an idea how to get rid of the default name space that
> is prepended before the tags. I would rather like to access the tag
> called "source" (without the default name space)
But there *is* a namespace, so how would you distinguish it from a plain
"source" tag without namespace?
If it's just for brevity, you can always use string constants.
> or "ct:software" with the shortcut of the name space.
Who guarantees that the namespace prefix ("ct") is used in all data files?
Your code would stop working if it wasn't...
> I also found the very interesting objectify method at
> http://codespeak.net/lxml/objectify.html
> but I finally have no idea how to use that in the parser because
> the page just describes creating objects (or did I missed something?)
http://codespeak.net/lxml/objectify.html#setting-up-lxml-objectify
iterparse() also returns a (special) parser, so the setup of the lookup scheme
should work alike. I never tried it, but this should work:
parser = etree.iterparse(source_file, remove_blank_text=True)
lookup = objectify.ObjectifyElementClassLookup()
parser.setElementClassLookup(lookup)
for event, element in parser:
...
Stefan
More information about the lxml-dev
mailing list