[lxml-dev] Beginner question
Andreas Tille
tillea at rki.de
Thu Oct 11 14:35:18 CEST 2007
On Thu, 11 Oct 2007, Stefan Behnel wrote:
>> for event, elem in etree.iterparse(infile, events=("start")):
>> if event == "start":
>> print "start:", etree.tostring(elem, pretty_print=True)
>> print "--->", elem.tag
>
> The "start" event only guarantees that the Element itself is complete, but its
> children may or may not be parsed yet. Use the "end" event if you need to
> access the children.
Does this mean the usage of
etree.iterparse(infile, events=("end"))
would be what I really want?
> BTW, testing for event == "start" if you already restricted the events to
> ("start",) is redundant.
Right. The condition was a remaining from some other tests ...
>> idea how to finally access the values like 'idSource="NRZ Berlin" '
>
> That would be an attribute. Read the tutorial on this.
>
> http://codespeak.net/lxml/tutorial.html#elements-carry-attributes
Ahhh, elem.get(attribute) did the trick. Thanks.
> But there *is* a namespace, so how would you distinguish it from a plain
> "source" tag without namespace?
>
> If it's just for brevity, you can always use string constants.
I decided for
if elem.tag.endswith('}source'):
source = elem.get("idSource")
because for practical reasons I can be sure that I'm in the default
name space.
>> or "ct:software" with the shortcut of the name space.
>
> Who guarantees that the namespace prefix ("ct") is used in all data files?
> Your code would stop working if it wasn't...
It would not validate before if the ct would be missing in the place where
it is used here. But I can see your arguing and can cope with it. I just
thought I would have missed something in the API that would enable me to
use shortcuts.
> http://codespeak.net/lxml/objectify.html#setting-up-lxml-objectify
>
> iterparse() also returns a (special) parser, so the setup of the lookup scheme
> should work alike. I never tried it, but this should work:
>
> parser = etree.iterparse(source_file, remove_blank_text=True)
>
> lookup = objectify.ObjectifyElementClassLookup()
> parser.setElementClassLookup(lookup)
>
> for event, element in parser:
> ...
Well, when using the code:
for event, element in parser:
print "element: ", etree.tostring(element, pretty_print=True)
gives for instance:
element: <ct:software name="psql2xml" version="0.1"/>
element: <source idSource="NRZ Berlin">
<ct:software name="psql2xml" version="0.1"/>
</source>
element: <target idTarget="RKI"/>
I here also wonder how to obtain the attribute idSource from the source tag
for instance.
Many thanks for the hint in the beginning which brought me quite a step foreward
Andreas.
--
http://fam-tille.de
More information about the lxml-dev
mailing list