[lxml-dev] Beginner question
Stefan Behnel
stefan_ml at behnel.de
Thu Oct 11 16:04:35 CEST 2007
Andreas Tille wrote:
> On Thu, 11 Oct 2007, Stefan Behnel wrote:
>
>>> for event, elem in etree.iterparse(infile, events=("start")):
>>> if event == "start":
>>> print "start:", etree.tostring(elem, pretty_print=True)
>>> print "--->", elem.tag
>> The "start" event only guarantees that the Element itself is complete, but its
>> children may or may not be parsed yet. Use the "end" event if you need to
>> access the children.
>
> Does this mean the usage of
> etree.iterparse(infile, events=("end"))
> would be what I really want?
Depends on what you want, but likely yes. Note that ("end",) is the default
anyway.
>>> or "ct:software" with the shortcut of the name space.
>> Who guarantees that the namespace prefix ("ct") is used in all data files?
>> Your code would stop working if it wasn't...
>
> It would not validate before if the ct would be missing in the place where
> it is used here.
Why not? You could use "humptydumpty:software" as long as you associated
"humptydumpty" with the right namespace. And your XML document could define
1000 prefixes for the same namespace and then use a different prefix for each
tag. And it would validate just fine, as the namespace would be correct.
>> http://codespeak.net/lxml/objectify.html#setting-up-lxml-objectify
>>
>> iterparse() also returns a (special) parser, so the setup of the lookup scheme
>> should work alike. I never tried it, but this should work:
>>
>> parser = etree.iterparse(source_file, remove_blank_text=True)
>>
>> lookup = objectify.ObjectifyElementClassLookup()
>> parser.setElementClassLookup(lookup)
>>
>> for event, element in parser:
>> ...
>
> Well, when using the code:
>
> for event, element in parser:
> print "element: ", etree.tostring(element, pretty_print=True)
>
> gives for instance:
>
> element: <ct:software name="psql2xml" version="0.1"/>
> element: <source idSource="NRZ Berlin">
> <ct:software name="psql2xml" version="0.1"/>
> </source>
> element: <target idTarget="RKI"/>
>
> I here also wonder how to obtain the attribute idSource from the source tag
> for instance.
Same attribute access as before, just the child access API is different, as
described in the objectify docs.
Stefan
More information about the lxml-dev
mailing list