[lxml-dev] Beginner question

Stefan Behnel stefan_ml at behnel.de
Thu Oct 11 16:04:35 CEST 2007


Andreas Tille wrote:
> On Thu, 11 Oct 2007, Stefan Behnel wrote:
> 
>>> for event, elem in etree.iterparse(infile, events=("start")):
>>>      if event == "start":
>>>          print "start:", etree.tostring(elem, pretty_print=True)
>>>          print "--->", elem.tag
>> The "start" event only guarantees that the Element itself is complete, but its
>> children may or may not be parsed yet. Use the "end" event if you need to
>> access the children.
> 
> Does this mean the usage of
>     etree.iterparse(infile, events=("end"))
> would be what I really want?

Depends on what you want, but likely yes. Note that ("end",) is the default
anyway.


>>> or "ct:software" with the shortcut of the name space.
>> Who guarantees that the namespace prefix ("ct") is used in all data files?
>> Your code would stop working if it wasn't...
> 
> It would not validate before if the ct would be missing in the place where
> it is used here.

Why not? You could use "humptydumpty:software" as long as you associated
"humptydumpty" with the right namespace. And your XML document could define
1000 prefixes for the same namespace and then use a different prefix for each
tag. And it would validate just fine, as the namespace would be correct.


>> http://codespeak.net/lxml/objectify.html#setting-up-lxml-objectify
>>
>> iterparse() also returns a (special) parser, so the setup of the lookup scheme
>> should work alike. I never tried it, but this should work:
>>
>>  parser = etree.iterparse(source_file, remove_blank_text=True)
>>
>>  lookup = objectify.ObjectifyElementClassLookup()
>>  parser.setElementClassLookup(lookup)
>>
>>  for event, element in parser:
>>     ...
> 
> Well, when using the code:
> 
> for event, element in parser:
>      print "element: ", etree.tostring(element, pretty_print=True)
> 
> gives for instance:
> 
> element:  <ct:software name="psql2xml" version="0.1"/>
> element:  <source idSource="NRZ Berlin">
>    <ct:software name="psql2xml" version="0.1"/>
> </source>
> element:  <target idTarget="RKI"/>
> 
> I here also wonder how to obtain the attribute idSource from the source tag
> for instance.

Same attribute access as before, just the child access API is different, as
described in the objectify docs.

Stefan


More information about the lxml-dev mailing list