[lxml-dev] Beginner question

Andreas Tille tillea at rki.de
Thu Oct 11 14:35:18 CEST 2007


On Thu, 11 Oct 2007, Stefan Behnel wrote:

>> for event, elem in etree.iterparse(infile, events=("start")):
>>      if event == "start":
>>          print "start:", etree.tostring(elem, pretty_print=True)
>>          print "--->", elem.tag
>
> The "start" event only guarantees that the Element itself is complete, but its
> children may or may not be parsed yet. Use the "end" event if you need to
> access the children.

Does this mean the usage of
    etree.iterparse(infile, events=("end"))
would be what I really want?

> BTW, testing for event == "start" if you already restricted the events to
> ("start",) is redundant.

Right.  The condition was a remaining from some other tests ...

>> idea how to finally access the values like 'idSource="NRZ Berlin" '
>
> That would be an attribute. Read the tutorial on this.
>
> http://codespeak.net/lxml/tutorial.html#elements-carry-attributes

Ahhh, elem.get(attribute) did the trick.  Thanks.

> But there *is* a namespace, so how would you distinguish it from a plain
> "source" tag without namespace?
>
> If it's just for brevity, you can always use string constants.

I decided for

     if elem.tag.endswith('}source'):
         source = elem.get("idSource")

because for practical reasons I can be sure that I'm in the default
name space.

>> or "ct:software" with the shortcut of the name space.
>
> Who guarantees that the namespace prefix ("ct") is used in all data files?
> Your code would stop working if it wasn't...

It would not validate before if the ct would be missing in the place where
it is used here.  But I can see your arguing and can cope with it.  I just
thought I would have missed something in the API that would enable me to
use shortcuts.

> http://codespeak.net/lxml/objectify.html#setting-up-lxml-objectify
>
> iterparse() also returns a (special) parser, so the setup of the lookup scheme
> should work alike. I never tried it, but this should work:
>
>  parser = etree.iterparse(source_file, remove_blank_text=True)
>
>  lookup = objectify.ObjectifyElementClassLookup()
>  parser.setElementClassLookup(lookup)
>
>  for event, element in parser:
>     ...

Well, when using the code:

for event, element in parser:
     print "element: ", etree.tostring(element, pretty_print=True)

gives for instance:


element:  <ct:software name="psql2xml" version="0.1"/>
element:  <source idSource="NRZ Berlin">
   <ct:software name="psql2xml" version="0.1"/>
</source>
element:  <target idTarget="RKI"/>


I here also wonder how to obtain the attribute idSource from the source tag
for instance.

Many thanks for the hint in the beginning which brought me quite a step foreward

       Andreas.

-- 
http://fam-tille.de


More information about the lxml-dev mailing list