[lxml-dev] lxml 1.3 annotate() behaviour for empty string data elements

jholg at gmx.de jholg at gmx.de
Mon Jun 18 16:09:06 CEST 2007


Hi,

I just noticed that annotate() does not add type information to empty string elements when parsed:

>>> root = etree.fromstring("""
... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
...         xmlns:py="http://codespeak.net/lxml/objectify/pytype"
...         xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...   <s1>foobar</s1>
...   <s2></s2>
... </root>
... """)
>>> objectify.annotate(root)
>>> 
>>> print etree.tostring(root, pretty_print=True)oot)
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <s1 py:pytype="str">foobar</s1>
  <s2/>
</root>
>>> 

Whereas type annotation happens when setting attributes manually:
>>> root = objectify.Element("root")
>>> root.s1 = "foobar"
>>> root.s2 = ""
>>> objectify.annotate(root)
>>> print etree.tostring(root, pretty_print=True)
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <s1 py:pytype="str">foobar</s1>
  <s2 py:pytype="str"></s2>
</root>
>>> 

I know this happens due to the .text of the node being None in the 1st case instead of '' in the second case (which is lxml/ElementTree/libxml2 behaviour that bites me once and again).
Still, I'd prefer to have annotate() provide all data elements with type information; after all, the element in question is treated as a StringElement (the default emtpy_data_class) anyway.

Objections?

Holger

-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail


More information about the lxml-dev mailing list