[lxml-dev] xml:space and xml:lang problem

Stefan Behnel behnel_ml at gkec.informatik.tu-darmstadt.de
Wed Feb 22 08:39:42 CET 2006


Scott Haeger wrote:
> It seems that lxml is not serializing properly.  The following code will
> demonstrate:
> 
> from lxml import etree
> import sys
> 
> # parse xml from file
> intree = etree.parse("test.xml")
> 
> # create root for new tree
> outroot = etree.Element("root")
> 
> # go through original and append to new tree
> doc = intree.getiterator()
> for el in doc:
>     newel = el
>     outroot.append(newel)
>   
> # create new tree and output to screen
> outtree = etree.ElementTree(outroot)
> outtree.write(sys.stdout)
> 
> 
> Test.xml is the following:
> 
> <?xml version="1.0"?>
> <svg
>     xmlns:xml=" http://www.w3.org/1998/XML">
> <a id="first" xml:space="default"></a>
> </svg>
> 
> Although the demo does not show the random characters (garbage) as seen
> in my full application, it does demostrate a failure to serialize
> properly.

Copying elements between trees will most likely not change the result. I cut
that down to the following:

----------
.>>> from lxml import etree

.>>> intree = etree.XML("""<?xml version="1.0"?>
... <svg xmlns:xml=" http://www.w3.org/1998/XML">
...   <a id="first" xml:space="default"></a>
... </svg>
... """)

.>>> etree.tostring(intree)
'<svg>\n<a id="first" xml:space="default"/>\n</svg>'
----------

So this definitely misses the XML namespace declaration. BUT, according to the
spec, that is not a problem.

"""
The prefix xml is by definition bound to the namespace name
http://www.w3.org/XML/1998/namespace.
"""

Source: http://www.w3.org/TR/REC-xml-names/

Is that what you meant when you said it demonstrates a failure?


> The problem occurs with and without the namespace declaration.

Because, according to the spec, both are the same.


> Also, removing the xml:space attribute corrects the problem. 

Probably, since it only references explicitly declared namespaces in that
case. Still, could you try to come up with an example that shows your
unreadable characters on serialization?

Thanks,
Stefan


More information about the lxml-dev mailing list