[lxml-dev] An lxml tree inside a lxml tree.
Stefan Behnel
stefan_ml at behnel.de
Wed Mar 26 12:07:09 CET 2008
Hi,
Albert Brandl wrote:
> With lxml 1.3.6, pretty-printing still is broken. If I append a subtree to
> an element,
>
>>>> elem1 = fromstring("""
> ... <a>
> ... <b/>
> ... <c/>
> ... </a>""")
>>>> elem2 = Element("e")
>>>> elem2.append(elem1)
>
> pretty-printing does not what I'd expect:
>
>>>> print tostring(elem2, pretty_print = True)
> <e>
> <a>
> <b/>
> <c/>
> </a>
> </e>
I added an answer here:
https://answers.launchpad.net/lxml/+question/28032
The so-called "pretty printing" of XML essentially means adding
white-space at places where it looks natural and where it is unlikely to
scramble the content. Mind the word "unlikely". The notion of "ignorable
whitespace" in XML is underdefined and a pure parser thing.
You can help the serialiser in figuring out what whitespace is "ignorable"
by either a) letting the parser remove ignorable whitespace for you by
giving it a DTD and the "remove_blank_text" option, or b) by removing it
yourself, e.g. by deleting empty tail text and empty text before elements.
After all, you know best what is ignorable and what isn't.
Example:
def remove_ignorable_whitespace(root):
for el in root.iter():
if len(el) and el.text and not el.text.strip():
el.text = None
if el.tail and not el.tail.strip():
el.tail = None
Stefan
More information about the lxml-dev
mailing list