[lxml-dev] An lxml tree inside a lxml tree.

Stefan Behnel stefan_ml at behnel.de
Wed Mar 26 12:07:09 CET 2008


Hi,

Albert Brandl wrote:
> With lxml 1.3.6, pretty-printing still is broken. If I append a subtree to
> an element,
>
>>>> elem1 = fromstring("""
> ... <a>
> ...   <b/>
> ...   <c/>
> ... </a>""")
>>>> elem2 = Element("e")
>>>> elem2.append(elem1)
>
> pretty-printing does not what I'd expect:
>
>>>> print tostring(elem2, pretty_print = True)
> <e>
>   <a>
>   <b/>
>   <c/>
> </a>
> </e>

I added an answer here:

https://answers.launchpad.net/lxml/+question/28032

The so-called "pretty printing" of XML essentially means adding
white-space at places where it looks natural and where it is unlikely to
scramble the content. Mind the word "unlikely". The notion of "ignorable
whitespace" in XML is underdefined and a pure parser thing.

You can help the serialiser in figuring out what whitespace is "ignorable"
by either a) letting the parser remove ignorable whitespace for you by
giving it a DTD and the "remove_blank_text" option, or b) by removing it
yourself, e.g. by deleting empty tail text and empty text before elements.
After all, you know best what is ignorable and what isn't.

Example:

    def remove_ignorable_whitespace(root):
        for el in root.iter():
            if len(el) and el.text and not el.text.strip():
                el.text = None
            if el.tail and not el.tail.strip():
                el.tail = None

Stefan



More information about the lxml-dev mailing list