[lxml-dev] Behaviour change in findtext

Stefan Behnel stefan_ml at behnel.de
Sat Feb 21 19:09:24 CET 2009


Hi Fredrik,

thanks for the clarification.

Fredrik Lundh wrote:
> Not sure - that you can get None back from findtext when the element
> is there looks like an accidental change when the ElementPath engine
> was rewritten.  I think I'll consider that a bug in findtext.

I thought so, too.


> As for distinguishing between <element/> and <element></element>

That's not what I meant, although that actually is the result when you
serialise with or without an empty string value. A parsed empty element
will always have its .text set to None in lxml.etree, regardless of the way
the parser saw it. I rather meant the difference between users setting

    el.text = None

and

    el.text = ''

in the code. In the second case, lxml.etree creates a text node with an
empty string in the underlying libxml2 tree. That way, it can return the
expected result on later requests. This is actually compatible with ET,
which (obviously) also remembers what the user set as value. You can think
of the above as an emulation of the ET behaviour, but also as a way to
prevent surprised faces on user side when you see

    el.text = ''
    for i in range(10:
        el.text += 'xyz'

fail mysteriously.


> the ET specification allows an implementation to use either
> None or an empty string for the text and tail attributes in either
> case to simplify the tree building.  However, an application shouldn't
> abuse this - an XML producer should be free to use either form to
> indicate an empty element, and application code should use "truth
> testing" when necessary, when inspecting the text/tail attributes of a
> given element.

I fully agree.


> And I think findtext should be reverted to the 1.2
> behaviour - just add an <or ""> to the suitable place in ElementPath,
> and leave the rest as is.

That's what I did for lxml 2.2. It just makes findtext() simpler to use.

Stefan



More information about the lxml-dev mailing list