[lxml-dev] removing nested element from html wipes text

Stefan Behnel stefan_ml at behnel.de
Tue Jan 29 08:07:59 CET 2008


Hi,

iShTa dAsaH wrote:
> I got missing text after removing last element..
> I've tried 2.0 beta but still no luck.
> 
> txt="""
> '<p>
> startPara
>                   <strong>str0ng</strong>
> MiddleP
>                    <u>UUUUUUUUU</u>
>                    <b>Bold</b>
> EndPara
> </p>"""
> 
> ee=etree.HTML(txt)
> pp=ee.xpath('//p')[0]
> 
> >>> pp.tail
> 
> 
> >>> pp.xpath('./text()')
> ['startPara', 'MiddleP', 'EndPara']
> 
> 
> >>> bold=pp.xpath('./b')[0]

Try this:

    >>> print bold.tail
    EndPara

So the tail text is actually part of the Element.


> >>> bold.clear()

And this clears your Element.

    >>> bold.tail
    None

Try "help(bold.clear)" or read this:

http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree._ElementInterface.clear-method


> >>> pp.xpath('./text()')
> ['startPara', 'MiddleP']

So this is the expected result.

http://codespeak.net/lxml/dev/tutorial.html#elements-contain-text

Stefan


More information about the lxml-dev mailing list