[lxml-dev] removing nested element from html wipes text
Stefan Behnel
stefan_ml at behnel.de
Tue Jan 29 08:07:59 CET 2008
Hi,
iShTa dAsaH wrote:
> I got missing text after removing last element..
> I've tried 2.0 beta but still no luck.
>
> txt="""
> '<p>
> startPara
> <strong>str0ng</strong>
> MiddleP
> <u>UUUUUUUUU</u>
> <b>Bold</b>
> EndPara
> </p>"""
>
> ee=etree.HTML(txt)
> pp=ee.xpath('//p')[0]
>
> >>> pp.tail
>
>
> >>> pp.xpath('./text()')
> ['startPara', 'MiddleP', 'EndPara']
>
>
> >>> bold=pp.xpath('./b')[0]
Try this:
>>> print bold.tail
EndPara
So the tail text is actually part of the Element.
> >>> bold.clear()
And this clears your Element.
>>> bold.tail
None
Try "help(bold.clear)" or read this:
http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree._ElementInterface.clear-method
> >>> pp.xpath('./text()')
> ['startPara', 'MiddleP']
So this is the expected result.
http://codespeak.net/lxml/dev/tutorial.html#elements-contain-text
Stefan
More information about the lxml-dev
mailing list