[lxml-dev] Text obscured by subelement
Richard Baron Penman
richardbp+lxml at gmail.com
Sun Aug 24 13:19:18 CEST 2008
hello,
I have a document with a format like this:
<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>
I want to extract 'text1text3text5' from <doc> but the text attribute
returns just 'text1'. Here is an example:
from lxml import html
doc = html.fromstring('<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>')
print doc.text # 'text1'
print doc.tail # ''
print doc.text_content() # 'text1text2text3text4text5'
for child in doc:
child.drop_tree()
print doc.text # 'text1text3text5'
>From the example you can see I can get what I want by first dropping the
subelements.
Is there a better way to access this text?
regards,
Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080824/254de6f4/attachment.htm
More information about the lxml-dev
mailing list