<div dir="ltr">hello,<br><br>I have a document with a format like this:<br><doc>text1<b>text2</b>text3<b>text4</b>text5</doc><br><br>I want to extract 'text1text3text5' from <doc> but the text attribute returns just 'text1'. Here is an example:<br>
<br>from lxml import html<br>doc = html.fromstring('<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>')<br>print doc.text # 'text1'<br>print doc.tail # ''<br>print doc.text_content() # 'text1text2text3text4text5'<br>
<br>for child in doc:<br> child.drop_tree()<br>print doc.text # 'text1text3text5'<br><br><br>From the example you can see I can get what I want by first dropping the subelements. <br>Is there a better way to access this text?<br>
<br>regards,<br>Richard<br></div>