[lxml-dev] Possible bug in DOM tree iteration?

Stefan Behnel stefan_ml at behnel.de
Mon Jun 30 09:44:27 CEST 2008


Hi,

Viksit Gaur wrote:
> I'm running some tests on a page's DOM tree by assigning each element a 
> unique identifier and then doing some analysis using this. I use code 
> similar to,
> 
> root = bs.fromstring(txtcontent)
> self.pagetree = etree.iterwalk(root, events=("start",))
> for event, element in self.pagetree:
>              element.attrib['uid'] = str(cnt)
>              cnt = cnt + 1

I guess it's really only similar to the above, as this code works just fine
for the HTML snippet you present below.


> I'm not sure how to access the rest of the the text under the 
>   P tag. When iterating through the tree, shouldn't the other tags be 
> included too, as well as the text for the P element should contain ALL 
> the text in there, including the b tags?

Please read the tutorial on Elements containing text.

Stefan




More information about the lxml-dev mailing list