[lxml-dev] Possible bug in DOM tree iteration?
Stefan Behnel
stefan_ml at behnel.de
Mon Jun 30 09:44:27 CEST 2008
Hi,
Viksit Gaur wrote:
> I'm running some tests on a page's DOM tree by assigning each element a
> unique identifier and then doing some analysis using this. I use code
> similar to,
>
> root = bs.fromstring(txtcontent)
> self.pagetree = etree.iterwalk(root, events=("start",))
> for event, element in self.pagetree:
> element.attrib['uid'] = str(cnt)
> cnt = cnt + 1
I guess it's really only similar to the above, as this code works just fine
for the HTML snippet you present below.
> I'm not sure how to access the rest of the the text under the
> P tag. When iterating through the tree, shouldn't the other tags be
> included too, as well as the text for the P element should contain ALL
> the text in there, including the b tags?
Please read the tutorial on Elements containing text.
Stefan
More information about the lxml-dev
mailing list