[lxml-dev] DOM tree intersection/comparison?

Viksit Gaur vik.list.nutch at gmail.com
Fri May 23 13:00:44 CEST 2008


Hi Stefan,

Stefan Behnel wrote:
> 
> Have a look at lxml.html.diff, might come close to what you want.

Thanks for the prompt pointer - I don't think this meets my requirements 
however. I was looking for something which would basically give me an 
intersection of 2 trees that was a subtree..

I had a couple of further questions actually.. I see there's a DFS 
iterator for elements, but is there a way to do a breadth first 
iteration through the tree?

I thought maybe I could do a comparison of elements at the same level 
(eg. html -> hr, a, div1, div2 etc) and (div1 ->a, hr, b, br) - sort of 
cluster these elements based on which level they are at in the tree. 
Looking for the source, it appears that this would be handled by the C 
kernel that lxml uses - which means any modifications to the base code 
must be made in C?

Or maybe there's an easier method to do this?

Cheers
Viksit

> 
> Stefan
> 
> 


More information about the lxml-dev mailing list