[lxml-dev] Efficient methods to build a tree out of HTML structure?

Viksit Gaur vik.list.nutch at gmail.com
Fri May 16 11:28:39 CEST 2008


Hi,

Stefan Behnel wrote:
> Hi,
> 
> Viksit Gaur wrote:
>> 2) Things like iterwalk do return "start" and "end" actions - but 
>> instead of first doing an iterwalk and then parsing the results, is 
>> there a better way to construct the tree when iterwalk itself is running?
> 
> I don't understand what you mean here. Are you modifying the tree during the
> iteration? Or do you think of some kind of pipelining?

Hmm. The problem I face was a method to assign a unique ID to each 
element on the page.

Lets say I construct an iterwalk object. But, during this phase, I would 
like to not only build the tree, but also add some of my own information 
to each node (such as a unique ID to each element). I'm not sure how to 
do this, without extending the etree.so file inside which iterwalk is 
implemented..

Cheers,
Viksit

> 
> Stefan
> 


More information about the lxml-dev mailing list