[lxml-dev] Efficient methods to build a tree out of HTML structure?
Viksit Gaur
vik.list.nutch at gmail.com
Fri May 16 11:28:39 CEST 2008
Hi,
Stefan Behnel wrote:
> Hi,
>
> Viksit Gaur wrote:
>> 2) Things like iterwalk do return "start" and "end" actions - but
>> instead of first doing an iterwalk and then parsing the results, is
>> there a better way to construct the tree when iterwalk itself is running?
>
> I don't understand what you mean here. Are you modifying the tree during the
> iteration? Or do you think of some kind of pipelining?
Hmm. The problem I face was a method to assign a unique ID to each
element on the page.
Lets say I construct an iterwalk object. But, during this phase, I would
like to not only build the tree, but also add some of my own information
to each node (such as a unique ID to each element). I'm not sure how to
do this, without extending the etree.so file inside which iterwalk is
implemented..
Cheers,
Viksit
>
> Stefan
>
More information about the lxml-dev
mailing list