[lxml-dev] Efficient methods to build a tree out of HTML structure?

Viksit Gaur vik.list.nutch at gmail.com
Fri May 16 04:58:41 CEST 2008


Hi all,

I was wondering - what would be the most efficient method to access all 
the elements in the DOM tree, in some order, using lxml.etree?

The methods I currently see in the docs return a class like 
ElementDepthfirstIterator or iterwalk, which have 2 issues -

1) The first has a flat representation of the tree, so I lose 
child/parent structure

2) Things like iterwalk do return "start" and "end" actions - but 
instead of first doing an iterwalk and then parsing the results, is 
there a better way to construct the tree when iterwalk itself is running?

Or perhaps there is some method I've missed completely?

Quick note on what I'm trying to do - graphically represent the DOM 
structure of a page using a library like networkX..

Cheers,
Viksit


More information about the lxml-dev mailing list