[lxml-dev] Efficient methods to build a tree out of HTML structure?
Viksit Gaur
vik.list.nutch at gmail.com
Fri May 16 04:58:41 CEST 2008
Hi all,
I was wondering - what would be the most efficient method to access all
the elements in the DOM tree, in some order, using lxml.etree?
The methods I currently see in the docs return a class like
ElementDepthfirstIterator or iterwalk, which have 2 issues -
1) The first has a flat representation of the tree, so I lose
child/parent structure
2) Things like iterwalk do return "start" and "end" actions - but
instead of first doing an iterwalk and then parsing the results, is
there a better way to construct the tree when iterwalk itself is running?
Or perhaps there is some method I've missed completely?
Quick note on what I'm trying to do - graphically represent the DOM
structure of a page using a library like networkX..
Cheers,
Viksit
More information about the lxml-dev
mailing list