[lxml-dev] DOM tree intersection/comparison?
Mike Meyer
mwm-keyword-lxml.9112b8 at mired.org
Fri May 23 16:46:39 CEST 2008
On Fri, 23 May 2008 04:00:44 -0700
Viksit Gaur <vik.list.nutch at gmail.com> wrote:
> Thanks for the prompt pointer - I don't think this meets my requirements
> however. I was looking for something which would basically give me an
> intersection of 2 trees that was a subtree..
I've written some code to diff two xml trees. The real issue is that
"the differences between two trees" isn't really well defined. I.e. -
does order of children matter? Not for attribute nodes, and maybe not
for other nodes, depending on the application. What about whitespace?
Same answer - some of it yes, some of it depends on the
application. Look at a modern diff's different options for whitespace
handling, then fold in XML's newline handling to see how nasty that
can get.
FWIW, I'm not sure you get a "subtree" - more like forest. Or maybe it
depends on exactly what you mean by "differences". I.e. - if an
attribute changed value and that was the only difference, I wanted
that attribute pulled out. I could see where you might define things
so that the difference was the largest common subtree, or some such.
> Or maybe there's an easier method to do this?
I dealt with my issues by deciding on a canonical character string
representation that gave me lots of lines, then feeding that
representation to a string differ. The standard canonical forms don't
quite work, because they (correctly) assume that order of attributes
don't matter, but they will when you diff them with a string.
<mike
--
Mike Meyer <mwm at mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
More information about the lxml-dev
mailing list