[lxml-dev] DOM tree intersection/comparison?
Viksit Gaur
vik.list.nutch at gmail.com
Sat May 24 12:36:36 CEST 2008
Hey Mike,
Mike Meyer wrote:
>
> I've written some code to diff two xml trees. The real issue is that
> "the differences between two trees" isn't really well defined. I.e. -
> does order of children matter? Not for attribute nodes, and maybe not
> for other nodes, depending on the application. What about whitespace?
> Same answer - some of it yes, some of it depends on the
> application. Look at a modern diff's different options for whitespace
> handling, then fold in XML's newline handling to see how nasty that
> can get.
Thanks for pointing out some interesting questions - I had thought of a
couple, but I was counting on others not being too relevant to what I
was doing.. Whitespace diffs are actually really bad - and I guess
unicode is not going to sit pretty with the mix if I ever have to move
to multi-lingual support.
>
> FWIW, I'm not sure you get a "subtree" - more like forest. Or maybe it
> depends on exactly what you mean by "differences". I.e. - if an
> attribute changed value and that was the only difference, I wanted
> that attribute pulled out. I could see where you might define things
> so that the difference was the largest common subtree, or some such.
The latter was what I was aiming for. Mostly, I'm not trying to compute
an intersection between 2 trees, as much as constructing a compressed
representation of them.
Cheers,
Viksit
More information about the lxml-dev
mailing list