[lxml-dev] c14n, pretty printing and diffing

Olivier Collioud Olivier.Collioud at wipo.int
Tue Feb 12 07:26:20 CET 2008


Thanks Stephan.

I prefer visual diffing : the ones provided by Eclipse, TkDiff or
WinMerge.

I did not fin any doc or usage example of lxml.usedoctest,
could you please give some pointer ?

Let me share my simple (because I do not use any namespace, PI,
comment...)
solution based on iterparse:

    depth = 0
    sourceTree = ElementTree.iterparse(open(inputFile, 'r'),
events=("start", "end"))
    for event, elem in sourceTree:
            
        if event == "start":
            i = "\n" + depth*"  "
            depth += 1
            outputFile.write('%s<%s' % (i,elem.tag))
            if len(elem.items()):
                attrs = elem.items()
                attrs.sort()
                outputFile.write(' ')
                outputFile.write(' '.join(['%s="%s"' % (a[0],a[1]) for
a in attrs if a[0] != 'size']))
            if elem.text and elem.text.strip():
                outputFile.write('>%s' %
elem.text.strip('\n').encode('utf-8'))
            elif len(elem):
                outputFile.write('>')
            
        if event == "end":
            if (elem.text and elem.text.strip()) or len(elem):
                outputFile.write('%s</%s>' % (i,elem.tag))
            else:
                outputFile.write('/>')
            if elem.tail and elem.tail.strip():
               
outputFile.write(elem.tail.strip('\n').encode('utf-8'))
            depth -= 1
            elem.clear()

Olivier.

>>> Stefan Behnel <stefan_ml at behnel.de> 11/02/08 7:56 pm >>>
Hi,

Olivier Collioud wrote:
> I would like to use my favourite text diffing tool to compare XML
> files.

Which is not lxml.html.diff, I assume? (I'm not sure how HTML specific
that
is, BTW). Also, for doctests, there is lxml.usedoctest that you can
import
(the lxml web pages use it for doctests).


> Is their a way to produce a pretty printed canonical version of my
XML
> files using lxml ?

Not using the c14n interface (libxml2 doesn't support it). Serialising
by hand
is not too hard, though. You can look at ElementTree._write() for an
example:

http://svn.effbot.org/public/elementtree/elementtree/ElementTree.py 

Stefan

_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net 
http://codespeak.net/mailman/listinfo/lxml-dev


------
World Intellectual Property Organization Disclaimer:

This electronic message may contain privileged, confidential and
copyright protected information. If you have received this e-mail
by mistake, please immediately notify the sender and delete this
e-mail and all its attachments. Please ensure all e-mail attachments
are scanned for viruses prior to opening or using.




More information about the lxml-dev mailing list