[lxml-dev] lxml.sax.saxify breaks on comments; `make test` failure on MacPython 2.5.1

Stefan Behnel stefan_ml at behnel.de
Fri May 4 19:26:15 CEST 2007


Hi,

Erik Swanson wrote:
> There appears to be a bug with lxml.sax's handling of comments, as the
> following code causes lxml.sax.saxify to fail:
> 
> """
> import lxml.etree , lxml.sax, xml.sax.handler
> from cStringIO import StringIO
> 
> p = lxml.etree.HTMLParser(remove_blank_text=True)
> h = xml.sax.handler.ContentHandler()
> f = StringIO("<body><!-- foo --><p>bar</p></body>")
> t = lxml.etree.parse(f, p)
> lxml.sax.saxify(t, h)
> """

ah, yes, thanks for the report. This is due to the way ElementTree handles
Element.tag for comments and processing instructions. They actually return
their factory functions and lxml.etree follows them for compatibility.

But the real problem is obviously in lxml.sax. It should handle comments
correctly. I'll fix it.


> Also, and I doubt this is related, but `make test` fails for me on OS X
> 10.4.9 with MacPython 2.5.1 (python.org <http://python.org> binary):
> 
> """
> python test.py -p -v 
> 
> TESTED VERSION:
>     Python:            (2, 5, 1, 'final', 0)
>     lxml.etree :        (1, 3, -1, 42667)
>     libxml used:       (2, 6, 28)
>     libxml compiled:   (2, 6, 28)
>     libxslt used:      (1, 1, 20)
>     libxslt compiled:  (1, 1, 20)
> 
>  733/733 (100.0%): Doctest: xpathxslt.txt                              
>                                         
> ======================================================================
> FAIL: test_module_HTML_unicode (
> lxml.tests.test_htmlparser.HtmlParserTestCaseBase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py",
> line 260, in run
>     testMethod()
>   File "/Users/erik/Projects/lxml/src/lxml/tests/test_htmlparser.py",
> line 33, in test_module_HTML_unicode
>     self.uhtml_str)
>   File
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py",
> line 334, in failUnlessEqual
>     (msg or '%r != %r' % (first, second))
> AssertionError: u'<html><head><title>test
> \xc3\x83\xc2\xa1\xef\xa3\x92</title></head><body><h1>page
> \xc3\x83\xc2\xa1\xef\xa3\x92 title</h1></body></html>' !=
> u'<html><head><title>test \xc3\xa1\uf8d2</title></head><body><h1>page
> \xc3\xa1\uf8d2 title</h1></body></html>'
> 
> ----------------------------------------------------------------------
> Ran 733 tests in 1.380s
> 
> FAILED (failures=1)
> """

Good to know. Not a big problem, but an annoying one, as it breaks the test
suite. I'll look into that, too.

Thanks for the reports,
Stefan




More information about the lxml-dev mailing list