[lxml-dev] Failing lxml.html tests
Stefan Behnel
stefan_ml at behnel.de
Tue Oct 30 22:26:43 CET 2007
Ian Bicking wrote:
> I made a new checkout, did python setup.py develop, and retested, and
> the errors seem even weirder now. Many are for method, but there's a
> bunch of others too (though still most pass).
>
> I attached the test output.
Hmm, there really must be something wrong with your setup. You have Cython
0.9.6.7 installed, I assume? I only get three errors, all in the HTML tests.
The first one is because one of the entries in _tag_link_attrs is a list, not
sure about the others.
Anyway, you can run the HTML tests by calling "test.py -vv html", that should
get you over the failing tests for now. I'll see how far I get with a clean
checkout myself. Have you tried importing etree by hand and checked if the
failing methods work there?
Stefan
======================================================================
ERROR:
/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/feedparser-data/entry_content_applet.data
----------------------------------------------------------------------
Traceback (most recent call last):
File "unittest.py", line 260, in run
testMethod()
File
"/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_feedparser_data.py",
line 60, in runTest
transformed = Cleaner(**kw).clean_html(self.input)
File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py",
line 445, in clean_html
self(doc)
File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py",
line 317, in __call__
if self.allow_element(el):
File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py",
line 371, in allow_element
url = el.get(self._tag_link_attrs[el.tag])
File "lxml.etree.pyx", line 965, in lxml.etree._Element.get
File "apihelpers.pxi", line 248, in lxml.etree._getAttributeValue
File "apihelpers.pxi", line 1024, in lxml.etree._getNsTag
File "apihelpers.pxi", line 971, in lxml.etree._utf8
TypeError: Argument must be string or unicode.
======================================================================
FAIL: Doctest: test_clean.txt
----------------------------------------------------------------------
Traceback (most recent call last):
File "unittest.py", line 260, in run
testMethod()
File "doctest.py", line 2112, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for test_clean.txt
File
"/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt",
line 0
----------------------------------------------------------------------
File
"/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt",
line 127, in test_clean.txt
Failed example:
print tostring(fromstring(doc_embed))
Expected:
<html>
<body>
<div>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash"></embed>
<embed src="http://anothersite.com/v/another"></embed>
<script src="http://www.youtube.com/example.js"></script>
<script src="/something-else.js"></script>
</div>
</body>
</html>
Got:
<html>
<body>
<div>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash">
<embed src="http://anothersite.com/v/another">
<script src="http://www.youtube.com/example.js"></script>
<script src="/something-else.js"></script>
</embed>
</embed>
</div>
</body>
</html>
Diff:
<html>
<body>
<div>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash">
-<embed src="http://anothersite.com/v/another">
<script src="http://www.youtube.com/example.js"></script>
<script src="/something-else.js"></script>
</embed>
</embed>
+<embed src="http://anothersite.com/v/another"></embed>
+<script src="http://www.youtube.com/example.js"></script>
+<script src="/something-else.js"></script>
</div>
</body>
</html>
----------------------------------------------------------------------
File
"/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt",
line 141, in test_clean.txt
Failed example:
print Cleaner(host_whitelist=['www.youtube.com'],
whitelist_tags=None).clean_html(doc_embed)
Expected:
<html>
<body>
<div>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash"></embed>
<script src="http://www.youtube.com/example.js"></script>
</div>
</body>
</html>
Got:
<html>
<body>
<div>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash">
<script src="http://www.youtube.com/example.js"></script>
</embed>
</div>
</body>
</html>
Diff:
<html>
<body>
<div>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash">
-<script src="http://www.youtube.com/example.js"></script>
</embed>
+<script src="http://www.youtube.com/example.js"></script>
</div>
</body>
</html>
More information about the lxml-dev
mailing list