[lxml-dev] Failing lxml.html tests

Stefan Behnel stefan_ml at behnel.de
Tue Oct 30 22:26:43 CET 2007


Ian Bicking wrote:
> I made a new checkout, did python setup.py develop, and retested, and
> the errors seem even weirder now.  Many are for method, but there's a
> bunch of others too (though still most pass).
> 
> I attached the test output.

Hmm, there really must be something wrong with your setup. You have Cython
0.9.6.7 installed, I assume? I only get three errors, all in the HTML tests.
The first one is because one of the entries in _tag_link_attrs is a list, not
sure about the others.

Anyway, you can run the HTML tests by calling "test.py -vv html", that should
get you over the failing tests for now. I'll see how far I get with a clean
checkout myself. Have you tried importing etree by hand and checked if the
failing methods work there?

Stefan

======================================================================
ERROR:
/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/feedparser-data/entry_content_applet.data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittest.py", line 260, in run
    testMethod()
  File
"/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_feedparser_data.py",
line 60, in runTest
    transformed = Cleaner(**kw).clean_html(self.input)
  File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py",
line 445, in clean_html
    self(doc)
  File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py",
line 317, in __call__
    if self.allow_element(el):
  File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py",
line 371, in allow_element
    url = el.get(self._tag_link_attrs[el.tag])
  File "lxml.etree.pyx", line 965, in lxml.etree._Element.get
  File "apihelpers.pxi", line 248, in lxml.etree._getAttributeValue
  File "apihelpers.pxi", line 1024, in lxml.etree._getNsTag
  File "apihelpers.pxi", line 971, in lxml.etree._utf8
TypeError: Argument must be string or unicode.

======================================================================
FAIL: Doctest: test_clean.txt
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittest.py", line 260, in run
    testMethod()
  File "doctest.py", line 2112, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for test_clean.txt
  File
"/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt",
line 0

----------------------------------------------------------------------
File
"/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt",
line 127, in test_clean.txt
Failed example:
    print tostring(fromstring(doc_embed))
Expected:
  <html>
    <body>
      <div>
        <embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash"></embed>
        <embed src="http://anothersite.com/v/another"></embed>
        <script src="http://www.youtube.com/example.js"></script>
        <script src="/something-else.js"></script>
      </div>
    </body>
  </html>

Got:
  <html>
    <body>
      <div>
        <embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash">
          <embed src="http://anothersite.com/v/another">
            <script src="http://www.youtube.com/example.js"></script>
            <script src="/something-else.js"></script>
          </embed>
        </embed>
      </div>
    </body>
  </html>

Diff:
  <html>
    <body>
      <div>
        <embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash">
          -<embed src="http://anothersite.com/v/another">
            <script src="http://www.youtube.com/example.js"></script>
            <script src="/something-else.js"></script>
          </embed>
        </embed>
        +<embed src="http://anothersite.com/v/another"></embed>
        +<script src="http://www.youtube.com/example.js"></script>
        +<script src="/something-else.js"></script>
      </div>
    </body>
  </html>
----------------------------------------------------------------------
File
"/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt",
line 141, in test_clean.txt
Failed example:
    print Cleaner(host_whitelist=['www.youtube.com'],
whitelist_tags=None).clean_html(doc_embed)
Expected:
  <html>
    <body>
      <div>
        <embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash"></embed>
        <script src="http://www.youtube.com/example.js"></script>
      </div>
    </body>
  </html>

Got:
  <html>
    <body>
      <div>
        <embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash">
          <script src="http://www.youtube.com/example.js"></script>
        </embed>
      </div>
    </body>
  </html>

Diff:
  <html>
    <body>
      <div>
        <embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash">
          -<script src="http://www.youtube.com/example.js"></script>
        </embed>
        +<script src="http://www.youtube.com/example.js"></script>
      </div>
    </body>
  </html>


More information about the lxml-dev mailing list