[lxml-dev] Weird bug

David Turner novalis at openplans.org
Thu Apr 12 23:20:06 CEST 2007


I'm trying to write some code that uses lxml, and I run into a weird
memory error.

Unfortunately, I can't seem to create a small testcase.  So this bug
report probably won't be very useful.  

How to reproduce:

Check out the following code:

http://codespeak.net/svn/z3/deliverance/branches/parallel

python setup.py develop

python deliverance/test_wsgi.py

This will sometimes run just fine (that is, produce no output).
Sometimes, it will give the following error:, which doesn't really seem
to matter, since it's "most likely raised during interpreter shutdown"

Exception in thread Thread-70 (most likely raised during interpreter
shutdown):
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
  File
"/home/novalis/deliverance/src/deliverance/transcluder/threadpool.py",
line 91, in run
  File
"/home/novalis/deliverance/src/deliverance/transcluder/tasklist.py",
line 87, in get
  File "/usr/lib64/python2.4/threading.py", line 197, in wait
exceptions.TypeError: 'NoneType' object is not callable
Unhandled exception in thread started by
Error in sys.excepthook:

Original exception was:
[nothing is printed here]
------------
And sometimes, there's an error in the actual test:
---------
Traceback (most recent call last):
  File "deliverance/test_wsgi.py", line 361, in ?
    x[0](*x[1:])
  File "deliverance/test_wsgi.py", line 156, in do_aggregate
    html_string_compare(res.body, res2.body)
  File "deliverance/test_wsgi.py", line 61, in html_string_compare
    raise ValueError(
ValueError: Comparison failed between actual:
==================
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>I am a title</title>
<link rel="rss" href="/rss.xml">
<link rel="rss" href="/rss.xml">
</head>
<body>
<div id="navbar"><div id="nav">Additional Nav Info</div></div>
  Some text
  <div id="content">
<p>Paragraph one</p>
<p>Paragraph two</p>
<div id="external_content">

    external body text
    <br><br>
</div>
</div>
</body>
</html>


expected:
==================
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>I am a title</title>
    <link rel="rss" href="/rss.xml">
  </head>
  <body>
    <div id="navbar"><div id="nav">Additional Nav Info</div></div>
    Some text
    <div id="content">
      <p>Paragraph one</p>
      <p>Paragraph two</p>
      <div id="external_content">
        external body text
        <br><br>
      </div>
    </div>
  </body>
</html>


Report:
children length differs, 4 != 3
children 1 do not match: head
------------


Running valgrind shows a couple of memory errors.  The first is in
xmlFreeNode, when it attempts to get the dict from a doc that has been
freed.  The node in question is created at line 327 of tasklist.py in
transcluder -- but the error comes later, during garbage collection.

If anyone has any ideas, I'm all ears.



More information about the lxml-dev mailing list