[lxml-dev] Installing lxml 2.0beta1 via easy_install requires Cython; also, question about lxml.html.clean.clean_html

Jon Rosebaugh chairos at gmail.com
Sat Jan 12 05:14:32 CET 2008


I attempted to install lxml 2.0beta1 via easy_install (easy_install
lxml==2.0beta1), and it didn't work. After a bunch of experimentation,
I discovered that the C files that are supposed to be present in the
download were not present. After installing a patched version of
Cython 0.9.6.10b (patched according to the directions I found on this
list) lxml successfully installed. But I was very surprised at this
requirement.

Also, I'm not sure, but I think the lxml.html.clean.clean_html()
function might not be working properly? I followed the example at
http://codespeak.net/lxml/dev/lxmlhtml.html#cleaning-up-html but got
different results. I expected this:
<html>
  <body>
    <div>
      <style>/* deleted */</style>
      <a href="">a link</a>
      <a href="#">another link</a>
      <p>a paragraph</p>
      <div>secret EVIL!</div>
      of EVIL!
      Password:
      annoying EVIL!
      <a href="evil-site">spam spam SPAM!</a>
      <img src="evil!">
    </div>
  </body>
</html>

But got this:
<div><style>/* deleted */</style><body>

   <a href="">a link</a>
   <a href="#">another link</a>
   <p>a paragraph</p>
   <div>secret EVIL!</div>
    of EVIL!


     Password:
   annoying EVIL!<a href="evil-site">spam spam SPAM!</a>
   <img src="evil!"></body></div>


More information about the lxml-dev mailing list