[lxml-dev] Installing lxml 2.0beta1 via easy_install requires Cython; also, question about lxml.html.clean.clean_html

Stefan Behnel stefan_ml at behnel.de
Sat Jan 12 09:46:36 CET 2008


Hi,

Jon Rosebaugh wrote:
> I attempted to install lxml 2.0beta1 via easy_install (easy_install
> lxml==2.0beta1), and it didn't work. After a bunch of experimentation,
> I discovered that the C files that are supposed to be present in the
> download were not present. After installing a patched version of
> Cython 0.9.6.10b (patched according to the directions I found on this
> list) lxml successfully installed.

Hmm, it shouldn't be that hard. The tgz I downloaded has the .c files, so
installing without Cython should work just fine. I just removed my local
Cython install and did an "easy_install lxml" (which downloaded, built and
installed 2.0beta1) and also an "easy_install lxml-2.0beta1.tar.gz". Both
worked just fine.

Maybe you had an older version of Cython installed? If that's found, it will
be used - and obviously fail.


> Also, I'm not sure, but I think the lxml.html.clean.clean_html()
> function might not be working properly? I followed the example at
> http://codespeak.net/lxml/dev/lxmlhtml.html#cleaning-up-html but got
> different results. I expected this:
> <html>
>   <body>
>     <div>
>       <style>/* deleted */</style>
>       <a href="">a link</a>
>       <a href="#">another link</a>
>       <p>a paragraph</p>
>       <div>secret EVIL!</div>
>       of EVIL!
>       Password:
>       annoying EVIL!
>       <a href="evil-site">spam spam SPAM!</a>
>       <img src="evil!">
>     </div>
>   </body>
> </html>
> 
> But got this:
> <div><style>/* deleted */</style><body>
> 
>    <a href="">a link</a>
>    <a href="#">another link</a>
>    <p>a paragraph</p>
>    <div>secret EVIL!</div>
>     of EVIL!
> 
> 
>      Password:
>    annoying EVIL!<a href="evil-site">spam spam SPAM!</a>
>    <img src="evil!"></body></div>

That one should work, too. I just ran lxmlhtml.txt as doctest (which
admittedly wasn't included in the test suite before) and it just worked. Same
for test_clean.txt.

What's the version of libxml2 you are using? Can you try running the test
suite and see if that works for you?

Stefan



More information about the lxml-dev mailing list