[lxml-dev] lxml.html, now with ignored namespaces!

Stefan Behnel stefan_ml at behnel.de
Sat Jul 4 12:03:01 CEST 2009


Hi,

Geoffrey Sneddon wrote:
>>> The output:
>>> -----
>>> <html xmlns="http://www.w3.org/TR/1999/REC-html-in-xml"
>>> cs="http://something.com/cs" xml:lang="en"
>>> lang="en"><head><title>Help!</title></head><body><p>My namespaces are
>>> going to disappear!</p><p content="fruit">FRUIT</p></body></html>
>>> -----
>
> My basic advice to the OP would be to use html5lib, which is far slower,
> but does cope with this fine.

Well, as I said, it just depends on the version of libxml2 that you are using.

>>> from lxml import etree
>>> print "lxml.etree:       ", etree.LXML_VERSION
lxml.etree:        (2, 2, 2, 0)
>>> print "libxml used:      ", etree.LIBXML_VERSION
libxml used:       (2, 6, 32)

>>> from lxml.html import fromstring

>>> document = fromstring("""<!DOCTYPE html PUBLIC "-//W3C//DTD
... XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html
... xmlns="http://www.w3.org/TR/1999/REC-html-in-xml"
... xmlns:cs="http://something.com/cs" xml:lang="en"
... lang="en"><head><title>Help!</title></head><body><p>My namespaces are
... going to disappear!</p><p cs:content='fruit'>FRUIT</p></body></html>
... """)

>>> print parser.tostring(document)
<html xmlns="http://www.w3.org/TR/1999/REC-html-in-xml"
xmlns:cs="http://something.com/cs" xml:lang="en"
lang="en"><head><title>Help!</title></head><body><p>My namespaces are
going to disappear!</p><p cs:content="fruit">FRUIT</p></body></html>


Stefan


More information about the lxml-dev mailing list