[lxml-dev] lxml.html, now with ignored namespaces!
Stefan Behnel
stefan_ml at behnel.de
Sat Jul 4 12:03:01 CEST 2009
Hi,
Geoffrey Sneddon wrote:
>>> The output:
>>> -----
>>> <html xmlns="http://www.w3.org/TR/1999/REC-html-in-xml"
>>> cs="http://something.com/cs" xml:lang="en"
>>> lang="en"><head><title>Help!</title></head><body><p>My namespaces are
>>> going to disappear!</p><p content="fruit">FRUIT</p></body></html>
>>> -----
>
> My basic advice to the OP would be to use html5lib, which is far slower,
> but does cope with this fine.
Well, as I said, it just depends on the version of libxml2 that you are using.
>>> from lxml import etree
>>> print "lxml.etree: ", etree.LXML_VERSION
lxml.etree: (2, 2, 2, 0)
>>> print "libxml used: ", etree.LIBXML_VERSION
libxml used: (2, 6, 32)
>>> from lxml.html import fromstring
>>> document = fromstring("""<!DOCTYPE html PUBLIC "-//W3C//DTD
... XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html
... xmlns="http://www.w3.org/TR/1999/REC-html-in-xml"
... xmlns:cs="http://something.com/cs" xml:lang="en"
... lang="en"><head><title>Help!</title></head><body><p>My namespaces are
... going to disappear!</p><p cs:content='fruit'>FRUIT</p></body></html>
... """)
>>> print parser.tostring(document)
<html xmlns="http://www.w3.org/TR/1999/REC-html-in-xml"
xmlns:cs="http://something.com/cs" xml:lang="en"
lang="en"><head><title>Help!</title></head><body><p>My namespaces are
going to disappear!</p><p cs:content="fruit">FRUIT</p></body></html>
Stefan
More information about the lxml-dev
mailing list