[lxml-dev] UTF-8 not supported

Alexander Shigin shigin at rambler-co.ru
Tue May 26 11:04:25 CEST 2009


В Пнд, 25/05/2009 в 14:41 -0500, Ovnicraft пишет:

> http://lxml.pastebin.com/m5d4a419,in this, the hightlighted line dont
> appears in the output when use utf-8 

Oh, I get it. The XML specification says:
"""
In the absence of information provided by an external transport protocol
(e.g. HTTP or MIME), it is a fatal error for an entity including an
encoding declaration to be presented to the XML processor in an encoding
other than that named in the declaration, or for an entity which begins
with neither a Byte Order Mark nor an encoding declaration to use an
encoding other than UTF-8.
"""
i.e. utf-8 is default encoding for xml and the xml library can omit
encoding declaration if encoding is utf-8.

The tostring routine has an option xml_declaration. The option force
lxml to write or omit xml encoding declaration.

In [37]: print etree.tostring(openerp, pretty_print=True, encoding='utf-8', xml_declaration=True)
<?xml version='1.0' encoding='utf-8'?>
<openerp>
  <data noupdate="1">
    <record name="" id="">
      <field name="">field</field>
    </record>
  </data>
</openerp>




More information about the lxml-dev mailing list