[lxml-dev] html encoding
Dirk Rothe
d.rothe at semantics.de
Thu Dec 4 12:57:26 CET 2008
On Thu, 04 Dec 2008 12:46:34 +0100, Daniel Jirku <nepi at gmx.ch> wrote:
> hi...
>
> My problem is i suppose well known, but i couldnt find any soultion
> through my searches...
>
> I have a regular html link with ? and an &. When i print the variable in
> pyhton, it looks fine... (like:
> http://www.somelink.com/site.html?param1=test¶m2=hello), BUT when i
> add it to my root xml element with:
> adId1 = etree.SubElement(tagAd, "originalAdUrl")
> adId1.text = adUrl
>
> and then later write the xml to a file with this:
> toStringValue = etree.tostring(xmlTagRoot, encoding="utf-8",
> method="xml", xml_declaration=True, pretty_print=True)
> ...
>
> the tag has as its value the link with an & instead of & !!
> How can i use the correct signs for persistant storage in a xml file...?
The XML Processor has correctly escaped your "&" character. If you
deserialise (aka load) the file with a XML Parser of your choice, it will
restore your "&" character.
see
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#XML_character_entity_references
--dirk
More information about the lxml-dev
mailing list