[lxml-dev] html encoding

Dirk Rothe d.rothe at semantics.de
Thu Dec 4 12:57:26 CET 2008


On Thu, 04 Dec 2008 12:46:34 +0100, Daniel Jirku <nepi at gmx.ch> wrote:

> hi...
>
> My problem is i suppose well known, but i couldnt find any soultion  
> through my searches...
>
> I have a regular html link with ? and an &. When i print the variable in  
> pyhton, it looks fine... (like:  
> http://www.somelink.com/site.html?param1=test&param2=hello), BUT when i  
> add it to my root xml element with:
> adId1 = etree.SubElement(tagAd, "originalAdUrl")
> adId1.text = adUrl
>
> and then later write the xml to a file with this:
> toStringValue = etree.tostring(xmlTagRoot, encoding="utf-8",  
> method="xml", xml_declaration=True, pretty_print=True)
> ...
>
> the tag has as its value the link with an &amp; instead of & !!
> How can i use the correct signs for persistant storage in a xml file...?

The XML Processor has correctly escaped your "&" character. If you  
deserialise (aka load) the file with a XML Parser of your choice, it will  
restore your "&" character.

see  
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#XML_character_entity_references

--dirk


More information about the lxml-dev mailing list