[lxml-dev] [Question #61584]: Is it possible to make lxml use hex instead of decimal for unicode entities?

Stefan Behnel stefan_ml at behnel.de
Thu Feb 19 20:50:50 CET 2009


usernamenumber wrote:
> Thanks very much for the assistance, Stefan. You are a great help! As it
> turns out, write_c14n() actually uses yet another method of rendering
> entities, (\xc2\xa9 as opposed to ©)

Ah, right. Sure, it serialises to UTF-8, which doesn't require Unicode
character escaping. What you see is just what the Python prompt makes of
the byte series on output.


> so it looks like I may have
> to just suck it up and deal with the output of my port being slightly
> different from that of the original tool (I don't think it's worth the
> extra processing to translate all the entities after the fact). But
> being able to write out to C14N (which I hadn't known about before now),
> might at least be able to avoid this problem in the future.

Wise choice.


> I do have one other question coming from this: I can find functions for
> writing out c14n content for ElementTree objects, but nothing for
> rendering an Element (the result of etree.fromstring(), for example) in
> this way. Am I missing something, or if I am working with a string do I
> just need to load it into a StringIO and run etree.parse() on it?

You can get an ElementTree either by calling parse() or by wrapping an
Element in it, i.e.

    tree = etree.ElementTree(root_node)

http://codespeak.net/lxml/tutorial.html#the-elementtree-class
http://effbot.org/zone/element.htm#reading-and-writing-xml-files
http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree.ElementTree-class

Stefan


More information about the lxml-dev mailing list