[lxml-dev] resolve_entities=False seems to have no effect

Stefan Behnel stefan_ml at behnel.de
Sat Feb 7 14:35:04 CET 2009


Hi,

I forwarded your question to the lxml mailing list, which is a much better
place to discuss this as there are more people listening who might have an
idea.

http://comments.gmane.org/gmane.comp.python.lxml.devel/4359

usernamenumber wrote:
>> Well, what you get is well-formed XML. May I ask why you need the entity
>> references in the output?
> 
> I am calculating checksums based on the combined contents of several
> specific tags within a given document. The tool I am writing is designed
> to replace a pre-existing tool, which did the same thing and stored
> those checksums for comparison. The old tool does not convert entities,
> so in order for it to not generate a slew of false-negative checksum
> mismatches when we switch over, mine can't either.

It's rarely easy to replace a tool if you are required to mimic the
original quirks. The right way to do it is to calculate the checksums on
the parsed in-memory tree rather than the serialised XML stream. The second
best solution is to serialise to canonical XML (C14N) and to work on that.
But having checksums depend on a byte stream as serialised by a specific
tool is definitely not future proof.

To emulate the old behaviour, you could maybe build the checksum from the
in-memory tree and just replace all occurrences of »'« and »"« by their
escaped equivalent before using a text value. If your XML source documents
consistently use the entity references everywhere, this should yield the
same checksums.

Does that help?

Stefan


More information about the lxml-dev mailing list