[lxml-dev] Is resolve_entities not working ??

Stefan Behnel stefan_ml at behnel.de
Tue Feb 19 16:00:47 CET 2008


Steve Howe wrote:
>> As the document does not specify a DTD, the entity "copy" is undefined,
>> which is an error if you instructed the parser to *resolve* the
>> entities.
> Agreed, but I set "resolve_entities=False" so it should not be resolving
> anything, right ? Or did I misunderstand something ?

Ah, sorry, I misread your example as saying "=True" ...

Documents that do not declare their entities are not well-formed:

---------------------------
Well-formedness constraint: Entity Declared

In a document without any DTD, a document with only an internal DTD subset
which contains no parameter entity references, or a document with
"standalone='yes'", for an entity reference that does not occur within the
external subset or a parameter entity, the Name given in the entity
reference MUST match that in an entity declaration that does not occur
within the external subset or a parameter entity, except that well-formed
documents need not declare any of the following entities: amp, lt, gt,
apos, quot. The declaration of a general entity MUST precede any reference
to it which appears in a default value in an attribute-list declaration.
---------------------------

with one exception:

---------------------------
Note that non-validating processors are not obligated to read and process
entity declarations occurring in parameter entities or in the external
subset; for such documents, the rule that an entity must be declared is a
well-formedness constraint only if standalone='yes'.
---------------------------

But since your document does not define an external Subset, the parser
knows that the Entity is not defined and that the document is not
well-formed. If you add a DOCTYPE, the parser will assume the entity to be
defined in the referenced DTD (even if it does not load it), and thus
ignore the missing declaration (you should still get a warning in the
parser "error_log", though).

Also, if you add "recover=True" to the parser, it will ignore the
(otherwise fatal) error.

Note that entities appear as children since lxml 2.0, not as text.

Stefan



More information about the lxml-dev mailing list