[lxml-dev] invalid tag names get serialized
Stefan Behnel
stefan_ml at behnel.de
Wed Jul 18 16:32:10 CEST 2007
Stefan Behnel wrote:
> jholg at gmx.de wrote:
>>>> detect it when you try to parse it back in (in vain). Would it be a
>>> problem
>>>> to have the tag name checked before it is set for an element?
>>> Not entirely "libxml2 behaviour", since it actually provides functions to
>>> check names. You just have to use them. Although 'just' is slightly too
>>> simplistic here. The straight forward patch actually breaks lots of test
>>> cases, e.g. getiterator('*').
>>>
>>> I'll have to look into this, but this is definitely 2.0 stuff. Maybe it
>>> would
>>> be enough to check names only in the factory functions, 'el.set()' and
>>> 'el.attrib.__setitem__()'. Lookup and search methods/functions don't have
>>> to care.
>> For my purposes, it would be sufficient if a tree did not serialize
>> successfully
>
> :) that's actually the heaviest thing to implement, as we currently only pass
> a tree to libxml2 and let it do the rest.
>
> Also, it's too late and too hard to debug. No, this patch works much better,
> but the now failing tests seem to imply that Klingon tag names are not allowed
> in well-formed XML documents. I'll have to check if it's the XML spec that's
> xenophobe here or only libxml2...
Actually it's the spec, libxml2 is right here. So the current trunk no longer
accepts invalid tag names at the API level when *creating* elements or
attributes. It still accepts them when searching tags or looking up attributes.
Stefan
More information about the lxml-dev
mailing list