[lxml-dev] invalid tag names get serialized

Stefan Behnel stefan_ml at behnel.de
Wed Jul 18 10:23:23 CEST 2007


jholg at gmx.de wrote:
>>> detect it when you try to parse it back in (in vain). Would it be a
>> problem
>>> to have the tag name checked before it is set for an element?
>> Not entirely "libxml2 behaviour", since it actually provides functions to
>> check names. You just have to use them. Although 'just' is slightly too
>> simplistic here. The straight forward patch actually breaks lots of test
>> cases, e.g. getiterator('*').
>>
>> I'll have to look into this, but this is definitely 2.0 stuff. Maybe it
>> would
>> be enough to check names only in the factory functions, 'el.set()' and
>> 'el.attrib.__setitem__()'. Lookup and search methods/functions don't have
>> to care.
> 
> For my purposes, it would be sufficient if a tree did not serialize
> successfully

:) that's actually the heaviest thing to implement, as we currently only pass
a tree to libxml2 and let it do the rest.

Also, it's too late and too hard to debug. No, this patch works much better,
but the now failing tests seem to imply that Klingon tag names are not allowed
in well-formed XML documents. I'll have to check if it's the XML spec that's
xenophobe here or only libxml2...

Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: name-validation.patch
Type: text/x-diff
Size: 1992 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070718/b7f270c8/attachment.bin 


More information about the lxml-dev mailing list