[lxml-dev] invalid tag names get serialized
Stefan Behnel
stefan_ml at behnel.de
Wed Jul 18 20:25:11 CEST 2007
jholg at gmx.de wrote:
> The name check should go directly into _createElement,
No, _createElement() is only a tiny wrapper around the element node creation
in libxml2. No Python exceptions allowed there.
> otherwise etree.SubElement will not pick it up.
Then SubElement will get its own check. I factored out the exception raising
so that it's only a one-liner to prevent invalid tags from passing through the
API.
> I'm also pro renaming TagNameIsValid to NCNameIsValid, as it is used on attributes also.
I actually renamed it to "_xmlNameIsValid()". It's not a public function yet,
but I might reconsider that.
>> Also, it's too late and too hard to debug. No, this patch works much
>> better,
>> but the now failing tests seem to imply that Klingon tag names are not
>> allowed
>> in well-formed XML documents. I'll have to check if it's the XML spec
>> that's xenophobe here or only libxml2...
>
> I do think that the character \u1234 is not allowed for XML NCNames:
> BaseChar production snippet:
>
> [...] #x11EB | #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] [...]
Right, I noticed that also. I also fixed the test cases now and added a bunch
of new ones.
Stefan
More information about the lxml-dev
mailing list