[lxml-dev] invalid tag names get serialized

jholg at gmx.de jholg at gmx.de
Wed Jul 18 17:04:13 CEST 2007


Hi,
I've just seen you've already been looking into this, so my comment below concerning test cases is just for reference, but:

The name check should go directly into _createElement, otherwise etree.SubElement will not pick it up. I'm also pro renaming TagNameIsValid to NCNameIsValid, as it is used on attributes also.

> Also, it's too late and too hard to debug. No, this patch works much
> better,
> but the now failing tests seem to imply that Klingon tag names are not
> allowed
> in well-formed XML documents. I'll have to check if it's the XML spec
> that's
> xenophobe here or only libxml2...

I do think that the character \u1234 is not allowed for XML NCNames:
BaseChar production snippet:

[...] #x11EB | #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] [...]

Thanks,
Holger


-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger


More information about the lxml-dev mailing list