[lxml-dev] Problem with ":" char in tag names

Stefan Behnel stefan_ml at behnel.de
Thu Aug 16 08:42:18 CEST 2007


Hi David,

Dave Kuhlman wrote:
> I've been using lxml and think it is great

:)


>, but ...

;) I just knew there was more to come...


> I recently installed lxml-1.3.3.  Now I find that the following
> gives me an error:
> 
>     In [3]: from lxml import etree
>     In [4]: etree.Element('abc:def')
>     ------------------------------------------------------------
>     Traceback (most recent call last):
>       File "<ipython console>", line 1, in <module>
>       File "etree.pyx", line 1801, in etree.Element
>       File "apihelpers.pxi", line 101, in etree._makeElement
>       File "apihelpers.pxi", line 723, in etree._getNsTag
>     ValueError: Invalid tag name
> 
> It's because of the ":" in the tag name.
> 
> That's critical for me, because I use lxml in my rst2odt project to
> produce OpenOffice ODF .odt files.  See:
> http://www.rexx.com/~dkuhlman/odtwriter.html
> 
> An ODF/.odt file is a zipped archive of XML files.  Those XML files
> contain many tags that contain colons.
> 
> Here are the relevant portions of the XML spec, I believe:
> 
>     http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-starttags
>     http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Name
> 
> Aren't I correct that a colon should be allowed in a tag name?
> 
> In apihelpers.pxi, it looks like the following lines were added in
> lxml version 1.3.3 and which I believe are raising the exception:
> 
>     elif cstd.strchr(c_tag, c':') is not NULL:
>         raise ValueError, "Invalid tag name"
> 
> Is there a reason for that?

lxml (read: libxml2) supports XML 1.0 (don't think there were any relevant
changes in 1.1, which you cite above) and is generally namespace aware. This
means that ":" is considered a separator between a namespace prefix and the
tag name, and is therefore not allowed as part of a plain (namespace-less) tag
name.

You mentioned ODF, which is heavily based on namespaces, and AFAIA, it doesn't
use prefixes for anything but namespace references. So you should be fine with
the general namespace support in lxml.etree.

http://codespeak.net/lxml/dev/tutorial.html#namespaces

Does that 'enlighten' you? :)

Stefan


More information about the lxml-dev mailing list