[lxml-dev] Problem with ":" char in tag names
Stefan Behnel
stefan_ml at behnel.de
Sun Aug 19 18:56:05 CEST 2007
Hi Martijn,
Martijn Faassen wrote:
> I agree that this is a bugfix; sorry for the confusion. I never tried
> to use namespace prefixes this way and used Clarke notation
> consistently. My feedback is not coming from the perspective of
> supporting broken use, but wondering whether we cannot make lxml
> easier to use.
I understand. I just doubt it would become easier to use.
> Yes, and prefixes are used in the XML serialization. The way we have
> both prefixes and Clarke notation *already* creates a lot of confusion
> for users.
That's why I would rather prefer getting it eliminated in XPath (with ETXPath)
than introducing it in other parts of the API.
>> Note that lxml nicely reassigns prefixes now when inserting an element into
>> an existing tree, so there really is no need to assign prefixes more than
>> once (if at all).
>
> Assigning prefixes, sure. *Using* prefixes is what I'm talking about.
But prefixes are error prone and this behaviour makes them even more error
prone. Prefixes are not equivalent to namespaces as more than one prefix can
map to the same namespace, in different parts of a document or even
concurrently. And since lxml.etree adapts namespace prefixes when merging
documents or adding new elements, you can get surprising behaviour depending
on the source of the document you are working on. If you only generate XML
from scratch without interacting with external code, you may be fine with
prefix notation, but if you work on existing documents or pipe XML through
external libraries, you may end up being surprised why lxml.etree starts
throwing exceptions at you when you continue working on the document you just
got back.
Allowing prefix notation in tag names encourages people to write code that
makes assumptions about their data that may not be true for 100% equivalent
data. And if you are aware of the potential pitfalls of such a feature, I
doubt that you would use it except for a very limited number of use cases.
> In addition, the Clarke notation pattern forces one to write code like this:
>
> SubElement(el, '{%s}foo' % MY_NS)
>
> i.e. people generally don't want to spell out their entire namespace
> URI over and over again when constructing XML.
I absolutely see that problem. But I do not think that supporting prefix
notation is a good way to solve this. I mean, the most common case where this
really hurts is that you use one single namespace in your application and have
to repeat it for every SubElement. But it's easy to write a factory that wraps
SubElement() and simply copies the namespace of the parent over to the new
child (if it doesn't provide one itself), something like this:
def SameNamespaceSubElement(parent, tag, *args, **kwargs):
if not tag.startswith("{") and parent.tag.startswith("{"):
tag = parent.tag[:parent.tag.index("}")+1] + tag
return etree.SubElement(parent, tag, *args, **kwargs)
(plus QName() support, plus a better name, etc.)
>>> The nice thing is that you could avoid having to write '{%s}foo' %
>>> my_namespace a lot.
>> Feel free to assign it to a global constant or to use the E factory as in
>> lxml.html.builder.
>
> Yes, remember that I've used lxml before. :)
:)
> I often use a global constant. It still means I scatter "{%s}foo" %
> MY_GLOBAL_CONSTANT throughout my code. Meanwhile, I *already* have a
> "global constant" that I also set somewhere, in the XML, namely my
> namespace map.
I rather meant something like a module that keeps constants for every tag name
in a namespace, a bit like in lxml.html.builder, just with strings instead of
factories.
>>> Of course this has consequences for other areas, such as 'tag', so I'm
>>> not sure whether this is a good idea, but throwing it in.
>> Right, it would let ".tag" return something other than what you passed into
>> the Element() function.
>
> Yes. If we make this change, we'd also need to figure out what happens
> if you explictily *set* tag. Should we allow:
>
> foo.tag = 'foo:bar'
>>> foo.tag = 'foo:bar'
>>> print foo.tag
{http://whatever}bar
Perfectly understandable. If we implement that, I'm all for documenting it in
a doctest. People will have to see it to believe it. :)
Stefan
More information about the lxml-dev
mailing list