[lxml-dev] type of custom objects in XML-tree disappears

Stefan Behnel stefan_ml at behnel.de
Wed Jan 23 11:24:02 CET 2008


Hi,

Markus Hillebrand wrote:
> You wrote "You do not have to call set_element_class_lookup()
> each time as it sticks with the parser." ... - but thats exactly
> what I want to do: I need a tree with objects of different(!)
> classes.

That's perfectly fine, and there are ways to do that.

You can write your own lookup scheme based on XML attributes, namespace/tag,
some general element information or even full-fledged tree traversal.

http://codespeak.net/lxml/dev/element_classes.html

What will /not/ work is: merge elements from different trees into a tree that
has a different lookup scheme and then have them reappear in the new tree with
their original class - *except* if you keep Python references to each object,
which will prevent them from being garbage collected and thus from
re-evaluating the lookup on access. But you have to take care in this case
that tree modifications are reflected in the cache.


> And I want to put additional data in that objects.

You can do that as long as it is reflected in the underlying XML (e.g. through
attributes in a separate namespace). lxml.objectify does this for type
annotations, for example.

You can /not/ do that if you want to keep the state in the Python objects -
again, with the exception of keeping the Python objects alive.


> So for me, it seems that lxml seems not to be designed to manage
> objects of different classes in an XML tree.

It totally is, it just depends on how sophisticated your lookup scheme is.


> With a hack (calling
> setElementClassLookup() before each creation of an element)

You are assuming here that you can keep state in the Element objects, which in
this case means: their Python type.


> I'm
> able to create such tree's and for some testcases it seems to work
> fine - despite of these nasty things I reported. But it's not quite
> satisficing:
> 
>    - it's not free of side-effects, when I change the default
>      setElementClassLookup ...

Which is discouraged anyway, but helpful in some I-know-what-I'm-doing cases
where you are sure you're the only one to play with this.


> Finally my question: is it possible, that lxml supports that feature
> officially? For example by providing an explicit factory call
> like etree.createElement(class)?

No, lxml will not keep state in its Element proxies.

But again, objectify uses something similar: it determines the Python type of
an element value (string, int, ...) and stores it as a namespaced attribute.
When it has to determine the Element class to use for such an element, it uses
that information in the class lookup. When serialising, you can choose to
either keep these attributes in or to "deannotate()" the tree first.

Stefan



More information about the lxml-dev mailing list