[lxml-dev] type of custom objects in XML-tree disappears
Stefan Behnel
stefan_ml at behnel.de
Sun Jan 20 11:30:34 CET 2008
Hi,
I'm responding here as I don't think this is a bug. It should be discussed on
the list.
mh wrote:
> I want to use lxml to build up an XML tree with custom
> element objects.
>
> Background: I want to build up an XML-compatible syntax tree
> and provide additional methods and some non XML relevant
> attributes in the tree.
>
> lxml provides etree.XMLParser.setElementClassLookup(...)
> and etree.XMLParser.makeelement(...) methods to build up
> such a tree.
>
> It works fine to that point, but after some operations on the
> tree, the inserted elements are still there, but some of them
> have changed their type from my custom classes to etree.Element
> and my additional methods and attributes are lost.
>
> I tried repeat that kind of error, but it's not easy to make
> it repeatable, so I wrote a brute force script to provocate that
> kind of error (look at the end of this message). I found this error
> on Mac OS X, Windows and Linux and on lxml 1.3.6 and lxml
> 2beta1 with Python 2.5.
>
> Maybe it's not exactly an explicit use case lxml was built for, but
> maybe it's worth thinking about that one.
>
> ----- SNIP -----
> from lxml import etree
> import random
> import sys
>
> class MyElement(etree.ElementBase):
> TAG="MyElement"
>
> class Generator:
> def __init__(self):
> self.__oFactory=etree.XMLParser()
> self.__oTree=etree.ElementTree(etree.Element("root"), None)
Note that the root element is not taking your element class setup into account
here, so it's using the standard class.
> def CreateElement(self, oClass):
> self.__oFactory.setElementClassLookup(etree.ElementDefaultClassLookup(oClass))
> oNew=self.__oFactory.makeelement(oClass.TAG)
> return oNew
You do not have to call set_element_class_lookup() each time as it sticks with
the parser. I'd rather write __init__ like this:
parser = etree.XMLParser()
parser.set_element_class_lookup(etree.ElementDefaultClassLookup(the_class))
self.__makeelement = parser.makeelement
self.__tree = etree.ElementTree(self.__makeelement("root"))
> def Run(self):
> try:
> for i in range(0,200):
> self.Visit(self.__oTree.getroot())
> except Exception, oError:
> etree.dump(self.__oTree.getroot())
>
> def Visit(self, oElement):
> if oElement.tag!="root":
> if not isinstance(oElement, MyElement):
> raise Exception("Failed")
> nRandom=random.randint(0,2)
> if nRandom==0:
> oNew=self.CreateElement(MyElement)
> oElement.append(oNew)
> elif nRandom==1:
> oNew=self.CreateElement(MyElement)
> oElement.insert(0, oNew)
> for oSub in oElement:
> self.Visit(oSub)
>
> oGen=Generator()
> oGen.Run()
This seems to work just fine for me (which actually surprises me). The problem
is that your root element does not know about the factory you are using for
your other elements, but when Element proxy(!) objects are created for a tree,
they are based on the factory that is associated with the document that holds
the root element. So when element proxy objects were garbage collected and
then are recreated on demand, they will use the standard factory, not yours.
The way to fix your code is to do something like the code I posted above.
Stefan
More information about the lxml-dev
mailing list