[lxml-dev] TreeBuilder implementation in lxml.etree
Stefan Behnel
stefan_ml at behnel.de
Fri Nov 23 12:31:47 CET 2007
Hi all,
lxml.etree now has an ET compatible TreeBuilder class that is integrated into
the parser framework, i.e. you can create a parser with "target=TreeBuilder()"
and have it build a tree for you just the way ET does. Or, you can create a
TreeBuilder instance and call the event methods yourself to get the same
effect without parsing.
There is one little difference to ET: the start() method has the following
signature:
def start(self, tag, attrs, nsmap=None):
whereas in ET it's
def start(self, tag, attrs):
so lxml.etree accepts an additional "nsmap" argument here. This is required as
lxml's parser would otherwise loose the namespace prefix mappings, so the
generated trees would declare namespaces wherever a tag uses them first in the
hierarchy and not where they were originally declared in the parsed document
(which usually means the root element). Also, this supports prefixed text
values that refer to declared namespaces (see for example the QName class).
If you want to write code that subclasses the TreeBuilder and that should
still work with both ET and lxml.etree, you should use the above signature.
There is a bit of code that tries to figure out if the method can be called
with three arguments, but I'm not sure it works in all cases. It's essentially
this:
import inspect
arguments = inspect.getargspec(target.start)
if len(arguments[0]) > 3: # self + 3 arguments
takes_nsmap = True
elif arguments[1] is not None: # '*args' parameter
takes_nsmap = True
else:
takes_nsmap = False
Hope this is useful,
Stefan
More information about the lxml-dev
mailing list