[lxml-dev] Context of custom XPath functions
Stefan Behnel
stefan_ml at behnel.de
Thu Aug 23 21:59:11 CEST 2007
Hi,
Frederik Elwert wrote:
> Am Donnerstag, den 23.08.2007, 12:42 +0200 schrieb Stefan Behnel:
>> Frederik Elwert wrote:
>>> So how would one implement id() or any similar function with lxml? As
>>> far as I got with custom functions, they only can handle the arguments
>>> they get passed direcly, but they don't know about the broader context.
>>> Any suggestions how I could solve this?
>> One way would be to define the functions local to each XPath call and provide
>> them with the necessary context yourself.
>
> I'm not sure I really understand how one would do this. But it sounds
> interesting, so could you give an example or further reference? I'm
> interested in a solution that I could get to work with current lxml, if
> possible.
There is an older API in lxml.etree that supports per-call extension
definitions. It's not very well documented, but you can pass a list of
[ {(ns, name):function} ]
dicts as "extensions" kw arg into the constructor of XPath evaluators. It
should work with any 1.x version of lxml.etree.
>> If you want a more global solution, you may have noticed that functions
>> receive a (currently empty) context object as first parameter. Maybe we should
>> make that a real context object or a dictionary that includes the reference to
>> the current document or its root node?
>
> I already thought about that, too. This would be great! Maybe a dict is
> a good idea, since one could add further context information in the
> future.
>
> I'm not sure, if the document itself or the root node would be the best
> choice. Another XForms function, current(), "Returns the context node
> used to initialize the evaluation of the containing XPath expression."
> <http://www.w3.org/TR/xforms11/#fn-current>. So maybe this information
> is most useful in general. One knows which element is the context for
> the xpath-function, and from this, one can get the doc and root by
> getroottree() etc.
>
> And the context node seems to be introduces quite well in XPath, as I
> just read in the XPath spec <http://www.w3.org/TR/xpath>. The
> introduction defines the context of XPath expressions as:
>
> * a node (the context node)
> * a pair of non-zero positive integers (the context position and
> the context size)
> * a set of variable bindings
> * a function library
> * the set of namespace declarations in scope for the expression
>
> So maybe this could be used as a reference for what to pass in a context
> dict. Function library and namespaces are present, anyway, so they man
> not be needed. Is this the case with XSLT-variables? I would guess so.
> So a context dict might contain the context node, and, if useful for
> anyone, context position and size.
>
> But whatever you find most practical, I really like the idea of context
> information for XPath functions!
Hmmm, having the context node available is obviously desirable, but it's not a
straight forward thing. Internally, lxml.etree does loads of C-ish stuff to
make sure the underlying C-tree stays consistent and allocated as long as
there are Python references to it. When we pass an Element for the current
context node, we may end up passing a node that is not part of the current
document (in XSLT, for example). It might even be a temporary node, which is
not linked to any of the documents that lxml.etree takes care of. So I think
this might get us into a lot of trouble if people start keeping a reference to
that node to work with it outside the context of the current function call.
Even deallocation might just crash in the case of a temporary node.
It already starts with the context itself. Allowing people to keep a reference
to the context to make it live outside of the function call is like pushing
around crash bugs. Ok, most people will not do this, but it happens.
I will really, really have to take a deep look into this before I consider
this a good idea.
Stefan
More information about the lxml-dev
mailing list