[lxml-dev] Context of custom XPath functions

Stefan Behnel stefan_ml at behnel.de
Thu Aug 23 21:59:11 CEST 2007


Hi,


Frederik Elwert wrote:
> Am Donnerstag, den 23.08.2007, 12:42 +0200 schrieb Stefan Behnel:
>> Frederik Elwert wrote:
>>> So how would one implement id() or any similar function with lxml? As
>>> far as I got with custom functions, they only can handle the arguments
>>> they get passed direcly, but they don't know about the broader context.
>>> Any suggestions how I could solve this?
>> One way would be to define the functions local to each XPath call and provide
>> them with the necessary context yourself.
> 
> I'm not sure I really understand how one would do this. But it sounds
> interesting, so could you give an example or further reference? I'm
> interested in a solution that I could get to work with current lxml, if
> possible.

There is an older API in lxml.etree that supports per-call extension
definitions. It's not very well documented, but you can pass a list of

  [ {(ns, name):function} ]

dicts as "extensions" kw arg into the constructor of XPath evaluators. It
should work with any 1.x version of lxml.etree.


>> If you want a more global solution, you may have noticed that functions
>> receive a (currently empty) context object as first parameter. Maybe we should
>> make that a real context object or a dictionary that includes the reference to
>> the current document or its root node?
> 
> I already thought about that, too. This would be great! Maybe a dict is
> a good idea, since one could add further context information in the
> future.
> 
> I'm not sure, if the document itself or the root node would be the best
> choice. Another XForms function, current(), "Returns the context node
> used to initialize the evaluation of the containing XPath expression."
> <http://www.w3.org/TR/xforms11/#fn-current>. So maybe this information
> is most useful in general. One knows which element is the context for
> the xpath-function, and from this, one can get the doc and root by
> getroottree() etc.
> 
> And the context node seems to be introduces quite well in XPath, as I
> just read in the XPath spec <http://www.w3.org/TR/xpath>. The
> introduction defines the context of XPath expressions as:
> 
>       * a node (the context node)
>       * a pair of non-zero positive integers (the context position and
>         the context size)
>       * a set of variable bindings
>       * a function library
>       * the set of namespace declarations in scope for the expression
> 
> So maybe this could be used as a reference for what to pass in a context
> dict. Function library and namespaces are present, anyway, so they man
> not be needed. Is this the case with XSLT-variables? I would guess so.
> So a context dict might contain the context node, and, if useful for
> anyone, context position and size.
> 
> But whatever you find most practical, I really like the idea of context
> information for XPath functions!

Hmmm, having the context node available is obviously desirable, but it's not a
straight forward thing. Internally, lxml.etree does loads of C-ish stuff to
make sure the underlying C-tree stays consistent and allocated as long as
there are Python references to it. When we pass an Element for the current
context node, we may end up passing a node that is not part of the current
document (in XSLT, for example). It might even be a temporary node, which is
not linked to any of the documents that lxml.etree takes care of. So I think
this might get us into a lot of trouble if people start keeping a reference to
that node to work with it outside the context of the current function call.
Even deallocation might just crash in the case of a temporary node.

It already starts with the context itself. Allowing people to keep a reference
to the context to make it live outside of the function call is like pushing
around crash bugs. Ok, most people will not do this, but it happens.

I will really, really have to take a deep look into this before I consider
this a good idea.

Stefan


More information about the lxml-dev mailing list