[lxml-dev] problems with document(''), possibly thread related - LXML 'BUG'

Stefan Behnel stefan_ml at behnel.de
Fri Aug 15 08:52:03 CEST 2008


Hi,

Brad Clements wrote:
> document('') is "expected" to always mean "the current stylesheet" no
> matter what  URL you named the stylesheet with. Could this be improved
> by having etree.XSLT attach the stylesheet doc to the returned
> stylesheet object, or is this too hard and tangled up inside libxslt?

The thing is that when a stylesheet says document(''), libxslt will resolve
that URL relative to the stylesheet URL (i.e. replace it with that URL) and
then ask lxml about that URL. So the only way to see that the stylesheet was
meant is to compare the requested URL to the one of the stylesheet. That is
identical to the case that you say document("the stylesheet url"). lxml
handles this directly without calling a user provided resolver.


> Is there any documentation on the internal URL caching mechanism? Is the
> "cache" shared between parsers? Between threads?

It's local to a single XSLT call. As long as all documents that participate in
your XSL transformation (including the stylesheet itself) have unique URLs,
you will be safe.


> If I use from_string(base_url="xyz") somewhere, then from a different
> parser have a stylesheet that does document('xyz'), will my resolver get
> called, or the document that was generated from_string be used instead?

The only document URLs that will not be requested through your resolver are
the one of the stylesheet and the one of the document that is being
transformed. Everything else will be requested before it is added to the cache.


> My WSGI code is generating stylesheets "on the fly" based on web
> requests, so I need to know more about the implementation details of the
> URL/document caching mechanism.

Giving each of them a unique base URL should work in any case.

Stefan



More information about the lxml-dev mailing list