[lxml-dev] problems with document(''), possibly thread related - LXML 'BUG'
Brad Clements
bkc at murkworks.com
Thu Aug 14 21:07:15 CEST 2008
Stefan Behnel wrote:
>
> You are deliberately lying to lxml, and still expect it to be so kind to do
> the right thing regardless?
>
Well, I didn't realize I was lying.. :-(
> It sounds to me like the misunderstanding here is largely based on what the
> "base URL" of a document is. It's the URL that defines the origin of the
> document. Assuming that you will get the same document when you re-read its
> URL is not that a stupid idea, IMHO. Otherwise, the XSLT processor would have
> to re-parse a document each time it encounters a document() reference. That
> would really hurt performance.
>
I agree with what you say.
However it's a "surprise" to find that document('') is affected this way.
document('') is "expected" to always mean "the current stylesheet" no
matter what URL you named the stylesheet with.
Could this be improved by having etree.XSLT attach the stylesheet doc to
the returned stylesheet object, or is this too hard and tangled up
inside libxslt?
Is there any documentation on the internal URL caching mechanism? Is the
"cache" shared between parsers? Between threads?
If I use from_string(base_url="xyz") somewhere, then from a different
parser have a stylesheet that does document('xyz'), will my resolver get
called, or the document that was generated from_string be used instead?
How long are documents and their URLs "cached"?
My WSGI code is generating stylesheets "on the fly" based on web
requests, so I need to know more about the implementation details of the
URL/document caching mechanism.
Thanks
--
Brad Clements, bkc at murkworks.com (315)268-1000
http://www.murkworks.com
AOL-IM: BKClements
More information about the lxml-dev
mailing list