[lxml-dev] problems with document(''), possibly thread related - LXML 'BUG'

Brad Clements bkc at murkworks.com
Thu Aug 14 21:07:15 CEST 2008


Stefan Behnel wrote:
>
> You are deliberately lying to lxml, and still expect it to be so kind to do
> the right thing regardless?
>   
Well, I didn't realize I was lying.. :-(

> It sounds to me like the misunderstanding here is largely based on what the
> "base URL" of a document is. It's the URL that defines the origin of the
> document. Assuming that you will get the same document when you re-read its
> URL is not that a stupid idea, IMHO. Otherwise, the XSLT processor would have
> to re-parse a document each time it encounters a document() reference. That
> would really hurt performance.
>   
I agree with what you say.

However it's a "surprise" to find that document('') is affected this way.

document('') is "expected" to always mean "the current stylesheet" no 
matter what  URL you named the stylesheet with. 
Could this be improved by having etree.XSLT attach the stylesheet doc to 
the returned stylesheet object, or is this too hard and tangled up 
inside libxslt?


Is there any documentation on the internal URL caching mechanism? Is the 
"cache" shared between parsers? Between threads?

If I use from_string(base_url="xyz") somewhere, then from a different 
parser have a stylesheet that does document('xyz'), will my resolver get 
called, or the document that was generated from_string be used instead?

How long are documents and their URLs "cached"?

My WSGI code is generating stylesheets "on the fly" based on web 
requests, so I need to know more about the implementation details of the 
URL/document caching mechanism.

Thanks




-- 
Brad Clements,                bkc at murkworks.com    (315)268-1000
http://www.murkworks.com                          
AOL-IM: BKClements



More information about the lxml-dev mailing list