[lxml-dev] file descriptors leak since lxml 2.0.5 while resolving local DTD
Stefan Behnel
stefan_ml at behnel.de
Sat Jun 21 11:56:43 CEST 2008
Hi,
Dmitri Fedoruk wrote:
> I got a problem, and, by binary search (based on change log) :) I found
> it in 2.0.5 first - it is the local file DTD resolver.
I'll take a look.
> This issue originates in
> http://article.gmane.org/gmane.comp.python.lxml.devel/3499
>
> Eventually I have to load DTD in some specific cases for parsing. Even
> if I load it from local disc and cache it, the parsing time is longer up
> to 10 times (40ms instead of 4ms).
>
> So, I came up to the following (ugly) solution:
>
> class LocalDTDResolver(etree.Resolver):
> def __init__(self, conf):
> self.conf = conf
> self.cached = None
> def resolve(self, url, id, context):
> if not self.cached:
> self.cached = self.resolve_filename( self.conf + '/vxml.dtd'
> , context )
> return self.cached
Not that ugly, but not very helpful either. You are caching the filename, not
the content. Check docloader.pxi to see how simple the machinery is here.
There isn't currently a way to return a parsed document from a resolver (and I
don't think libxml2 supports that), so I think the best you can do is to
return the content as a cached string, thus avoiding I/O but not the parse
overhead.
> Systems are FreeBSD 6.2/7.0,
> lxml.etree: (2, 0, 5, 0)
> libxml used: (2, 6, 30)
> libxml compiled: (2, 6, 30)
> libxslt used: (1, 1, 22)
> libxslt compiled: (1, 1, 22)
>
> This code is run within mod_python3/apache2.2.8
Now that you mention it: are you using the single interpreter option in
mod_python or does it work without? I fixed a couple of threading things in
2.0.6, so that should now work without that work-around. But it's still
untested due to lack of feedback.
> Up to 2.0.5 I have no problem when the resolvingParser is called. But
> since 2.0.5 after I have this:
> # no call of resolving parser
> [root at machine ~/trunk/fb-ports/py-lxml]$ sysctl kern.openfiles
> kern.openfiles: 377
> # after a single (!) call of resolving parser
> [root at machine ~/trunk/fb-ports/py-lxml]$ sysctl kern.openfiles
> kern.openfiles: 11439
If you are really using the above code then it means that libxml2 is reading
the DTD internally. Maybe there's something more we have to clean up, or maybe
it's really a leak in libxml2. But the numbers you post here look very
unrealistic to me.
> And my local DTD file is opened about 11000 times (according to fstat
> and find -inode).
If you parse it once, libxml2 should open the DTD file once, and not more.
I'll look into that.
Stefan
More information about the lxml-dev
mailing list