[lxml-dev] file descriptors leak since lxml 2.0.5 while resolving local DTD
Dmitri Fedoruk
dfedoruk at gmail.com
Fri Jun 20 22:50:52 CEST 2008
Greetings,
I've been using 2.0 for a while and today I've decided to upgrade to the
most recent 2.0.7.
I got a problem, and, by binary search (based on change log) :) I found
it in 2.0.5 first - it is the local file DTD resolver.
This issue originates in
http://article.gmane.org/gmane.comp.python.lxml.devel/3499
Eventually I have to load DTD in some specific cases for parsing. Even
if I load it from local disc and cache it, the parsing time is longer up
to 10 times (40ms instead of 4ms).
So, I came up to the following (ugly) solution:
class LocalDTDResolver(etree.Resolver):
def __init__(self, conf):
self.conf = conf
self.cached = None
def resolve(self, url, id, context):
if not self.cached:
self.cached = self.resolve_filename( self.conf +
'/vxml.dtd' , context )
return self.cached
class LxmlUser(...):
# just the relevant snippets
def __init__(...)
self.xmlParser = etree.XMLParser(no_network=True,
resolve_entities=False, load_dtd=False)
self.resolvingParser = etree.XMLParser(no_network=False,
resolve_entities=False, load_dtd=True)
self.resolvingParser.resolvers.add(LocalDTDResolver(local_path))
def call_parser(self, replies):
for data in replies:
if need_resolve:
parser = self.resolvingParser
else:
parser = self.xmlParser
xmlres = etree.parse( StringIO.StringIO( data ), parser )
Systems are FreeBSD 6.2/7.0,
lxml.etree: (2, 0, 5, 0)
libxml used: (2, 6, 30)
libxml compiled: (2, 6, 30)
libxslt used: (1, 1, 22)
libxslt compiled: (1, 1, 22)
This code is run within mod_python3/apache2.2.8
Up to 2.0.5 I have no problem when the resolvingParser is called. But
since 2.0.5 after I have this:
# no call of resolving parser
[root at machine ~/trunk/fb-ports/py-lxml]$ sysctl kern.openfiles
kern.openfiles: 377
# after a single (!) call of resolving parser
[root at machine ~/trunk/fb-ports/py-lxml]$ sysctl kern.openfiles
kern.openfiles: 11439
And my local DTD file is opened about 11000 times (according to fstat
and find -inode).
Am I doing something wrong in such a way of coding or it is a bug?
Cheers,
Dmitri
More information about the lxml-dev
mailing list