[lxml-dev] file descriptors leak since lxml 2.0.5 while resolving local DTD

Dmitri Fedoruk dfedoruk at gmail.com
Fri Jun 20 22:50:52 CEST 2008


Greetings,

I've been using 2.0 for a while and today I've decided to upgrade to the 
most recent 2.0.7.

I got a problem, and, by binary search (based on change log) :) I found 
it in 2.0.5 first - it is the local file DTD resolver.

This issue originates in
http://article.gmane.org/gmane.comp.python.lxml.devel/3499

Eventually I have to load DTD in some specific cases for parsing. Even 
if I load it from local disc and cache it, the parsing time is longer up 
to 10 times (40ms instead of 4ms).

So, I came up to the following (ugly) solution:

class LocalDTDResolver(etree.Resolver):
     def __init__(self, conf):
	self.conf = conf
         self.cached = None
     def resolve(self, url, id, context):
         if not self.cached:
             self.cached = self.resolve_filename( self.conf + 
'/vxml.dtd' ,  context )
         return self.cached

class LxmlUser(...):
     # just the relevant snippets
     def __init__(...)
         self.xmlParser = etree.XMLParser(no_network=True, 
resolve_entities=False, load_dtd=False)

         self.resolvingParser = etree.XMLParser(no_network=False, 
resolve_entities=False, load_dtd=True)
         self.resolvingParser.resolvers.add(LocalDTDResolver(local_path))

     def call_parser(self, replies):
	for data in replies:
             if need_resolve:
                 parser = self.resolvingParser
             else:
                 parser = self.xmlParser

             xmlres = etree.parse( StringIO.StringIO( data ), parser )


Systems are FreeBSD 6.2/7.0,
lxml.etree:        (2, 0, 5, 0)
libxml used:       (2, 6, 30)
libxml compiled:   (2, 6, 30)
libxslt used:      (1, 1, 22)
libxslt compiled:  (1, 1, 22)

This code is run within mod_python3/apache2.2.8

Up to 2.0.5 I have no problem when the resolvingParser is called. But 
since 2.0.5 after  I have this:
# no call of resolving parser
[root at machine ~/trunk/fb-ports/py-lxml]$ sysctl kern.openfiles
kern.openfiles: 377
# after a single (!) call of resolving parser
[root at machine ~/trunk/fb-ports/py-lxml]$ sysctl kern.openfiles
kern.openfiles: 11439

And my local DTD file is opened about 11000 times (according to fstat 
and find -inode).

Am I doing something wrong in such a way of coding or it is a bug?

Cheers,
Dmitri



More information about the lxml-dev mailing list