[lxml-dev] etree.parse hangs with a lot of parallel requests

Dmitri Fedoruk dfedoruk at gmail.com
Tue Apr 1 16:13:03 CEST 2008


Hi all,

I'm using lxml-2.0_1 now  (I have not upgraded since to most recent
versions as I have not noticed any features relevant to me),
libxml2-2.6.30 , libxslt-1.1.22, FreeBSD 6.2 and 7.0 , the application
runs within mod_python / apache 2.2.8 .

My situation is pretty straightforward: fetch xml as plain text via
http, parse it and get etree object, than apply xslt and get resulting
html.

The code is the following:
self.xmlParser = etree.XMLParser(no_network = False, resolve_entities
= False, load_dtd = True )

I use load_dtd=True as sometimes I encounter html entities in my input
data. They are included in my dtd in this way:
<!ENTITY % HTMLlat1 SYSTEM "xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol SYSTEM "xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial SYSTEM "xhtml-special.ent">
%HTMLspecial;

Then eventually it comes up to
...
xmlres = etree.parse( StringIO.StringIO( reply['data'] ), self.xmlParser )

And here I have serious problems.  Parsing time is usually up to 100
ms (even this is critical time for me). But sometimes I have 3, 5 and
even 60 seconds (!) of parsing. This situation happens under a heavy
load (~20 simultaneous parsings/transformations per sec).

So, I have several questions:
1) What am I doing wrong?
2) Is there any way to limit the runtime of the etree.parse? Is there
any way to kill a thread maybe? I can not afford to wait even 150 ms,
to say nothing about 1 second and more.

Any help would be appreciated!
Cheers,
Dmitri


More information about the lxml-dev mailing list