[lxml-dev] etree.parse hangs with a lot of parallel requests
Dmitri Fedoruk
dfedoruk at gmail.com
Tue Apr 1 16:13:03 CEST 2008
Hi all,
I'm using lxml-2.0_1 now (I have not upgraded since to most recent
versions as I have not noticed any features relevant to me),
libxml2-2.6.30 , libxslt-1.1.22, FreeBSD 6.2 and 7.0 , the application
runs within mod_python / apache 2.2.8 .
My situation is pretty straightforward: fetch xml as plain text via
http, parse it and get etree object, than apply xslt and get resulting
html.
The code is the following:
self.xmlParser = etree.XMLParser(no_network = False, resolve_entities
= False, load_dtd = True )
I use load_dtd=True as sometimes I encounter html entities in my input
data. They are included in my dtd in this way:
<!ENTITY % HTMLlat1 SYSTEM "xhtml-lat1.ent">
%HTMLlat1;
<!ENTITY % HTMLsymbol SYSTEM "xhtml-symbol.ent">
%HTMLsymbol;
<!ENTITY % HTMLspecial SYSTEM "xhtml-special.ent">
%HTMLspecial;
Then eventually it comes up to
...
xmlres = etree.parse( StringIO.StringIO( reply['data'] ), self.xmlParser )
And here I have serious problems. Parsing time is usually up to 100
ms (even this is critical time for me). But sometimes I have 3, 5 and
even 60 seconds (!) of parsing. This situation happens under a heavy
load (~20 simultaneous parsings/transformations per sec).
So, I have several questions:
1) What am I doing wrong?
2) Is there any way to limit the runtime of the etree.parse? Is there
any way to kill a thread maybe? I can not afford to wait even 150 ms,
to say nothing about 1 second and more.
Any help would be appreciated!
Cheers,
Dmitri
More information about the lxml-dev
mailing list