[lxml-dev] etree.parse hangs with a lot of parallel requests
Stefan Behnel
stefan_ml at behnel.de
Sun Apr 6 16:26:59 CEST 2008
Hi,
Dmitri Fedoruk wrote:
> The code is the following:
> self.xmlParser = etree.XMLParser(no_network = False, resolve_entities
> = False, load_dtd = True )
>
> I use load_dtd=True as sometimes I encounter html entities in my input
> data. They are included in my dtd in this way:
> <!ENTITY % HTMLlat1 SYSTEM "xhtml-lat1.ent">
> %HTMLlat1;
>
> <!ENTITY % HTMLsymbol SYSTEM "xhtml-symbol.ent">
> %HTMLsymbol;
>
> <!ENTITY % HTMLspecial SYSTEM "xhtml-special.ent">
> %HTMLspecial;
>
> Then eventually it comes up to
> ...
> xmlres = etree.parse( StringIO.StringIO( reply['data'] ), self.xmlParser )
>
> And here I have serious problems. Parsing time is usually up to 100
> ms (even this is critical time for me). But sometimes I have 3, 5 and
> even 60 seconds (!) of parsing. This situation happens under a heavy
> load (~20 simultaneous parsings/transformations per sec).
>
> So, I have several questions:
> 1) What am I doing wrong?
> 2) Is there any way to limit the runtime of the etree.parse? Is there
> any way to kill a thread maybe? I can not afford to wait even 150 ms,
> to say nothing about 1 second and more.
It seems you only want to parse DTDs locally from disc, so setting
"no_network=True" (which is the default in lxml 2.0) should prevent any
accidental remote access.
Does that help?
Stefan
More information about the lxml-dev
mailing list