[lxml-dev] etree.parse hangs with a lot of parallel requests

Dmitri Fedoruk dfedoruk at gmail.com
Wed Apr 16 19:50:26 CEST 2008


Hi!

Speaking again about the issue with DTD loading, parsing etc.

>  >> It seems you only want to parse DTDs locally from disc, so setting
>  >> "no_network=True" (which is the default in lxml 2.0) should prevent
>  >> any accidental remote access.
>  >
>  > Eventually it turned out that I'm working fine without DTD. So,
>  > setting no_network = True and load_dtd = False really solved the
>  > problem.
>
> Hmm, do you really need to turn off DTD loading or is disabling network
>  access enough? I wouldn't expect loading the DTD from the disk cache to
>  take that much time (although, if you can live without it and time is
>  really critical, then it's obviously better to safe that bit of time
>  also).

I was wrong - I do need DTD to resolve entities correctly. Sometimes I
got the html   and things like these. My DTD included all the
required entities, but it is referenced by URL. And the only way to
deal with this enity is to load the DTD, isn't it?

Which options do I have except of switching URL to a local path in
SYSTEM definition? Setting up the DTD catalog on every machine that
runs the application? The ideal option would be to tell the parser
"load the given DTD from a given location(i.e. disk) and use it from
now and on for parsing all incoming data", but is it possible?

Cheers,
Dmitri


More information about the lxml-dev mailing list