[lxml-dev] etree.parse hangs with a lot of parallel requests

Stefan Behnel stefan_ml at behnel.de
Wed Apr 9 09:46:17 CEST 2008


>> It seems you only want to parse DTDs locally from disc, so setting
>> "no_network=True" (which is the default in lxml 2.0) should prevent any
>> accidental remote access.
>
> Eventually it turned out that I'm working fine without DTD. So,
> setting no_network = True and load_dtd = False really solved the problem.

Hmm, do you really need to turn off DTD loading or is disabling network
access enough? I wouldn't expect loading the DTD from the disk cache to
take that much time (although, if you can live without it and time is
really critical, then it's obviously better to safe that bit of time
also).


> The parsing time is almost insignificant now.

Shameless plug:

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/


> If you're interested, here it goes:
> http://beta.rambler.ru/srch?query=python+lxml&searchtype=web
> This is a search engine frontend, xml\xslt based, everything is
> carried via lxml.

Nice. That's a meta search engine, right?

Will that site stay online? I.e. does it make sense to set a link from our
"who uses lxml" FAQ entry?

http://codespeak.net/lxml/FAQ.html#who-uses-lxml

Stefan





More information about the lxml-dev mailing list