[lxml-dev] Fwd: News flash: Python possibly guilty in excessive DTD traffic
Stefan Behnel
stefan_ml at behnel.de
Sat Feb 9 09:05:56 CET 2008
Hi Sidnei,
Sidnei da Silva wrote:
> http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
> Does any of that apply to lxml?
I don't think so, the article relates to DTD loading through urllib. lxml
leaves that to libxml2's parser.
> I suppose lxml supports dtd catalogs?
Yes, libxml2 has catalog support (although you can compile that out), so it
will normally see network access as a last resort to resolve external entities.
> Does it cache dtds in any way?
There is no internal document caching (except for repeated access to the same
document during a single operation, e.g. in XSLT). If you do not provide
catalogs on your system, that's your own 'decision'. You can still write your
own caching resolver in that case, but I would consider catalogs the best
solution to this problem.
Stefan
> ---------- Forwarded message ----------
> From: Guido van Rossum <guido at python.org>
> Date: 2008/2/9
> Subject: [Web-SIG] Fwd: [Baypiggies] News flash: Python possibly
> guilty in excessive DTD traffic
> To: Web SIG <web-sig at python.org>
>
>
> ---------- Forwarded message ----------
> From: Keith Dart ♂ <keith at dartworks.biz>
> Date: Feb 8, 2008 8:03 PM
> Subject: [Baypiggies] News flash: Python possibly guilty in excessive
> DTD traffic
> To: baypiggies at python.org
>
>
> http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
>
> This is interesting. I've noticed that when you use Python's XML
> package in validating mode it does try to fetch the DTD. Be careful
> when you use that.
More information about the lxml-dev
mailing list