[lxml-dev] lxml is loading dtd from w3.org but I don't want it to

Brad Clements bkc at murkworks.com
Sun Nov 23 02:28:11 CET 2008


I cannot seem to disable loading DTD from w3.org when transforming a file.

I suppose I am doing something wrong, but I can't see what that could be.

I am using lxml inside a wsgi application, and every network request 
causes my wsgi server to go back to w3 to get the dtd (I am not using 
local catalogs).

I am on ubuntu 7.xx  (can't recall, 2 versions old) x86_64, using python 2.5

 >>> sys.version
'2.5.1 (r251:54863, Mar  7 2008, 03:39:23) \n[GCC 4.1.3 20070929 
(prerelease) (Ubuntu 4.1.2-16ubuntu2)]'


 >>> lxml.etree.__version__
u'2.1.3'

 >>> lxml.etree.LIBXSLT_COMPILED_VERSION
(1, 1, 21)
 >>> lxml.etree.LIBXML_COMPILED_VERSION
(2, 6, 30)


my code looks like this (I added all the =False keywords to see if they 
helped, they did not):

            parser = etree.XMLParser(load_dtd=False, 
attribute_defaults=False, dtd_validation=False)
            parser.resolvers.add(Resolver(resolver=xml_src_object.resolve))
                                      
            xml_doc = etree.fromstring(xml_src_object.get_source(), 
parser, base_url=document_uri)

also the xml or stylesheet source might be loaded this way:

            parser = etree.XMLParser(load_dtd=False, 
attribute_defaults=False, dtd_validation=False, no_network=True)
            stylesheet_doc = etree.parse(xslt_src_object, parser)

where xslt_src_object is a string containing the filepath to parse.


I was running an older lxml previously, but it was also downloading 
catalogs. I don't know what version that was, sorry. I upgraded to 2.1.3 
today.

If I do the transform with xsltproc, it goes fast and does not download 
the dtd.

I have been working on this for a couple of hours, so I'm likely to have 
made a mistake on this. However this is code I've been using  for more 
than a year and I doubt very much that it's always been downloading the dtd.

I think I did an apt-get update/upgrade some weeks back, I might have 
gotten a newer libxml2. Perhaps it's not honoring the load_dtd=False ?

any ideas?


-- 
Brad Clements,                bkc at murkworks.com    (315)268-1000
http://www.murkworks.com                          
AOL-IM: BKClements



More information about the lxml-dev mailing list