[lxml-dev] Fwd: News flash: Python possibly guilty in excessive DTD traffic

jholg at gmx.de jholg at gmx.de
Tue Feb 12 08:50:16 CET 2008


Hi,


> Secondly, lxml 2.0 does not load referenced network resources by default.
> While it loads documents that you explicitly ask it to download by 
> parsing
> from a URL, you will also have to explicitly tell it to enable network 
> access
> for referenced resources like DTDs, schemas and the like, again, by
> configuring a parser.
> 

 A question on this:

I don't see any problems when network-parsing a schema that includes other 
schemas:

 >>> schema = etree.XMLSchema(root)
>>> print etree.__version__
2.0.0-51192
>>> root = 
objectify.parse("http://adevp02:8080/accountSummary-1.2.xsd").getroot()
>>> schema = etree.XMLSchema(root)
>>>
 My simple http server says this:

 adevp01.ae.hz.lbbw.sko.de - - [12/Feb/2008 08:49:19] "GET 
/accountSummary-1.2.xsd HTTP/1.0" 200 -

adevp01.ae.hz.lbbw.sko.de - - [12/Feb/2008 08:49:28] "GET /iso3currency.xsd 
HTTP/1.0" 200 -
adevp01.ae.hz.lbbw.sko.de - - [12/Feb/2008 08:49:28] "GET 
/iso3currency-1.0.xsd HTTP/1.0" 200 - 

 where the first GET is the parse operation and the 2nd & 3rd GET are the 
"schemafying" of the

parsed doc.

 Now, what I'm curious about is that I did never set no_network to False.

Here's how I initialize lxml: 

 def _register():
    """Register lxml objectify module with pytaf standard settings.
    Needs not be explicitly called when importing xmsg from a pytaf 
installation
    as this is done on first xmsg module import.
    """
    # set a default parser that removes whitespace in mixed-content 
elements
    parser = etree.XMLParser(remove_blank_text=True)

    # enable ns/tag-based lookup that falls back on 
pytype/xsi:type/guess-lookup
    lookup = etree.ElementNamespaceClassLookup(
        objectify.ObjectifyElementClassLookup())
    parser.setElementClassLookup(lookup)
    # set our parser as objectify default parser
    objectify.setDefaultParser(parser)
    # Set our parser as etree default parser, too. Otherwise 
etree.Element()
    # returns etree._Element instead of ObjectifiedElements
    etree.setDefaultParser(parser) 
    
    # enable recursive pretty-printing of ObjectifiedElements
    objectify.enableRecursiveStr()

 ?? 

Holger 


-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080212/1eeaaa81/attachment.htm 


More information about the lxml-dev mailing list