[lxml-dev] network access in lxml API
Stefan Behnel
stefan_ml at behnel.de
Tue Feb 12 14:32:50 CET 2008
Hi Holger,
>> Secondly, lxml 2.0 does not load referenced network resources by
>> default.
>> While it loads documents that you explicitly ask it to download by
>> parsing
>> from a URL, you will also have to explicitly tell it to enable network
>> access
>> for referenced resources like DTDs, schemas and the like, again, by
>> configuring a parser.
>
> A question on this:
>
> I don't see any problems when network-parsing a schema that includes other
> schemas:
>
> >>> schema = etree.XMLSchema(root)
>>>> print etree.__version__
> 2.0.0-51192
>>>> root =
> objectify.parse("http://adevp02:8080/accountSummary-1.2.xsd").getroot()
>>>> schema = etree.XMLSchema(root)
Hmm, ok, I was refering to the parsers. Schema imports will always work -
and I see no reason to disable them, as this would break schema handling.
If you want to use a schema from the network that uses imports (or one
that explicitly imports from a URL), be prepared for network access. Note
that this is still different from *parsing* the schema document. You have
to actually create an XMLSchema() object, which I find a pretty clear
indication that you want to use the schema, thus requiring the imports to
be resolved.
But now that you mention it, I noticed that XSLT allows network access by
default. This means that you can use imports, but also that you can do
"document('http://evilsite.com')" in a stylesheet. I'm not sure into which
category this falls, the schema-handling or the parsers, but it looks more
like something that should not be restricted by default, as it's explicit
in the stylesheet. In the parser case, you'd have to explicitly enable
external loading anyway, so you can enable network access right in the
same line of code.
So, to sum it up, I think it's ok the way it is now, and it's also easy to
use caching, just by not re-instantiating the schema/XSLT object too
often.
Stefan
More information about the lxml-dev
mailing list