[lxml-dev] document('') fixed
cazic at gmx.net
cazic at gmx.net
Fri Apr 21 16:06:21 CEST 2006
Hi,
> --- Ursprüngliche Nachricht ---
> Von: Stefan Behnel <behnel_ml at gkec.informatik.tu-darmstadt.de>
> An: cazic at gmx.net
> Kopie: lxml-dev at codespeak.net
> Betreff: Re: [lxml-dev] document('') fixed
> Datum: Fri, 21 Apr 2006 14:39:53 +0200
[...]
> rather handle the lookup "manually"? That would require copying the
> document
> twice before the XSLT compilation, to use one copy for compilation and to
> store the other one. The doc loader would then return a copy of the second
> copy when the stylesheet URL is requested.
>
> Is that the correct approach? That would really make it a lot of deep
> copying.
> If this is really necessary, would you mind if I called this behaviour a
> bug
> in libxslt?
For whitespace-stripping see:
http://www.w3.org/TR/xslt#strip
or the XSLT 2.0 spec, which clarifies the intended behaviour
much better:
http://www.w3.org/TR/xslt20/#stylesheet-stripping
The elimination of xsl:text elements is a Libxslt-only thingy,
but it's just an internal processing like pre-compilation of
XPath expressions.
I learned that the spec of XSLT 2.0 clarifies the semantics
of the document() function (which, as I was told, was introduced
in an abandoned draft of XSLT 1.1 and never made it into the
recommendation):
"One effect of these rules is that unless XML entities or xml:base are used,
and provided that the base URI of the stylesheet module is known,
document("") refers to the document node of the containing stylesheet module
(the definitive rules are in [RFC3986]). The XML resource containing the
stylesheet module is processed exactly as if it were any other XML document,
for example there is no special recognition of xsl:text elements, and no
special treatment of comments and processing instructions."
(http://www.w3.org/TR/xslt20/#document)
So this mechanism relies on a base URI to be known, which is
not known if the stylesheet-tree is constructed from an in-memory
string.
I haven't read RFC3986, but an interesting question for me
is, whether the *string* containing the XML, could be
be treated as the document and be addressed/acquired via
the document("") function. So if you could tweak lxml to keep
a reference to that string, and feed Libxslt with it when
document("") is called, that would be a nice solution, I think.
Regards,
Kasimier
More information about the lxml-dev
mailing list