[lxml-dev] Overriding whitespace normalization under XSLT

Stefan Behnel stefan_ml at behnel.de
Sat Mar 17 06:33:10 CET 2007


Hi,

CC, hum? We should start writing up a hall of fame of organisations using lxml. :)


Nathan R. Yergler wrote:
> I'm not sure this is really the right list for this, but we're using
> lxml, and I'm hoping that someone will have the requisite expertise to
> enlighten me.

It looks a bit more like an XSLT question to me.


> Creative Commons has an XSLT transformation we use for generating
> license engine output; it's divided into a few templates, and one
> final template that assembles the pieces.  The problem is with respect
> to the "img" tag in the human readable copy-n-paste output.

You did not say what you are actually generating. XHTML? HTML? I assume it's
unindented XHTML from your problem statement. Would generating indented HTML
solve it?

  <xsl:output method="html" indent="yes" />


> The part that's problematic is this line:
> 
> 		<a rel="license" href="{$license-uri}"><img alt="Creative Commons
> License" style="border-width:0" src="{$license-button}" /></a><br/>
> 
> Note that there is a space between the closing quote of the src
> attribute on the image tag and the "/>" closing bracket.  When we
> process the transform, we consistently end up with
> 
> 		<a rel="license" href="..."><img alt="Creative Commons License"
> style="border-width:0" src="..."/></a><br/>
> 
> (note the space has been removed)

That's perfectly well-formed XHTML. But rumour has it that some browsers can't
handle that. It's just not old-style HTML-ish enough.

If you feel like it, you can also target UTF-8 as encoding in XSLT, then
serialise it to a string and do the replacement by hand ('/>' -> ' />') before
sending the result somewhere else. That's a hands-on approach, but if you want
to generate backwards compatible XHTML, that's one way to get closer.

Stefan


More information about the lxml-dev mailing list