[lxml-dev] XML Documents & I18N (the way Cocoon does it)
Stefan Behnel
stefan_ml at behnel.de
Tue Apr 28 19:59:50 CEST 2009
Hi,
Alexis Georges wrote:
> I am maintaining a multilingual website which works with XML, XSLT to
> generate XHTML.
>
> I am working with Apache Cocoon (http://cocoon.apache.org/2.1/) using
> (among other things) their I18NTransformer. Basically I can use elements
> in the I18N (http://apache.org/cocoon/i18n/2.1) namespace, and then tell
> Cocoon to apply the I18NTransfomer to the document; this replaces the
> I18N elements with a localized value (eg. a formatted date/number, a
> translated label/attribute, etc...).
>
> I have been looking at lxml a little bit to see if I could move to a
> Python-based framework for the website. I am not quite sure how to go
> about the I18N part though.
>
> Using the Babel library (http://babel.edgewall.org/) along with request
> headers to generate localized data, I have everything I need. What is
> missing is the "parser" for the I18N elements. All I can think of right
> now is to implement a SAX parser, the way Cocoon does (in Java).
There is a SAX-like interface in lxml.etree, called "target parser".
However, if your documents fit into memory, using iterparse() is a lot
simpler (and likely not even much slower).
Something like this might work:
context = etree.iterparse(
"somefile.xml",
tag = "{http://apache.org/cocoon/i18n/2.1}*")
for event, i18n_element in context:
new_element = get_i18n_replacement_for(i18n_element)
i18n_element.getparent().replace(i18n_element, new_element)
context.getroottree().write("newfile.xml")
See here for some documentation:
http://codespeak.net/lxml/parsing.html
You can also achieve the same thing in XSLT, or using XPath, or ...
Stefan
More information about the lxml-dev
mailing list