[lxml-dev] XML Documents & I18N (the way Cocoon does it)

Stefan Behnel stefan_ml at behnel.de
Tue Apr 28 19:59:50 CEST 2009


Hi,

Alexis Georges wrote:
> I am maintaining a multilingual website which works with XML, XSLT to
> generate XHTML.
> 
> I am working with Apache Cocoon (http://cocoon.apache.org/2.1/) using
> (among other things) their I18NTransformer. Basically I can use elements
> in the I18N (http://apache.org/cocoon/i18n/2.1) namespace, and then tell
> Cocoon to apply the I18NTransfomer to the document; this replaces the
> I18N elements with a localized value (eg. a formatted date/number, a
> translated label/attribute, etc...).
> 
> I have been looking at lxml a little bit to see if I could move to a
> Python-based framework for the website. I am not quite sure how to go
> about the I18N part though.
> 
> Using the Babel library (http://babel.edgewall.org/) along with request
> headers to generate localized data, I have everything I need. What is
> missing is the "parser" for the I18N elements. All I can think of right
> now is to implement a SAX parser, the way Cocoon does (in Java).

There is a SAX-like interface in lxml.etree, called "target parser".

However, if your documents fit into memory, using iterparse() is a lot
simpler (and likely not even much slower).

Something like this might work:

     context = etree.iterparse(
              "somefile.xml",
              tag = "{http://apache.org/cocoon/i18n/2.1}*")

     for event, i18n_element in context:
         new_element = get_i18n_replacement_for(i18n_element)
         i18n_element.getparent().replace(i18n_element, new_element)

     context.getroottree().write("newfile.xml")

See here for some documentation:

http://codespeak.net/lxml/parsing.html

You can also achieve the same thing in XSLT, or using XPath, or ...

Stefan


More information about the lxml-dev mailing list