[lxml-dev] XML Documents & I18N (the way Cocoon does it)

Alexis Georges velvetcrafter.subscriber at gmail.com
Mon Jun 1 22:36:56 CEST 2009


Hi,

This is a bit late, but thanks for the response.

I am playing around with iterparse() and am following the advice you  
gave.

I have a question though: I could not find a way to consume an element  
and replace it with just text. For example <i18n:text>hello</ 
i18n:text> when found in the middle of a paragraph will be replaced by  
text. The replace() method requires the replacement to be an element.

Is this possible?

Thanks!

Alexis Georges


On 28-Apr-09, at 1:59 PM, Stefan Behnel wrote:

> Hi,
>
> Alexis Georges wrote:
>> I am maintaining a multilingual website which works with XML, XSLT to
>> generate XHTML.
>>
>> I am working with Apache Cocoon (http://cocoon.apache.org/2.1/) using
>> (among other things) their I18NTransformer. Basically I can use  
>> elements
>> in the I18N (http://apache.org/cocoon/i18n/2.1) namespace, and then  
>> tell
>> Cocoon to apply the I18NTransfomer to the document; this replaces the
>> I18N elements with a localized value (eg. a formatted date/number, a
>> translated label/attribute, etc...).
>>
>> I have been looking at lxml a little bit to see if I could move to a
>> Python-based framework for the website. I am not quite sure how to go
>> about the I18N part though.
>>
>> Using the Babel library (http://babel.edgewall.org/) along with  
>> request
>> headers to generate localized data, I have everything I need. What is
>> missing is the "parser" for the I18N elements. All I can think of  
>> right
>> now is to implement a SAX parser, the way Cocoon does (in Java).
>
> There is a SAX-like interface in lxml.etree, called "target parser".
>
> However, if your documents fit into memory, using iterparse() is a lot
> simpler (and likely not even much slower).
>
> Something like this might work:
>
>     context = etree.iterparse(
>              "somefile.xml",
>              tag = "{http://apache.org/cocoon/i18n/2.1}*")
>
>     for event, i18n_element in context:
>         new_element = get_i18n_replacement_for(i18n_element)
>         i18n_element.getparent().replace(i18n_element, new_element)
>
>     context.getroottree().write("newfile.xml")
>
> See here for some documentation:
>
> http://codespeak.net/lxml/parsing.html
>
> You can also achieve the same thing in XSLT, or using XPath, or ...
>
> Stefan



More information about the lxml-dev mailing list