[lxml-dev] Parsing i18n UTF-8 files

Matt Grove isno97 at yahoo.ca
Wed Jul 23 08:06:44 CEST 2008


Hi,

 

I'm attempting to parse several UTF-8 encoded xml files in several different
languages - ranging from English to Japanese - but I've run into some
trouble. I first want to parse the xml files,  gather certain elements into
a dictionary or some other data structure, and then write them out to other
files. I know how to parse the files when they're in English, but I don't
know how to read the Japanese text (for example) without encountering
encoding exceptions. I've tried understanding the process through the
tutorials but I'm still confused. Would someone like to try steering me on
the correct path? 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080723/a656d872/attachment.htm 


More information about the lxml-dev mailing list