[lxml-dev] Parsing i18n UTF-8 files
Matt Grove
isno97 at yahoo.ca
Wed Jul 23 08:06:44 CEST 2008
Hi,
I'm attempting to parse several UTF-8 encoded xml files in several different
languages - ranging from English to Japanese - but I've run into some
trouble. I first want to parse the xml files, gather certain elements into
a dictionary or some other data structure, and then write them out to other
files. I know how to parse the files when they're in English, but I don't
know how to read the Japanese text (for example) without encountering
encoding exceptions. I've tried understanding the process through the
tutorials but I'm still confused. Would someone like to try steering me on
the correct path?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080723/a656d872/attachment.htm
More information about the lxml-dev
mailing list