[kupu-dev] UnicodeDecodeError with umlauts in image title
Duncan Booth
duncan.booth at suttoncourtenay.org.uk
Fri Jul 4 09:36:02 CEST 2008
Tim Terlegård <tim.terlegard at valentinewebsystems.se> wrote:
> Hi kupuers,
>
> I get an error when I have an image with a title that contains umlauts
> and use that image inside a document with caption enabled.
>
> The error is triggered by the transform in html2captioned.py on these
> lines:
>
> if isinstance(data, str):
> data = data.decode('utf8')
> html = IMAGE_PATTERN.sub(replaceImage, data)
>
> replaceImage returns utf8, so data should also be utf8, otherwise the
> sub()
> method will fail when there are umlauts involved.
>
> Things work if I remove the conversion to unicode on the line above.
> I'm not sure why the conversion to unicode was added some months ago.
> I have changed the tests to use umlauts and removed the conversion to
> unicode. The tests pass. Should I commit this or is there something
I'm
> missing?
>
> /Tim
>
No, don't commit that.
You haven't said which version of Plone you are using. The problem is
that some versions of Plone return unicode here and some return byte
strings so the code has to work in both situations. However it is
important that the regular expression be working on unicode. You should
never do manipulations on utf8 encoded strings as it is possible (albeit
unlikely) that the regex could mess up parts of multi-byte encoded
characters. That's why the decode (and later on an encode) are present.
If on your system replaceImage is returning utf8 then the fix should be
to ensure that it decodes its result before returning. Probably change:
return template(**d)
to:
result = template(**d)
if isinstance(result, str):
result = result.decode('utf8')
return result
More information about the kupu-dev
mailing list