From ilzogoiby at gmail.com Wed May 11 22:27:06 2011 From: ilzogoiby at gmail.com (Pedro Ferreira) Date: Wed, 11 May 2011 22:27:06 +0200 Subject: [icalendar-dev] Possible problems with utf-8 Message-ID: Hello, I believe I may have found two problems in the way icalendar handles utf-8 strings. First, when a `vText` object is created, the constructor of the parent `unicode` class is called with no encoding parameter, which raises an exception if utf-8 source strings contain non-ASCII characters. The other problem is more complex, and seems to be related with the conversion of `Contentline` object to string, which in some situations seems to cause some bytes from utf-8 characters to be repeated when lines are folded. I haven't studied it in detail, but I found a way to reproduce it: {{{ In [26]: from icalendar.parser import Contentline In [27]: str(Contentline((74 * u'X\u00a0').encode('utf-8'))) Out[27]: 'X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\r\n \xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0\r\n X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\xa0X\xc2\r\n \xc2\xa0' In [28]: str(Contentline((74 * u'X\u00a0').encode('utf-8'))).decode('utf-8') [...] UnicodeDecodeError: 'utf8' codec can't decode byte 0xc2 in position 73: invalid continuation byte }}} As you can see, there is a UTF-8 "half-character" that gets repeated. I attach a patch for both issues. The solution for the second one is very inefficient, though, as it converts the string to a unicode object first (and then back to UTF-8). Thanks, Pedro -------------- next part -------------- A non-text attachment was scrubbed... Name: icalendar.diff Type: text/x-patch Size: 2195 bytes Desc: not available Url : http://codespeak.net/pipermail/icalendar-dev/attachments/20110511/fc0e4a06/attachment.bin From Georger at cozi.com Fri May 27 03:45:28 2011 From: Georger at cozi.com (George Reilly) Date: Fri, 27 May 2011 01:45:28 +0000 Subject: [icalendar-dev] iCalendar patches Message-ID: My colleagues and I forked the iCalendar package some time ago and fixed several issues that we found in iCalendar parsing and generation. You can see what we did at https://github.com/cozi/icalendar/commits/master-future We've been using the iCalendar package to parse iCalendar feeds for a long time. I recently started using the package to generate feeds and I had to work around several shortcomings, such as * hopelessly inadequate VTIMEZONE ? no STANDARD or DAYLIGHT stanzas. I used vobject's implementation. * vobj = vobject.icalendar.TimezoneComponent(tzinfo) * vtz = icalendar.Timezone.from_string(vobj.serialize()) * no easy way to set TZID parameter on vDatetimes to anything other than UTC * not being able to add EXDATE. I forget the details. Something to do with the list construction. I hope to have some additional patches for the package at some point. -- /George V. Reilly ? Cozi ? Software Development Lead ? m: 206 730-8474 GeorgeR at cozi.com ? www.cozi.com ? blogs.cozi.com/tech -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/icalendar-dev/attachments/20110527/14900a4f/attachment.htm