From mostawesomedude at gmail.com Fri Sep 3 09:03:54 2010 From: mostawesomedude at gmail.com (Corbin Simpson) Date: Fri, 3 Sep 2010 00:03:54 -0700 Subject: [Cython] [PATCH] Turn std::invalid_argument into ValueError In-Reply-To: References: Message-ID: Hopefully I am doin' it right. invalid_argument seems like the correct analogue to ValueError. GMail shouldn't mangle this, but it wouldn't be the first time. ~ C. --- Compiler/ExprNodes.py.back ?2010-09-02 23:57:02.000000000 -0700 +++ Compiler/ExprNodes.py ? ? ? 2010-09-02 23:46:30.000000000 -0700 @@ -7028,6 +7028,9 @@ ? ? ? ; // let the latest Python exn pass through and ignore the current one ? ? else ? ? ? throw; + ?} catch (const std::invalid_argument& exn) { + ? ?// Catch invalid_argument explicitly and raise a ValueError + ? ?PyErr_SetString(PyExc_ValueError, exn.what()); ? } catch (const std::out_of_range& exn) { ? ? // catch out_of_range explicitly so the proper Python exn may be raised ? ? PyErr_SetString(PyExc_IndexError, exn.what()); -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson From robertwb at math.washington.edu Fri Sep 3 18:05:58 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 3 Sep 2010 09:05:58 -0700 Subject: [Cython] [PATCH] Turn std::invalid_argument into ValueError In-Reply-To: References: Message-ID: On Fri, Sep 3, 2010 at 12:03 AM, Corbin Simpson wrote: > Hopefully I am doin' it right. invalid_argument seems like the correct > analogue to ValueError. Yes. Thanks! > GMail shouldn't mangle this, but it wouldn't be the first time. Could you send it as an attachment? Also, if you use hg to make the patch (type make repo to upgrade cython directory to the full hg repository) it will have metadata such as a changelog entry and username attached, which would be preferable. - Robert > ~ C. > > --- Compiler/ExprNodes.py.back ?2010-09-02 23:57:02.000000000 -0700 > +++ Compiler/ExprNodes.py ? ? ? 2010-09-02 23:46:30.000000000 -0700 > @@ -7028,6 +7028,9 @@ > ? ? ? ; // let the latest Python exn pass through and ignore the current one > ? ? else > ? ? ? throw; > + ?} catch (const std::invalid_argument& exn) { > + ? ?// Catch invalid_argument explicitly and raise a ValueError > + ? ?PyErr_SetString(PyExc_ValueError, exn.what()); > ? } catch (const std::out_of_range& exn) { > ? ? // catch out_of_range explicitly so the proper Python exn may be raised > ? ? PyErr_SetString(PyExc_IndexError, exn.what()); > > > -- > When the facts change, I change my mind. What do you do, sir? ~ Keynes > > Corbin Simpson > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From mostawesomedude at gmail.com Fri Sep 3 22:31:11 2010 From: mostawesomedude at gmail.com (Corbin Simpson) Date: Fri, 3 Sep 2010 13:31:11 -0700 Subject: [Cython] [PATCH] Turn std::invalid_argument into ValueError In-Reply-To: References: Message-ID: On Fri, Sep 3, 2010 at 9:05 AM, Robert Bradshaw wrote: > Could you send it as an attachment? Also, if you use hg to make the > patch (type make repo to upgrade cython directory to the full hg > repository) it will have metadata such as a changelog entry and > username attached, which would be preferable. Whoo, hg. First time using it. It's like git and svn, so I think I did it right. Is there ongoing work on the C++ support? There's still a couple holes, for example wrapping this is still not right and I can't figure out how to mod Cython appropriately. template class SomeClass; Since this is currently my Easy Path (the Hard Path involves modding Panda3D to compile with C++0x) I'm willing to fix Cython bugs to make it happen. ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson -------------- next part -------------- A non-text attachment was scrubbed... Name: patch.patch Type: application/octet-stream Size: 1175 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20100903/d4c468f3/attachment.obj From robertwb at math.washington.edu Sat Sep 4 00:09:16 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 3 Sep 2010 15:09:16 -0700 Subject: [Cython] [PATCH] Turn std::invalid_argument into ValueError In-Reply-To: References: Message-ID: On Fri, Sep 3, 2010 at 1:31 PM, Corbin Simpson wrote: > On Fri, Sep 3, 2010 at 9:05 AM, Robert Bradshaw > wrote: >> Could you send it as an attachment? Also, if you use hg to make the >> patch (type make repo to upgrade cython directory to the full hg >> repository) it will have metadata such as a changelog entry and >> username attached, which would be preferable. > > Whoo, hg. First time using it. It's like git and svn, so I think I did it right. Looks good. Thanks! Pushed. > Is there ongoing work on the C++ support? There's still a couple > holes, for example wrapping this is still not right and I can't figure > out how to mod Cython appropriately. > > template class SomeClass; Int templates are still not supported, but you're not the first to request them. C++ support is in no ways complete, and there's still a lot of low-hanging fruit, but I don't think anyone is actively working on it right now (other than fixing a bug here and there). > Since this is currently my Easy Path (the Hard Path involves modding > Panda3D to compile with C++0x) I'm willing to fix Cython bugs to make > it happen. That would be great. Please feel free to ping the list if you have any questions about how things work. - Robert From stefan_ml at behnel.de Sat Sep 4 17:29:30 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 04 Sep 2010 17:29:30 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> Message-ID: <4C8265DA.4060702@behnel.de> Carl Witty, 25.08.2010 22:21: > On Wed, Aug 25, 2010 at 12:15 PM, Stefan Behnel wrote: >> Lisandro Dalcin, 25.08.2010 20:28: >>> When trying to cythonize my code using the -3 flag, I got many errors >>> like the one below: >>> >>> Error converting Pyrex file to C: >>> ------------------------------------------------------------ >>> ... >>> if not (PetscInitializeCalled): return >>> if (PetscFinalizeCalled): return >>> # deinstall custom error handler >>> ierr = PetscPopErrorHandlerPython() >>> if ierr != 0: >>> fprintf(stderr, "PetscPopErrorHandler() failed " >>> ^ >>> ------------------------------------------------------------ >>> >>> /u/dalcinl/Devel/petsc4py-dev/src/PETSc/PETSc.pyx:307:24: Unicode >>> literals do not support coercion to C types other than Py_UNICODE. >> >> Right, the parser reads the literal as unicode string here before type >> analysis figures out that it's really meant to be a bytes literal. >> >> This will be hard to change as recovering the original bytes literal is >> impossible once it's converted to a unicode string (remember that you can >> use arbitrary character escape sequences in the literal). So I'm leaning >> towards keeping this as an error. After all, Unicode string literals is one >> of the things that a user explicitly requests with the -3 switch. > > How about allowing it for ASCII literals and leaving it an error if > there are any codepoints in the literal outside the 0-127 range? It's not so unlikely that you find C (data) strings that contain (escaped) non-ASCII characters. Those strings would need a 'b' prefix then. So you'd end up with some C strings that work without prefix and others for which you need a 'b', even if both clearly occur in a C char* context. The problem is, unprefixed string literals found in source code compiled by Cython are equally likely to be meant as unicode strings, byte strings, C strings or pymorphic strings these days. There isn't one obvious "do what I mean" way. Remember that Lisandro brought this up because Cython reported an *error* when compiling the code. I find that a lot better than silently accepting something that may not have been meant that way. One thing we could do, however, is to parse all (unprefixed?) strings as both unicode strings *and* byte strings. That would induce a (minor) bit of overhead in the parser (both in terms of memory and speed), but it would allow us to recover the original byte sequence of a Unicode string during type analysis if we find that we need to coerce it to a byte string. In case we need to, we could then even write both types of byte sequences into the string constant table in the C file, so that we can recover the exact byte sequence and the correct Unicode character sequence depending on the CPython runtime. Stefan From stefan_ml at behnel.de Sat Sep 4 21:06:05 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 04 Sep 2010 21:06:05 +0200 Subject: [Cython] C string literals In-Reply-To: <4C8265DA.4060702@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> Message-ID: <4C82989D.4070002@behnel.de> Stefan Behnel, 04.09.2010 17:29: > Carl Witty, 25.08.2010 22:21: >> On Wed, Aug 25, 2010 at 12:15 PM, Stefan Behnel wrote: >>> Lisandro Dalcin, 25.08.2010 20:28: >>>> When trying to cythonize my code using the -3 flag, I got many errors >>>> like the one below: >>>> >>>> Error converting Pyrex file to C: >>>> ------------------------------------------------------------ >>>> ... >>>> if not (PetscInitializeCalled): return >>>> if (PetscFinalizeCalled): return >>>> # deinstall custom error handler >>>> ierr = PetscPopErrorHandlerPython() >>>> if ierr != 0: >>>> fprintf(stderr, "PetscPopErrorHandler() failed " >>>> ^ >>>> ------------------------------------------------------------ >>>> >>>> /u/dalcinl/Devel/petsc4py-dev/src/PETSc/PETSc.pyx:307:24: Unicode >>>> literals do not support coercion to C types other than Py_UNICODE. >>> >>> Right, the parser reads the literal as unicode string here before type >>> analysis figures out that it's really meant to be a bytes literal. >>> >>> This will be hard to change as recovering the original bytes literal is >>> impossible once it's converted to a unicode string (remember that you can >>> use arbitrary character escape sequences in the literal). So I'm leaning >>> towards keeping this as an error. After all, Unicode string literals is one >>> of the things that a user explicitly requests with the -3 switch. >> >> How about allowing it for ASCII literals and leaving it an error if >> there are any codepoints in the literal outside the 0-127 range? > > It's not so unlikely that you find C (data) strings that contain (escaped) > non-ASCII characters. Those strings would need a 'b' prefix then. So you'd > end up with some C strings that work without prefix and others for which > you need a 'b', even if both clearly occur in a C char* context. > > The problem is, unprefixed string literals found in source code compiled by > Cython are equally likely to be meant as unicode strings, byte strings, C > strings or pymorphic strings these days. There isn't one obvious "do what I > mean" way. Remember that Lisandro brought this up because Cython reported > an *error* when compiling the code. I find that a lot better than silently > accepting something that may not have been meant that way. > > One thing we could do, however, is to parse all (unprefixed?) strings as > both unicode strings *and* byte strings. That would induce a (minor) bit of > overhead in the parser (both in terms of memory and speed), but it would > allow us to recover the original byte sequence of a Unicode string during > type analysis if we find that we need to coerce it to a byte string. http://trac.cython.org/cython_trac/ticket/575 http://hg.cython.org/cython-devel/rev/a0f2c20789e3 > In case we need to, we could then even write both types of byte sequences > into the string constant table in the C file, so that we can recover the > exact byte sequence and the correct Unicode character sequence depending on > the CPython runtime. Still open. The only obvious use case for this is when using unicode escapes in 'str' literals, e.g. "abc\u0987". Here, the correct way to read the literal as a byte string is as a 9 character string that reproduces the escape sequence, whereas the correct unicode string would be a 4 character literal that has the escape sequence resolved. The only way to do this is by spelling out both literals in the C code. Stefan From stefan_ml at behnel.de Sat Sep 4 21:30:35 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 04 Sep 2010 21:30:35 +0200 Subject: [Cython] other issue with cython -3 In-Reply-To: <4C75787E.5070601@behnel.de> References: <4C756C22.6070308@behnel.de> <4C75787E.5070601@behnel.de> Message-ID: <4C829E5B.2050000@behnel.de> Stefan Behnel, 25.08.2010 22:09: > Lisandro Dalcin, 25.08.2010 21:57: >> On 25 August 2010 16:16, Stefan Behnel wrote: >>> Lisandro Dalcin, 25.08.2010 21:00: >>>> $ cython -3 tmp.pyx >>>> >>>> Error converting Pyrex file to C: >>>> ------------------------------------------------------------ >>>> ... >>>> cdef str a = "abc" >>>> ^ >>>> ------------------------------------------------------------ >>>> >>>> /u/dalcinl/tmp/tmp.pyx:1:13: Cannot convert Unicode string to 'str' >>>> implicitly. This is not portable and requires explicit encoding. >>> >>> Same thing I said before: if you request unicode string literals, you get >>> unicode literals. >> >> But in Python 3, the the Python-level 'str' type actually is an unicode string! > > Ah, right, I missed that. Yes, I think it makes sense to statically > associate 'str' with 'unicode' on -3. Hmm, that's tricky, though. We can't change the Builtin scope itself as that's a global thing, so we somehow need to intercept lookups of 'str' before hitting the builtin scope and replace it by 'unicode', so that both the 'cdef str' declaration work as well as Python references to the builtin type. Ugly... Stefan From stefan_ml at behnel.de Sat Sep 4 21:43:21 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 04 Sep 2010 21:43:21 +0200 Subject: [Cython] other issue with cython -3 In-Reply-To: <4C829E5B.2050000@behnel.de> References: <4C756C22.6070308@behnel.de> <4C75787E.5070601@behnel.de> <4C829E5B.2050000@behnel.de> Message-ID: <4C82A159.4060007@behnel.de> Stefan Behnel, 04.09.2010 21:30: > Stefan Behnel, 25.08.2010 22:09: >> Lisandro Dalcin, 25.08.2010 21:57: >>> On 25 August 2010 16:16, Stefan Behnel wrote: >>>> Lisandro Dalcin, 25.08.2010 21:00: >>>>> $ cython -3 tmp.pyx >>>>> >>>>> Error converting Pyrex file to C: >>>>> ------------------------------------------------------------ >>>>> ... >>>>> cdef str a = "abc" >>>>> ^ >>>>> ------------------------------------------------------------ >>>>> >>>>> /u/dalcinl/tmp/tmp.pyx:1:13: Cannot convert Unicode string to 'str' >>>>> implicitly. This is not portable and requires explicit encoding. >>>> >>>> Same thing I said before: if you request unicode string literals, you get >>>> unicode literals. >>> >>> But in Python 3, the the Python-level 'str' type actually is an unicode string! >> >> Ah, right, I missed that. Yes, I think it makes sense to statically >> associate 'str' with 'unicode' on -3. > > Hmm, that's tricky, though. We can't change the Builtin scope itself as > that's a global thing, so we somehow need to intercept lookups of 'str' > before hitting the builtin scope and replace it by 'unicode', so that both > the 'cdef str' declaration work as well as Python references to the builtin > type. Ugly... http://trac.cython.org/cython_trac/ticket/576 Stefan From robertwb at math.washington.edu Sat Sep 4 22:04:41 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 4 Sep 2010 13:04:41 -0700 Subject: [Cython] C string literals In-Reply-To: <4C8265DA.4060702@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> Message-ID: On Sat, Sep 4, 2010 at 8:29 AM, Stefan Behnel wrote: > Carl Witty, 25.08.2010 22:21: >> On Wed, Aug 25, 2010 at 12:15 PM, Stefan Behnel wrote: >>> Lisandro Dalcin, 25.08.2010 20:28: >>>> When trying to cythonize my code using the -3 flag, I got many errors >>>> like the one below: >>>> >>>> Error converting Pyrex file to C: >>>> ------------------------------------------------------------ >>>> ... >>>> ? ? ? if not (PetscInitializeCalled): return >>>> ? ? ? if (PetscFinalizeCalled): return >>>> ? ? ? # deinstall custom error handler >>>> ? ? ? ierr = PetscPopErrorHandlerPython() >>>> ? ? ? if ierr != 0: >>>> ? ? ? ? ? fprintf(stderr, "PetscPopErrorHandler() failed " >>>> ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>>> ------------------------------------------------------------ >>>> >>>> /u/dalcinl/Devel/petsc4py-dev/src/PETSc/PETSc.pyx:307:24: Unicode >>>> literals do not support coercion to C types other than Py_UNICODE. >>> >>> Right, the parser reads the literal as unicode string here before type >>> analysis figures out that it's really meant to be a bytes literal. >>> >>> This will be hard to change as recovering the original bytes literal is >>> impossible once it's converted to a unicode string (remember that you can >>> use arbitrary character escape sequences in the literal). So I'm leaning >>> towards keeping this as an error. After all, Unicode string literals is one >>> of the things that a user explicitly requests with the -3 switch. >> >> How about allowing it for ASCII literals and leaving it an error if >> there are any codepoints in the literal outside the 0-127 range? > > It's not so unlikely that you find C (data) strings that contain (escaped) > non-ASCII characters. Those strings would need a 'b' prefix then. So you'd > end up with some C strings that work without prefix and others for which > you need a 'b', even if both clearly occur in a C char* context. In my experience, non-ASCII literals are even more un-common than non-ASCII user data, but it would be really nice at least to handle the ASCII case smoothly. > The problem is, unprefixed string literals found in source code compiled by > Cython are equally likely to be meant as unicode strings, byte strings, C > strings or pymorphic strings these days. There isn't one obvious "do what I > mean" way. Remember that Lisandro brought this up because Cython reported > an *error* when compiling the code. I find that a lot better than silently > accepting something that may not have been meant that way. > > One thing we could do, however, is to parse all (unprefixed?) strings as > both unicode strings *and* byte strings. That would induce a (minor) bit of > overhead in the parser (both in terms of memory and speed), but it would > allow us to recover the original byte sequence of a Unicode string during > type analysis if we find that we need to coerce it to a byte string. > > In case we need to, we could then even write both types of byte sequences > into the string constant table in the C file, so that we can recover the > exact byte sequence and the correct Unicode character sequence depending on > the CPython runtime. How about we parse the literals as unicode strings, and if used in a bytes context we raise a compile time error if any characters are larger than a char? Thus "\u0001" would still be OK in a bytes context, but "\u1000" would not be (compile time error). It may even be better to set the limit to 127, as that is the truly unambiguous range, and require a prefix if you really want something more. - Robert From stefan_ml at behnel.de Sun Sep 5 06:24:24 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 05 Sep 2010 06:24:24 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> Message-ID: <4C831B78.1060405@behnel.de> Robert Bradshaw, 04.09.2010 22:04: > How about we parse the literals as unicode strings, and if used in a > bytes context we raise a compile time error if any characters are > larger than a char? Can't work because you cannot recover the original byte sequence from a decoded Unicode string. It may have used escapes or not, and it may or may not be encodable using the source code encoding. Stefan From robertwb at math.washington.edu Sun Sep 5 07:06:08 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 4 Sep 2010 22:06:08 -0700 Subject: [Cython] C string literals In-Reply-To: <4C831B78.1060405@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> Message-ID: On Sat, Sep 4, 2010 at 9:24 PM, Stefan Behnel wrote: > Robert Bradshaw, 04.09.2010 22:04: >> How about we parse the literals as unicode strings, and if used in a >> bytes context we raise a compile time error if any characters are >> larger than a char? > > Can't work because you cannot recover the original byte sequence from a > decoded Unicode string. It may have used escapes or not, and it may or may > not be encodable using the source code encoding. I'm saying we shouldn't care about using escapes, and should raise a compile time error if it's not encodable using the source encoding. In other words, I'm not a fan of foo("abc \u0001") behaving (in my opinion) very differently depending on whether foo takes a char* or object argument. I'd rather have it be decode as a unicode string when reading the source, then if need be re-encoded as bytes if possible. Probably the simplest thing to do here is only allow ASCII in such string literals--this will handle both the common case and I don't think it's a stretch for the user to have to be explicit about b"..." vs u"..." for literals with high-value code points. - Robert From stefan_ml at behnel.de Sun Sep 5 07:59:10 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 05 Sep 2010 07:59:10 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> Message-ID: <4C8331AE.80003@behnel.de> Robert Bradshaw, 05.09.2010 07:06: > On Sat, Sep 4, 2010 at 9:24 PM, Stefan Behnel wrote: >> Robert Bradshaw, 04.09.2010 22:04: >>> How about we parse the literals as unicode strings, and if used in a >>> bytes context we raise a compile time error if any characters are >>> larger than a char? >> >> Can't work because you cannot recover the original byte sequence from a >> decoded Unicode string. It may have used escapes or not, and it may or may >> not be encodable using the source code encoding. > > I'm saying we shouldn't care about using escapes, and should raise a > compile time error if it's not encodable using the source encoding. In that case, you'd break most code that actually uses escapes. If the byte values were correctly representable using the source encoding the escapes wouldn't be necessary in the first place. > In other words, I'm not a fan of > > foo("abc \u0001") > > behaving (in my opinion) very differently depending on whether foo > takes a char* or object argument. It's Python compatible, though: Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 'abc \u0001' 'abc \\u0001' >>> len('abc \u0001') 10 >>> u'abc \u0001' u'abc \x01' >>> len(u'abc \u0001') 5 Same for Python 3 with the 'b' prefix on the byte string examples. The fix I committed mimics this behaviour. Stefan From robertwb at math.washington.edu Mon Sep 6 18:24:28 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 6 Sep 2010 09:24:28 -0700 Subject: [Cython] C string literals In-Reply-To: <4C8331AE.80003@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> Message-ID: On Sat, Sep 4, 2010 at 10:59 PM, Stefan Behnel wrote: > Robert Bradshaw, 05.09.2010 07:06: >> On Sat, Sep 4, 2010 at 9:24 PM, Stefan Behnel wrote: >>> Robert Bradshaw, 04.09.2010 22:04: >>>> How about we parse the literals as unicode strings, and if used in a >>>> bytes context we raise a compile time error if any characters are >>>> larger than a char? >>> >>> Can't work because you cannot recover the original byte sequence from a >>> decoded Unicode string. It may have used escapes or not, and it may or may >>> not be encodable using the source code encoding. >> >> I'm saying we shouldn't care about using escapes, and should raise a >> compile time error if it's not encodable using the source encoding. > > In that case, you'd break most code that actually uses escapes. If the byte > values were correctly representable using the source encoding the escapes > wouldn't be necessary in the first place. The most common escape is probably \n, followed by \0, \r, \t... As for \uXXXX, that is just a superset of \xXX that only works for unicode literals. >> In other words, I'm not a fan of >> >> ? ? ?foo("abc \u0001") >> >> behaving (in my opinion) very differently depending on whether foo >> takes a char* or object argument. > > It's Python compatible, though: No, it's not. Python doesn't have the concept of "used in a C context." > ? ? Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) > ? ? [GCC 4.4.3] on linux2 > ? ? Type "help", "copyright", "credits" or "license" for more information. > ? ? >>> 'abc \u0001' > ? ? 'abc \\u0001' > ? ? >>> len('abc \u0001') > ? ? 10 > ? ? >>> u'abc \u0001' > ? ? u'abc \x01' > ? ? >>> len(u'abc \u0001') > ? ? 5 > > Same for Python 3 with the 'b' prefix on the byte string examples. When I see b"abc \u0001" or u"abc \u0001" I know exactly what it means. When I see "abc \u0001" I have to know whether unicode literals are enabled to know what it means, but now you've changed it so that's not enough anymore--I have to determine whether it's being used in a char* or object context, which I think is something we want to minimize. I'm with Lisandro and Carl WItty--how about just letting the parser parse them as unicode literals and then only accepting conversion back to char* for plain ASCII rather than introducing more complicated logic and semantics? - Robert From dagss at student.matnat.uio.no Mon Sep 6 18:36:53 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 06 Sep 2010 18:36:53 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> Message-ID: <4C8518A5.5060100@student.matnat.uio.no> Robert Bradshaw wrote: > On Sat, Sep 4, 2010 at 10:59 PM, Stefan Behnel wrote: > >> Robert Bradshaw, 05.09.2010 07:06: >> >>> On Sat, Sep 4, 2010 at 9:24 PM, Stefan Behnel wrote: >>> >>>> Robert Bradshaw, 04.09.2010 22:04: >>>> >>>>> How about we parse the literals as unicode strings, and if used in a >>>>> bytes context we raise a compile time error if any characters are >>>>> larger than a char? >>>>> >>>> Can't work because you cannot recover the original byte sequence from a >>>> decoded Unicode string. It may have used escapes or not, and it may or may >>>> not be encodable using the source code encoding. >>>> >>> I'm saying we shouldn't care about using escapes, and should raise a >>> compile time error if it's not encodable using the source encoding. >>> >> In that case, you'd break most code that actually uses escapes. If the byte >> values were correctly representable using the source encoding the escapes >> wouldn't be necessary in the first place. >> > > The most common escape is probably \n, followed by \0, \r, \t... As > for \uXXXX, that is just a superset of \xXX that only works for > unicode literals. > > >>> In other words, I'm not a fan of >>> >>> foo("abc \u0001") >>> >>> behaving (in my opinion) very differently depending on whether foo >>> takes a char* or object argument. >>> >> It's Python compatible, though: >> > > No, it's not. Python doesn't have the concept of "used in a C context." > > >> Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) >> [GCC 4.4.3] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> 'abc \u0001' >> 'abc \\u0001' >> >>> len('abc \u0001') >> 10 >> >>> u'abc \u0001' >> u'abc \x01' >> >>> len(u'abc \u0001') >> 5 >> >> Same for Python 3 with the 'b' prefix on the byte string examples. >> > > When I see b"abc \u0001" or u"abc \u0001" I know exactly what it > means. When I see "abc \u0001" I have to know whether unicode literals > are enabled to know what it means, but now you've changed it so that's > not enough anymore--I have to determine whether it's being used in a > char* or object context, which I think is something we want to > minimize. > > I'm with Lisandro and Carl WItty--how about just letting the parser > parse them as unicode literals and then only accepting conversion back > to char* for plain ASCII rather than introducing more complicated > logic and semantics? > I don't understand this suggestion. What happens in each of these cases, for different settings of "from __future__ import unicode_literals"? cdef char* x1 = 'abc\u0001' cdef char* x2 = 'abc\x01' Dag Sverre From robertwb at math.washington.edu Mon Sep 6 19:01:48 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 6 Sep 2010 10:01:48 -0700 Subject: [Cython] C string literals In-Reply-To: <4C8518A5.5060100@student.matnat.uio.no> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> Message-ID: On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn wrote: > Robert Bradshaw wrote: >> On Sat, Sep 4, 2010 at 10:59 PM, Stefan Behnel wrote: >> >>> Robert Bradshaw, 05.09.2010 07:06: >>> >>>> On Sat, Sep 4, 2010 at 9:24 PM, Stefan Behnel wrote: >>>> >>>>> Robert Bradshaw, 04.09.2010 22:04: >>>>> >>>>>> How about we parse the literals as unicode strings, and if used in a >>>>>> bytes context we raise a compile time error if any characters are >>>>>> larger than a char? >>>>>> >>>>> Can't work because you cannot recover the original byte sequence from a >>>>> decoded Unicode string. It may have used escapes or not, and it may or may >>>>> not be encodable using the source code encoding. >>>>> >>>> I'm saying we shouldn't care about using escapes, and should raise a >>>> compile time error if it's not encodable using the source encoding. >>>> >>> In that case, you'd break most code that actually uses escapes. If the byte >>> values were correctly representable using the source encoding the escapes >>> wouldn't be necessary in the first place. >>> >> >> The most common escape is probably \n, followed by \0, \r, \t... As >> for \uXXXX, that is just a superset of \xXX that only works for >> unicode literals. >> >> >>>> In other words, I'm not a fan of >>>> >>>> ? ? ?foo("abc \u0001") >>>> >>>> behaving (in my opinion) very differently depending on whether foo >>>> takes a char* or object argument. >>>> >>> It's Python compatible, though: >>> >> >> No, it's not. Python doesn't have the concept of "used in a C context." >> >> >>> ? ? Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) >>> ? ? [GCC 4.4.3] on linux2 >>> ? ? Type "help", "copyright", "credits" or "license" for more information. >>> ? ? >>> 'abc \u0001' >>> ? ? 'abc \\u0001' >>> ? ? >>> len('abc \u0001') >>> ? ? 10 >>> ? ? >>> u'abc \u0001' >>> ? ? u'abc \x01' >>> ? ? >>> len(u'abc \u0001') >>> ? ? 5 >>> >>> Same for Python 3 with the 'b' prefix on the byte string examples. >>> >> >> When I see b"abc \u0001" or u"abc \u0001" I know exactly what it >> means. When I see "abc \u0001" I have to know whether unicode literals >> are enabled to know what it means, but now you've changed it so that's >> not enough anymore--I have to determine whether it's being used in a >> char* or object context, which I think is something we want to >> minimize. >> >> I'm with Lisandro and Carl WItty--how about just letting the parser >> parse them as unicode literals and then only accepting conversion back >> to char* for plain ASCII rather than introducing more complicated >> logic and semantics? >> > I don't understand this suggestion. What happens in each of these cases, > for different settings of "from __future__ import unicode_literals"? > > cdef char* x1 = 'abc\u0001' > cdef char* x2 = 'abc\x01' from __future__ import unicode_literals (or -3) len(x1) == 4 len(x2) == 4 Otherwise len(x1) == 9 len(x2) == 4 - Robert From dalcinl at gmail.com Mon Sep 6 19:55:45 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 6 Sep 2010 14:55:45 -0300 Subject: [Cython] 'with gil' and non-threaded Python builds Message-ID: Today, I've built Python-2.6.5 form with ./configure --without-threads. Then the PyGILState_{Ensure|Release} calls are missing!!! Could any of you please confirm this?? Any Cython code using 'with gil' is then broken. I implemented this fix in a helper C header file: #if !defined(WITH_THREAD) #undef PyGILState_Ensure #define PyGILState_Ensure() ((PyGILState_STATE)0) #undef PyGILState_Release #define PyGILState_Release(state) (state)=((PyGILState_STATE)0) #undef Py_BLOCK_THREADS #define Py_BLOCK_THREADS (_save)=(PyThreadState*)0; #undef Py_UNBLOCK_THREADS #define Py_UNBLOCK_THREADS (_save)=(PyThreadState*)0; #endif Should a hack like that be added to every Cython-generated C source? -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stefan_ml at behnel.de Mon Sep 6 20:02:31 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 06 Sep 2010 20:02:31 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> Message-ID: <4C852CB7.1040506@behnel.de> Robert Bradshaw, 06.09.2010 18:24: > On Sat, Sep 4, 2010 at 10:59 PM, Stefan Behnel wrote: >> Robert Bradshaw, 05.09.2010 07:06: >>> On Sat, Sep 4, 2010 at 9:24 PM, Stefan Behnel wrote: >>>> Robert Bradshaw, 04.09.2010 22:04: >>>>> How about we parse the literals as unicode strings, and if used in a >>>>> bytes context we raise a compile time error if any characters are >>>>> larger than a char? >>>> >>>> Can't work because you cannot recover the original byte sequence from a >>>> decoded Unicode string. It may have used escapes or not, and it may or may >>>> not be encodable using the source code encoding. >>> >>> I'm saying we shouldn't care about using escapes, and should raise a >>> compile time error if it's not encodable using the source encoding. >> >> In that case, you'd break most code that actually uses escapes. If the byte >> values were correctly representable using the source encoding the escapes >> wouldn't be necessary in the first place. > > The most common escape is probably \n, followed by \0, \r, \t... As > for \uXXXX, that is just a superset of \xXX that only works for > unicode literals. Sure, and '\u...' is the only escape sequence that really makes a difference here. >>> In other words, I'm not a fan of >>> >>> foo("abc \u0001") >>> >>> behaving (in my opinion) very differently depending on whether foo >>> takes a char* or object argument. >> >> It's Python compatible, though: > > No, it's not. Python doesn't have the concept of "used in a C context." I meant the context of byte strings. Cython has always allowed C char* strings to be used without prefix, and I would expect that most people have used that in their code. I also don't see a problem with that. > When I see b"abc \u0001" or u"abc \u0001" I know exactly what it > means. When I see "abc \u0001" I have to know whether unicode literals > are enabled to know what it means, but now you've changed it so that's > not enough anymore C char* strings have always behaved like plain byte strings, and that's the right way to handle them. The only problem is that importing Future.unicode_literals breaks those literals. My change fixed that. Besides, I really don't think that people will use Unicode escapes when writing char* literals when the normal byte escapes are so much shorter and more readable. > I'm with Lisandro and Carl WItty--how about just letting the parser > parse them as unicode literals and then only accepting conversion back > to char* for plain ASCII rather than introducing more complicated > logic and semantics? As I said, that breaks non-ASCII strings. I don't see why we should make an exception only for ASCII when we can make it work in general. If you want, we can disallow (or warn about) Unicode escapes in those strings. I could live with that and it's easy to implement. You can still write their byte sequence down by escaping the leading '\u' as '\\u' or by prepending a 'b' to the string, so nothing is lost, and users are prevented from falling into the trap of believing that their explicitly escaped Unicode string will be passed as such into a char* accepting function (however unlikely it is that someone might get that idea...). Plus, when you use non-ASCII characters from the source code charset in such a string, you will get exactly the byte sequence of the source code. I think that's expected, too. I really cannot see anything being wrong with my fix. Stefan From stefan_ml at behnel.de Mon Sep 6 20:20:10 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 06 Sep 2010 20:20:10 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> Message-ID: <4C8530DA.9030808@behnel.de> Robert Bradshaw, 06.09.2010 19:01: > On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn >> I don't understand this suggestion. What happens in each of these cases, >> for different settings of "from __future__ import unicode_literals"? >> >> cdef char* x1 = 'abc\u0001' As I said in my other mail, I don't think anyone would use the above in real code. The alternative below is just too obvious and simple. >> cdef char* x2 = 'abc\x01' > > from __future__ import unicode_literals (or -3) > > len(x1) == 4 > len(x2) == 4 > > Otherwise > > len(x1) == 9 > len(x2) == 4 Hmm, now *that* looks unexpected to me. The way I see it, a C string is the C equivalent of a Python byte string and should always and predictably behave like a Python byte string, regardless of the way Python object literals are handled. Stefan From dagss at student.matnat.uio.no Mon Sep 6 20:30:06 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 06 Sep 2010 20:30:06 +0200 Subject: [Cython] C string literals In-Reply-To: <4C8530DA.9030808@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> Message-ID: <4C85332E.2030802@student.matnat.uio.no> Stefan Behnel wrote: > Robert Bradshaw, 06.09.2010 19:01: > >> On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn >> >>> I don't understand this suggestion. What happens in each of these cases, >>> for different settings of "from __future__ import unicode_literals"? >>> >>> cdef char* x1 = 'abc\u0001' >>> > > As I said in my other mail, I don't think anyone would use the above in > real code. The alternative below is just too obvious and simple. > > > >>> cdef char* x2 = 'abc\x01' >>> >> from __future__ import unicode_literals (or -3) >> >> len(x1) == 4 >> len(x2) == 4 >> >> Otherwise >> >> len(x1) == 9 >> len(x2) == 4 >> > > Hmm, now *that* looks unexpected to me. The way I see it, a C string is the > C equivalent of a Python byte string and should always and predictably > behave like a Python byte string, regardless of the way Python object > literals are handled. > While the "cdef char*" case isn't that horrible, f('abc\x01') is. Imagine throwing in a type in the signature of f and then get different data in. I really, really don't like having the value of a literal depend on type of the variable it gets assigned to (I know, I know about ints and so on, but let's try to keep the number of instances down). My vote is for identifying a set of completely safe strings (no \x or \u, ASCII-only) that is the same regardless of any setting, and allow that. Anything else, demand a b'' prefix to assign to a char*. Putting in a b'' isn't THAT hard. Dag Sverre From stefan_ml at behnel.de Mon Sep 6 20:56:38 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 06 Sep 2010 20:56:38 +0200 Subject: [Cython] C string literals In-Reply-To: <4C85332E.2030802@student.matnat.uio.no> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> Message-ID: <4C853966.1020304@behnel.de> Dag Sverre Seljebotn, 06.09.2010 20:30: > Stefan Behnel wrote: >> Robert Bradshaw, 06.09.2010 19:01: >> >>> On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn >>> >>>> I don't understand this suggestion. What happens in each of these cases, >>>> for different settings of "from __future__ import unicode_literals"? >>>> >>>> cdef char* x1 = 'abc\u0001' >>>> >> >> As I said in my other mail, I don't think anyone would use the above in >> real code. The alternative below is just too obvious and simple. >> >> >> >>>> cdef char* x2 = 'abc\x01' >>>> >>> from __future__ import unicode_literals (or -3) >>> >>> len(x1) == 4 >>> len(x2) == 4 >>> >>> Otherwise >>> >>> len(x1) == 9 >>> len(x2) == 4 >>> >> >> Hmm, now *that* looks unexpected to me. The way I see it, a C string is the >> C equivalent of a Python byte string and should always and predictably >> behave like a Python byte string, regardless of the way Python object >> literals are handled. >> > While the "cdef char*" case isn't that horrible, > > f('abc\x01') > > is. Imagine throwing in a type in the signature of f and then get > different data in. This case is unambiguous. But the following would change. # using default source code encoding UTF-8 cdef char* cstring = 'abc???' charfunc('abc???') pyfunc('abc???') Here, 'cstring' is assigned a 9 byte long C string which is also passed into charfunc(). When unicode_literals are enabled, pyfunc() would receive u'abc???', otherwise otherwise it would receive the same 9 bytes long byte string. # encoding: ISO-8859-1 cdef char* cstring = 'abc???' charfunc('abc???') pyfunc('abc???') assigns a 6 byte long C string, same for the charfunc() call. With unicode_literals, pyfunc() would receive u'abc???', otherwise, it would receive a 6 byte long byte string b'abc???'. With the ASCII-only proposal, both examples above would raise an error for the C string usage and behave as described for the Python strings. The same string as an escaped literal: cdef char* cstring = 'abc\xfc\xf6\xe4' cfunc('abc\xfc\xf6\xe4') pyfunc('abc\xfc\xf6\xe4') would assign/pass a 6 byte string, whereas it would be equally disallowed with the ASCII-only proposal. The Python case would pass a 6 character unicode or 6 bytes byte string, depending on unicode_literals. My point is that I don't see a reason for a compiler error. I find the above behaviour predictable and reasonable. > I really, really don't like having the value of a literal depend on type > of the variable it gets assigned to (I know, I know about ints and so > on, but let's try to keep the number of instances down). > > My vote is for identifying a set of completely safe strings (no \x or > \u, ASCII-only) that is the same regardless of any setting, and allow > that. Anything else, demand a b'' prefix to assign to a char*. Putting > in a b'' isn't THAT hard. Well, then why not keep it the way it was before and *always* require a 'b' prefix in front of char* literals when unicode_literals is enabled? After all, it's an explicit option, so users who want to enable it can be required to adapt their code accordingly. Stefan From dagss at student.matnat.uio.no Mon Sep 6 21:36:32 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 06 Sep 2010 21:36:32 +0200 Subject: [Cython] C string literals In-Reply-To: <4C853966.1020304@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> Message-ID: <4C8542C0.6060104@student.matnat.uio.no> Stefan Behnel wrote: > Dag Sverre Seljebotn, 06.09.2010 20:30: > >> Stefan Behnel wrote: >> >>> Robert Bradshaw, 06.09.2010 19:01: >>> >>> >>>> On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn >>>> >>>> >>>>> I don't understand this suggestion. What happens in each of these cases, >>>>> for different settings of "from __future__ import unicode_literals"? >>>>> >>>>> cdef char* x1 = 'abc\u0001' >>>>> >>>>> >>> As I said in my other mail, I don't think anyone would use the above in >>> real code. The alternative below is just too obvious and simple. >>> >>> >>> >>> >>>>> cdef char* x2 = 'abc\x01' >>>>> >>>>> >>>> from __future__ import unicode_literals (or -3) >>>> >>>> len(x1) == 4 >>>> len(x2) == 4 >>>> >>>> Otherwise >>>> >>>> len(x1) == 9 >>>> len(x2) == 4 >>>> >>>> >>> Hmm, now *that* looks unexpected to me. The way I see it, a C string is the >>> C equivalent of a Python byte string and should always and predictably >>> behave like a Python byte string, regardless of the way Python object >>> literals are handled. >>> >>> >> While the "cdef char*" case isn't that horrible, >> >> f('abc\x01') >> >> is. Imagine throwing in a type in the signature of f and then get >> different data in. >> > > This case is unambiguous. But the following would change. > > # using default source code encoding UTF-8 > > cdef char* cstring = 'abc???' > > charfunc('abc???') > > pyfunc('abc???') > > Here, 'cstring' is assigned a 9 byte long C string which is also passed > into charfunc(). When unicode_literals are enabled, pyfunc() would receive > u'abc???', otherwise otherwise it would receive the same 9 bytes long byte > string. > > # encoding: ISO-8859-1 > > cdef char* cstring = 'abc???' > > charfunc('abc???') > > pyfunc('abc???') > > assigns a 6 byte long C string, same for the charfunc() call. With > unicode_literals, pyfunc() would receive u'abc???', otherwise, it would > receive a 6 byte long byte string b'abc???'. > > With the ASCII-only proposal, both examples above would raise an error for > the C string usage and behave as described for the Python strings. > > > The same string as an escaped literal: > > cdef char* cstring = 'abc\xfc\xf6\xe4' > > cfunc('abc\xfc\xf6\xe4') > > pyfunc('abc\xfc\xf6\xe4') > > would assign/pass a 6 byte string, whereas it would be equally disallowed > with the ASCII-only proposal. The Python case would pass a 6 character > unicode or 6 bytes byte string, depending on unicode_literals. > > My point is that I don't see a reason for a compiler error. I find the > above behaviour predictable and reasonable. > > > >> I really, really don't like having the value of a literal depend on type >> of the variable it gets assigned to (I know, I know about ints and so >> on, but let's try to keep the number of instances down). >> >> My vote is for identifying a set of completely safe strings (no \x or >> \u, ASCII-only) that is the same regardless of any setting, and allow >> that. Anything else, demand a b'' prefix to assign to a char*. Putting >> in a b'' isn't THAT hard. >> > > Well, then why not keep it the way it was before and *always* require a 'b' > prefix in front of char* literals when unicode_literals is enabled? After > all, it's an explicit option, so users who want to enable it can be > required to adapt their code accordingly. > If this can get any momentum, I'm all for it (I was dismissing it when thinking about it because I thought it would meet opposition everywhere). It doesn't really make sense to assign unicode literals to char* in the first place to me, and with -3 or unicode_literals you're pretty much asking for having to do such a change. Dag Sverre From stefan_ml at behnel.de Mon Sep 6 21:52:05 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 06 Sep 2010 21:52:05 +0200 Subject: [Cython] C string literals In-Reply-To: <4C8542C0.6060104@student.matnat.uio.no> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> Message-ID: <4C854665.30700@behnel.de> Dag Sverre Seljebotn, 06.09.2010 21:36: > Stefan Behnel wrote: >> Dag Sverre Seljebotn, 06.09.2010 20:30: >>> My vote is for identifying a set of completely safe strings (no \x or >>> \u, ASCII-only) that is the same regardless of any setting, and allow >>> that. Anything else, demand a b'' prefix to assign to a char*. Putting >>> in a b'' isn't THAT hard. >> >> Well, then why not keep it the way it was before and *always* require a 'b' >> prefix in front of char* literals when unicode_literals is enabled? After >> all, it's an explicit option, so users who want to enable it can be >> required to adapt their code accordingly. >> > If this can get any momentum, I'm all for it (I was dismissing it when > thinking about it because I thought it would meet opposition > everywhere). It doesn't really make sense to assign unicode literals to > char* in the first place to me, and with -3 or unicode_literals you're > pretty much asking for having to do such a change. It's certainly the cleanest way to handle this. Lisandro didn't like it when he stumbled over it, because it means that he actually has to change his code. It's easy to do that, given that Cython reports all such places with a compile error. It's just cumbersome. Maybe a Cython specific 2to3 tool could help. Stefan From dalcinl at gmail.com Mon Sep 6 21:59:29 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 6 Sep 2010 16:59:29 -0300 Subject: [Cython] C string literals In-Reply-To: <4C854665.30700@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> Message-ID: On 6 September 2010 16:52, Stefan Behnel wrote: > Dag Sverre Seljebotn, 06.09.2010 21:36: >> Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 06.09.2010 20:30: >>>> My vote is for identifying a set of completely safe strings (no \x or >>>> \u, ASCII-only) that is the same regardless of any setting, and allow >>>> that. Anything else, demand a b'' prefix to assign to a char*. Putting >>>> in a b'' isn't THAT hard. >>> >>> Well, then why not keep it the way it was before and *always* require a 'b' >>> prefix in front of char* literals when unicode_literals is enabled? After >>> all, it's an explicit option, so users who want to enable it can be >>> required to adapt their code accordingly. >>> >> If this can get any momentum, I'm all for it (I was dismissing it when >> thinking about it because I thought it would meet opposition >> everywhere). It doesn't really make sense to assign unicode literals to >> char* in the first place to me, and with -3 or unicode_literals you're >> pretty much asking for having to do such a change. > > It's certainly the cleanest way to handle this. Lisandro didn't like it > when he stumbled over it, because it means that he actually has to change > his code. It's easy to do that, given that Cython reports all such places > with a compile error. It's just cumbersome. Maybe a Cython specific 2to3 > tool could help. > Stefan, I do not have any problem about changing my code... What I really like and want is to have a single Cython source that generated C-code being able to run in Python-2 and Python-3 runtime... Because of this, In many cases I have to use unprefixed (but pure ASCII) string literals and use the 'str' type, expecting that to become PyUnicode on Py3 and PyString under Py2. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stefan_ml at behnel.de Mon Sep 6 22:07:23 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 06 Sep 2010 22:07:23 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> Message-ID: <4C8549FB.9090006@behnel.de> Lisandro Dalcin, 06.09.2010 21:59: > On 6 September 2010 16:52, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 06.09.2010 21:36: >>> Stefan Behnel wrote: >>>> Dag Sverre Seljebotn, 06.09.2010 20:30: >>>>> My vote is for identifying a set of completely safe strings (no \x or >>>>> \u, ASCII-only) that is the same regardless of any setting, and allow >>>>> that. Anything else, demand a b'' prefix to assign to a char*. Putting >>>>> in a b'' isn't THAT hard. >>>> >>>> Well, then why not keep it the way it was before and *always* require a 'b' >>>> prefix in front of char* literals when unicode_literals is enabled? After >>>> all, it's an explicit option, so users who want to enable it can be >>>> required to adapt their code accordingly. >>>> >>> If this can get any momentum, I'm all for it (I was dismissing it when >>> thinking about it because I thought it would meet opposition >>> everywhere). It doesn't really make sense to assign unicode literals to >>> char* in the first place to me, and with -3 or unicode_literals you're >>> pretty much asking for having to do such a change. >> >> It's certainly the cleanest way to handle this. Lisandro didn't like it >> when he stumbled over it, because it means that he actually has to change >> his code. It's easy to do that, given that Cython reports all such places >> with a compile error. It's just cumbersome. Maybe a Cython specific 2to3 >> tool could help. > > Stefan, I do not have any problem about changing my code... What I > really like and want is to have a single Cython source that generated > C-code being able to run in Python-2 and Python-3 runtime... Because > of this, In many cases I have to use unprefixed (but pure ASCII) > string literals and use the 'str' type, expecting that to become > PyUnicode on Py3 and PyString under Py2. That's not the problem we are discussing here, though. This thread is about *C string* literals, which should or should not change due to unicode_literals being enabled. That's the question. I think it makes sense to keep the two discussions separate. Stefan From kayhayen at gmx.de Mon Sep 6 23:04:22 2010 From: kayhayen at gmx.de (Kay Hayen) Date: Mon, 06 Sep 2010 23:04:22 +0200 Subject: [Cython] C string literals In-Reply-To: <4C8549FB.9090006@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> Message-ID: <4C855756.3040804@gmx.de> Am 06.09.2010 22:07, schrieb Stefan Behnel: > That's not the problem we are discussing here, though. This thread is about > *C string* literals, which should or should not change due to > unicode_literals being enabled. That's the question. > > I think it makes sense to keep the two discussions separate. Wasn't this why wchar_t was invented? I recall that some define at least on Windows allowed to decide if it was unicode or ascii. So suppose that you have: cdef wchar_t *str1; str2 = "lalala" Now dependent on "unicode_literals" the wchar_t could change to be a "char" or a "Py_UNICODE". I think that Wikipedia has good information about this here: http://en.wikipedia.org/wiki/Wide_character#Programming_specifics This way the type of literals and C strings and would always be in sync and it should allow you to address both platforms. Yours, Kay From stefan_ml at behnel.de Mon Sep 6 23:12:45 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 06 Sep 2010 23:12:45 +0200 Subject: [Cython] C string literals In-Reply-To: <4C855756.3040804@gmx.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> Message-ID: <4C85594D.2020307@behnel.de> Kay Hayen, 06.09.2010 23:04: > Am 06.09.2010 22:07, schrieb Stefan Behnel: >> That's not the problem we are discussing here, though. This thread is about >> *C string* literals, which should or should not change due to >> unicode_literals being enabled. That's the question. >> >> I think it makes sense to keep the two discussions separate. > > Wasn't this why wchar_t was invented? Now we're really drifting off-topic. ;) No, that's totally unrelated. The "wchar_t" type is used internally for Unicode strings (typedef-ed to "Py_UNICODE") by CPython on Windows. It has nothing to do with the char* type used to represent C (byte!) strings, and in particular, it has nothing to do with the *content* of C byte strings and its mapping from C byte string literals in Cython code, which is what this discussion is about. Stefan From kayhayen at gmx.de Mon Sep 6 23:21:48 2010 From: kayhayen at gmx.de (Kay Hayen) Date: Mon, 06 Sep 2010 23:21:48 +0200 Subject: [Cython] C string literals In-Reply-To: <4C85594D.2020307@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> Message-ID: <4C855B6C.5070101@gmx.de> Am 06.09.2010 23:12, schrieb Stefan Behnel: > Kay Hayen, 06.09.2010 23:04: >> Am 06.09.2010 22:07, schrieb Stefan Behnel: >>> That's not the problem we are discussing here, though. This thread is about >>> *C string* literals, which should or should not change due to >>> unicode_literals being enabled. That's the question. >>> >>> I think it makes sense to keep the two discussions separate. >> >> Wasn't this why wchar_t was invented? > > Now we're really drifting off-topic. ;) > > No, that's totally unrelated. The "wchar_t" type is used internally for > Unicode strings (typedef-ed to "Py_UNICODE") by CPython on Windows. It has > nothing to do with the char* type used to represent C (byte!) strings, and > in particular, it has nothing to do with the *content* of C byte strings > and its mapping from C byte string literals in Cython code, which is what > this discussion is about. Well, if the discussion is about solving Lisandro's goals, then it is about how Python literals can be matched with a C type. And when a Python literal has a variant type depending on Parser flags, then C code could just as well have them too. That's why I recalled wchar_t, which historically has been used that way, but it could be any other type too, that is a "char" when no unicode_literals, and is a "Py_UNICODE" when they are active. If I understand Lisandro right, his goal is to write portable code. And portable code would best be served by having a C type equivalent of the Python type. Name it however you want, but I think it is needed to write code that doesn't convert the strings at all. That other topic, of C string literals seems like solving a problem that shouldn't exist. Why should the Python code use unicode literals and the C code not, that is just incompatible to Python. You know where I stand on that issue. Just don't use/allow "char *" as an interface with Python strings (as opposed to bytes) if you want to be Python 3 compatible. Yours, Kay From stefan_ml at behnel.de Mon Sep 6 23:34:29 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 06 Sep 2010 23:34:29 +0200 Subject: [Cython] C string literals In-Reply-To: <4C855B6C.5070101@gmx.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> <4C855B6C.5070101@gmx.de> Message-ID: <4C855E65.3030200@behnel.de> Kay Hayen, 06.09.2010 23:21: > Am 06.09.2010 23:12, schrieb Stefan Behnel: >> Kay Hayen, 06.09.2010 23:04: >>> Am 06.09.2010 22:07, schrieb Stefan Behnel: >>>> That's not the problem we are discussing here, though. This thread is about >>>> *C string* literals, which should or should not change due to >>>> unicode_literals being enabled. That's the question. >>>> >>>> I think it makes sense to keep the two discussions separate. >>> >>> Wasn't this why wchar_t was invented? >> >> Now we're really drifting off-topic. ;) >> >> No, that's totally unrelated. The "wchar_t" type is used internally for >> Unicode strings (typedef-ed to "Py_UNICODE") by CPython on Windows. It has >> nothing to do with the char* type used to represent C (byte!) strings, and >> in particular, it has nothing to do with the *content* of C byte strings >> and its mapping from C byte string literals in Cython code, which is what >> this discussion is about. > > Well, if the discussion is about solving Lisandro's goals, then it is > about how Python literals can be matched with a C type. Not at all. Please see the other thread that Lisandro started regarding the background ("other issues with cython -3"). It only deals with different Python string types, not with C types. > That other topic, of C string literals seems like solving a problem that > shouldn't exist. Why should the Python code use unicode literals and the > C code not, that is just incompatible to Python. I don't think I understand what you are trying to say here. We are talking about the interpretation of string literals *in Cython code* here. C code doesn't know about Unicode, it only knows about char*, i.e. byte sequences. It doesn't matter that there are types like "wchar_t" when you actually need to talk to something that wants a char*. Did you actually read the entire thread? Stefan From kayhayen at gmx.de Tue Sep 7 00:15:00 2010 From: kayhayen at gmx.de (Kay Hayen) Date: Tue, 07 Sep 2010 00:15:00 +0200 Subject: [Cython] C string literals In-Reply-To: <4C855E65.3030200@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> <4C855B6C.5070101@gmx.de> <4C855E65.3030200@behnel.de> Message-ID: <4C8567E4.3060200@gmx.de> Hello Stefan, >> That other topic, of C string literals seems like solving a problem that >> shouldn't exist. Why should the Python code use unicode literals and the >> C code not, that is just incompatible to Python. > > I don't think I understand what you are trying to say here. We are talking > about the interpretation of string literals *in Cython code* here. C code > doesn't know about Unicode, it only knows about char*, i.e. byte sequences. > It doesn't matter that there are types like "wchar_t" when you actually > need to talk to something that wants a char*. Did you actually read the > entire thread? I surely read it entirely and with big interest. With my own compiler project Nuitka, I came across the issue too. And I have on my mind to do it that way, to keep the unicode or non-unicode nature of strings to follow the Python strings at hand. I may be misunderstanding people because of my different goal to stay close to CPython. I am coming from a standpoint where the Python semantics should be the Cython semantics. So naturally I suggest that Cython string literals are to be Python string literals and that there be no C literals, but instead only Python literals in the source code. Then allow users to cast or convert "pchar_t *" to "char *" which depending on the value of "unicode_literals" may or may not be trivial, making the C code portable if wanted, and to only do the conversion if necessary. I don't see any use having some of the proposed hybrid semantics somewhere between C, Python 2 and Python 3 literals, which is just another step to being neither C nor Python. Regarding C and unicode: While true that C doesn't have it, but I would be hardpressed to find a compiler that doesn't at least have types for it. Otherwise CPython probably would itself have a hard time to support unicode. See for example: http://www.gnu.org/s/libc/manual/html_node/Extended-Char-Intro.html I am not so sure, how CPython's UCS2 vs. UCS4 usage comes into play here, I just hope it's the compiler / platform standard, however big wchar_t is. Yours, Kay From robertwb at math.washington.edu Tue Sep 7 01:53:50 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 6 Sep 2010 16:53:50 -0700 Subject: [Cython] C string literals In-Reply-To: <4C8530DA.9030808@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> Message-ID: On Mon, Sep 6, 2010 at 11:20 AM, Stefan Behnel wrote: > Robert Bradshaw, 06.09.2010 19:01: >> On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn >>> I don't understand this suggestion. What happens in each of these cases, >>> for different settings of "from __future__ import unicode_literals"? >>> >>> cdef char* x1 = 'abc\u0001' > > As I said in my other mail, I don't think anyone would use the above in > real code. The alternative below is just too obvious and simple. > > >>> cdef char* x2 = 'abc\x01' >> >> from __future__ import unicode_literals (or -3) >> >> ? ? ?len(x1) == 4 >> ? ? ?len(x2) == 4 >> >> Otherwise >> >> ? ? ?len(x1) == 9 >> ? ? ?len(x2) == 4 > > Hmm, now *that* looks unexpected to me. But this *exactly* how Python handles. x1 = 'abc\u0001' x2 = 'abc\x01' len(x1), len(x2) for with and without unicode_literals. > The way I see it, a C string is the > C equivalent of a Python byte string and should always and predictably > behave like a Python byte string, regardless of the way Python object > literals are handled. Python bytes are very different than strings. C (and most C libraries) use char* for both strings and binary data. - Robert From robertwb at math.washington.edu Tue Sep 7 01:54:15 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 6 Sep 2010 16:54:15 -0700 Subject: [Cython] C string literals In-Reply-To: <4C85332E.2030802@student.matnat.uio.no> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> Message-ID: On Mon, Sep 6, 2010 at 11:30 AM, Dag Sverre Seljebotn wrote: > Stefan Behnel wrote: >> Robert Bradshaw, 06.09.2010 19:01: >> >>> On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn >>> >>>> I don't understand this suggestion. What happens in each of these cases, >>>> for different settings of "from __future__ import unicode_literals"? >>>> >>>> cdef char* x1 = 'abc\u0001' >>>> >> >> As I said in my other mail, I don't think anyone would use the above in >> real code. The alternative below is just too obvious and simple. >> >> >> >>>> cdef char* x2 = 'abc\x01' >>>> >>> from __future__ import unicode_literals (or -3) >>> >>> ? ? ?len(x1) == 4 >>> ? ? ?len(x2) == 4 >>> >>> Otherwise >>> >>> ? ? ?len(x1) == 9 >>> ? ? ?len(x2) == 4 >>> >> >> Hmm, now *that* looks unexpected to me. >> >> The way I see it, a C string is the >> C equivalent of a Python byte string and should always and predictably >> behave like a Python byte string, regardless of the way Python object >> literals are handled. > > While the "cdef char*" case isn't that horrible, > > f('abc\x01') > > is. Imagine throwing in a type in the signature of f and then get > different data in. > > I really, really don't like having the value of a literal depend on type > of the variable it gets assigned to (I know, I know about ints and so > on, but let's try to keep the number of instances down). +1. This is the main reason I'm arguing my point. Literals should not be re-interpreted based on context. > My vote is for identifying a set of completely safe strings (no \x or > \u, ASCII-only) that is the same regardless of any setting, and allow > that. Anything else, demand a b'' prefix to assign to a char*. Putting > in a b'' isn't THAT hard. Sure. Many (most) libraries take char* for string values. I want to avoid requiting special incantations (not that the 'b' is hard, but one needs to know it) to write, e.g, printf("Hello World\n") Anything non-ascii, well, it's reasonable to force users to think about bytes vs. strings, encodings, etc. - Robert From robertwb at math.washington.edu Tue Sep 7 01:57:38 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 6 Sep 2010 16:57:38 -0700 Subject: [Cython] C string literals In-Reply-To: <4C8567E4.3060200@gmx.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> <4C855B6C.5070101@gmx.de> <4C855E65.3030200@behnel.de> <4C8567E4.3060200@gmx.de> Message-ID: On Mon, Sep 6, 2010 at 3:15 PM, Kay Hayen wrote: > Hello Stefan, > >>> That other topic, of C string literals seems like solving a problem that >>> shouldn't exist. Why should the Python code use unicode literals and the >>> C code not, that is just incompatible to Python. >> >> I don't think I understand what you are trying to say here. We are talking >> about the interpretation of string literals *in Cython code* here. C code >> doesn't know about Unicode, it only knows about char*, i.e. byte sequences. >> It doesn't matter that there are types like "wchar_t" when you actually >> need to talk to something that wants a char*. Did you actually read the >> entire thread? > > I surely read it entirely and with big interest. With my own compiler > project Nuitka, I came across the issue too. And I have on my mind to do > it that way, to keep the unicode or non-unicode nature of strings to > follow the Python strings at hand. I may be misunderstanding people > because of my different goal to stay close to CPython. > > I am coming from a standpoint where the Python semantics should be the > Cython semantics. So naturally I suggest that Cython string literals are > to be Python string literals and that there be no C literals, but > instead only Python literals in the source code. > > Then allow users to cast or convert "pchar_t *" to "char *" which > depending on the value of "unicode_literals" may or may not be trivial, > making the C code portable if wanted, and to only do the conversion if > necessary. > > I don't see any use having some of the proposed hybrid semantics > somewhere between C, Python 2 and Python 3 literals, which is just > another step to being neither C nor Python. > > Regarding C and unicode: While true that C doesn't have it, but I would > be hardpressed to find a compiler that doesn't at least have types for > it. Otherwise CPython probably would itself have a hard time to support > unicode. The key point, and the reason this comes up, is that we care about natively calling existing C libraries that take char* arguments. - Robert From stefan_ml at behnel.de Tue Sep 7 08:46:30 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 07 Sep 2010 08:46:30 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> Message-ID: <4C85DFC6.70402@behnel.de> Robert Bradshaw, 07.09.2010 01:53: > On Mon, Sep 6, 2010 at 11:20 AM, Stefan Behnel wrote: >> Robert Bradshaw, 06.09.2010 19:01: >>> On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn >>>> I don't understand this suggestion. What happens in each of these cases, >>>> for different settings of "from __future__ import unicode_literals"? >>>> >>>> cdef char* x1 = 'abc\u0001' >> >> As I said in my other mail, I don't think anyone would use the above in >> real code. The alternative below is just too obvious and simple. >> >> >>>> cdef char* x2 = 'abc\x01' >>> >>> from __future__ import unicode_literals (or -3) >>> >>> len(x1) == 4 >>> len(x2) == 4 >>> >>> Otherwise >>> >>> len(x1) == 9 >>> len(x2) == 4 >> >> Hmm, now *that* looks unexpected to me. > > But this *exactly* how Python handles. > > x1 = 'abc\u0001' > x2 = 'abc\x01' > len(x1), len(x2) > > for with and without unicode_literals. Not for byte strings. Seriously, what you are trying to push here is that users must decide if they prefix a char* literal with a 'b' or not, depending on the content of the string. Sometimes, Cython will force them to do it, sometimes, it will just work, even for calls to exactly the same function. Great. Why can't we *always* require a 'b' or *always* make it work as expected? What would be wrong with that? >> The way I see it, a C string is the >> C equivalent of a Python byte string and should always and predictably >> behave like a Python byte string, regardless of the way Python object >> literals are handled. > > Python bytes are very different than strings. C (and most C libraries) > use char* for both strings and binary data. No. They use it for binary data and *encoded* text content, even if the encoding is ASCII. That's different. The fact that they accept text content encoded in ASCII, CP1250, UTF-8, UCS4, Latin-15, Kanji or whatever doesn't mean they know what Unicode is or even how to handle text. They may just store it away as binary, they may interpret it a filename encoded in a platform specific way, or they may pass it to a recoder. Cython can't know. The user will know it, though, and will (in almost all cases) pass content that suits the other side, be it ASCII encoded or not. Could you comment on this please? http://permalink.gmane.org/gmane.comp.python.cython.devel/10243 I think I made it pretty clear there what I think the two suitable alternatives are. Stefan From stefan_ml at behnel.de Tue Sep 7 10:14:29 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 07 Sep 2010 10:14:29 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> Message-ID: <4C85F465.3080407@behnel.de> Robert Bradshaw, 07.09.2010 01:54: > On Mon, Sep 6, 2010 at 11:30 AM, Dag Sverre Seljebotn >> While the "cdef char*" case isn't that horrible, >> >> f('abc\x01') >> >> is. Imagine throwing in a type in the signature of f and then get >> different data in. >> >> I really, really don't like having the value of a literal depend on type >> of the variable it gets assigned to (I know, I know about ints and so >> on, but let's try to keep the number of instances down). > > +1. This is the main reason I'm arguing my point. Literals should not > be re-interpreted based on context. Well, they are, though. There's the context of the source code encoding, the context of unicode_literals, and the special case of 1-character and 1-byte literals in integer contexts. There's also the runtime specific interpretation of 'str', but that only affects literals indirectly, independent of their content. In addition to that, the "ASCII-only" proposal adds a similar context on top as the "char* == bytes" proposal. "ASCII-only" encodes Unicode strings to ASCII and rejects everything that doesn't fit, including explicit byte escapes. So you can't write cfunc("ao\xFF"), for example, although the code itself only uses plain ASCII characters. This creates an artificial difference between cfunc("ac\x7F") and cfunc("ac\x80") in the sense that one is allowed and the other is rejected and requires code modifications. "char* == bytes" encodes char* literals back to the byte sequence defined by the source code encoding, while properly handling all byte escapes in addition. So cfunc("ao\xFF") behaves exactly as written and cfunc("ao??") will be interpreted in the context of the source code encoding. >> My vote is for identifying a set of completely safe strings (no \x or >> \u, ASCII-only) that is the same regardless of any setting, and allow >> that. Anything else, demand a b'' prefix to assign to a char*. Putting >> in a b'' isn't THAT hard. > > Sure. Many (most) libraries take char* for string values. I want to > avoid requiting special incantations (not that the 'b' is hard, but > one needs to know it) to write, e.g, > > printf("Hello World\n") But that would only apply when you enable unicode_literals. I think it's reasonable to either a) require a 'b' prefix in that case or b) enforce bytes semantics for char* automatically. To make life easy for users, either b) can be applied or for a), we can let Cython generate a patch (or script) that prepends 'b' prefixes to all places where unprefixed string literals are used in a char* context. That way, the source code becomes safe regardless of the unicode_literals setting. Stefan From robertwb at math.washington.edu Tue Sep 7 10:20:11 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 7 Sep 2010 01:20:11 -0700 Subject: [Cython] C string literals In-Reply-To: <4C85DFC6.70402@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85DFC6.70402@behnel.de> Message-ID: >>>> from __future__ import unicode_literals (or -3) >>>> >>>> ? ? ? len(x1) == 4 >>>> ? ? ? len(x2) == 4 >>>> >>>> Otherwise >>>> >>>> ? ? ? len(x1) == 9 >>>> ? ? ? len(x2) == 4 >>> >>> Hmm, now *that* looks unexpected to me. >> >> But this *exactly* how Python handles. >> >> x1 = 'abc\u0001' >> x2 = 'abc\x01' >> len(x1), len(x2) >> >> for with and without unicode_literals. > > Not for byte strings. We're talking about unprefixed literals. > Seriously, what you are trying to push here is that users must decide if > they prefix a char* literal with a 'b' or not, depending on the content of > the string. Users are free to always prefix all their byte literals with 'b', I'm proposing for the simple, unambiguous case that they aren't forced to. > Sometimes, Cython will force them to do it, sometimes, it will > just work, even for calls to exactly the same function. Great. Why can't we > *always* require a 'b' I think this is overkill for the vast majority of libraries that I've wrapped (admittedly mostly math), as well as all the standard c libraries that take char* arguments (e.g. stdio, as in my previous example). > or *always* make it work as expected? What would be > wrong with that? Because clearly "what is expected" is not consistant across the participants in this thread, and I'd certainly rather have an unexpected compile time error than unexpected (potentially undetected) runtime behavior. >>> The way I see it, a C string is the >>> C equivalent of a Python byte string and should always and predictably >>> behave like a Python byte string, regardless of the way Python object >>> literals are handled. >> >> Python bytes are very different than strings. C (and most C libraries) >> use char* for both strings and binary data. > > No. They use it for binary data and *encoded* text content, even if the > encoding is ASCII. That's different. The fact that they accept text content > encoded in ASCII, CP1250, UTF-8, UCS4, Latin-15, Kanji or whatever doesn't > mean they know what Unicode is or even how to handle text. They may just > store it away as binary, they may interpret it a filename encoded in a > platform specific way, or they may pass it to a recoder. Cython can't know. > The user will know it, though, and will (in almost all cases) pass content > that suits the other side, be it ASCII encoded or not. [Sigh] I know the difference, but to say the C statement char *x = "abc"; doesn't contain any strings, only encoded text content, is IMHO overly pedantic, and I think it's too much to push this level of pedantism on all our users when the result is unambiguous. > Could you comment on this please? Sure, at the risk of being redundant. > http://permalink.gmane.org/gmane.comp.python.cython.devel/10243 > I think I made it pretty clear there what I think the two suitable > alternatives are. Yes, you favor either (1) re-interpretation of the literal depending on the type context they're used in or (2) disallowing interpretation of string literals when unicode literal are enabled. I think (1) is a bad path to take and would prefer not to burden users with (2). - Robert From stefan_ml at behnel.de Tue Sep 7 10:27:09 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 07 Sep 2010 10:27:09 +0200 Subject: [Cython] C string literals In-Reply-To: <4C8567E4.3060200@gmx.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> <4C855B6C.5070101@gmx.de> <4C855E65.3030200@behnel.de> <4C8567E4.3060200@gmx.de> Message-ID: <4C85F75D.9070402@behnel.de> Kay Hayen, 07.09.2010 00:15: > I may be misunderstanding people > because of my different goal to stay close to CPython. You may want to keep the level of unnecessary FUD down, especially on this list. Cython has very good and seamless support for Unicode and CPython's string type semantics. Stefan From robertwb at math.washington.edu Tue Sep 7 10:24:08 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 7 Sep 2010 01:24:08 -0700 Subject: [Cython] C string literals In-Reply-To: <4C853966.1020304@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> Message-ID: On Mon, Sep 6, 2010 at 11:56 AM, Stefan Behnel wrote: >> While the "cdef char*" case isn't that horrible, >> >> f('abc\x01') >> >> is. Imagine throwing in a type in the signature of f and then get >> different data in. > > This case is unambiguous. But the following would change. > > ? ? # using default source code encoding UTF-8 > > ? ? cdef char* cstring = 'abc???' > > ? ? charfunc('abc???') > > ? ? pyfunc('abc???') > > Here, 'cstring' is assigned a 9 byte long C string which is also passed > into charfunc(). When unicode_literals are enabled, pyfunc() would receive > u'abc???', otherwise otherwise it would receive the same 9 bytes long byte > string. > > ? ? # encoding: ISO-8859-1 > > ? ? cdef char* cstring = 'abc???' > > ? ? charfunc('abc???') > > ? ? pyfunc('abc???') > > assigns a 6 byte long C string, same for the charfunc() call. With > unicode_literals, pyfunc() would receive u'abc???', otherwise, it would > receive a 6 byte long byte string b'abc???'. > > With the ASCII-only proposal, both examples above would raise an error for > the C string usage and behave as described for the Python strings. > > > The same string as an escaped literal: > > ? ? cdef char* cstring = 'abc\xfc\xf6\xe4' > > ? ? cfunc('abc\xfc\xf6\xe4') > > ? ? pyfunc('abc\xfc\xf6\xe4') > > would assign/pass a 6 byte string, whereas it would be equally disallowed > with the ASCII-only proposal. The Python case would pass a 6 character > unicode or 6 bytes byte string, depending on unicode_literals. > > My point is that I don't see a reason for a compiler error. I find the > above behaviour predictable and reasonable. The reason I don't see this as predictable is because the value of the literal depend on knowing the signature of cfunc and pyfunc (which probably will not be named as informatively...) Actually, b'abc???' is a syntax error "bytes can only contain ASCII literal characters," which bolsters the argument of requiring Cython to follow suite. (In fact, if byte literals are interpreted as the literal bytes in the file, that means python files can't be naively re-encoded with a different encoding (and fixing the header) without possibly changing the actual meaning of the program. >> I really, really don't like having the value of a literal depend on type >> of the variable it gets assigned to (I know, I know about ints and so >> on, but let's try to keep the number of instances down). >> >> My vote is for identifying a set of completely safe strings (no \x or >> \u, ASCII-only) that is the same regardless of any setting, and allow >> that. Anything else, demand a b'' prefix to assign to a char*. Putting >> in a b'' isn't THAT hard. > > Well, then why not keep it the way it was before and *always* require a 'b' > prefix in front of char* literals when unicode_literals is enabled? After > all, it's an explicit option, so users who want to enable it can be > required to adapt their code accordingly. Why require it if there's absolutely no ambiguity? - Robert From robertwb at math.washington.edu Tue Sep 7 10:31:40 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 7 Sep 2010 01:31:40 -0700 Subject: [Cython] C string literals In-Reply-To: <4C85F465.3080407@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C85F465.3080407@behnel.de> Message-ID: On Tue, Sep 7, 2010 at 1:14 AM, Stefan Behnel wrote: > Robert Bradshaw, 07.09.2010 01:54: >> On Mon, Sep 6, 2010 at 11:30 AM, Dag Sverre Seljebotn >>> While the "cdef char*" case isn't that horrible, >>> >>> f('abc\x01') >>> >>> is. Imagine throwing in a type in the signature of f and then get >>> different data in. >>> >>> I really, really don't like having the value of a literal depend on type >>> of the variable it gets assigned to (I know, I know about ints and so >>> on, but let's try to keep the number of instances down). >> >> +1. This is the main reason I'm arguing my point. Literals should not >> be re-interpreted based on context. > > Well, they are, though. There's the context of the source code encoding, > the context of unicode_literals, and the special case of 1-character and > 1-byte literals in integer contexts. There's also the runtime specific > interpretation of 'str', but that only affects literals indirectly, > independent of their content. OK, there's some context going on, but I don't find any of these as egregious and depending on a definition that may be in another file and especially because it could change, for the sake of optimization, in an surprisingly incompatible way. > In addition to that, the "ASCII-only" proposal adds a similar context on > top as the "char* == bytes" proposal. "ASCII-only" encodes Unicode strings > to ASCII and rejects everything that doesn't fit, including explicit byte > escapes. So you can't write cfunc("ao\xFF"), for example, although the code > itself only uses plain ASCII characters. This creates an artificial > difference between cfunc("ac\x7F") and cfunc("ac\x80") in the sense that > one is allowed and the other is rejected and requires code modifications. I'm fine with this distinction, or (as someone proposed) marking a string as unsafe if it has any escapes at all. Python 3 draw one line at "ASCII only" for bytes types, though they do allow all escapes. > "char* == bytes" encodes char* literals back to the byte sequence defined > by the source code encoding, while properly handling all byte escapes in > addition. So cfunc("ao\xFF") behaves exactly as written and cfunc("ao??") > will be interpreted in the context of the source code encoding. > > >>> My vote is for identifying a set of completely safe strings (no \x or >>> \u, ASCII-only) that is the same regardless of any setting, and allow >>> that. Anything else, demand a b'' prefix to assign to a char*. Putting >>> in a b'' isn't THAT hard. >> >> Sure. Many (most) libraries take char* for string values. I want to >> avoid requiting special incantations (not that the 'b' is hard, but >> one needs to know it) to write, e.g, >> >> printf("Hello World\n") > > But that would only apply when you enable unicode_literals. Yep. But if we plan to move to -3 being the default eventually, I'd like to make things easier not harder. > I think it's > reasonable to either a) require a 'b' prefix in that case or b) enforce > bytes semantics for char* automatically. To make life easy for users, > either b) can be applied or for a), we can let Cython generate a patch (or > script) that prepends 'b' prefixes to all places where unprefixed string > literals are used in a char* context. That way, the source code becomes > safe regardless of the unicode_literals setting. Neither are obvious to do for a newcomer, whereas a compiler error can give a hint. - Robert From robertwb at math.washington.edu Tue Sep 7 10:37:54 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 7 Sep 2010 01:37:54 -0700 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C85F465.3080407@behnel.de> Message-ID: On Tue, Sep 7, 2010 at 1:31 AM, Robert Bradshaw wrote: >> I think it's >> reasonable to either a) require a 'b' prefix in that case or b) enforce >> bytes semantics for char* automatically. To make life easy for users, >> either b) can be applied or for a), we can let Cython generate a patch (or >> script) that prepends 'b' prefixes to all places where unprefixed string >> literals are used in a char* context. That way, the source code becomes >> safe regardless of the unicode_literals setting. > > Neither are obvious to do for a newcomer, whereas a compiler error can > give a hint. Nevermind, if we require a 'b' prefix, the fix would be obvious to a newcomer, though not the best first impression. It's the shifting string literal interpretation that worries me most. - Robert From stefan_ml at behnel.de Tue Sep 7 11:04:25 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 07 Sep 2010 11:04:25 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> Message-ID: <4C860019.2070607@behnel.de> Robert Bradshaw, 07.09.2010 10:24: > Actually, b'abc???' is a syntax error "bytes can only contain ASCII > literal characters," I didn't know and am deeply surprised they did that in Python 3. So, you're right, Python compatibility dictates that bytes literals contain only ASCII characters in this case. This would be something to change for -3, in addition to (but separate from) the existing unicode_literals behaviour. Note that Py2.[67] accept the above just fine with unicode_literals enabled. However, I don't see how this favours disallowing escaped byte values > 127 in char* literals, which appear even safer in this light. Stefan From stefan_ml at behnel.de Tue Sep 7 12:31:20 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 07 Sep 2010 12:31:20 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85DFC6.70402@behnel.de> Message-ID: <4C861478.4060604@behnel.de> Robert Bradshaw, 07.09.2010 10:20: >> Could you comment on this please? >> >> http://permalink.gmane.org/gmane.comp.python.cython.devel/10243 >> >> I think I made it pretty clear there what I think the two suitable >> alternatives are. > > Yes, you favor either (1) re-interpretation of the literal depending > on the type context they're used in or (2) disallowing interpretation > of string literals when unicode literal are enabled. > > I think (1) is a bad path to take and would prefer not to burden users > with (2). So, what about doing the following then: 1) we keep the current implementation as is, i.e. unprefixed string literals can coerce to char* literals during type analysis that match the byte sequence in the source file and properly handle byte escapes 2) with the -3 option, we disallow byte values > 127 in byte string literals and do not generate a byte string representation for unprefixed string literals that contain them, thus effectively preventing their coercion to char* That's basically the ASCII-only proposal with added escapes, and my proposal minus non-ASCII literal characters. Should make life easy for basically everyone, with the added benefit of increasing the compatibility with Python 3. We may additionally consider warning about '\u...' in unprefixed char* strings. I think this particular case will be rare enough to encourage a 'b' prefix or a '\\' escape. Stefan From kayhayen at gmx.de Tue Sep 7 18:24:28 2010 From: kayhayen at gmx.de (Kay Hayen) Date: Tue, 07 Sep 2010 18:24:28 +0200 Subject: [Cython] C string literals In-Reply-To: <4C85F75D.9070402@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> <4C855B6C.5070101@gmx.de> <4C855E65.3030200@behnel.de> <4C8567E4.3060200@gmx.de> <4C85F75D.9070402@behnel.de> Message-ID: <4C86673C.8050403@gmx.de> Am 07.09.2010 10:27, schrieb Stefan Behnel: > Kay Hayen, 07.09.2010 00:15: >> I may be misunderstanding people >> because of my different goal to stay close to CPython. > > You may want to keep the level of unnecessary FUD down, especially on this > list. Cython has very good and seamless support for Unicode and CPython's > string type semantics. What you quoted was not a statement about Cython, but about me. I have a lot of doubts about me. Therefore I might not understand that the discussion is about "Cython C string literals", because I don't understand what that should be and why it should be technically. To me at least, if you have "unicode_literals", every literal is unicode in unless you say so, no guessing is needed at all. My proposal to have a C type to represent the type of strings dependent on unicode_literals. Then have the user convert it to UTF-8 if that is what he needs to work with them. That conversion is obviously very simple without "unicode_literals", and for a constant (we are talking about literals even) it could be pre-determined in an optimization step by either the C++ compiler or Cython. This has the benefit of avoiding conversions where needed, and allowing to target both Python2 and Python3 runtime with the same source code and no magic "how it us used" conversions behind the users back. It has the other benefit of working with string objects as well, in the same way, that with Python3. The drawback is that you would need to cast if you use string objects or literals. But if that is needed, is a question to be answered by the user anyway. He may not be calling C functions that want UTF-8 but instead just optimize some string operations, for which conversions to UTF-8 back and forth would be detrimental. I actually see another benefit if unnecessary conversions are avoided. So what's wrong with that approach in the first place? Yours, Kay From kayhayen at gmx.de Tue Sep 7 18:39:41 2010 From: kayhayen at gmx.de (Kay Hayen) Date: Tue, 07 Sep 2010 18:39:41 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> Message-ID: <4C866ACD.1010803@gmx.de> Am 07.09.2010 10:24, schrieb Robert Bradshaw: > > Actually, b'abc???' is a syntax error "bytes can only contain ASCII > literal characters," You can write b'abc\xc3\xbc\xc3\xb6\xc3\xa4' if you wish to though, this is the UTF-8 form as gained by bytes( 'abc???', 'utf8' ), but I am sure, the latin form could also be used. So you can't write "?" but need to commit to an encoding by specifying the exact byte value. Hope this helps. Yours, Kay From stefan_ml at behnel.de Tue Sep 7 18:44:59 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 07 Sep 2010 18:44:59 +0200 Subject: [Cython] C string literals In-Reply-To: <4C86673C.8050403@gmx.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> <4C855B6C.5070101@gmx.de> <4C855E65.3030200@behnel.de> <4C8567E4.3060200@gmx.de> <4C85F75D.9070402@behnel.de> <4C86673C.8050403@gmx.d e> Message-ID: <4C866C0B.9030807@behnel.de> Kay Hayen, 07.09.2010 18:24: > Am 07.09.2010 10:27, schrieb Stefan Behnel: >> Kay Hayen, 07.09.2010 00:15: >>> I may be misunderstanding people >>> because of my different goal to stay close to CPython. >> >> You may want to keep the level of unnecessary FUD down, especially on this >> list. Cython has very good and seamless support for Unicode and CPython's >> string type semantics. > > What you quoted was not a statement about Cython, but about me. Hmm, guess I misunderstood you then. Sorry. You keep talking about "different goals" in contexts where it's not clear to me how Cython and Nuitka differ in their goals. (but let's don't discuss this in this thread) > Therefore I might not understand that the discussion is about "Cython C > string literals", because I don't understand what that should be and why > it should be technically. It's what you get when, for example, you write a string literal as input to a C function that accepts a char*. It's basically a Python bytes string mapped into C space, or an "unboxed" bytes string. Cython does these things in order to talk to foreign code. Stefan From kayhayen at gmx.de Tue Sep 7 19:49:38 2010 From: kayhayen at gmx.de (Kay Hayen) Date: Tue, 07 Sep 2010 19:49:38 +0200 Subject: [Cython] C string literals In-Reply-To: <4C866C0B.9030807@behnel.de> References: <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> <4C855B6C.5070101@gmx.de> <4C855E65.3030200@behnel.de> <4C8567E4.3060200@gmx.de> <4C85F75D.9070402@behnel.de> <4C86673C.8050403@gmx.d e> <4C866C0B.9030807@behnel.de> Message-ID: <4C867B32.8010808@gmx.de> Am 07.09.2010 18:44, schrieb Stefan Behnel: > Kay Hayen, 07.09.2010 18:24: >> Am 07.09.2010 10:27, schrieb Stefan Behnel: >>> Kay Hayen, 07.09.2010 00:15: >>>> I may be misunderstanding people >>>> because of my different goal to stay close to CPython. >>> >>> You may want to keep the level of unnecessary FUD down, especially on this >>> list. Cython has very good and seamless support for Unicode and CPython's >>> string type semantics. >> >> What you quoted was not a statement about Cython, but about me. > > Hmm, guess I misunderstood you then. Sorry.[...] My goals are independent of the tool. So it is true for Cython as much as for any other tool. I am interested in having Python semantics and faster execution. I heard that Cython is a tool for that. Is it not? I would therefore like to point out solutions that preserve Python semantics, but instead hand it over to C as well. >> Therefore I might not understand that the discussion is about "Cython C >> string literals", because I don't understand what that should be and why >> it should be technically. > > It's what you get when, for example, you write a string literal as input to > a C function that accepts a char*. It's basically a Python bytes string > mapped into C space, or an "unboxed" bytes string. Cython does these things > in order to talk to foreign code. From my point of view, if you want to call C function with C literals, why not use C in the first place. From a bindings point of view, I don't see the problem with the user getting a "pchar_t *" and him (or the compilers on the fly) to convert it as required. Yours, Kay From robertwb at math.washington.edu Tue Sep 7 20:07:17 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 7 Sep 2010 11:07:17 -0700 Subject: [Cython] C string literals In-Reply-To: <4C860019.2070607@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C860019.2070607@behnel.de> Message-ID: On Tue, Sep 7, 2010 at 2:04 AM, Stefan Behnel wrote: > Robert Bradshaw, 07.09.2010 10:24: >> Actually, b'abc???' is a syntax error "bytes can only contain ASCII >> literal characters," > > I didn't know and am deeply surprised they did that in Python 3. So, you're > right, Python compatibility dictates that bytes literals contain only ASCII > characters in this case. This is necessary for the source encoding to be orthogonal to the source content, as it should be. > This would be something to change for -3, in > addition to (but separate from) the existing unicode_literals behaviour. > Note that Py2.[67] accept the above just fine with unicode_literals enabled. > > However, I don't see how this favours disallowing escaped byte values > 127 > in char* literals, which appear even safer in this light. I'm actually not opposed to allowing escaped byte values in char* literals, it was just an easy place to draw the line. (In fact, I think it would make life easier.) What I am opposed to is treating escapes differently based on the way in which a literal is used. - Robert From robertwb at math.washington.edu Tue Sep 7 20:16:50 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 7 Sep 2010 11:16:50 -0700 Subject: [Cython] C string literals In-Reply-To: <4C861478.4060604@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85DFC6.70402@behnel.de> <4C861478.4060604@behnel.de> Message-ID: On Tue, Sep 7, 2010 at 3:31 AM, Stefan Behnel wrote: > Robert Bradshaw, 07.09.2010 10:20: >>> Could you comment on this please? >>> >>> http://permalink.gmane.org/gmane.comp.python.cython.devel/10243 >>> >>> I think I made it pretty clear there what I think the two suitable >>> alternatives are. >> >> Yes, you favor either (1) re-interpretation of the literal depending >> on the type context they're used in or (2) disallowing interpretation >> of string literals when unicode literal are enabled. >> >> I think (1) is a bad path to take and would prefer not to burden users >> with (2). > > So, what about doing the following then: > > 1) we keep the current implementation as is, i.e. unprefixed string > literals can coerce to char* literals during type analysis that match the > byte sequence in the source file and properly handle byte escapes I'd be more OK with that, except for I'd rather have consistent handling of the \u escape. The -2 behavior is the same, the -3 behavior as below, so the from __future__ import unicode_literals is more of an intermediate step, so not quite as important in the long run. > 2) with the -3 option, we disallow byte values > 127 in byte string > literals and do not generate a byte string representation for unprefixed > string literals that contain them, thus effectively preventing their > coercion to char* > > That's basically the ASCII-only proposal with added escapes, and my > proposal minus non-ASCII literal characters. Should make life easy for > basically everyone, with the added benefit of increasing the compatibility > with Python 3. +1 > We may additionally consider warning about '\u...' in unprefixed char* > strings. I think this particular case will be rare enough to encourage a > 'b' prefix or a '\\' escape. If we do this, we should have a warning for sure. I'd love to hear what others think. - Robert From robertwb at math.washington.edu Tue Sep 7 20:35:28 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 7 Sep 2010 11:35:28 -0700 Subject: [Cython] C string literals In-Reply-To: <4C867B32.8010808@gmx.de> References: <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85332E.2030802@student.matnat.uio.no> <4C853966.1020304@behnel.de> <4C8542C0.6060104@student.matnat.uio.no> <4C854665.30700@behnel.de> <4C8549FB.9090006@behnel.de> <4C855756.3040804@gmx.de> <4C85594D.2020307@behnel.de> <4C855B6C.5070101@gmx.de> <4C855E65.3030200@behnel.de> <4C8567E4.3060200@gmx.de> <4C85F75D.9070402@behnel.de> <4C866C0B.9030807@behnel.de> <4C867B32.8010808@gmx.de> Message-ID: On Tue, Sep 7, 2010 at 10:49 AM, Kay Hayen wrote: > Am 07.09.2010 18:44, schrieb Stefan Behnel: >> Kay Hayen, 07.09.2010 18:24: >>> Am 07.09.2010 10:27, schrieb Stefan Behnel: >>>> Kay Hayen, 07.09.2010 00:15: >>>>> I may be misunderstanding people >>>>> because of my different goal to stay close to CPython. >>>> >>>> You may want to keep the level of unnecessary FUD down, especially on this >>>> list. Cython has very good and seamless support for Unicode and CPython's >>>> string type semantics. >>> >>> What you quoted was not a statement about Cython, but about me. >> >> Hmm, guess I misunderstood you then. Sorry.[...] > > My goals are independent of the tool. So it is true for Cython as much > as for any other tool. > > I am interested in having Python semantics and faster execution. I heard > that Cython is a tool for that. Is it not? I would therefore like to > point out solutions that preserve Python semantics, but instead hand it > over to C as well. We have lots of this, thanks mostly to Stefan's great work to support the Py_UNICODE type and its optimized interaction with unicode strings. Python semantics for faster execution is not the only goal--we also want to easily use and wrap existing C libraries. >>> Therefore I might not understand that the discussion is about "Cython C >>> string literals", because I don't understand what that should be and why >>> it should be technically. >> >> It's what you get when, for example, you write a string literal as input to >> a C function that accepts a char*. It's basically a Python bytes string >> mapped into C space, or an "unboxed" bytes string. Cython does these things >> in order to talk to foreign code. > > ?From my point of view, if you want to call C function with C literals, > why not use C in the first place. Because some of us prefer coding in Python :). But there's a lot of useful code out there in C. This is like asking, "if you want to have faster execution, why not use C in the first place?" > ?From a bindings point of view, I don't see the problem with the user > getting a "pchar_t *" and him (or the compilers on the fly) to convert > it as required. Creating a (logical or actual) Python string just to convert it to a C string for use with foreign code is wasted overhead compared to being able to use the C literal directly. - Robert From stefan_ml at behnel.de Tue Sep 7 21:05:12 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 07 Sep 2010 21:05:12 +0200 Subject: [Cython] C string literals In-Reply-To: References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85DFC6.70402@behnel.de> <4C861478.4060604@behnel.de> Message-ID: <4C868CE8.2080200@behnel.de> Robert Bradshaw, 07.09.2010 20:16: > On Tue, Sep 7, 2010 at 3:31 AM, Stefan Behnel wrote: >> Robert Bradshaw, 07.09.2010 10:20: >>>> Could you comment on this please? >>>> >>>> http://permalink.gmane.org/gmane.comp.python.cython.devel/10243 >>>> >>>> I think I made it pretty clear there what I think the two suitable >>>> alternatives are. >>> >>> Yes, you favor either (1) re-interpretation of the literal depending >>> on the type context they're used in or (2) disallowing interpretation >>> of string literals when unicode literal are enabled. >>> >>> I think (1) is a bad path to take and would prefer not to burden users >>> with (2). >> >> So, what about doing the following then: >> >> 1) we keep the current implementation as is, i.e. unprefixed string >> literals can coerce to char* literals during type analysis that match the >> byte sequence in the source file and properly handle byte escapes > > I'd be more OK with that, except for I'd rather have consistent > handling of the \u escape. The -2 behavior is the same, the -3 > behavior as below, so the from __future__ import unicode_literals is > more of an intermediate step, so not quite as important in the long > run. I think so, too. In the long run, users should be able to appreciate -3 more than the partial imports. There's still some way to go to get it rolling smoothly (see Lisandro's "str" problem), but that'll come over time. >> 2) with the -3 option, we disallow byte values> 127 in byte string >> literals and do not generate a byte string representation for unprefixed >> string literals that contain them, thus effectively preventing their >> coercion to char* >> >> That's basically the ASCII-only proposal with added escapes, and my >> proposal minus non-ASCII literal characters. Should make life easy for >> basically everyone, with the added benefit of increasing the compatibility >> with Python 3. > > +1 Here's an attempt: http://hg.cython.org/cython-devel/rev/8f4cda480124 Hudson complains about one of the tests in Py<=2.5, but I should be able to fix that. >> We may additionally consider warning about '\u...' in unprefixed char* >> strings. I think this particular case will be rare enough to encourage a >> 'b' prefix or a '\\' escape. > > If we do this, we should have a warning for sure. It's generally valid Python to put a plain Unicode escape sequence into a byte string, but a warning will make it clear that it does have a code smell to do that because it makes the literal look like something that it is not. I think that in the context of char* literals, we are free to decide either way (as long as the char* context doesn't occur due to an internal optimisation of Cython...) > I'd love to hear what others think. Sure. Please give it a try. Stefan From dalcinl at gmail.com Wed Sep 8 04:35:17 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 7 Sep 2010 23:35:17 -0300 Subject: [Cython] C string literals In-Reply-To: <4C868CE8.2080200@behnel.de> References: <4C756BEA.40800@behnel.de> <4C8265DA.4060702@behnel.de> <4C831B78.1060405@behnel.de> <4C8331AE.80003@behnel.de> <4C8518A5.5060100@student.matnat.uio.no> <4C8530DA.9030808@behnel.de> <4C85DFC6.70402@behnel.de> <4C861478.4060604@behnel.de> <4C868CE8.2080200@behnel.de> Message-ID: On 7 September 2010 16:05, Stefan Behnel wrote: > Robert Bradshaw, 07.09.2010 20:16: >> On Tue, Sep 7, 2010 at 3:31 AM, Stefan Behnel wrote: >>> Robert Bradshaw, 07.09.2010 10:20: >>>>> Could you comment on this please? >>>>> >>>>> http://permalink.gmane.org/gmane.comp.python.cython.devel/10243 >>>>> >>>>> I think I made it pretty clear there what I think the two suitable >>>>> alternatives are. >>>> >>>> Yes, you favor either (1) re-interpretation of the literal depending >>>> on the type context they're used in or (2) disallowing interpretation >>>> of string literals when unicode literal are enabled. >>>> >>>> I think (1) is a bad path to take and would prefer not to burden users >>>> with (2). >>> >>> So, what about doing the following then: >>> >>> 1) we keep the current implementation as is, i.e. unprefixed string >>> literals can coerce to char* literals during type analysis that match the >>> byte sequence in the source file and properly handle byte escapes >> >> I'd be more OK with that, except for I'd rather have consistent >> handling of the \u escape. The -2 behavior is the same, the -3 >> behavior as below, so the from __future__ import unicode_literals is >> more of an intermediate step, so not quite as important in the long >> run. > > I think so, too. In the long run, users should be able to appreciate -3 > more than the partial imports. There's still some way to go to get it > rolling smoothly (see Lisandro's "str" problem), but that'll come over time. > > >>> 2) with the -3 option, we disallow byte values> ?127 in byte string >>> literals and do not generate a byte string representation for unprefixed >>> string literals that contain them, thus effectively preventing their >>> coercion to char* >>> >>> That's basically the ASCII-only proposal with added escapes, and my >>> proposal minus non-ASCII literal characters. Should make life easy for >>> basically everyone, with the added benefit of increasing the compatibility >>> with Python 3. >> >> +1 > > Here's an attempt: > > http://hg.cython.org/cython-devel/rev/8f4cda480124 > > Hudson complains about one of the tests in Py<=2.5, but I should be able to > fix that. > > >>> We may additionally consider warning about '\u...' in unprefixed char* >>> strings. I think this particular case will be rare enough to encourage a >>> 'b' prefix or a '\\' escape. >> >> If we do this, we should have a warning for sure. > > It's generally valid Python to put a plain Unicode escape sequence into a > byte string, but a warning will make it clear that it does have a code > smell to do that because it makes the literal look like something that it > is not. I think that in the context of char* literals, we are free to > decide either way (as long as the char* context doesn't occur due to an > internal optimisation of Cython...) > > >> I'd love to hear what others think. > > Sure. Please give it a try. > Now all petsc4py cythonizes fine with -3, except for two lines, one shown below. However, I understand that making that work could be hard/undesirable, but I really need to use a 'str' there (need a PyString in Py2 and PyUnicode in Py3). Error converting Pyrex file to C: ------------------------------------------------------------ ... def __get__(self): if self.iset != NULL: CHKERR( ISGetLocalSize(self.iset, &self.size) ) cdef object size = toInt(self.size) cdef dtype descr = PyArray_DescrFromType(NPY_PETSC_INT) cdef str typestr = "=%c%d" % (descr.kind, descr.itemsize) ^ ------------------------------------------------------------ /u/dalcinl/Devel/petsc4py-dev/src/PETSc/petscis.pxi:165:39: Cannot convert Unicode string to 'str' implicitly. This is not portable and requires explicit encoding. > Stefan > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From robertwb at math.washington.edu Tue Sep 14 07:19:44 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 13 Sep 2010 22:19:44 -0700 Subject: [Cython] Cython distutils Message-ID: I've pushed an implementation of http://wiki.cython.org/enhancements/distutils_preprocessing . This allows one do write, e.g. module_list = cythonize("*.pyx") in your setup.py, and it handle the .pyx dependencies (including transitive dependence of libraries) and does all the .pyx -> .c translation. (In particular, this does away with the need for a customized build_ext extension, which is especially nice for projects their own. There are also mechanisms for specifying more build information in the source file itself--see http://hg.cython.org/cython-devel/file/tip/tests/build/inline_distutils.srctree for an example. This will also be useful for non-distutils setups (e.g. pyximport and inline code). Try it out and let me know what you think in terms of features and API. - Robert From robertwb at math.washington.edu Tue Sep 14 07:19:44 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 13 Sep 2010 22:19:44 -0700 Subject: [Cython] Cython distutils Message-ID: I've pushed an implementation of http://wiki.cython.org/enhancements/distutils_preprocessing . This allows one do write, e.g. module_list = cythonize("*.pyx") in your setup.py, and it handle the .pyx dependencies (including transitive dependence of libraries) and does all the .pyx -> .c translation. (In particular, this does away with the need for a customized build_ext extension, which is especially nice for projects their own. There are also mechanisms for specifying more build information in the source file itself--see http://hg.cython.org/cython-devel/file/tip/tests/build/inline_distutils.srctree for an example. This will also be useful for non-distutils setups (e.g. pyximport and inline code). Try it out and let me know what you think in terms of features and API. - Robert From markflorisson88 at gmail.com Tue Sep 14 10:45:49 2010 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 14 Sep 2010 10:45:49 +0200 Subject: [Cython] Cython debugger Message-ID: Hello, There currently is no proper debugger available to debug Cython code. The current way we debug Cython code is: - the print statement - a C debugger - pdb None of the aforementioned methods is very powerful and time effective at the same time. For instance, 'print' requires recompilation and is not so powerful, pdb doesn't provide access to anything C-level and gdb often requires a lot of digging through generated C code. This is why I feel that there is a need to a proper Cython debugger. This debugger should have, for starters, the following features: - Cython variable/type/function name correspondence (the ability to set breakpoints for Cython function/method names, the ability to inspect variables by their Cython name (be it C or Python variables), etc) - A line number and source code correspondence with the Cython code (ability to view the code at the C and Cython abstraction levels) - Be able to inspect Python code, print python backtraces, etc (there's already Misc/gdbinit in the Python source distribution and there is the EasierPythonDebugging project) Such a debugger would need some kind of information, namely it would need: - a mapping from Cython line numbers to a range of C line numbers - a mapping from Cython names to mangled C names - naming information with regard to scope My plan is to extend GDB with Python and introduce new Cython commands that would deal with these issues. It would use the information exported by the Cython compiler to realize this (the easiest and most portable way would probably be to write it to a separate file in some format (xml or json come to mind)). So I have a few questions for cython-dev. First of all, all the scope and naming information is in the AST, so I think the easiest way would be to traverse the AST and generate a similar tree where nodes contain information such as: - C name - Cython name - Line number - C object or Python object I think this code could be called from the 'parse' closure in Cython.Compiler.Main.create_parse() just before returning 'tree'. What do you think? The line number correspondence is a little harder, because I believe that no code is currently saving Cython line number information when generating C code. Seeing that CCodeWriter (and StringIOTree) have these Insertion Points I think line number information would have to be saved there, and all the code dealing with that would have to be adjusted. Does that sound right or do you think there is an easier way to implement this? Another question is, if I would create a patch for Cython to have Cython emit this kind of debugger information, would it be accepted if accompanied by tests and according to the rules etc? I would be willing to maintain the code at least. And what is your general opinion of this debugger? What else would you see as an essential feature of such a Cython debugger? And do you think the actual debugger should be an external project or should it be part of Cython (if it's mature)? I think the former option would allow a more flexible release schedule. Kind regards, Mark From dagss at student.matnat.uio.no Tue Sep 14 11:20:22 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 14 Sep 2010 11:20:22 +0200 Subject: [Cython] Cython debugger In-Reply-To: References: Message-ID: <4C8F3E56.2020705@student.matnat.uio.no> mark florisson wrote: > Hello, > > There currently is no proper debugger available to debug Cython code. > The current way we debug Cython code is: > - the print statement > - a C debugger > - pdb > > None of the aforementioned methods is very powerful and time effective > at the same time. For instance, 'print' requires recompilation and is > not so powerful, pdb doesn't provide access to anything C-level and > gdb often requires a lot of digging through generated C code. This is > why I feel that there is a need to a proper Cython debugger. > > This debugger should have, for starters, the following features: > - Cython variable/type/function name correspondence (the ability to > set breakpoints for Cython function/method names, the ability to > inspect variables by their Cython name (be it C or Python variables), > etc) > - A line number and source code correspondence with the Cython code > (ability to view the code at the C and Cython abstraction levels) > - Be able to inspect Python code, print python backtraces, etc > (there's already Misc/gdbinit in the Python source distribution and > there is the EasierPythonDebugging project) > > Such a debugger would need some kind of information, namely it would need: > - a mapping from Cython line numbers to a range of C line numbers > - a mapping from Cython names to mangled C names > - naming information with regard to scope > > My plan is to extend GDB with Python and introduce new Cython commands > that would deal with these issues. It would use the information > exported by the Cython compiler to realize this (the easiest and most > portable way would probably be to write it to a separate file in some > format (xml or json come to mind)). > Excellent! This is great news! > So I have a few questions for cython-dev. > First of all, all the scope and naming information is in the AST, so I > think the easiest way would be to traverse the AST and generate a > similar tree where nodes contain information such as: > - C name > - Cython name > - Line number > - C object or Python object > > I think this code could be called from the 'parse' closure in > Cython.Compiler.Main.create_parse() just before returning 'tree'. What > do you think? > One might need to do it a little later, because C names and so on might not have been resolved at that stage. Because each node in the tree contains a pointer to their initial position anyway (in their "pos" attribute), I wonder if one should in fact do this at the very last, after code generation. Then we have the freedom to make up new C names during the code generation phase. Basically, this means writing a new transform and plugging it into the pipeline after code generation. Please ask for more details if you want and I hope somebody will help, although I'm a bit rushed at the moment myself. > The line number correspondence is a little harder, because I believe > that no code is currently saving Cython line number information when > generating C code. Seeing that CCodeWriter (and StringIOTree) have > these Insertion Points I think line number information would have to > be saved there, and all the code dealing with that would have to be > adjusted. Does that sound right or do you think there is an easier way > to implement this? > Actually, Cython can already emit gcc line directives through the --line-directives switch. If you can't use that directly with GDB then grepping the Cython source for "emit_linenums" should at least put you on track. Robert will know more about this, as he wrote that code. > Another question is, if I would create a patch for Cython to have > Cython emit this kind of debugger information, would it be accepted if > accompanied by tests and according to the rules etc? I would be > willing to maintain the code at least. And what is your general > opinion of this debugger? What else would you see as an essential > feature of such a Cython debugger? And do you think the actual > debugger should be an external project or should it be part of Cython > (if it's mature)? I think the former option would allow a more > flexible release schedule. > I think it is very likely that we would love to ship the necesarry modifications (once stable) as part of main Cython. If your Cython patches can become part of mainline GDB then there's no question about it at all, but even if you have to maintain a forked GDB I'm all for it (and in that case we could host the forked GDB on cython.org as well and so on). Let us know if you would like a more "official" branch on cython.org, although just keeping one on bitbucket.org is usually just as easy. Then we can pull the changes into mainline when things work well. Dag Sverre From markflorisson88 at gmail.com Tue Sep 14 11:56:25 2010 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 14 Sep 2010 11:56:25 +0200 Subject: [Cython] Cython debugger In-Reply-To: <4C8F3E56.2020705@student.matnat.uio.no> References: <4C8F3E56.2020705@student.matnat.uio.no> Message-ID: Dear Dag Sverre, thank you very much for your swift reply and help. Indeed, I noticed that the 'c_name' wasn't filled out at that point, so writing a transform sounds indeed like the better (and cleaner!) option. Regarding the line numbers, you can't access these pragmas from the Python API in GDB and in fact, it seems that these directives confuse gdb: http://paste.pocoo.org/show/261841/ . So I think I'll dig through the Cython source and look at the emit_linenums code. It's good to hear that you are willing to include such a patch. I think that indeed, making these changes part of gdb would be nice, but I think it should also be possible to install these gdb extensions manually (i.e., a separate project, perhaps shipped with Cython wouldn't hurt either). But I will definitely have a talk with the gdb community, and see if the Cython debugger could be integrated with EasierPythonDebugging. Yes, I think an official branch would be nice. Cheers, Mark On 14 September 2010 11:20, Dag Sverre Seljebotn wrote: > mark florisson wrote: >> Hello, >> >> There currently is no proper debugger available to debug Cython code. >> The current way we debug Cython code is: >> - the print statement >> - a C debugger >> - pdb >> >> None of the aforementioned methods is very powerful and time effective >> at the same time. For instance, 'print' requires recompilation and is >> not so powerful, pdb doesn't provide access to anything C-level and >> gdb often requires a lot of digging through generated C code. This is >> why I feel that there is a need to a proper Cython debugger. >> >> This debugger should have, for starters, the following features: >> - Cython variable/type/function name correspondence (the ability to >> set breakpoints for Cython function/method names, the ability to >> inspect variables by their Cython name (be it C or Python variables), >> etc) >> - A line number and source code correspondence with the Cython code >> (ability to view the code at the C and Cython abstraction levels) >> - Be able to inspect Python code, print python backtraces, etc >> (there's already Misc/gdbinit in the Python source distribution and >> there is the EasierPythonDebugging project) >> >> Such a debugger would need some kind of information, namely it would need: >> - a mapping from Cython line numbers to a range of C line numbers >> - a mapping from Cython names to mangled C names >> - naming information with regard to scope >> >> My plan is to extend GDB with Python and introduce new Cython commands >> that would deal with these issues. It would use the information >> exported by the Cython compiler to realize this (the easiest and most >> portable way would probably be to write it to a separate file in some >> format (xml or json come to mind)). >> > Excellent! This is great news! >> So I have a few questions for cython-dev. >> First of all, all the scope and naming information is in the AST, so I >> think the easiest way would be to traverse the AST and generate a >> similar tree where nodes contain information such as: >> - C name >> - Cython name >> - Line number >> - C object or Python object >> >> I think this code could be called from the 'parse' closure in >> Cython.Compiler.Main.create_parse() just before returning 'tree'. What >> do you think? >> > One might need to do it a little later, because C names and so on might > not have been resolved at that stage. > > Because each node in the tree contains a pointer to their initial > position anyway (in their "pos" attribute), I wonder if one should in > fact do this at the very last, after code generation. Then we have the > freedom to make up new C names during the code generation phase. > > Basically, this means writing a new transform and plugging it into the > pipeline after code generation. Please ask for more details if you want > and I hope somebody will help, although I'm a bit rushed at the moment > myself. >> The line number correspondence is a little harder, because I believe >> that no code is currently saving Cython line number information when >> generating C code. Seeing that CCodeWriter (and StringIOTree) have >> these Insertion Points I think line number information would have to >> be saved there, and all the code dealing with that would have to be >> adjusted. Does that sound right or do you think there is an easier way >> to implement this? >> > Actually, Cython can already emit gcc line directives through the > --line-directives switch. If you can't use that directly with GDB then > grepping the Cython source for "emit_linenums" should at least put you > on track. > > Robert will know more about this, as he wrote that code. >> Another question is, if I would create a patch for Cython to have >> Cython emit this kind of debugger information, would it be accepted if >> accompanied by tests and according to the rules etc? I would be >> willing to maintain the code at least. And what is your general >> opinion of this debugger? What else would you see as an essential >> feature of such a Cython debugger? And do you think the actual >> debugger should be an external project or should it be part of Cython >> (if it's mature)? I think the former option would allow a more >> flexible release schedule. >> > I think it is very likely that we would love to ship the necesarry > modifications (once stable) as part of main Cython. If your Cython > patches can become part of mainline GDB then there's no question about > it at all, but even if you have to maintain a forked GDB I'm all for it > (and in that case we could host the forked GDB on cython.org as well and > so on). > > Let us know if you would like a more "official" branch on cython.org, > although just keeping one on bitbucket.org is usually just as easy. Then > we can pull the changes into mainline when things work well. > > Dag Sverre > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From dalcinl at gmail.com Tue Sep 14 15:57:48 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 14 Sep 2010 10:57:48 -0300 Subject: [Cython] [Numpy-discussion] Cython distutils In-Reply-To: References: Message-ID: On 14 September 2010 02:19, Robert Bradshaw wrote: > I've pushed an implementation of > http://wiki.cython.org/enhancements/distutils_preprocessing . This > allows one do write, e.g. > > ? ?module_list = cythonize("*.pyx") > What happens if you run 'python setup.py --help', for example? Will that trigger cythonization? > in your setup.py, and it handle the .pyx dependencies (including > transitive dependence of libraries) and does all the .pyx -> .c > translation. (In particular, this does away with the need for a > customized build_ext extension, which is especially nice for projects > their own. ?There are also mechanisms for specifying more build > information in the source file itself--see > http://hg.cython.org/cython-devel/file/tip/tests/build/inline_distutils.srctree > for an example. This will also be useful for non-distutils setups > (e.g. pyximport and inline code). > > Try it out and let me know what you think in terms of features and API. > > - Robert > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From robertwb at math.washington.edu Tue Sep 14 17:42:57 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 14 Sep 2010 08:42:57 -0700 Subject: [Cython] [Numpy-discussion] Cython distutils In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 6:57 AM, Lisandro Dalcin wrote: > On 14 September 2010 02:19, Robert Bradshaw > wrote: >> I've pushed an implementation of >> http://wiki.cython.org/enhancements/distutils_preprocessing . This >> allows one do write, e.g. >> >> ? ?module_list = cythonize("*.pyx") >> > > What happens if you run 'python setup.py --help', for example? Will > that trigger cythonization? Yes. This is still a TODO item. Does anyone know if there an easy way to detect this? >> in your setup.py, and it handle the .pyx dependencies (including >> transitive dependence of libraries) and does all the .pyx -> .c >> translation. (In particular, this does away with the need for a >> customized build_ext extension, which is especially nice for projects >> their own. ?There are also mechanisms for specifying more build >> information in the source file itself--see >> http://hg.cython.org/cython-devel/file/tip/tests/build/inline_distutils.srctree >> for an example. This will also be useful for non-distutils setups >> (e.g. pyximport and inline code). >> >> Try it out and let me know what you think in terms of features and API. >> >> - Robert >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > Lisandro Dalcin > --------------- > CIMEC (INTEC/CONICET-UNL) > Predio CONICET-Santa Fe > Colectora RN 168 Km 472, Paraje El Pozo > Tel: +54-342-4511594 (ext 1011) > Tel/Fax: +54-342-4511169 > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From dalcinl at gmail.com Tue Sep 14 17:54:26 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 14 Sep 2010 12:54:26 -0300 Subject: [Cython] [Numpy-discussion] Cython distutils In-Reply-To: References: Message-ID: On 14 September 2010 12:42, Robert Bradshaw wrote: > On Tue, Sep 14, 2010 at 6:57 AM, Lisandro Dalcin wrote: >> On 14 September 2010 02:19, Robert Bradshaw >> wrote: >>> I've pushed an implementation of >>> http://wiki.cython.org/enhancements/distutils_preprocessing . This >>> allows one do write, e.g. >>> >>> ? ?module_list = cythonize("*.pyx") >>> >> >> What happens if you run 'python setup.py --help', for example? Will >> that trigger cythonization? > > Yes. This is still a TODO item. Does anyone know if there an easy way > to detect this? > AFAICT, the robust way is to implement a new a "build_src" distutils command, and add insert it to build.subcommands list before build_ext. IIUC, this is more or less what NumPy does. But this kind of monkeypatching is not a pleasure to code... >>> in your setup.py, and it handle the .pyx dependencies (including >>> transitive dependence of libraries) and does all the .pyx -> .c >>> translation. (In particular, this does away with the need for a >>> customized build_ext extension, which is especially nice for projects >>> their own. ?There are also mechanisms for specifying more build >>> information in the source file itself--see >>> http://hg.cython.org/cython-devel/file/tip/tests/build/inline_distutils.srctree >>> for an example. This will also be useful for non-distutils setups >>> (e.g. pyximport and inline code). >>> >>> Try it out and let me know what you think in terms of features and API. >>> >>> - Robert >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> >> -- >> Lisandro Dalcin >> --------------- >> CIMEC (INTEC/CONICET-UNL) >> Predio CONICET-Santa Fe >> Colectora RN 168 Km 472, Paraje El Pozo >> Tel: +54-342-4511594 (ext 1011) >> Tel/Fax: +54-342-4511169 >> _______________________________________________ >> Cython-dev mailing list >> Cython-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/cython-dev >> > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From robertwb at math.washington.edu Tue Sep 14 18:14:33 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 14 Sep 2010 09:14:33 -0700 Subject: [Cython] [Numpy-discussion] Cython distutils In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 8:54 AM, Lisandro Dalcin wrote: > On 14 September 2010 12:42, Robert Bradshaw > wrote: >> On Tue, Sep 14, 2010 at 6:57 AM, Lisandro Dalcin wrote: >>> On 14 September 2010 02:19, Robert Bradshaw >>> wrote: >>>> I've pushed an implementation of >>>> http://wiki.cython.org/enhancements/distutils_preprocessing . This >>>> allows one do write, e.g. >>>> >>>> ? ?module_list = cythonize("*.pyx") >>>> >>> >>> What happens if you run 'python setup.py --help', for example? Will >>> that trigger cythonization? >> >> Yes. This is still a TODO item. Does anyone know if there an easy way >> to detect this? >> > > AFAICT, the robust way is to implement a new a "build_src" distutils > command, and add insert it to build.subcommands list before build_ext. > IIUC, this is more or less what NumPy does. But this kind of > monkeypatching is not a pleasure to code... I agree. And this would conflict with anyone else doing such monkeypatching as well... One of my goals was to completely dis-entangle this from the whole distutils stack. Another option might be to make module_list a lazily created iterable object. I wonder if distutils requires a list though... - Robert From dalcinl at gmail.com Tue Sep 14 18:30:04 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 14 Sep 2010 13:30:04 -0300 Subject: [Cython] [Numpy-discussion] Cython distutils In-Reply-To: References: Message-ID: On 14 September 2010 13:14, Robert Bradshaw wrote: > On Tue, Sep 14, 2010 at 8:54 AM, Lisandro Dalcin wrote: >> On 14 September 2010 12:42, Robert Bradshaw >> wrote: >>> On Tue, Sep 14, 2010 at 6:57 AM, Lisandro Dalcin wrote: >>>> On 14 September 2010 02:19, Robert Bradshaw >>>> wrote: >>>>> I've pushed an implementation of >>>>> http://wiki.cython.org/enhancements/distutils_preprocessing . This >>>>> allows one do write, e.g. >>>>> >>>>> ? ?module_list = cythonize("*.pyx") >>>>> >>>> >>>> What happens if you run 'python setup.py --help', for example? Will >>>> that trigger cythonization? >>> >>> Yes. This is still a TODO item. Does anyone know if there an easy way >>> to detect this? >>> >> >> AFAICT, the robust way is to implement a new a "build_src" distutils >> command, and add insert it to build.subcommands list before build_ext. >> IIUC, this is more or less what NumPy does. But this kind of >> monkeypatching is not a pleasure to code... > > I agree. And this would conflict with anyone else doing such > monkeypatching as well... Well, perhaps it can be done in a clever way, and also calling the command "build_pyx". The key here is to get the 'build' and 'build_ext' command at runtime, and figure out the proper insertion point (i.e, before build_ext) for build_pyx in the build.subcommands list... > One of my goals was to completely > dis-entangle this from the whole distutils stack. > As long as you use setup.py, you simply cannot dis-entangle distutils. Robert, you cythonize() api is nice, it will work in many cases and will be fine for many users. But my point is that at some point we should try to add an additional layer with a build_pyx command. > Another option might be to make module_list a lazily created iterable > object. I wonder if distutils requires a list though... > Yep, it requires a list... > - Robert > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From dmalcolm at redhat.com Tue Sep 14 18:39:39 2010 From: dmalcolm at redhat.com (David Malcolm) Date: Tue, 14 Sep 2010 12:39:39 -0400 Subject: [Cython] Cython debugger In-Reply-To: <4C8F3E56.2020705@student.matnat.uio.no> References: <4C8F3E56.2020705@student.matnat.uio.no> Message-ID: <1284482379.13330.333.camel@radiator.bos.redhat.com> On Tue, 2010-09-14 at 11:20 +0200, Dag Sverre Seljebotn wrote: (various comments inline) > mark florisson wrote: > > Hello, > > > > There currently is no proper debugger available to debug Cython > code. > > The current way we debug Cython code is: > > - the print statement > > - a C debugger > > - pdb > > > > None of the aforementioned methods is very powerful and time > effective > > at the same time. For instance, 'print' requires recompilation and > is > > not so powerful, pdb doesn't provide access to anything C-level and > > gdb often requires a lot of digging through generated C code. This > is > > why I feel that there is a need to a proper Cython debugger. > > > > This debugger should have, for starters, the following features: > > - Cython variable/type/function name correspondence (the ability to > > set breakpoints for Cython function/method names, the ability to > > inspect variables by their Cython name (be it C or Python > variables), > > etc) > > - A line number and source code correspondence with the Cython code > > (ability to view the code at the C and Cython abstraction levels) > > - Be able to inspect Python code, print python backtraces, etc > > (there's already Misc/gdbinit in the Python source distribution and > > there is the EasierPythonDebugging project) I wrote the gdb7 hooks that I think you're referring to: http://fedoraproject.org/wiki/Features/EasierPythonDebugging This is in Python 2.7, Python 3.2a1, Fedora 13's builds of 2.6/3.1, and in some other distros. The code in question can be seen here: http://svn.python.org/view/python/trunk/Tools/gdb/libpython.py along with a test suite here: http://svn.python.org/view/python/trunk/Lib/test/test_gdb.py > > Such a debugger would need some kind of information, namely it would need: > > - a mapping from Cython line numbers to a range of C line numbers > > - a mapping from Cython names to mangled C names > > - naming information with regard to scope > > > > My plan is to extend GDB with Python and introduce new Cython commands > > that would deal with these issues. It would use the information > > exported by the Cython compiler to realize this (the easiest and most > > portable way would probably be to write it to a separate file in some > > format (xml or json come to mind)). > > [snip] > > > I think it is very likely that we would love to ship the necesarry > modifications (once stable) as part of main Cython. If your Cython > patches can become part of mainline GDB then there's no question about > it at all, but even if you have to maintain a forked GDB I'm all for it > (and in that case we could host the forked GDB on cython.org as well and > so on). You shouldn't need to fork GDB; gdb 7 embeds python scripting. Unfortunately the exact methods you'll have in the gdb python classes in any given Linux distribution's build of gdb will vary (at risk of further blowing my own trumpet, much of this python integration was done by colleagues of mine at RH, so the Fedora build of GDB tends to be slightly ahead of FSF GDB in this regard). You may run into issues with optimization: under the wrong conditions, GCC can optimize away vital variables in such a way that GDB can't read them. I run into this with optimization on 64-bit with the hooks I wrote for python: it can't read the PyFrameObject *f var in PyEval_EvalFrameEx, arguably the most important local within the CPython runtime :( (again, my RH colleagues are working on this, but it will involve adding new functionality to DWARF). Having said all that, I implemented various CPython hooks using gdb7: py-list py-up/py-down py-print and these work with an unoptimized build of python. It ought to be possible to do something similar with cython code. It may not even be necessary to modify cython: perhaps some searching for locals named "__pyx_*" iirc would get you 70% of the way there? I can attest that having the prettyprinters enabled does make it much easier to debug cython code: all of the PyObject* get prettyprinted. One other thought: if it's possibly to expose the cython structures in some meaningful way, perhaps we could change upstream python's gdb hooks to simply integrate them into the py-* commands I mentioned above? (so e.g. cython c functions get somehow treated as python frames; currently I have a test predicate: Frame.is_evalframeex() which perhaps could be generalized?) (Not sure; it would complicate the selftests within python itself) > Let us know if you would like a more "official" branch on cython.org, > although just keeping one on bitbucket.org is usually just as easy. Then > we can pull the changes into mainline when things work well. > > Dag Sverre > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From robertwb at math.washington.edu Tue Sep 14 19:28:56 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 14 Sep 2010 10:28:56 -0700 Subject: [Cython] Cython debugger In-Reply-To: References: <4C8F3E56.2020705@student.matnat.uio.no> Message-ID: On Tue, Sep 14, 2010 at 2:56 AM, mark florisson wrote: > Dear Dag Sverre, > > thank you very much for your swift reply and help. Indeed, I noticed > that the 'c_name' wasn't filled out at that point, so writing a > transform sounds indeed like the better (and cleaner!) option. > > Regarding the line numbers, you can't access these pragmas from the > Python API in GDB and in fact, it seems that these directives confuse > gdb: http://paste.pocoo.org/show/261841/ . So I think I'll dig through > the Cython source and look at the emit_linenums code. This is because you need the #file as well as #line directive so it can correctly index into the "source." > It's good to hear that you are willing to include such a patch. I > think that indeed, making these changes part of gdb would be nice, but > I think it should also be possible to install these gdb extensions > manually (i.e., a separate project, perhaps shipped with Cython > wouldn't hurt either). But I will definitely have a talk with the gdb > community, and see if the Cython debugger could be integrated with > EasierPythonDebugging. Good debugging tools is an oft requested feature, and would be very useful. Anything you can do along these lines would be great! > Yes, I think an official branch would be nice. Send me a htpasswd file offline, and I'll set you up one. - Robert From dagss at student.matnat.uio.no Wed Sep 15 10:09:40 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 15 Sep 2010 10:09:40 +0200 Subject: [Cython] C++ docs bug Message-ID: <4C907F44.9070005@student.matnat.uio.no> In the C++ docs it says "C++ objects can now be stack-allocated." Should that be "not"? http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html Dag Sverre From markflorisson88 at gmail.com Wed Sep 15 11:55:53 2010 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 15 Sep 2010 11:55:53 +0200 Subject: [Cython] Cython debugger In-Reply-To: <1284482379.13330.333.camel@radiator.bos.redhat.com> References: <4C8F3E56.2020705@student.matnat.uio.no> <1284482379.13330.333.camel@radiator.bos.redhat.com> Message-ID: Dear David, On 14 September 2010 18:39, David Malcolm wrote: > On Tue, 2010-09-14 at 11:20 +0200, Dag Sverre Seljebotn wrote: > > (various comments inline) > >> mark florisson wrote: >> > Hello, >> > >> > There currently is no proper debugger available to debug Cython >> code. >> > The current way we debug Cython code is: >> > - the print statement >> > - a C debugger >> > - pdb >> > >> > None of the aforementioned methods is very powerful and time >> effective >> > at the same time. For instance, 'print' requires recompilation and >> is >> > not so powerful, pdb doesn't provide access to anything C-level and >> > gdb often requires a lot of digging through generated C code. This >> is >> > why I feel that there is a need to a proper Cython debugger. >> > >> > This debugger should have, for starters, the following features: >> > - Cython variable/type/function name correspondence (the ability to >> > set breakpoints for Cython function/method names, the ability to >> > inspect variables by their Cython name (be it C or Python >> variables), >> > etc) >> > - A line number and source code correspondence with the Cython code >> > (ability to view the code at the C and Cython abstraction levels) >> > - Be able to inspect Python code, print python backtraces, etc >> > (there's already Misc/gdbinit in the Python source distribution and >> > there is the EasierPythonDebugging project) > > I wrote the gdb7 hooks that I think you're referring to: > ?http://fedoraproject.org/wiki/Features/EasierPythonDebugging > > This is in Python 2.7, Python 3.2a1, Fedora 13's builds of 2.6/3.1, and > in some other distros. > > The code in question can be seen here: > ?http://svn.python.org/view/python/trunk/Tools/gdb/libpython.py > along with a test suite here: > ?http://svn.python.org/view/python/trunk/Lib/test/test_gdb.py Thanks for pointing us to the source code, I didn't realize this was included in the CPython source code distribution. >> > Such a debugger would need some kind of information, namely it would need: >> > - a mapping from Cython line numbers to a range of C line numbers >> > - a mapping from Cython names to mangled C names >> > - naming information with regard to scope >> > >> > My plan is to extend GDB with Python and introduce new Cython commands >> > that would deal with these issues. It would use the information >> > exported by the Cython compiler to realize this (the easiest and most >> > portable way would probably be to write it to a separate file in some >> > format (xml or json come to mind)). >> > > > [snip] > >> > >> I think it is very likely that we would love to ship the necesarry >> modifications (once stable) as part of main Cython. If your Cython >> patches can become part of mainline GDB then there's no question about >> it at all, but even if you have to maintain a forked GDB I'm all for it >> (and in that case we could host the forked GDB on cython.org as well and >> so on). > > You shouldn't need to fork GDB; gdb 7 embeds python scripting. > > Unfortunately the exact methods you'll have in the gdb python classes in > any given Linux distribution's build of gdb will vary (at risk of > further blowing my own trumpet, much of this python integration was done > by colleagues of mine at RH, so the Fedora build of GDB tends to be > slightly ahead of FSF GDB in this regard). > > You may run into issues with optimization: under the wrong conditions, > GCC can optimize away vital variables in such a way that GDB can't read > them. ?I run into this with optimization on 64-bit with the hooks I > wrote for python: it can't read the PyFrameObject *f var in > PyEval_EvalFrameEx, arguably the most important local within the CPython > runtime :( ?(again, my RH colleagues are working on this, but it will > involve adding new functionality to DWARF). > > Having said all that, I implemented various CPython hooks using gdb7: > ?py-list > ?py-up/py-down > ?py-print > and these work with an unoptimized build of python. > > It ought to be possible to do something similar with cython code. ?It > may not even be necessary to modify cython: perhaps some searching for > locals named "__pyx_*" iirc would get you 70% of the way there? Although that sounds like a wonderful idea, I think there are also issues with that. One issue is that a user must be able to set Cython breakpoints before the Cython module would be loaded, and for that the symbol name would be needed beforehand. Also, I don't know if these mangled names are consistent now and in the future and if you would be able to unambiguously associate a Cython variable name with a mangled name. > I can attest that having the prettyprinters enabled does make it much > easier to debug cython code: all of the PyObject* get prettyprinted. I've been looking at the code and this is pretty neat. I did encounter some issues, for instance if you load the script before loading the python interpreter you get this traceback because these types are not defined at that time: Traceback (most recent call last): File "", line 1, in File ".libpython.py", line 49, in _type_size_t = gdb.lookup_type('size_t') RuntimeError: No type named size_t. So I think it would be a good idea to not make that code module-level. > > One other thought: if it's possibly to expose the cython structures in > some meaningful way, perhaps we could change upstream python's gdb hooks > to simply integrate them into the py-* commands I mentioned above? (so > e.g. cython c functions get somehow treated as python frames; currently > I have a test predicate: > ?Frame.is_evalframeex() > which perhaps could be generalized?) > > (Not sure; it would complicate the selftests within python itself) > I think it would be hard to make them actual Python frames because creating frames in the inferior process from gdb is probably quite dangerous, and the alternative would be to modify Cython so that it creates Python stack frames (this sounds feasible but I think it might be a little bit of work). However, if this could be done (it would only do so if this 'debug' flag is active), then tracebacks and locals inspection etc wouldn't need special attention and the code would appear as normal Python code (apart from the Code objects obviously). However, this would form a problem for non-primitive C-type Cython variables. So at the very least we could have a 'py-locals' or some such command that would show the value of all the locals (the Python locals would be printed by py-print and C locals by gdb print). For regular python code it would show the locals from the current stack frame. For the Cython part to work we would need information from the Cython compiler because we wouldn't want to list any temporary or irrelevant variables. So I think we should be able to integrate these two projects into one fruitful project, and with proper documentation it could help both regular Python users and Cython users. >> Let us know if you would like a more "official" branch on cython.org, >> although just keeping one on bitbucket.org is usually just as easy. Then >> we can pull the changes into mainline when things work well. >> >> Dag Sverre >> _______________________________________________ >> Cython-dev mailing list >> Cython-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/cython-dev > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > Cheers, Mark From cournape at gmail.com Wed Sep 15 14:58:19 2010 From: cournape at gmail.com (David Cournapeau) Date: Wed, 15 Sep 2010 21:58:19 +0900 Subject: [Cython] [Numpy-discussion] Cython distutils In-Reply-To: References: Message-ID: On Wed, Sep 15, 2010 at 12:54 AM, Lisandro Dalcin wrote: > On 14 September 2010 12:42, Robert Bradshaw > wrote: >> On Tue, Sep 14, 2010 at 6:57 AM, Lisandro Dalcin wrote: >>> On 14 September 2010 02:19, Robert Bradshaw >>> wrote: >>>> I've pushed an implementation of >>>> http://wiki.cython.org/enhancements/distutils_preprocessing . This >>>> allows one do write, e.g. >>>> >>>> ? ?module_list = cythonize("*.pyx") >>>> >>> >>> What happens if you run 'python setup.py --help', for example? Will >>> that trigger cythonization? >> >> Yes. This is still a TODO item. Does anyone know if there an easy way >> to detect this? >> > > AFAICT, the robust way is to implement a new a "build_src" distutils > command, and add insert it to build.subcommands list before build_ext. > IIUC, this is more or less what NumPy does. But this kind of > monkeypatching is not a pleasure to code... Indeed, and I would not call it robust :) The next version of bento (to be released soon), will have relatively decent support for cython, in particular chaining with other builder (so that you could say chain .pyx.in -> .pyx -> .c -> .o -> .so) will be possible without any monkey patch or other moronic operations. cheers, David From robertwb at math.washington.edu Wed Sep 15 17:49:44 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 15 Sep 2010 08:49:44 -0700 Subject: [Cython] Cython debugger In-Reply-To: References: <4C8F3E56.2020705@student.matnat.uio.no> <1284482379.13330.333.camel@radiator.bos.redhat.com> Message-ID: On Wed, Sep 15, 2010 at 2:55 AM, mark florisson wrote: >> It ought to be possible to do something similar with cython code. ?It >> may not even be necessary to modify cython: perhaps some searching for >> locals named "__pyx_*" iirc would get you 70% of the way there? > > Although that sounds like a wonderful idea, I think there are also > issues with that. One issue is that a user must be able to set Cython > breakpoints before the Cython module would be loaded, and for that the > symbol name would be needed beforehand. Also, I don't know if these > mangled names are consistent now and in the future and if you would be > able to unambiguously associate a Cython variable name with a mangled > name. Mangled names are deterministic and, though they're not guaranteed to be consistent from release to release, almost always are. >> I can attest that having the prettyprinters enabled does make it much >> easier to debug cython code: all of the PyObject* get prettyprinted. > > I've been looking at the code and this is pretty neat. I did encounter > some issues, for instance if you load the script before loading the > python interpreter you get this traceback because these types are not > defined at that time: > > Traceback (most recent call last): > ?File "", line 1, in > ?File ".libpython.py", line 49, in > ? ?_type_size_t = gdb.lookup_type('size_t') > RuntimeError: No type named size_t. > > So I think it would be a good idea to not make that code module-level. > >> >> One other thought: if it's possibly to expose the cython structures in >> some meaningful way, perhaps we could change upstream python's gdb hooks >> to simply integrate them into the py-* commands I mentioned above? (so >> e.g. cython c functions get somehow treated as python frames; currently >> I have a test predicate: >> ?Frame.is_evalframeex() >> which perhaps could be generalized?) >> >> (Not sure; it would complicate the selftests within python itself) >> > > I think it would be hard to make them actual Python frames because > creating frames in the inferior process from gdb is probably quite > dangerous, and the alternative would be to modify Cython so that it > creates Python stack frames (this sounds feasible but I think it might > be a little bit of work). However, if this could be done (it would > only do so if this 'debug' flag is active), then tracebacks and locals > inspection etc wouldn't need special attention and the code would > appear as normal Python code (apart from the Code objects obviously). > However, this would form a problem for non-primitive C-type Cython > variables. > > So at the very least we could have a 'py-locals' or some such command > that would show the value of all the locals (the Python locals would > be printed by py-print and C locals by gdb print). For regular python > code it would show the locals from the current stack frame. For the > Cython part to work we would need information from the Cython compiler > because we wouldn't want to list any temporary or irrelevant > variables. Yes, this is what I was thinking, at least in terms of exposing stuff to pdb (which is a complementary project). BTW, frames are already created for functions when profiling is enabled. > So I think we should be able to integrate these two projects into one > fruitful project, and with proper documentation it could help both > regular Python users and Cython users. +1 - Robert From robertwb at math.washington.edu Wed Sep 15 17:53:33 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 15 Sep 2010 08:53:33 -0700 Subject: [Cython] C++ docs bug In-Reply-To: <4C907F44.9070005@student.matnat.uio.no> References: <4C907F44.9070005@student.matnat.uio.no> Message-ID: On Wed, Sep 15, 2010 at 1:09 AM, Dag Sverre Seljebotn wrote: > In the C++ docs it says "C++ objects can now be stack-allocated." > > Should that be "not"? > http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html They can be, but there should be some caveats (specifically, only default constructors and they are all allocated at the beginning of the method and deallocated at the end, unlike C++ scoping rules). - Robert From barbieri at profusion.mobi Fri Sep 17 15:07:53 2010 From: barbieri at profusion.mobi (Gustavo Sverzut Barbieri) Date: Fri, 17 Sep 2010 10:07:53 -0300 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. Message-ID: changeset: 3753:62e52f105bc0 tag: tip user: Gustavo Sverzut Barbieri date: Fri Sep 17 09:42:57 2010 -0300 files: Cython/Compiler/ModuleNode.py description: Force GCC>=4 to export module initialization function. With GCC's -fvisibility=hidden (both CFLAGS and LDFLAGS) it is possible to have the compiler to produce binaries where all symbols are hidden (local to the binary). To force a symbol to be visible one must specify __attribute__ ((visibility("default"))). This patch introduces pre-processor code to always force module symbols to be visible. NOTE: in theory pyport.h from Python would do the right thing, but it does not define any special cases for GCC, just Windows. As Python is not changing that easily and even if it does is just for future releases, it's better to have those in Cython. diff --git a/Cython/Compiler/ModuleNode.py b/Cython/Compiler/ModuleNode.py --- a/Cython/Compiler/ModuleNode.py +++ b/Cython/Compiler/ModuleNode.py @@ -1671,9 +1671,15 @@ header2 = "PyMODINIT_FUNC init%s(void)" % env.module_name header3 = "PyMODINIT_FUNC PyInit_%s(void)" % env.module_name code.putln("#if PY_MAJOR_VERSION < 3") + code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") + code.putln("__attribute__ ((visibility(\"default\")))") + code.putln("# endif") code.putln("%s; /*proto*/" % header2) code.putln(header2) code.putln("#else") + code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") + code.putln("__attribute__ ((visibility(\"default\")))") + code.putln("# endif") code.putln("%s; /*proto*/" % header3) code.putln(header3) code.putln("#endif") -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -------------------------------------- MSN: barbieri at gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 -------------- next part -------------- A non-text attachment was scrubbed... Name: cython-export-module-init-visibility-gcc.patch Type: text/x-patch Size: 1753 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20100917/264c5b45/attachment.bin From dalcinl at gmail.com Fri Sep 17 16:07:31 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 17 Sep 2010 11:07:31 -0300 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: Message-ID: On 17 September 2010 10:07, Gustavo Sverzut Barbieri wrote: > changeset: ? 3753:62e52f105bc0 > tag: ? ? ? ? tip > user: ? ? ? ?Gustavo Sverzut Barbieri > date: ? ? ? ?Fri Sep 17 09:42:57 2010 -0300 > files: ? ? ? Cython/Compiler/ModuleNode.py > description: > Force GCC>=4 to export module initialization function. > > With GCC's -fvisibility=hidden (both CFLAGS and LDFLAGS) it is > possible to have the compiler to produce binaries where all symbols > are hidden (local to the binary). To force a symbol to be visible one > must specify __attribute__ ((visibility("default"))). > Why do you need to force -fvisibility=hidden ? In general, Cython emits 'static' storage specifier, unless 'public' or 'api' keywords are involved. > This patch introduces pre-processor code to always force module > symbols to be visible. > > NOTE: in theory pyport.h from Python would do the right thing, but it > does not define any special cases for GCC, just Windows. As Python is > not changing that easily and even if it does is just for future releases, > it's better to have those in Cython. > Did you bug Python? > > diff --git a/Cython/Compiler/ModuleNode.py b/Cython/Compiler/ModuleNode.py > --- a/Cython/Compiler/ModuleNode.py > +++ b/Cython/Compiler/ModuleNode.py > @@ -1671,9 +1671,15 @@ > ? ? ? ? header2 = "PyMODINIT_FUNC init%s(void)" % env.module_name > ? ? ? ? header3 = "PyMODINIT_FUNC PyInit_%s(void)" % env.module_name > ? ? ? ? code.putln("#if PY_MAJOR_VERSION < 3") > + ? ? ? ?code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") > + ? ? ? ?code.putln("__attribute__ ((visibility(\"default\")))") > + ? ? ? ?code.putln("# endif") > ? ? ? ? code.putln("%s; /*proto*/" % header2) > ? ? ? ? code.putln(header2) > ? ? ? ? code.putln("#else") > + ? ? ? ?code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") > + ? ? ? ?code.putln("__attribute__ ((visibility(\"default\")))") > + ? ? ? ?code.putln("# endif") > ? ? ? ? code.putln("%s; /*proto*/" % header3) > ? ? ? ? code.putln(header3) > ? ? ? ? code.putln("#endif") > It looks good, I support the patch. However, 1) Do you know how this would work in Windows with MinGW? I would MinGW to do the right thing, but we have to be sure... 2) Do you know if this could affect full static Python builds? (I mean, when you build a Python interpreter with all ext modules 'built-in' in the python binary) ? I cannot think of any problem, but perhaps you do? -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From barbieri at profusion.mobi Fri Sep 17 16:34:27 2010 From: barbieri at profusion.mobi (Gustavo Sverzut Barbieri) Date: Fri, 17 Sep 2010 11:34:27 -0300 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: Message-ID: On Fri, Sep 17, 2010 at 11:07 AM, Lisandro Dalcin wrote: > On 17 September 2010 10:07, Gustavo Sverzut Barbieri > wrote: >> changeset: ? 3753:62e52f105bc0 >> tag: ? ? ? ? tip >> user: ? ? ? ?Gustavo Sverzut Barbieri >> date: ? ? ? ?Fri Sep 17 09:42:57 2010 -0300 >> files: ? ? ? Cython/Compiler/ModuleNode.py >> description: >> Force GCC>=4 to export module initialization function. >> >> With GCC's -fvisibility=hidden (both CFLAGS and LDFLAGS) it is >> possible to have the compiler to produce binaries where all symbols >> are hidden (local to the binary). To force a symbol to be visible one >> must specify __attribute__ ((visibility("default"))). >> > > Why do you need to force -fvisibility=hidden ? In general, Cython > emits 'static' storage specifier, unless 'public' or 'api' keywords > are involved. I don't. But we recently moved from setuptools to autoconf due lots of complaints of the former and I got tired and wasted 2 days to convert all my bindings, and users had that in their CFLAGS as the whole project compiles with it... except our bindings :-/ As for autoconf, later on I'll post the link to our examples... I even created a cython.m4 to help finding cython, its version and already cythonized files. What made us run away from setuptools is the deep level of magic and dynamic patching: setuptools/distutils checked for pyrex, thus I was faking cython as pyrex in sys.__modules__, but then people asked for easy_install and that was importing setuptools before we fake pyrex and then to unpatch all the classes was being a major PITA. autoconf/automake is not much easier, as it requires lots of code to get something done, but at least the rest of our developers are used to it due the C libraries... :-/ >> This patch introduces pre-processor code to always force module >> symbols to be visible. >> >> NOTE: in theory pyport.h from Python would do the right thing, but it >> does not define any special cases for GCC, just Windows. As Python is >> not changing that easily and even if it does is just for future releases, >> it's better to have those in Cython. >> > > Did you bug Python? not yet... >> diff --git a/Cython/Compiler/ModuleNode.py b/Cython/Compiler/ModuleNode.py >> --- a/Cython/Compiler/ModuleNode.py >> +++ b/Cython/Compiler/ModuleNode.py >> @@ -1671,9 +1671,15 @@ >> ? ? ? ? header2 = "PyMODINIT_FUNC init%s(void)" % env.module_name >> ? ? ? ? header3 = "PyMODINIT_FUNC PyInit_%s(void)" % env.module_name >> ? ? ? ? code.putln("#if PY_MAJOR_VERSION < 3") >> + ? ? ? ?code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") >> + ? ? ? ?code.putln("__attribute__ ((visibility(\"default\")))") >> + ? ? ? ?code.putln("# endif") >> ? ? ? ? code.putln("%s; /*proto*/" % header2) >> ? ? ? ? code.putln(header2) >> ? ? ? ? code.putln("#else") >> + ? ? ? ?code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") >> + ? ? ? ?code.putln("__attribute__ ((visibility(\"default\")))") >> + ? ? ? ?code.putln("# endif") >> ? ? ? ? code.putln("%s; /*proto*/" % header3) >> ? ? ? ? code.putln(header3) >> ? ? ? ? code.putln("#endif") >> > > It looks good, I support the patch. However, > > 1) Do you know how this would work in Windows with MinGW? I would > MinGW to do the right thing, but we have to be sure... AFAIK it does the right thing as we have libraries with that and it still work. Right now I don't have people to check it, but if you wish we could add a !windows before the check for gcc. > 2) Do you know if this could affect full static Python builds? (I > mean, when you build a Python interpreter with all ext modules > 'built-in' in the python binary) ? I cannot think of any problem, but > perhaps you do? no problems other than the symbol will be visible. BR, -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -------------------------------------- MSN: barbieri at gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 From dalcinl at gmail.com Fri Sep 17 16:58:37 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 17 Sep 2010 11:58:37 -0300 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: Message-ID: On 17 September 2010 11:34, Gustavo Sverzut Barbieri wrote: > On Fri, Sep 17, 2010 at 11:07 AM, Lisandro Dalcin wrote: >> On 17 September 2010 10:07, Gustavo Sverzut Barbieri >> wrote: >>> changeset: ? 3753:62e52f105bc0 >>> tag: ? ? ? ? tip >>> user: ? ? ? ?Gustavo Sverzut Barbieri >>> date: ? ? ? ?Fri Sep 17 09:42:57 2010 -0300 >>> files: ? ? ? Cython/Compiler/ModuleNode.py >>> description: >>> Force GCC>=4 to export module initialization function. >>> >>> With GCC's -fvisibility=hidden (both CFLAGS and LDFLAGS) it is >>> possible to have the compiler to produce binaries where all symbols >>> are hidden (local to the binary). To force a symbol to be visible one >>> must specify __attribute__ ((visibility("default"))). >>> >> >> Why do you need to force -fvisibility=hidden ? In general, Cython >> emits 'static' storage specifier, unless 'public' or 'api' keywords >> are involved. > > I don't. But we recently moved from setuptools to autoconf due lots of > complaints of the former and I got tired and wasted 2 days to convert > all my bindings, and users had that in their CFLAGS as the whole > project compiles with it... except our bindings :-/ > > As for autoconf, later on I'll post the link to our examples... I even > created a cython.m4 to help finding cython, its version and already > cythonized files. ? What made us run away from setuptools is the deep > level of magic and dynamic patching: setuptools/distutils checked for > pyrex, thus I was faking cython as pyrex in sys.__modules__, but then > people asked for easy_install and that was importing setuptools before > we fake pyrex and then to unpatch all the classes was being a major > PITA. ?autoconf/automake is not much easier, as it requires lots of > code to get something done, but at least the rest of our developers > are used to it due the C libraries... :-/ > > >>> This patch introduces pre-processor code to always force module >>> symbols to be visible. >>> >>> NOTE: in theory pyport.h from Python would do the right thing, but it >>> does not define any special cases for GCC, just Windows. As Python is >>> not changing that easily and even if it does is just for future releases, >>> it's better to have those in Cython. >>> >> >> Did you bug Python? > > not yet... > > >>> diff --git a/Cython/Compiler/ModuleNode.py b/Cython/Compiler/ModuleNode.py >>> --- a/Cython/Compiler/ModuleNode.py >>> +++ b/Cython/Compiler/ModuleNode.py >>> @@ -1671,9 +1671,15 @@ >>> ? ? ? ? header2 = "PyMODINIT_FUNC init%s(void)" % env.module_name >>> ? ? ? ? header3 = "PyMODINIT_FUNC PyInit_%s(void)" % env.module_name >>> ? ? ? ? code.putln("#if PY_MAJOR_VERSION < 3") >>> + ? ? ? ?code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") >>> + ? ? ? ?code.putln("__attribute__ ((visibility(\"default\")))") >>> + ? ? ? ?code.putln("# endif") >>> ? ? ? ? code.putln("%s; /*proto*/" % header2) >>> ? ? ? ? code.putln(header2) >>> ? ? ? ? code.putln("#else") >>> + ? ? ? ?code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") >>> + ? ? ? ?code.putln("__attribute__ ((visibility(\"default\")))") >>> + ? ? ? ?code.putln("# endif") >>> ? ? ? ? code.putln("%s; /*proto*/" % header3) >>> ? ? ? ? code.putln(header3) >>> ? ? ? ? code.putln("#endif") >>> >> >> It looks good, I support the patch. However, >> >> 1) Do you know how this would work in Windows with MinGW? I would >> MinGW to do the right thing, but we have to be sure... > > AFAIK it does the right thing as we have libraries with that and it > still work. Right now I don't have people to check it, but if you wish > we could add a !windows before the check for gcc. > No, I would prefer to test your patch in Windows and see if things still work. But that would be next week. > >> 2) Do you know if this could affect full static Python builds? (I >> mean, when you build a Python interpreter with all ext modules >> 'built-in' in the python binary) ? I cannot think of any problem, but >> perhaps you do? > > no problems other than the symbol will be visible. > Well, that could be a problem... Perhaps we should define a CYTHON_MODINIT_FUNC_VISIBILITY macro, then as a last resort you can change it. I'm thinking on something like: #if !defined(CYTHON_MODINIT_FUNC_VISIBILITY) #if defined(__GNUC__) && (__GNUC__ >= 4) #define CYTHON_MODINIT_FUNC_VISIBILITY __attribute__ ((visibility("default"))) #else #define CYTHON_MODINIT_FUNC_VISIBILITY #endif #endif and then we can emit code like this: PyMODINIT_FUNC init(void) CYTHON_MODINIT_FUNC_VISIBILITY { .... } What do you think? -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From barbieri at profusion.mobi Fri Sep 17 17:48:15 2010 From: barbieri at profusion.mobi (Gustavo Sverzut Barbieri) Date: Fri, 17 Sep 2010 12:48:15 -0300 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: Message-ID: On Fri, Sep 17, 2010 at 11:58 AM, Lisandro Dalcin wrote: > On 17 September 2010 11:34, Gustavo Sverzut Barbieri > wrote: >> On Fri, Sep 17, 2010 at 11:07 AM, Lisandro Dalcin wrote: >>> On 17 September 2010 10:07, Gustavo Sverzut Barbieri >>> wrote: >>>> changeset: ? 3753:62e52f105bc0 >>>> tag: ? ? ? ? tip >>>> user: ? ? ? ?Gustavo Sverzut Barbieri >>>> date: ? ? ? ?Fri Sep 17 09:42:57 2010 -0300 >>>> files: ? ? ? Cython/Compiler/ModuleNode.py >>>> description: >>>> Force GCC>=4 to export module initialization function. >>>> >>>> With GCC's -fvisibility=hidden (both CFLAGS and LDFLAGS) it is >>>> possible to have the compiler to produce binaries where all symbols >>>> are hidden (local to the binary). To force a symbol to be visible one >>>> must specify __attribute__ ((visibility("default"))). >>>> >>> >>> Why do you need to force -fvisibility=hidden ? In general, Cython >>> emits 'static' storage specifier, unless 'public' or 'api' keywords >>> are involved. >> >> I don't. But we recently moved from setuptools to autoconf due lots of >> complaints of the former and I got tired and wasted 2 days to convert >> all my bindings, and users had that in their CFLAGS as the whole >> project compiles with it... except our bindings :-/ >> >> As for autoconf, later on I'll post the link to our examples... I even >> created a cython.m4 to help finding cython, its version and already >> cythonized files. ? What made us run away from setuptools is the deep >> level of magic and dynamic patching: setuptools/distutils checked for >> pyrex, thus I was faking cython as pyrex in sys.__modules__, but then >> people asked for easy_install and that was importing setuptools before >> we fake pyrex and then to unpatch all the classes was being a major >> PITA. ?autoconf/automake is not much easier, as it requires lots of >> code to get something done, but at least the rest of our developers >> are used to it due the C libraries... :-/ >> > > >> >>>> This patch introduces pre-processor code to always force module >>>> symbols to be visible. >>>> >>>> NOTE: in theory pyport.h from Python would do the right thing, but it >>>> does not define any special cases for GCC, just Windows. As Python is >>>> not changing that easily and even if it does is just for future releases, >>>> it's better to have those in Cython. >>>> >>> >>> Did you bug Python? >> >> not yet... >> >> >>>> diff --git a/Cython/Compiler/ModuleNode.py b/Cython/Compiler/ModuleNode.py >>>> --- a/Cython/Compiler/ModuleNode.py >>>> +++ b/Cython/Compiler/ModuleNode.py >>>> @@ -1671,9 +1671,15 @@ >>>> ? ? ? ? header2 = "PyMODINIT_FUNC init%s(void)" % env.module_name >>>> ? ? ? ? header3 = "PyMODINIT_FUNC PyInit_%s(void)" % env.module_name >>>> ? ? ? ? code.putln("#if PY_MAJOR_VERSION < 3") >>>> + ? ? ? ?code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") >>>> + ? ? ? ?code.putln("__attribute__ ((visibility(\"default\")))") >>>> + ? ? ? ?code.putln("# endif") >>>> ? ? ? ? code.putln("%s; /*proto*/" % header2) >>>> ? ? ? ? code.putln(header2) >>>> ? ? ? ? code.putln("#else") >>>> + ? ? ? ?code.putln("# if defined(__GNUC__) && (__GNUC__ >= 4)") >>>> + ? ? ? ?code.putln("__attribute__ ((visibility(\"default\")))") >>>> + ? ? ? ?code.putln("# endif") >>>> ? ? ? ? code.putln("%s; /*proto*/" % header3) >>>> ? ? ? ? code.putln(header3) >>>> ? ? ? ? code.putln("#endif") >>>> >>> >>> It looks good, I support the patch. However, >>> >>> 1) Do you know how this would work in Windows with MinGW? I would >>> MinGW to do the right thing, but we have to be sure... >> >> AFAIK it does the right thing as we have libraries with that and it >> still work. Right now I don't have people to check it, but if you wish >> we could add a !windows before the check for gcc. >> > > No, I would prefer to test your patch in Windows and see if things > still work. But that would be next week. > >> >>> 2) Do you know if this could affect full static Python builds? (I >>> mean, when you build a Python interpreter with all ext modules >>> 'built-in' in the python binary) ? I cannot think of any problem, but >>> perhaps you do? >> >> no problems other than the symbol will be visible. >> > > Well, that could be a problem... Perhaps we should define a > CYTHON_MODINIT_FUNC_VISIBILITY macro, then as a last resort you can > change it. I'm thinking on something like: > > #if !defined(CYTHON_MODINIT_FUNC_VISIBILITY) > #if defined(__GNUC__) && (__GNUC__ >= 4) > #define CYTHON_MODINIT_FUNC_VISIBILITY __attribute__ ((visibility("default"))) > #else > #define CYTHON_MODINIT_FUNC_VISIBILITY > #endif > #endif > > and then we can emit code like this: > > PyMODINIT_FUNC init(void) > CYTHON_MODINIT_FUNC_VISIBILITY > { > .... > } > > What do you think? I'm fine with that, I just wanted the patch to remain as simple as possible, but this can be done as easily. -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -------------------------------------- MSN: barbieri at gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 From dagss at student.matnat.uio.no Sat Sep 18 16:17:01 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 18 Sep 2010 16:17:01 +0200 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: Message-ID: <4C94C9DD.9030002@student.matnat.uio.no> Gustavo Sverzut Barbieri wrote: > On Fri, Sep 17, 2010 at 11:07 AM, Lisandro Dalcin wrote: > >> On 17 September 2010 10:07, Gustavo Sverzut Barbieri >> wrote: >> >>> changeset: 3753:62e52f105bc0 >>> tag: tip >>> user: Gustavo Sverzut Barbieri >>> date: Fri Sep 17 09:42:57 2010 -0300 >>> files: Cython/Compiler/ModuleNode.py >>> description: >>> Force GCC>=4 to export module initialization function. >>> >>> With GCC's -fvisibility=hidden (both CFLAGS and LDFLAGS) it is >>> possible to have the compiler to produce binaries where all symbols >>> are hidden (local to the binary). To force a symbol to be visible one >>> must specify __attribute__ ((visibility("default"))). >>> >>> >> Why do you need to force -fvisibility=hidden ? In general, Cython >> emits 'static' storage specifier, unless 'public' or 'api' keywords >> are involved. >> > > I don't. But we recently moved from setuptools to autoconf due lots of > complaints of the former and I got tired and wasted 2 days to convert > all my bindings, and users had that in their CFLAGS as the whole > project compiles with it... except our bindings :-/ > > As for autoconf, later on I'll post the link to our examples... I even > created a cython.m4 to help finding cython, its version and already > cythonized files. What made us run away from setuptools is the deep > level of magic and dynamic patching: setuptools/distutils checked for > pyrex, thus I was faking cython as pyrex in sys.__modules__, but then > people asked for easy_install and that was importing setuptools before > we fake pyrex and then to unpatch all the classes was being a major > PITA. autoconf/automake is not much easier, as it requires lots of > code to get something done, but at least the rest of our developers > are used to it due the C libraries... :-/ > I can very much sympathize. People quickly discover that setuptools and distutils are not suitable for anything nontrivial and that the codebase is beyond repair. As an anecdote, at EuroScipy2010, every time someone mentioned they had problems building or distributing their Python software the room eccoed with "tell me about it". Building and distributing Python software seems to have become somewhat of a standing joke, at least within science and Python (with their need to mix Fortran, Cython etc. and run on various nonstandard cluster hardware). The reason so much Cython documentation is written using distutils is, I think, because of the lack of an obvious alternative, not because anyone thinks distutils is such a great thing. IMO, if any solution comes to the mess of building Cython software it is not going to come from distutils or related tools (interesting reads on the subject are any rant David Cournapeau has on the subject on his blog or distutils-sig). BTW, David is or was the maintainer of the build system of NumPy and SciPy, which is about as complicated as it gets with Python packages. He's writing the Bento project ( http://github.com/cournape/Bento) for packaging Python software, which primary feature is not being distutils :-) Apparently, Bento is able to build projects with Cython in it now...I've been meaning to check it out but haven't got around to it. If Bento lives up to its promise and gets released in a stable version it'd be nice to establish usage of Bento as well as or instead of distutils in the Cython docs. Dag Sverre From barbieri at profusion.mobi Sat Sep 18 21:57:06 2010 From: barbieri at profusion.mobi (Gustavo Sverzut Barbieri) Date: Sat, 18 Sep 2010 16:57:06 -0300 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: <4C94C9DD.9030002@student.matnat.uio.no> References: <4C94C9DD.9030002@student.matnat.uio.no> Message-ID: On Sat, Sep 18, 2010 at 11:17 AM, Dag Sverre Seljebotn wrote: > Gustavo Sverzut Barbieri wrote: >> On Fri, Sep 17, 2010 at 11:07 AM, Lisandro Dalcin wrote: >> >>> On 17 September 2010 10:07, Gustavo Sverzut Barbieri >>> wrote: >>> >>>> changeset: ? 3753:62e52f105bc0 >>>> tag: ? ? ? ? tip >>>> user: ? ? ? ?Gustavo Sverzut Barbieri >>>> date: ? ? ? ?Fri Sep 17 09:42:57 2010 -0300 >>>> files: ? ? ? Cython/Compiler/ModuleNode.py >>>> description: >>>> Force GCC>=4 to export module initialization function. >>>> >>>> With GCC's -fvisibility=hidden (both CFLAGS and LDFLAGS) it is >>>> possible to have the compiler to produce binaries where all symbols >>>> are hidden (local to the binary). To force a symbol to be visible one >>>> must specify __attribute__ ((visibility("default"))). >>>> >>>> >>> Why do you need to force -fvisibility=hidden ? In general, Cython >>> emits 'static' storage specifier, unless 'public' or 'api' keywords >>> are involved. >>> >> >> I don't. But we recently moved from setuptools to autoconf due lots of >> complaints of the former and I got tired and wasted 2 days to convert >> all my bindings, and users had that in their CFLAGS as the whole >> project compiles with it... except our bindings :-/ >> >> As for autoconf, later on I'll post the link to our examples... I even >> created a cython.m4 to help finding cython, its version and already >> cythonized files. ? What made us run away from setuptools is the deep >> level of magic and dynamic patching: setuptools/distutils checked for >> pyrex, thus I was faking cython as pyrex in sys.__modules__, but then >> people asked for easy_install and that was importing setuptools before >> we fake pyrex and then to unpatch all the classes was being a major >> PITA. ?autoconf/automake is not much easier, as it requires lots of >> code to get something done, but at least the rest of our developers >> are used to it due the C libraries... :-/ >> > I can very much sympathize. People quickly discover that setuptools and > distutils are not suitable for anything nontrivial and that the codebase > is beyond repair. As an anecdote, at EuroScipy2010, every time someone > mentioned they had problems building or distributing their Python > software the room eccoed with "tell me about it". Building and > distributing Python software seems to have become somewhat of a standing > joke, at least within science and Python (with their need to mix > Fortran, Cython etc. and run on various nonstandard cluster hardware). > > The reason so much Cython documentation is written using distutils is, I > think, because of the lack of an obvious alternative, not because anyone > thinks distutils is such a great thing. > > IMO, if any solution comes to the mess of building Cython software it is > not going to come from distutils or related tools (interesting reads on > the subject are any rant David Cournapeau has on the subject on his blog > or distutils-sig). BTW, David is or was the maintainer of the build > system of NumPy and SciPy, which is about as complicated as it gets with > Python packages. He's writing the Bento project ( > http://github.com/cournape/Bento) for packaging Python software, which > primary feature is not being distutils :-) Apparently, Bento is able to > build projects with Cython in it now...I've been meaning to check it out > but haven't got around to it. If Bento lives up to its promise and gets > released in a stable version it'd be nice to establish usage of Bento as > well as or instead of distutils in the Cython docs. Our problems would be highly reduced if setuptools knew about Cython, simply by doing "have_cython_or_pyrex" instead of "have_pyrex", same for distutils. If one of the core developers can push for that simply acceptance, then wonderful. Another option is to make our package conflict with pyrex and install a Pyrex.py that in turns imports Cython. -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -------------------------------------- MSN: barbieri at gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 From cournape at gmail.com Sun Sep 19 08:17:07 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 19 Sep 2010 15:17:07 +0900 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: <4C94C9DD.9030002@student.matnat.uio.no> Message-ID: > > Our problems would be highly reduced if setuptools knew about Cython, > simply by doing "have_cython_or_pyrex" instead of "have_pyrex", same > for distutils. If one of the core developers can push for that simply > acceptance, then wonderful. FWIW, that's exactly what's broken in distutils. The very fact that setuptools needs to know about cython *is* the issue. Because later, you will have issue because of distribute instead of setuptools, and then distutils2, and who knows. cheers, David From robertwb at math.washington.edu Tue Sep 21 08:11:06 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 20 Sep 2010 23:11:06 -0700 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: <4C94C9DD.9030002@student.matnat.uio.no> Message-ID: On Sat, Sep 18, 2010 at 11:17 PM, David Cournapeau wrote: >> >> Our problems would be highly reduced if setuptools knew about Cython, >> simply by doing "have_cython_or_pyrex" instead of "have_pyrex", same >> for distutils. If one of the core developers can push for that simply >> acceptance, then wonderful. > > FWIW, that's exactly what's broken in distutils. The very fact that > setuptools needs to know about cython *is* the issue. Because later, > you will have issue because of distribute instead of setuptools, and > then distutils2, and who knows. Hence http://wiki.cython.org/enhancements/distutils_preprocessing . Of course having setuptools know about Cython would be a simple step forward--is the maintainer likely to take a patch? - Robert From cournape at gmail.com Tue Sep 21 16:42:26 2010 From: cournape at gmail.com (David Cournapeau) Date: Tue, 21 Sep 2010 23:42:26 +0900 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: <4C94C9DD.9030002@student.matnat.uio.no> Message-ID: On Tue, Sep 21, 2010 at 3:11 PM, Robert Bradshaw wrote: > On Sat, Sep 18, 2010 at 11:17 PM, David Cournapeau wrote: >>> >>> Our problems would be highly reduced if setuptools knew about Cython, >>> simply by doing "have_cython_or_pyrex" instead of "have_pyrex", same >>> for distutils. If one of the core developers can push for that simply >>> acceptance, then wonderful. >> >> FWIW, that's exactly what's broken in distutils. The very fact that >> setuptools needs to know about cython *is* the issue. Because later, >> you will have issue because of distribute instead of setuptools, and >> then distutils2, and who knows. > > Hence http://wiki.cython.org/enhancements/distutils_preprocessing . THe problem of those solutions is that you are only moving the issue one layer above. For example, what if you wanted to generate .pyx from some .pyx.src, running through numpy templating system ? You need something more declarative in nature to be truely flexible IMO. Now, this is obviously a good enough solution in the short term. > > Of course having setuptools know about Cython would be a simple step > forward--is the maintainer likely to take a patch? You would have to ask both setuptools and distribute maintainers. I am not sure what the status of distribute is with the recent focus of distutils2 (which of course has exactly the same issue, except that now the API is incompatible...). David From robertwb at math.washington.edu Tue Sep 21 17:59:18 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 21 Sep 2010 08:59:18 -0700 Subject: [Cython] [PATCH] Force GCC>=4 to export module initialization function. In-Reply-To: References: <4C94C9DD.9030002@student.matnat.uio.no> Message-ID: On Tue, Sep 21, 2010 at 7:42 AM, David Cournapeau wrote: > On Tue, Sep 21, 2010 at 3:11 PM, Robert Bradshaw > wrote: >> On Sat, Sep 18, 2010 at 11:17 PM, David Cournapeau wrote: >>>> >>>> Our problems would be highly reduced if setuptools knew about Cython, >>>> simply by doing "have_cython_or_pyrex" instead of "have_pyrex", same >>>> for distutils. If one of the core developers can push for that simply >>>> acceptance, then wonderful. >>> >>> FWIW, that's exactly what's broken in distutils. The very fact that >>> setuptools needs to know about cython *is* the issue. Because later, >>> you will have issue because of distribute instead of setuptools, and >>> then distutils2, and who knows. >> >> Hence http://wiki.cython.org/enhancements/distutils_preprocessing . > > THe problem of those solutions is that you are only moving the issue > one layer above. For example, what if you wanted to generate .pyx from > some .pyx.src, running through numpy templating system ? You need > something more declarative in nature to be truely flexible IMO. I would consider that part of a previous build step, not something that should necessarily be part of Cython itself. My goal was to provide the mechanics of resolving Cython dependancies, transitive inline directives (such as language and library dependancies), and the .pyx to .c[pp] translation step as a self-contained, simple API. Hopefully one of the dozen or so packaging systems out there can make use of this (or they can adapt/roll their own). In the mean time, it is hopefully simple enough to use directly as well. I certainly don't have an answer to the a (hard) question of building and distributing packages. > Now, this is obviously a good enough solution in the short term. > >> >> Of course having setuptools know about Cython would be a simple step >> forward--is the maintainer likely to take a patch? > > You would have to ask both setuptools and distribute maintainers. I am > not sure what the status of distribute is with the recent focus of > distutils2 (which of course has exactly the same issue, except that > now the API is incompatible...). Yeah, I haven't followed what's going on there too closely, except there doesn't seem to be a clear direction yet. - Robert From stefan at sun.ac.za Wed Sep 22 11:57:14 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 22 Sep 2010 11:57:14 +0200 Subject: [Cython] Changes from 0.12 to 0.13 Message-ID: Hi all, The image processing scikit fails to build under 0.13 with the following cryptic error: AttributeError: 'UnspecifiedType' object has no attribute 'create_from_py_utility_code' (full traceback below) Have you seen this before? Regards St?fan cython -o /home/stefan/src/scikits.image/scikits/image/opencv/opencv_cv.c.new /home/stefan/src/scikits.image/scikits/image/opencv/opencv_cv.pyx Traceback (most recent call last): File "/home/stefan/bin/cython", line 8, in main(command_line = 1) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Main.py", line 767, in main result = compile(sources, options) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Main.py", line 742, in compile return compile_multiple(source, options) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Main.py", line 714, in compile_multiple result = run_pipeline(source, options) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Main.py", line 583, in run_pipeline err, enddata = context.run_pipeline(pipeline, source) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Main.py", line 224, in run_pipeline data = phase(data) File "Visitor.py", line 276, in Cython.Compiler.Visitor.CythonTransform.__call__ (/home/stefan/src/cython/Cython/Compiler/Visitor.c:4924) File "Visitor.py", line 259, in Cython.Compiler.Visitor.VisitorTransform.__call__ (/home/stefan/src/cython/Cython/Compiler/Visitor.c:4684) File "Visitor.py", line 28, in Cython.Compiler.Visitor.BasicVisitor.visit (/home/stefan/src/cython/Cython/Compiler/Visitor.c:1178) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/ParseTreeTransforms.py", line 1150, in visit_ModuleNode node.body.analyse_expressions(node.scope) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Nodes.py", line 346, in analyse_expressions stat.analyse_expressions(env) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Nodes.py", line 346, in analyse_expressions stat.analyse_expressions(env) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Nodes.py", line 3270, in analyse_expressions self.analyse_types(env) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/Nodes.py", line 3364, in analyse_types self.rhs = self.rhs.coerce_to(self.lhs.type, env) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/ExprNodes.py", line 572, in coerce_to src = CoerceFromPyTypeNode(dst_type, src, env) File "/home/stefan/lib/python2.6/site-packages/Cython/Compiler/ExprNodes.py", line 6712, in __init__ if not result_type.create_from_py_utility_code(env): AttributeError: 'UnspecifiedType' object has no attribute 'create_from_py_utility_code' From stefan at sun.ac.za Wed Sep 22 12:54:09 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 22 Sep 2010 12:54:09 +0200 Subject: [Cython] Cython slides / examples / exercises Message-ID: Hi all, I'm presenting a Cython introduction at the Trento Summerschool and, while I have some material (mostly NumPy related), I was wondering if there is any "stock" material floating around? Regards St?fan From stefan at sun.ac.za Wed Sep 22 12:54:09 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 22 Sep 2010 12:54:09 +0200 Subject: [Cython] Cython slides / examples / exercises Message-ID: Hi all, I'm presenting a Cython introduction at the Trento Summerschool and, while I have some material (mostly NumPy related), I was wondering if there is any "stock" material floating around? Regards St?fan From stefan at sun.ac.za Wed Sep 22 12:54:09 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 22 Sep 2010 12:54:09 +0200 Subject: [Cython] Cython slides / examples / exercises Message-ID: Hi all, I'm presenting a Cython introduction at the Trento Summerschool and, while I have some material (mostly NumPy related), I was wondering if there is any "stock" material floating around? Regards St?fan From stefan at sun.ac.za Wed Sep 22 12:54:09 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 22 Sep 2010 12:54:09 +0200 Subject: [Cython] Cython slides / examples / exercises Message-ID: Hi all, I'm presenting a Cython introduction at the Trento Summerschool and, while I have some material (mostly NumPy related), I was wondering if there is any "stock" material floating around? Regards St?fan From dagss at student.matnat.uio.no Wed Sep 22 13:12:49 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 22 Sep 2010 13:12:49 +0200 Subject: [Cython] Cython slides / examples / exercises In-Reply-To: References: Message-ID: <4C99E4B1.8010707@student.matnat.uio.no> St?fan van der Walt wrote: > Hi all, > > I'm presenting a Cython introduction at the Trento Summerschool and, > while I have some material (mostly NumPy related), I was wondering if > there is any "stock" material floating around? > My last talk is here: http://github.com/dagss/euroscipy2010/ and some other talks: http://wiki.cython.org/talks If you want the sources (Beamer/tex) of any of my talks please say so (and which); I couldn't find them right away so I'm not looking unless you want to use them. Dag Sverre From stephane.drouard at st.com Wed Sep 22 15:22:35 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Wed, 22 Sep 2010 15:22:35 +0200 Subject: [Cython] Crash with "nogil" and "except +" Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> Hello, With the following code: test.pyx -------- cdef extern from "foo.h": void foo() nogil except + def bar(): with nogil: foo() foo.c ----- void foo() { throw std::runtime_error("foo exception"); } The following crashes: python -c "import test; test.bar()" Dumping at the code of bar() generated by Cython: { PyThreadState *_save; Py_UNBLOCK_THREADS /*try:*/ { try {foo();} catch(...) {__Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} } /*finally:*/ { int __pyx_why; __pyx_why = 0; goto __pyx_L7; __pyx_L6: __pyx_why = 4; goto __pyx_L7; __pyx_L7:; Py_BLOCK_THREADS switch (__pyx_why) { case 4: goto __pyx_L1_error; } } } The problem is that __Pyx_CppExn2PyErr() accesses Python without the GIL. There are several ways of fixing this. 1/ The simplest, but not the most optimized one: acquire/release the GIL within __Pyx_CppExn2PyErr(). 2/ acquire/release the GIL in the catch clause only when "with nogil": try {foo();} catch(...) {PyGILState_STATE state = PyGILState_Ensure(); __Pyx_CppExn2PyErr(); PyGILState_Release(state); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} 3/ restore the thread in the catch clause: try {foo();} catch(...) { Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} } /*finally:*/ { int __pyx_why; Py_BLOCK_THREADS // <<<<<<<<<<<<<<<< __pyx_why = 0; goto __pyx_L7; __pyx_L6: __pyx_why = 4; goto __pyx_L7; __pyx_L7:; switch (__pyx_why) { case 4: goto __pyx_L1_error; } } } Cheers, Stephane From dalcinl at gmail.com Wed Sep 22 16:18:36 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 22 Sep 2010 11:18:36 -0300 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> Message-ID: On 22 September 2010 10:22, Stephane DROUARD wrote: > Hello, > > With the following code: > > test.pyx > -------- > cdef extern from "foo.h": > ? ?void foo() nogil except + > > def bar(): > ? ?with nogil: > ? ? ? ?foo() > > foo.c > ----- > void foo() > { > ?throw std::runtime_error("foo exception"); > } > > The following crashes: > ?python -c "import test; test.bar()" > > > Dumping at the code of bar() generated by Cython: > > ?{ PyThreadState *_save; > ? ?Py_UNBLOCK_THREADS > ? ?/*try:*/ { > ? ? ?try {foo();} catch(...) {__Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > ? ?} > ? ?/*finally:*/ { > ? ? ?int __pyx_why; > ? ? ?__pyx_why = 0; goto __pyx_L7; > ? ? ?__pyx_L6: __pyx_why = 4; goto __pyx_L7; > ? ? ?__pyx_L7:; > ? ? ?Py_BLOCK_THREADS > ? ? ?switch (__pyx_why) { > ? ? ? ?case 4: goto __pyx_L1_error; > ? ? ?} > ? ?} > ?} > > The problem is that __Pyx_CppExn2PyErr() accesses Python without the GIL. > Good catch. > There are several ways of fixing this. > > 1/ The simplest, but not the most optimized one: acquire/release the GIL within __Pyx_CppExn2PyErr(). > Why do you say (1) is not the most optimized one? (Sorry, I'm not good at understanding issues with thread-based concurrency) > 2/ acquire/release the GIL in the catch clause only when "with nogil": > ? ? ?try {foo();} catch(...) {PyGILState_STATE state = PyGILState_Ensure(); __Pyx_CppExn2PyErr(); PyGILState_Release(state); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > How would this be different to (1) ? > 3/ restore the thread in the catch clause: > ? ? ?try {foo();} catch(...) { Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > ? ?} > ? ?/*finally:*/ { > ? ? ?int __pyx_why; > ? ? ?Py_BLOCK_THREADS ?// <<<<<<<<<<<<<<<< > ? ? ?__pyx_why = 0; goto __pyx_L7; > ? ? ?__pyx_L6: __pyx_why = 4; goto __pyx_L7; > ? ? ?__pyx_L7:; > ? ? ?switch (__pyx_why) { > ? ? ? ?case 4: goto __pyx_L1_error; > ? ? ?} > ? ?} > ?} > And this one looks bad, I think you need: Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); Py_UNBLOCK_THREADS .. Am I right? What about doing this (ignore the line endings, all should be generated in a single line) { PyThreadState *_save; try { Py_UNBLOCK_THREADS foo(); Py_BLOCK_THREADS } catch(...) { Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); } Am I missing something? However, note that changing to this could be not so easy... -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stephane.drouard at st.com Wed Sep 22 16:57:44 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Wed, 22 Sep 2010 16:57:44 +0200 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> Lisandro Dalcin wrote: > > 1/ The simplest, but not the most optimized one: acquire/release the GIL within __Pyx_CppExn2PyErr(). > > Why do you say (1) is not the most optimized one? (Sorry, I'm not good > at understanding issues with thread-based concurrency) Because it will acquire/release the GIL, even for functions that are not called without "with nogil". But maybe acquiring/releasing the GIL is not costy (I haven't checked). > > 2/ acquire/release the GIL in the catch clause only when "with nogil": > > try {foo();} catch(...) {PyGILState_STATE state = PyGILState_Ensure(); > __Pyx_CppExn2PyErr(); PyGILState_Release(state); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; > __pyx_clineno = __LINE__; goto __pyx_L6;}} > > > > How would this be different to (1) ? Because it can only acquires/releases the GIL when a function is called "with nogil" and not for the others (but more complex to implement as Cython needs to know the "with nogil" context. > > 3/ restore the thread in the catch clause: > > try {foo();} catch(...) { Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); {__pyx_filename = > __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > > } > > /*finally:*/ { > > int __pyx_why; > > Py_BLOCK_THREADS // <<<<<<<<<<<<<<<< > > __pyx_why = 0; goto __pyx_L7; > > __pyx_L6: __pyx_why = 4; goto __pyx_L7; > > __pyx_L7:; > > switch (__pyx_why) { > > case 4: goto __pyx_L1_error; > > } > > } > > } > > > > And this one looks bad, I think you need: Py_BLOCK_THREADS > __Pyx_CppExn2PyErr(); Py_UNBLOCK_THREADS .. Am I right? > > What about doing this (ignore the line endings, all should be > generated in a single line) > > { PyThreadState *_save; > try { > Py_UNBLOCK_THREADS > foo(); > Py_BLOCK_THREADS > } catch(...) { > Py_BLOCK_THREADS > __Pyx_CppExn2PyErr(); > } > > Am I missing something? However, note that changing to this could be > not so easy... Because you may have several function calls within "with nogil": def bar(): with nogil: foo() foo() and the generated code contains as many try/catch around the calls but only one thread save/restore: { PyThreadState *_save; Py_UNBLOCK_THREADS /*try:*/ { try {foo();} catch(...) {__Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} try {foo();} catch(...) {__Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; __pyx_clineno = __LINE__; goto __pyx_L6;}} } /*finally:*/ { int __pyx_why; __pyx_why = 0; goto __pyx_L7; __pyx_L6: __pyx_why = 4; goto __pyx_L7; __pyx_L7:; Py_BLOCK_THREADS switch (__pyx_why) { case 4: goto __pyx_L1_error; } } } Stephane From dalcinl at gmail.com Wed Sep 22 17:16:37 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 22 Sep 2010 12:16:37 -0300 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> Message-ID: On 22 September 2010 11:57, Stephane DROUARD wrote: > Lisandro Dalcin wrote: >> > 1/ The simplest, but not the most optimized one: acquire/release the GIL > within __Pyx_CppExn2PyErr(). >> >> Why do you say (1) is not the most optimized one? (Sorry, I'm not good >> at understanding issues with thread-based concurrency) > > Because it will acquire/release the GIL, even for functions that are not called > without "with nogil". > But maybe acquiring/releasing the GIL is not costy (I haven't checked). > I'm not sure about this, either... >> > 2/ acquire/release the GIL in the catch clause only when "with nogil": >> > ? ? ?try {foo();} catch(...) {PyGILState_STATE state = PyGILState_Ensure(); >> __Pyx_CppExn2PyErr(); PyGILState_Release(state); {__pyx_filename = __pyx_f[0]; > __pyx_lineno = 6; >> __pyx_clineno = __LINE__; goto __pyx_L6;}} >> > >> >> How would this be different to (1) ? > > Because it can only acquires/releases the GIL when a function is called "with > nogil" and not for the others (but more complex to implement as Cython needs to > know the "with nogil" context. > >> > 3/ restore the thread in the catch clause: >> > ? ? ?try {foo();} catch(...) { Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); > {__pyx_filename = >> __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} >> > ? ?} >> > ? ?/*finally:*/ { >> > ? ? ?int __pyx_why; >> > ? ? ?Py_BLOCK_THREADS ?// <<<<<<<<<<<<<<<< >> > ? ? ?__pyx_why = 0; goto __pyx_L7; >> > ? ? ?__pyx_L6: __pyx_why = 4; goto __pyx_L7; >> > ? ? ?__pyx_L7:; >> > ? ? ?switch (__pyx_why) { >> > ? ? ? ?case 4: goto __pyx_L1_error; >> > ? ? ?} >> > ? ?} >> > ?} >> > >> >> And this one looks bad, I think you need: Py_BLOCK_THREADS >> __Pyx_CppExn2PyErr(); Py_UNBLOCK_THREADS .. Am I right? >> >> What about doing this (ignore the line endings, all should be >> generated in a single line) >> >> ? { PyThreadState *_save; >> ? ? try { >> ? ? ? Py_UNBLOCK_THREADS >> ? ? ? foo(); >> ? ? ? Py_BLOCK_THREADS >> ? ? } catch(...) { >> ? ? ? Py_BLOCK_THREADS >> ? ? ? __Pyx_CppExn2PyErr(); >> ? ? } >> >> Am I missing something? However, note that changing to this could be >> not so easy... > > Because you may have several function calls within "with nogil": > def bar(): > ? ?with nogil: > ? ? ? ?foo() > ? ? ? ?foo() > > and the generated code contains as many try/catch around the calls but only one > thread save/restore: > ?{ PyThreadState *_save; > ? ?Py_UNBLOCK_THREADS > ? ?/*try:*/ { > ? ? ?try {foo();} catch(...) {__Pyx_CppExn2PyErr(); {__pyx_filename = > __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > ? ? ?try {foo();} catch(...) {__Pyx_CppExn2PyErr(); {__pyx_filename = > __pyx_f[0]; __pyx_lineno = 7; __pyx_clineno = __LINE__; goto __pyx_L6;}} > ? ?} > ? ?/*finally:*/ { > ? ? ?int __pyx_why; > ? ? ?__pyx_why = 0; goto __pyx_L7; > ? ? ?__pyx_L6: __pyx_why = 4; goto __pyx_L7; > ? ? ?__pyx_L7:; > ? ? ?Py_BLOCK_THREADS > ? ? ?switch (__pyx_why) { > ? ? ? ?case 4: goto __pyx_L1_error; > ? ? ?} > ? ?} > ?} > OK, you are definitely right... A last thing: in my previous mail I commented about fixing you solution (3) like this: Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); Py_UNBLOCK_THREADS Is that right? Take into account that the /*finally:*/ block does Py_BLOCK_THREADS... If you can confirm that this works, I think you solution (3) with my fix (in case it is fine) is be better option, right? -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stephane.drouard at st.com Wed Sep 22 17:56:26 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Wed, 22 Sep 2010 17:56:26 +0200 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> Lisandro Dalcin wrote: > >> > 3/ restore the thread in the catch clause: > >> > try {foo();} catch(...) { Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); > > {__pyx_filename = > >> __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > >> > } > >> > /*finally:*/ { > >> > int __pyx_why; > >> > Py_BLOCK_THREADS // <<<<<<<<<<<<<<<< > >> > __pyx_why = 0; goto __pyx_L7; > >> > __pyx_L6: __pyx_why = 4; goto __pyx_L7; > >> > __pyx_L7:; > >> > switch (__pyx_why) { > >> > case 4: goto __pyx_L1_error; > >> > } > >> > } > >> > } > >> > > >> > >> And this one looks bad, I think you need: Py_BLOCK_THREADS > >> __Pyx_CppExn2PyErr(); Py_UNBLOCK_THREADS .. Am I right? > >> > A last thing: in my previous mail I commented about fixing you > solution (3) like this: > > Py_BLOCK_THREADS > __Pyx_CppExn2PyErr(); > Py_UNBLOCK_THREADS > > Is that right? Take into account that the /*finally:*/ block does > Py_BLOCK_THREADS... > > If you can confirm that this works, I think you solution (3) with my > fix (in case it is fine) is be better option, right? Yes it works fine. It's a bit less efficient than what I proposed, as, when an error occurs: - it restores the thread context before Pyx_CppExn2PyErr(), - saves the thread context after Pyx_CppExn2PyErr(), - restores it in the /*finally:*/ block. whereas what I proposed only restores once the context (before Pyx_CppExn2PyErr(), skipped in the /*finally:*/ block). But do we have to really care of performance when an error occurs? Stephane From craigcitro at gmail.com Wed Sep 22 17:58:50 2010 From: craigcitro at gmail.com (Craig Citro) Date: Wed, 22 Sep 2010 08:58:50 -0700 Subject: [Cython] Changes from 0.12 to 0.13 In-Reply-To: References: Message-ID: Hi Stefan, > AttributeError: 'UnspecifiedType' object has no attribute > 'create_from_py_utility_code' > > (full traceback below) > > Have you seen this before? > Unfortunately, yes. It means that the type inferencer left something in a halfway state (i.e. it didn't finish inferring types everywhere). I'd love to see the code (I dealt with dozens of these during the 0.13 release cycle), but as a temporary workaround you can either (1) add explicit type info for the variable it's complaining about, which may involve a little digging, or (2) turn off type inference completely. -cc From robertwb at math.washington.edu Wed Sep 22 17:59:34 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 22 Sep 2010 08:59:34 -0700 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> Message-ID: On Wed, Sep 22, 2010 at 8:56 AM, Stephane DROUARD wrote: > Lisandro Dalcin wrote: > >> >> > 3/ restore the thread in the catch clause: >> >> > ? ? ?try {foo();} catch(...) { Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); >> > {__pyx_filename = >> >> __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} >> >> > ? ?} >> >> > ? ?/*finally:*/ { >> >> > ? ? ?int __pyx_why; >> >> > ? ? ?Py_BLOCK_THREADS ?// <<<<<<<<<<<<<<<< >> >> > ? ? ?__pyx_why = 0; goto __pyx_L7; >> >> > ? ? ?__pyx_L6: __pyx_why = 4; goto __pyx_L7; >> >> > ? ? ?__pyx_L7:; >> >> > ? ? ?switch (__pyx_why) { >> >> > ? ? ? ?case 4: goto __pyx_L1_error; >> >> > ? ? ?} >> >> > ? ?} >> >> > ?} >> >> > >> >> >> >> And this one looks bad, I think you need: Py_BLOCK_THREADS >> >> __Pyx_CppExn2PyErr(); Py_UNBLOCK_THREADS .. Am I right? >> >> >> A last thing: in my previous mail I commented about fixing you >> solution (3) like this: >> >> Py_BLOCK_THREADS >> __Pyx_CppExn2PyErr(); >> Py_UNBLOCK_THREADS >> >> Is that right? Take into account that the /*finally:*/ ?block does >> Py_BLOCK_THREADS... >> >> If you can confirm that this works, I think you solution (3) with my >> fix (in case it is fine) is be better option, right? > > Yes it works fine. > It's a bit less efficient than what I proposed, as, when an error occurs: > ?- it restores the thread context before Pyx_CppExn2PyErr(), > ?- saves the thread context after Pyx_CppExn2PyErr(), > ?- restores it in the /*finally:*/ block. > whereas what I proposed only restores once the context (before Pyx_CppExn2PyErr(), skipped in the /*finally:*/ block). > > But do we have to really care of performance when an error occurs? We certainly care more about the non-error performance (and correctness of course :). - Robert From dalcinl at gmail.com Wed Sep 22 18:12:41 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 22 Sep 2010 13:12:41 -0300 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> Message-ID: On 22 September 2010 12:56, Stephane DROUARD wrote: >> >> If you can confirm that this works, I think you solution (3) with my >> fix (in case it is fine) is be better option, right? > > Yes it works fine. Nice. > It's a bit less efficient than what I proposed, as, when an error occurs: > ?- it restores the thread context before Pyx_CppExn2PyErr(), > ?- saves the thread context after Pyx_CppExn2PyErr(), > ?- restores it in the /*finally:*/ block. > whereas what I proposed only restores once the context (before Pyx_CppExn2PyErr(), skipped in the /*finally:*/ block). > Indeed (but only when an error occurs)... However, I'm a bit confused: Is my added restore in the finally block actually necessary? Is you original implementation correct if you do not restore in the finally block? > > But do we have to really care of performance when an error occurs? > I do not advocate using exceptions for control flow (though I do it in a few cases), so I do not care about performance in this case. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stephane.drouard at st.com Wed Sep 22 18:38:28 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Wed, 22 Sep 2010 18:38:28 +0200 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com> Lisandro Dalcin wrote: > > It's a bit less efficient than what I proposed, as, when an error occurs: > > - it restores the thread context before Pyx_CppExn2PyErr(), > > - saves the thread context after Pyx_CppExn2PyErr(), > > - restores it in the /*finally:*/ block. > > whereas what I proposed only restores once the context (before Pyx_CppExn2PyErr(), skipped in the > > /*finally:*/ block). > > Indeed (but only when an error occurs)... However, I'm a bit confused: > Is my added restore in the finally block actually necessary? Yes, to re-acquire the GIL. > Is you > original implementation correct if you do not restore in the finally > block? You need to restore in case you reach this block through the normal flow (no exceptions). Note that you can restore at the end of the /*try:*/ block, instead of in the /*finally:*/ block: { PyThreadState *_save; Py_UNBLOCK_THREADS /*try:*/ { try {foo();} catch(...) {Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} Py_BLOCK_THREADS } /*finally:*/ { int __pyx_why; __pyx_why = 0; goto __pyx_L7; __pyx_L6: __pyx_why = 4; goto __pyx_L7; __pyx_L7:; switch (__pyx_why) { case 4: goto __pyx_L1_error; } } } (it's exactly equivalent to my original proposal, but maybe cleaner that way...) > > > > But do we have to really care of performance when an error occurs? > > > > I do not advocate using exceptions for control flow (though I do it in > a few cases), so I do not care about performance in this case. Fine. Stephane From dalcinl at gmail.com Wed Sep 22 18:56:45 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 22 Sep 2010 13:56:45 -0300 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com> Message-ID: On 22 September 2010 13:38, Stephane DROUARD wrote: > > Note that you can restore at the end of the /*try:*/ block, instead of in the /*finally:*/ block: > > ?{ PyThreadState *_save; > ? ?Py_UNBLOCK_THREADS > ? ?/*try:*/ { > ? ? ?try {foo();} catch(...) {Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > ? ? ?Py_BLOCK_THREADS > ? ?} > ? ?/*finally:*/ { > ? ? ?int __pyx_why; > ? ? ?__pyx_why = 0; goto __pyx_L7; > ? ? ?__pyx_L6: __pyx_why = 4; goto __pyx_L7; > ? ? ?__pyx_L7:; > ? ? ?switch (__pyx_why) { > ? ? ? ?case 4: goto __pyx_L1_error; > ? ? ?} > ? ?} > ?} > > (it's exactly equivalent to my original proposal, but maybe cleaner that way...) > Sorry, I'm still confused. In case of errors, your code does Py_BLOCK_THREADS twice! Is that fine? Isn't a matching Py_UNBLOCK_THREADS required after __Pyx_CppExn2PyErr()? In short, is the pure C code below right? Py_BLOCK_THREADS foo() Py_UNBLOCK_THREADS Py_UNBLOCK_THREADS -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stefan at sun.ac.za Wed Sep 22 19:17:08 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 22 Sep 2010 19:17:08 +0200 Subject: [Cython] Changes from 0.12 to 0.13 In-Reply-To: References: Message-ID: Hi Craig On Wed, Sep 22, 2010 at 5:58 PM, Craig Citro wrote: >> Have you seen this before? >> > Unfortunately, yes. It means that the type inferencer left something > in a halfway state (i.e. it didn't finish inferring types everywhere). > I'd love to see the code (I dealt with dozens of these during the 0.13 > release cycle), but as a temporary workaround you can either (1) add > explicit type info for the variable it's complaining about, which may > involve a little digging, or (2) turn off type inference completely. Thanks for the quick response. You are more than welcome to have a look at the code here: http://github.com/stefanv/scikits.image/blob/master/scikits/image/opencv/opencv_cv.pyx The file is fairly short (mostly docstring). Regards St?fan From craigcitro at gmail.com Wed Sep 22 19:42:02 2010 From: craigcitro at gmail.com (Craig Citro) Date: Wed, 22 Sep 2010 10:42:02 -0700 Subject: [Cython] Changes from 0.12 to 0.13 In-Reply-To: References: Message-ID: Hi Stefan, So here's an ugly answer: put the `cimport numpy as np` before the `import numpy as np`. I've got "clean up the semantics of import vs. cimport" on my todo list, but it's a big ugly ball of sadness, so it's still sitting on the todo list. ;) Probably the first fix is to write a simple pass that moves the cimports up ... I didn't actually compile the resulting C code (I don't have numpy on this machine), but let me know if that just generates bad C code. -cc From stephane.drouard at st.com Wed Sep 22 20:01:30 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Wed, 22 Sep 2010 20:01:30 +0200 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com>, Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDDFE1C30@SAFEX1MAIL2.st.com> Lisandro Dalcin wrote: > > Note that you can restore at the end of the /*try:*/ block, instead of in the /*finally:*/ block: > > > > { PyThreadState *_save; > > Py_UNBLOCK_THREADS > > /*try:*/ { > > try {foo();} catch(...) {Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > > Py_BLOCK_THREADS > > } > > /*finally:*/ { > > int __pyx_why; > > __pyx_why = 0; goto __pyx_L7; > > __pyx_L6: __pyx_why = 4; goto __pyx_L7; > > __pyx_L7:; > > switch (__pyx_why) { > > case 4: goto __pyx_L1_error; > > } > > } > > } > > > > (it's exactly equivalent to my original proposal, but maybe cleaner that way...) > > Sorry, I'm still confused. In case of errors, your code does > Py_BLOCK_THREADS twice! Is that fine? Isn't a matching > Py_UNBLOCK_THREADS required after __Pyx_CppExn2PyErr()? In short, is > the pure C code below right? > > Py_BLOCK_THREADS > foo() > Py_UNBLOCK_THREADS > Py_UNBLOCK_THREADS Py_UNBLOCK_THREADS and Py_BLOCK_THREADS are only called once each, whatever the "route". In case of no exceptions, the route is: Py_UNBLOCK_THREADS /*try:*/ { foo(); Py_BLOCK_THREADS // The one at the end of the /*try:*/ block, not the one in the catch block. } /*finally:*/ { ... } In case of an exception, the route is: Py_UNBLOCK_THREADS /*try:*/ { try {foo();} catch(...) {Py_BLOCK_THREADS ... goto __pyx_L6;}} } /*finally:*/ { __pyx_L6: ... } Py_BLOCK_THREADS is called from the catch block. Due to "goto __pyx_L6", the Py_BLOCK_THREADS at the end of the /*try:*/ block is skipped. Stephane From dalcinl at gmail.com Wed Sep 22 20:58:06 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 22 Sep 2010 15:58:06 -0300 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDDFE1C30@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDDFE1C30@SAFEX1MAIL2.st.com> Message-ID: On 22 September 2010 15:01, Stephane DROUARD wrote: > Lisandro Dalcin wrote: > >> > Note that you can restore at the end of the /*try:*/ block, instead of in the /*finally:*/ block: >> > >> > ?{ PyThreadState *_save; >> > ? ?Py_UNBLOCK_THREADS >> > ? ?/*try:*/ { >> > ? ? ?try {foo();} catch(...) {Py_BLOCK_THREADS __Pyx_CppExn2PyErr(); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} >> > ? ? ?Py_BLOCK_THREADS >> > ? ?} >> > ? ?/*finally:*/ { >> > ? ? ?int __pyx_why; >> > ? ? ?__pyx_why = 0; goto __pyx_L7; >> > ? ? ?__pyx_L6: __pyx_why = 4; goto __pyx_L7; >> > ? ? ?__pyx_L7:; >> > ? ? ?switch (__pyx_why) { >> > ? ? ? ?case 4: goto __pyx_L1_error; >> > ? ? ?} >> > ? ?} >> > ?} >> > >> > (it's exactly equivalent to my original proposal, but maybe cleaner that way...) >> >> Sorry, I'm still confused. In case of errors, your code does >> Py_BLOCK_THREADS twice! Is that fine? Isn't a matching >> Py_UNBLOCK_THREADS required after __Pyx_CppExn2PyErr()? In short, is >> the pure C code below right? >> >> Py_BLOCK_THREADS >> foo() >> Py_UNBLOCK_THREADS >> Py_UNBLOCK_THREADS > > Py_UNBLOCK_THREADS and Py_BLOCK_THREADS are only called once each, whatever the "route". > > In case of no exceptions, the route is: > > ?Py_UNBLOCK_THREADS > ?/*try:*/ { > ? ?foo(); > ? ?Py_BLOCK_THREADS ?// The one at the end of the /*try:*/ block, not the one in the catch block. > ?} > ?/*finally:*/ { > ? ?... > ?} > > In case of an exception, the route is: > > ?Py_UNBLOCK_THREADS > ?/*try:*/ { > ? ?try {foo();} catch(...) {Py_BLOCK_THREADS ... goto __pyx_L6;}} > ?} > ?/*finally:*/ { > ? ?__pyx_L6: ... > ?} > > Py_BLOCK_THREADS is called from the catch block. > Due to "goto __pyx_L6", the Py_BLOCK_THREADS at the end of the /*try:*/ block is skipped. > Ah! now I see.. so you are removing the Py_BLOCK_THREADS from finally block and moving it up to the try block. Looks good... -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stefan at sun.ac.za Wed Sep 22 22:24:14 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 22 Sep 2010 22:24:14 +0200 Subject: [Cython] Changes from 0.12 to 0.13 In-Reply-To: References: Message-ID: Hi Craig On Wed, Sep 22, 2010 at 7:42 PM, Craig Citro wrote: > So here's an ugly answer: put the `cimport numpy as np` before the > `import numpy as np`. I've got "clean up the semantics of import vs. > cimport" on my todo list, but it's a big ugly ball of sadness, so it's > still sitting on the todo list. ;) Probably the first fix is to write > a simple pass that moves the cimports up ... Perfect, that did the job! Thanks a lot. Regards St?fan From stephane.drouard at st.com Thu Sep 23 11:42:54 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Thu, 23 Sep 2010 11:42:54 +0200 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDDFE1C30@SAFEX1MAIL2.st.com> Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EDCB@SAFEX1MAIL2.st.com> Lisandro Dalcin wrote: > Ah! now I see.. so you are removing the Py_BLOCK_THREADS from finally > block and moving it up to the try block. Looks good... OK. I checked other except clauses and propose a solution to fix them. 1/ void foo() nogil except +MemoryError Generated: try {foo();} catch(...) { try { throw; } catch(const std::exception& exn) { PyErr_SetString(__pyx_builtin_MemoryError, exn.what()); } catch(...) { PyErr_SetNone(__pyx_builtin_MemoryError); }; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} Proposed fix: try {foo();} catch(...) { Py_BLOCK_THREADS try { throw; } catch(const std::exception& exn) { PyErr_SetString(__pyx_builtin_MemoryError, exn.what()); } catch(...) { PyErr_SetNone(__pyx_builtin_MemoryError); }; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} 2/ void foo() nogil except * Generated: foo(); if (unlikely(PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} Proposed fix (need to restore the context before checking and save again in case there is no error): foo(); Py_BLOCK_THREADS if (unlikely(PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} Py_UNBLOCK_THREADS 3/ int foo() nogil except -1 Generated: __pyx_t_1 = foo(); if (unlikely(__pyx_t_1 == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} Proposed fix: __pyx_t_1 = foo(); if (unlikely(__pyx_t_1 == -1)) { Py_BLOCK_THREADS __pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} 4/ int foo() nogil except? -1 Generated: __pyx_t_1 = foo(); if (unlikely(__pyx_t_1 == -1 && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} Proposed fix: __pyx_t_1 = foo(); if (unlikely(__pyx_t_1 == -1)) { Py_BLOCK_THREADS if (unlikely(PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} Py_UNBLOCK_THREADS } Cheers, Stephane From dalcinl at gmail.com Thu Sep 23 15:49:30 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 23 Sep 2010 10:49:30 -0300 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EDCB@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDDFE1C30@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EDCB@SAFEX1MAIL2.st.com> Message-ID: On 23 September 2010 06:42, Stephane DROUARD wrote: > Lisandro Dalcin wrote: > >> Ah! now I see.. so you are removing the Py_BLOCK_THREADS from finally >> block and moving it up to the try block. ?Looks good... > > OK. > > > I checked other except clauses and propose a solution to fix them. > > 1/ void foo() nogil except +MemoryError > > Generated: > ? ? ?try {foo();} catch(...) { try { throw; } catch(const std::exception& exn) { PyErr_SetString(__pyx_builtin_MemoryError, exn.what()); } catch(...) { PyErr_SetNone(__pyx_builtin_MemoryError); }; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > > Proposed fix: > ? ? ?try {foo();} catch(...) { Py_BLOCK_THREADS try { throw; } catch(const std::exception& exn) { PyErr_SetString(__pyx_builtin_MemoryError, exn.what()); } catch(...) { PyErr_SetNone(__pyx_builtin_MemoryError); }; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;}} > > > 2/ void foo() nogil except * > > Generated: > ? ? ?foo(); if (unlikely(PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} > > Proposed fix (need to restore the context before checking and save again in case there is no error): > ? ? ?foo(); Py_BLOCK_THREADS if (unlikely(PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} Py_UNBLOCK_THREADS > > > 3/ int foo() nogil except -1 > > Generated: > ? ? ?__pyx_t_1 = foo(); if (unlikely(__pyx_t_1 == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} > > Proposed fix: > ? ? ?__pyx_t_1 = foo(); if (unlikely(__pyx_t_1 == -1)) { Py_BLOCK_THREADS __pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} > > > 4/ int foo() nogil except? -1 > > Generated: > ? ? ?__pyx_t_1 = foo(); if (unlikely(__pyx_t_1 == -1 && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} > > Proposed fix: > ? ? ?__pyx_t_1 = foo(); if (unlikely(__pyx_t_1 == -1)) { Py_BLOCK_THREADS if (unlikely(PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L6;} Py_UNBLOCK_THREADS } > > > Cheers, Any chance you could help us by writing a comprehensive testcase to speedup the resolution of this issue? -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stephane.drouard at st.com Fri Sep 24 12:37:36 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Fri, 24 Sep 2010 12:37:36 +0200 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDDFE1C30@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EDCB@SAFEX1MAIL2.st.com> Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05F342@SAFEX1MAIL2.st.com> Lisandro Dalcin wrote: > Any chance you could help us by writing a comprehensive testcase to > speedup the resolution of this issue? It depends what you really expect. I assume you already have tests which exercise the 'except' clause. So for me we only have to duplicate those tests declaring/adapting the C functions as 'nogil' and calling them "with nogil". Do you agree? If so, I may consider doing the porting of those tests. How can we proceed? Ideally, if you could send me the tests of the except clause (source, build, check, ...) I can modify them and sent them back to you to integrate into your database. Stephane From dalcinl at gmail.com Fri Sep 24 15:45:07 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 24 Sep 2010 10:45:07 -0300 Subject: [Cython] Crash with "nogil" and "except +" In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05F342@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EAA0@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EB64@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBBB@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EBD8@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDDFE1C30@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05EDCB@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE05F342@SAFEX1MAIL2.st.com> Message-ID: On 24 September 2010 07:37, Stephane DROUARD wrote: > Lisandro Dalcin wrote: > >> Any chance you could help us by writing a comprehensive testcase to >> speedup the resolution of this issue? > > It depends what you really expect. > > I assume you already have tests which exercise the 'except' clause. > So for me we only have to duplicate those tests declaring/adapting the C functions as 'nogil' and calling them "with nogil". > Do you agree? > > If so, I may consider doing the porting of those tests. > Yes, I'm talking about you writing a test_except_nogil.pyx with Cython code, probably using cdef extern from to include a header where you can write foo(int i), bar(int i), these functions throwing C++ exceptions if i!=0, otherwise they do nothing > How can we proceed? > Ideally, if you could send me the tests of the except clause (source, build, check, ...) I can modify them and sent them back to you to integrate into your database. > Take a look at tests/run/cpp_exceptions.pyx, specially at test_int_raw(). Your test is going to be a bit different, because you are going to use nogil functions raising C++ exceptions. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From dalcinl at gmail.com Fri Sep 24 17:10:33 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 24 Sep 2010 12:10:33 -0300 Subject: [Cython] [PATCH] Cython: C++ exceptions and nogil Message-ID: Stephane, let's change our deal... Here you have a patch with new testcase included, you can use "hg import --no-commit cpp_exc_nogil.diff" to apply the patch. Try to run the test on your side (python runtests.py cpp_exceptions_nogil), next test with your own code. Next try hard to break my fix :-). If all is fine, I'll push it. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 -------------- next part -------------- A non-text attachment was scrubbed... Name: cpp_exc_nogil.diff Type: text/x-patch Size: 4538 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20100924/17b463be/attachment.bin From kayhayen at gmx.de Sat Sep 25 09:09:24 2010 From: kayhayen at gmx.de (Kay Hayen) Date: Sat, 25 Sep 2010 09:09:24 +0200 Subject: [Cython] Optimal/recommended compiler option sets for Cython Message-ID: <4C9DA024.8030003@gmx.de> Hello, when I compile pure Python code with Cython 0.12 and/or Cython 0.13, which are the options recommended for optimal performance. I am doing some comparative benchmarking to CPython based on Valgrind and I wonder if I am doing it correctly. Currently I only use: cython --embed and from cython --help (0.12) there wasn't much else I could try. so is that all? Any difference expected between C and C++ code generation? And then what to use for gcc (the C/C++ compiler of choice or is any other working for you better on Linux), simply -O3 and be done with it? Or is there any experience indicating that some special options will give measurable improvements? Obviously I would like to first use the correct options and only then publish results. So please let me know. :-) Yours, Kay From robertwb at math.washington.edu Sat Sep 25 09:34:13 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 25 Sep 2010 00:34:13 -0700 Subject: [Cython] Optimal/recommended compiler option sets for Cython In-Reply-To: <4C9DA024.8030003@gmx.de> References: <4C9DA024.8030003@gmx.de> Message-ID: On Sat, Sep 25, 2010 at 12:09 AM, Kay Hayen wrote: > > Hello, > > when I compile pure Python code with Cython 0.12 and/or Cython 0.13, > which are the options recommended for optimal performance. I am doing > some comparative benchmarking to CPython based on Valgrind and I wonder > if I am doing it correctly. > > Currently I only use: > > cython --embed > > and from cython --help (0.12) > > there wasn't much else I could try. so is that all? In general, there aren't dozens of knobs to twiddle like there are with most compilers. There are a couple of directives at http://wiki.cython.org/enhancements/compilerdirectives that might make a difference. For example, setting boundscheck and wraparound to False will make things faster (removing the checks for negative and out-of-bounds access of course), and infer_types=True (again, this creates semantic changes, as integer literals will be inferred to be C ints and thus might overflow). And I'd imagine that 0.13 will be an improvement over 0.12, at least for some things. > Any difference expected between C and C++ code generation? Not that I am aware of. Cython doesn't use any specific C++ features (unless you're explicitly linking against C++ libraries) so it should all be the same (but it'd be interesting to test). > And then what to use for gcc (the C/C++ compiler of choice or is any > other working for you better on Linux), simply -O3 and be done with it? That's all I do, but there's a lot of options to try, so it'd be interesting to see if any make a noticeable impact. (Same with comparing various compilers.) > Or is there any experience indicating that some special options will > give measurable improvements? > > Obviously I would like to first use the correct options and only then > publish results. So please let me know. :-) It should be noted that most of the effort has gone into making *annotated* code fast, as most users care more about a 100x speedup with a little bit of work than a 2x speedup for free. - Robert From kayhayen at gmx.de Sat Sep 25 10:15:00 2010 From: kayhayen at gmx.de (Kay Hayen) Date: Sat, 25 Sep 2010 10:15:00 +0200 Subject: [Cython] Optimal/recommended compiler option sets for Cython In-Reply-To: References: <4C9DA024.8030003@gmx.de> Message-ID: <4C9DAF84.2050806@gmx.de> Hello Robert, >> there wasn't much else I could try. so is that all? > > In general, there aren't dozens of knobs to twiddle like there are > with most compilers. > > There are a couple of directives at > http://wiki.cython.org/enhancements/compilerdirectives that might make > a difference. Ah my bad, I searched for "Options". Might be a good idea to make these a bit more prominent. > For example, setting boundscheck and wraparound to False > will make things faster (removing the checks for negative and > out-of-bounds access of course), and infer_types=True (again, this > creates semantic changes, as integer literals will be inferred to be C > ints and thus might overflow). And I'd imagine that 0.13 will be an > improvement over 0.12, at least for some things. Thanks for the pointers, I believe I will use these sets then: compatability: infer_types = True # speed nonecheck = True # CPython compatability wraparound = True # CPython compatability boundscheck = True # CPython compatability cdivision = True # CPython compatability speed: infer_types = True nonecheck = False wraparound = False boundscheck = False cdivision = False >> And then what to use for gcc (the C/C++ compiler of choice or is any >> other working for you better on Linux), simply -O3 and be done with it? > > That's all I do, but there's a lot of options to try, so it'd be > interesting to see if any make a noticeable impact. (Same with > comparing various compilers.) I will play around for sure. >> Or is there any experience indicating that some special options will >> give measurable improvements? >> >> Obviously I would like to first use the correct options and only then >> publish results. So please let me know. :-) > > It should be noted that most of the effort has gone into making > *annotated* code fast, as most users care more about a 100x speedup > with a little bit of work than a 2x speedup for free. I will always make that clear when posting results, Robert. I am a minority. Using the above option set "compatability" indicates that already. I am not not against annotations at all, I just believe they should be in Python and ideally do a check in Python too. And initially I want to stretch the borders of what's possible with automatic type inference or guessing. Thanks for this, I will come back with results then. I will include 2 sets of cython directives, aggressive and compatability and see how far I get with that. Yours, Kay From dsdale24 at gmail.com Sat Sep 25 17:10:33 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Sat, 25 Sep 2010 11:10:33 -0400 Subject: [Cython] question about cimports, python3 Message-ID: Hello, I am attempting to contribute to the h5py project by porting the code to python3. The code is available in the py3k branch at github (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to build and install h5py using python3, but when I try to import it, I get the following error: Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import h5py Traceback (most recent call last): File "", line 1, in File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", line 24, in from . import h5 File "h5e.pxd", line 20, in init h5py.h5py.h5 (h5py/h5.c:5499) ImportError: No module named h5py.h5e I don't understand why the error message refers to "h5py.h5py.h5", should I be concerned about this? I checked the contents of my h5py install directory, and h5e.{py, pyc, so, pyx} is certainly there. Here is the top of my h5py/h5.pyx file: --- include "config.pxi" from h5e cimport register_thread import atexit import threading --- Could anyone offer a suggestion, or prod me for additional information that I should have provided? Thanks, Darren From dalcinl at gmail.com Sat Sep 25 20:03:12 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 25 Sep 2010 15:03:12 -0300 Subject: [Cython] question about cimports, python3 In-Reply-To: References: Message-ID: On 25 September 2010 12:10, Darren Dale wrote: > Hello, > > I am attempting to contribute to the h5py project by porting the code > to python3. The code is available in the py3k branch at github > (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to > build and install h5py using python3, but when I try to import it, I > get the following error: > > Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) > [GCC 4.4.5] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import h5py > Traceback (most recent call last): > ?File "", line 1, in > ?File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", > line 24, in Is h5py-1.3.1.dev-py3.1-linux-x86_64.egg a directory or a zip file? If it is a zip file, then: 1) remove your install (do not forget to remove the entry for easy-install.pth). 2) next edit your setup.py and pass zip_safe=False to the setup() function call. Finally, consider stop using setuptools. And stop using easy_install, use pip. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From dsdale24 at gmail.com Sat Sep 25 21:04:39 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Sat, 25 Sep 2010 15:04:39 -0400 Subject: [Cython] question about cimports, python3 In-Reply-To: References: Message-ID: On Sat, Sep 25, 2010 at 2:03 PM, Lisandro Dalcin wrote: > On 25 September 2010 12:10, Darren Dale wrote: >> Hello, >> >> I am attempting to contribute to the h5py project by porting the code >> to python3. The code is available in the py3k branch at github >> (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to >> build and install h5py using python3, but when I try to import it, I >> get the following error: >> >> Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) >> [GCC 4.4.5] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import h5py >> Traceback (most recent call last): >> ?File "", line 1, in >> ?File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", >> line 24, in > > Is h5py-1.3.1.dev-py3.1-linux-x86_64.egg a directory or a zip file? It is a directory, not a zipfile. [...] > Finally, consider stop using setuptools. And stop using easy_install, use pip. The trouble I am having is not related to distribute/setuptools/easy_install. I get the same import errors if I clean my environment of distribute/setuptools/easy_install/h5py, and then build and install h5py from scratch using distutils. Darren From dalcinl at gmail.com Sat Sep 25 23:25:09 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 25 Sep 2010 18:25:09 -0300 Subject: [Cython] question about cimports, python3 In-Reply-To: References: Message-ID: On 25 September 2010 16:04, Darren Dale wrote: > On Sat, Sep 25, 2010 at 2:03 PM, Lisandro Dalcin wrote: >> On 25 September 2010 12:10, Darren Dale wrote: >>> Hello, >>> >>> I am attempting to contribute to the h5py project by porting the code >>> to python3. The code is available in the py3k branch at github >>> (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to >>> build and install h5py using python3, but when I try to import it, I >>> get the following error: >>> >>> Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) >>> [GCC 4.4.5] on linux2 >>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> import h5py >>> Traceback (most recent call last): >>> ?File "", line 1, in >>> ?File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", >>> line 24, in >> >> Is h5py-1.3.1.dev-py3.1-linux-x86_64.egg a directory or a zip file? > > It is a directory, not a zipfile. > OK. I've cloned your repo and tested with cython-devel, and I cannot reproduce your error with Python 2.6. With Python 3.1 (debug build), I get this: $ python3.1 Python 3.1.1 (r311:74480, Jun 25 2010, 11:49:56) [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import h5py Traceback (most recent call last): File "", line 1, in File "h5py/__init__.py", line 24, in from . import h5 File "h5e.pyx", line 1, in init h5py.h5e (h5py/h5e.c:3538) File "h5e.pxd", line 20, in init h5py.h5 (h5py/h5.c:5430) RuntimeError: maximum recursion depth exceeded while calling a Python object -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From robertwb at math.washington.edu Sun Sep 26 05:17:55 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 25 Sep 2010 20:17:55 -0700 Subject: [Cython] Optimal/recommended compiler option sets for Cython In-Reply-To: <4C9DAF84.2050806@gmx.de> References: <4C9DA024.8030003@gmx.de> <4C9DAF84.2050806@gmx.de> Message-ID: On Sat, Sep 25, 2010 at 1:15 AM, Kay Hayen wrote: > > Hello Robert, > >>> there wasn't much else I could try. so is that all? >> >> In general, there aren't dozens of knobs to twiddle like there are >> with most compilers. >> >> There are a couple of directives at >> http://wiki.cython.org/enhancements/compilerdirectives that might make >> a difference. > > Ah my bad, I searched for "Options". Might be a good idea to make these > a bit more prominent. Seconded. >> For example, setting boundscheck and wraparound to False >> will make things faster (removing the checks for negative and >> out-of-bounds access of course), and infer_types=True (again, this >> creates semantic changes, as integer literals will be inferred to be C >> ints and thus might overflow). And I'd imagine that 0.13 will be an >> improvement over 0.12, at least for some things. > > Thanks for the pointers, I believe I will use these sets then: > > compatability: > infer_types = True ? # speed > nonecheck = True ? ? # CPython compatability > wraparound = True ? ?# CPython compatability > boundscheck = True ? # CPython compatability > cdivision = True ? ? # CPython compatability > > speed: > infer_types = True > nonecheck = False > wraparound = False > boundscheck = False > cdivision = False cdivision=False is the CPython compatibility mode (as cdivision=True does C division rather than Python division). Other than that, it looks good. I wouldn't expect too much of a difference as most of these directives control overheads that are negligible when one is manipulating a lot of Python objects (but important when everything is pure C.) > ?>> And then what to use for gcc (the C/C++ compiler of choice or is any >>> other working for you better on Linux), simply -O3 and be done with it? >> >> That's all I do, but there's a lot of options to try, so it'd be >> interesting to see if any make a noticeable impact. (Same with >> comparing various compilers.) > > I will play around for sure. > >>> Or is there any experience indicating that some special options will >>> give measurable improvements? >>> >>> Obviously I would like to first use the correct options and only then >>> publish results. So please let me know. :-) >> >> It should be noted that most of the effort has gone into making >> *annotated* code fast, as most users care more about a 100x speedup >> with a little bit of work than a 2x speedup for free. > > I will always make that clear when posting results, Robert. I am a > minority. Using the above option set "compatability" indicates that already. > > I am not not against annotations at all, I just believe they should be > in Python and ideally do a check in Python too. And initially I want to > stretch the borders of what's possible with automatic type inference or > guessing. > > Thanks for this, I will come back with results then. I will include 2 > sets of cython directives, aggressive and compatability and see how far > I get with that. Looking forward to seeing the results. - Robert From dsdale24 at gmail.com Mon Sep 27 15:00:08 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Mon, 27 Sep 2010 09:00:08 -0400 Subject: [Cython] question about cimports, python3 In-Reply-To: References: Message-ID: On Sat, Sep 25, 2010 at 5:25 PM, Lisandro Dalcin wrote: > On 25 September 2010 16:04, Darren Dale wrote: >> On Sat, Sep 25, 2010 at 2:03 PM, Lisandro Dalcin wrote: >>> On 25 September 2010 12:10, Darren Dale wrote: >>>> Hello, >>>> >>>> I am attempting to contribute to the h5py project by porting the code >>>> to python3. The code is available in the py3k branch at github >>>> (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to >>>> build and install h5py using python3, but when I try to import it, I >>>> get the following error: >>>> >>>> Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) >>>> [GCC 4.4.5] on linux2 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>> import h5py >>>> Traceback (most recent call last): >>>> ?File "", line 1, in >>>> ?File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", >>>> line 24, in >>> >>> Is h5py-1.3.1.dev-py3.1-linux-x86_64.egg a directory or a zip file? >> >> It is a directory, not a zipfile. >> > > OK. > > I've cloned your repo and tested with cython-devel, and I cannot > reproduce your error with Python 2.6. Right, that was the point of my inquiry. > With Python 3.1 (debug build), I get this: > > $ python3.1 > Python 3.1.1 (r311:74480, Jun 25 2010, 11:49:56) > [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import h5py > Traceback (most recent call last): > ?File "", line 1, in > ?File "h5py/__init__.py", line 24, in > ? ?from . import h5 > ?File "h5e.pyx", line 1, in init h5py.h5e (h5py/h5e.c:3538) > ?File "h5e.pxd", line 20, in init h5py.h5 (h5py/h5.c:5430) > RuntimeError: maximum recursion depth exceeded while calling a Python object Right. Does anyone have any ideas why these imports work with python-2.6 and not python-3.1? Darren From dsdale24 at gmail.com Mon Sep 27 15:10:28 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Mon, 27 Sep 2010 09:10:28 -0400 Subject: [Cython] question about cimports, python3 In-Reply-To: References: Message-ID: On Mon, Sep 27, 2010 at 9:00 AM, Darren Dale wrote: > On Sat, Sep 25, 2010 at 5:25 PM, Lisandro Dalcin wrote: >> On 25 September 2010 16:04, Darren Dale wrote: >>> On Sat, Sep 25, 2010 at 2:03 PM, Lisandro Dalcin wrote: >>>> On 25 September 2010 12:10, Darren Dale wrote: >>>>> Hello, >>>>> >>>>> I am attempting to contribute to the h5py project by porting the code >>>>> to python3. The code is available in the py3k branch at github >>>>> (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to >>>>> build and install h5py using python3, but when I try to import it, I >>>>> get the following error: >>>>> >>>>> Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) >>>>> [GCC 4.4.5] on linux2 >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>>> import h5py >>>>> Traceback (most recent call last): >>>>> ?File "", line 1, in >>>>> ?File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", >>>>> line 24, in >>>> >>>> Is h5py-1.3.1.dev-py3.1-linux-x86_64.egg a directory or a zip file? >>> >>> It is a directory, not a zipfile. >>> >> >> OK. >> >> I've cloned your repo and tested with cython-devel, and I cannot >> reproduce your error with Python 2.6. > > Right, that was the point of my inquiry. > >> With Python 3.1 (debug build), I get this: >> >> $ python3.1 >> Python 3.1.1 (r311:74480, Jun 25 2010, 11:49:56) >> [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import h5py >> Traceback (most recent call last): >> ?File "", line 1, in >> ?File "h5py/__init__.py", line 24, in >> ? ?from . import h5 >> ?File "h5e.pyx", line 1, in init h5py.h5e (h5py/h5e.c:3538) >> ?File "h5e.pxd", line 20, in init h5py.h5 (h5py/h5.c:5430) >> RuntimeError: maximum recursion depth exceeded while calling a Python object > > Right. Does anyone have any ideas why these imports work with > python-2.6 and not python-3.1? Additional information: I just tried installing that py3k branch on a Snow Leopard machine at work, using the python provided by MacPorts. In this case, I'm getting: >>> import h5py Traceback (most recent call last): File "", line 1, in File "/Users/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-macosx-10.6-x86_64.egg/h5py/__init__.py", line 34, in from . import h5, h5a, h5d, h5f, h5fd, h5g, h5l, h5o, h5i, h5p, h5r, h5s, h5t, h5z File "h5t.pxd", line 17, in init h5py.h5a (h5py/h5a.c:5248) File "h5p.pxd", line 23, in init h5py.h5t (h5py/h5t.c:16481) File "h5t.pxd", line 17, in init h5py.h5p (h5py/h5p.c:9297) ImportError: No module named h5t Does python-3 have circular import issues that were not present in python-2? Darren From dalcinl at gmail.com Mon Sep 27 15:49:27 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 27 Sep 2010 10:49:27 -0300 Subject: [Cython] question about cimports, python3 In-Reply-To: References: Message-ID: On 27 September 2010 10:10, Darren Dale wrote: > On Mon, Sep 27, 2010 at 9:00 AM, Darren Dale wrote: >> On Sat, Sep 25, 2010 at 5:25 PM, Lisandro Dalcin wrote: >>> On 25 September 2010 16:04, Darren Dale wrote: >>>> On Sat, Sep 25, 2010 at 2:03 PM, Lisandro Dalcin wrote: >>>>> On 25 September 2010 12:10, Darren Dale wrote: >>>>>> Hello, >>>>>> >>>>>> I am attempting to contribute to the h5py project by porting the code >>>>>> to python3. The code is available in the py3k branch at github >>>>>> (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to >>>>>> build and install h5py using python3, but when I try to import it, I >>>>>> get the following error: >>>>>> >>>>>> Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) >>>>>> [GCC 4.4.5] on linux2 >>>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>>>> import h5py >>>>>> Traceback (most recent call last): >>>>>> ?File "", line 1, in >>>>>> ?File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", >>>>>> line 24, in >>>>> >>>>> Is h5py-1.3.1.dev-py3.1-linux-x86_64.egg a directory or a zip file? >>>> >>>> It is a directory, not a zipfile. >>>> >>> >>> OK. >>> >>> I've cloned your repo and tested with cython-devel, and I cannot >>> reproduce your error with Python 2.6. >> >> Right, that was the point of my inquiry. >> >>> With Python 3.1 (debug build), I get this: >>> >>> $ python3.1 >>> Python 3.1.1 (r311:74480, Jun 25 2010, 11:49:56) >>> [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2 >>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> import h5py >>> Traceback (most recent call last): >>> ?File "", line 1, in >>> ?File "h5py/__init__.py", line 24, in >>> ? ?from . import h5 >>> ?File "h5e.pyx", line 1, in init h5py.h5e (h5py/h5e.c:3538) >>> ?File "h5e.pxd", line 20, in init h5py.h5 (h5py/h5.c:5430) >>> RuntimeError: maximum recursion depth exceeded while calling a Python object >> >> Right. Does anyone have any ideas why these imports work with >> python-2.6 and not python-3.1? > > Additional information: I just tried installing that py3k branch on a > Snow Leopard machine at work, using the python provided by MacPorts. > In this case, I'm getting: > >>>> import h5py > Traceback (most recent call last): > ?File "", line 1, in > ?File "/Users/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-macosx-10.6-x86_64.egg/h5py/__init__.py", > line 34, in > ? ?from . import h5, h5a, h5d, h5f, h5fd, h5g, h5l, h5o, h5i, h5p, > h5r, h5s, h5t, h5z > ?File "h5t.pxd", line 17, in init h5py.h5a (h5py/h5a.c:5248) > ?File "h5p.pxd", line 23, in init h5py.h5t (h5py/h5t.c:16481) > ?File "h5t.pxd", line 17, in init h5py.h5p (h5py/h5p.c:9297) > ImportError: No module named h5t > Is h5py know to have circular imports? -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From dsdale24 at gmail.com Mon Sep 27 16:08:26 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Mon, 27 Sep 2010 10:08:26 -0400 Subject: [Cython] question about cimports, python3 In-Reply-To: References: Message-ID: On Mon, Sep 27, 2010 at 9:49 AM, Lisandro Dalcin wrote: > On 27 September 2010 10:10, Darren Dale wrote: >> On Mon, Sep 27, 2010 at 9:00 AM, Darren Dale wrote: >>> On Sat, Sep 25, 2010 at 5:25 PM, Lisandro Dalcin wrote: >>>> On 25 September 2010 16:04, Darren Dale wrote: >>>>> On Sat, Sep 25, 2010 at 2:03 PM, Lisandro Dalcin wrote: >>>>>> On 25 September 2010 12:10, Darren Dale wrote: >>>>>>> Hello, >>>>>>> >>>>>>> I am attempting to contribute to the h5py project by porting the code >>>>>>> to python3. The code is available in the py3k branch at github >>>>>>> (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to >>>>>>> build and install h5py using python3, but when I try to import it, I >>>>>>> get the following error: >>>>>>> >>>>>>> Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) >>>>>>> [GCC 4.4.5] on linux2 >>>>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>>>>> import h5py >>>>>>> Traceback (most recent call last): >>>>>>> ?File "", line 1, in >>>>>>> ?File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", >>>>>>> line 24, in >>>>>> >>>>>> Is h5py-1.3.1.dev-py3.1-linux-x86_64.egg a directory or a zip file? >>>>> >>>>> It is a directory, not a zipfile. >>>>> >>>> >>>> OK. >>>> >>>> I've cloned your repo and tested with cython-devel, and I cannot >>>> reproduce your error with Python 2.6. >>> >>> Right, that was the point of my inquiry. >>> >>>> With Python 3.1 (debug build), I get this: >>>> >>>> $ python3.1 >>>> Python 3.1.1 (r311:74480, Jun 25 2010, 11:49:56) >>>> [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>> import h5py >>>> Traceback (most recent call last): >>>> ?File "", line 1, in >>>> ?File "h5py/__init__.py", line 24, in >>>> ? ?from . import h5 >>>> ?File "h5e.pyx", line 1, in init h5py.h5e (h5py/h5e.c:3538) >>>> ?File "h5e.pxd", line 20, in init h5py.h5 (h5py/h5.c:5430) >>>> RuntimeError: maximum recursion depth exceeded while calling a Python object >>> >>> Right. Does anyone have any ideas why these imports work with >>> python-2.6 and not python-3.1? >> >> Additional information: I just tried installing that py3k branch on a >> Snow Leopard machine at work, using the python provided by MacPorts. >> In this case, I'm getting: >> >>>>> import h5py >> Traceback (most recent call last): >> ?File "", line 1, in >> ?File "/Users/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-macosx-10.6-x86_64.egg/h5py/__init__.py", >> line 34, in >> ? ?from . import h5, h5a, h5d, h5f, h5fd, h5g, h5l, h5o, h5i, h5p, >> h5r, h5s, h5t, h5z >> ?File "h5t.pxd", line 17, in init h5py.h5a (h5py/h5a.c:5248) >> ?File "h5p.pxd", line 23, in init h5py.h5t (h5py/h5t.c:16481) >> ?File "h5t.pxd", line 17, in init h5py.h5p (h5py/h5p.c:9297) >> ImportError: No module named h5t >> > > Is h5py know to have circular imports? Nothing that has ever manifested as a problem with python-2. But yes, there are some cases, as illustrated by my last post, where one cython extension cimports from another, and vice versa. Darren From stephane.drouard at st.com Mon Sep 27 17:12:30 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Mon, 27 Sep 2010 17:12:30 +0200 Subject: [Cython] [PATCH] Cython: C++ exceptions and nogil In-Reply-To: References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE0CF842@SAFEX1MAIL2.st.com> Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDE0CFD7A@SAFEX1MAIL2.st.com> Lisandro Dalcin wrote: > Stephane, let's change our deal... Here you have a patch with new > testcase included, you can use "hg import --no-commit > cpp_exc_nogil.diff" to apply the patch. > > Try to run the test on your side (python runtests.py > cpp_exceptions_nogil), next test with your own code. Next try hard to > break my fix . If all is fine, I'll push it. I checked your patch with my code and it really fixes the issue. I also added some tests to raise interrupts within an except clause and a finally clause. Question: do you plan or not to support the same for 'except *' and 'except[?] '? If not, you should maybe issue and error if such functions are declared nogil. Cheers, Stephane -------------- next part -------------- A non-text attachment was scrubbed... Name: cpp_exceptions_nogil.pyx Type: application/octet-stream Size: 5154 bytes Desc: cpp_exceptions_nogil.pyx Url : http://codespeak.net/pipermail/cython-dev/attachments/20100927/689d4419/attachment.obj From dalcinl at gmail.com Mon Sep 27 22:55:34 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 27 Sep 2010 17:55:34 -0300 Subject: [Cython] Asking for advice: exception handling with nogil Message-ID: consider this code: cdef void foo() except * with gil: raise ValueError cdef int bar() except ? -1 with gil: raise ValueError with nogil: foo() with nogil: bar() Currently, Cython generates segfaulting code because in both cases it is using PyErr_Occurred() without re-acquiring the GIL. I think the easiest solution would be to use a utility code wrapping around PyErr_Occurred() and implemented with the PyGILState_XXX API's, more or less like below: int __Pyx_PyErr_Occurred_WithGIL() { int err; _save = PyGILState_Ensure(); err = !!PyErr_Occurred(); /* note: PyErr_Occurred() returns PyObject* */ PyGILState_Release(_save); return err; } Of course, this function is going to be used ONLY when calling C functions within nogil blocks. Would such fix be fine? I would prefer to use the Py_[UN]BLOCK_THREAD macros, but in such case the fix is not trivial for me and I do not want to make a mess... PS: note that in order to fix C++ exc handling, now SimpleCallNode gets a "nogil" attribute at analyse_c_function_call() ... -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stephane.drouard at st.com Tue Sep 28 10:04:50 2010 From: stephane.drouard at st.com (Stephane DROUARD) Date: Tue, 28 Sep 2010 10:04:50 +0200 Subject: [Cython] Asking for advice: exception handling with nogil In-Reply-To: References: Message-ID: <1BBEE2BA50AFBB41BDCE56494A11093AADDE0CFF18@SAFEX1MAIL2.st.com> Lisandro Dalcin wrote: > > consider this code: > > cdef void foo() except * with gil: > raise ValueError > > cdef int bar() except ? -1 with gil: > raise ValueError > > with nogil: > foo() > > with nogil: > bar() > > Currently, Cython generates segfaulting code because in both cases it > is using PyErr_Occurred() without re-acquiring the GIL. I think the > easiest solution would be to use a utility code wrapping around > PyErr_Occurred() and implemented with the PyGILState_XXX API's, more > or less like below: > > int __Pyx_PyErr_Occurred_WithGIL() > { > int err; > _save = PyGILState_Ensure(); > err = !!PyErr_Occurred(); /* note: PyErr_Occurred() returns PyObject* */ > PyGILState_Release(_save); > return err; > } > > Of course, this function is going to be used ONLY when calling C > functions within nogil blocks. > > Would such fix be fine? I would prefer to use the Py_[UN]BLOCK_THREAD > macros, but in such case the fix is not trivial for me and I do not > want to make a mess... > What about something like: int __Pyx_PyErr_Occurred(PyThreadState *_save) { int err; Py_BLOCK_THREADS err = !!PyErr_Occurred(); /* note: PyErr_Occurred() returns PyObject* */ Py_UNBLOCK_THREADS return err; } Stephane From dalcinl at gmail.com Tue Sep 28 16:08:20 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 28 Sep 2010 11:08:20 -0300 Subject: [Cython] Asking for advice: exception handling with nogil In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE0CFF18@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE0CFF18@SAFEX1MAIL2.st.com> Message-ID: On 28 September 2010 05:04, Stephane DROUARD wrote: > Lisandro Dalcin wrote: > >> >> consider this code: >> >> cdef void foo() except * with gil: >> ? ? raise ValueError >> >> cdef int bar() except ? -1 with gil: >> ? ? raise ValueError >> >> with nogil: >> ? ? foo() >> >> with nogil: >> ? ? bar() >> >> Currently, Cython generates segfaulting code because in both cases it >> is using PyErr_Occurred() without re-acquiring the GIL. I think the >> easiest solution would be to use a utility code wrapping around >> PyErr_Occurred() and implemented with the PyGILState_XXX API's, more >> or less like below: >> >> int __Pyx_PyErr_Occurred_WithGIL() >> { >> int err; >> _save = PyGILState_Ensure(); >> err = !!PyErr_Occurred(); /* note: PyErr_Occurred() returns PyObject* ?*/ >> PyGILState_Release(_save); >> return err; >> } >> >> Of course, this function is going to be used ONLY when calling C >> functions within nogil blocks. >> >> Would such fix be fine? I would prefer to use the Py_[UN]BLOCK_THREAD >> macros, but in such case the fix is not trivial for me and I do not >> want to make a mess... >> > > What about something like: > int __Pyx_PyErr_Occurred(PyThreadState *_save) > { > ?int err; > ?Py_BLOCK_THREADS > ?err = !!PyErr_Occurred(); /* note: PyErr_Occurred() returns PyObject* ?*/ > ?Py_UNBLOCK_THREADS > ?return err; > } > Mmm, yes... I definitely like it... -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From robertwb at math.washington.edu Tue Sep 28 18:02:56 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 28 Sep 2010 09:02:56 -0700 Subject: [Cython] Asking for advice: exception handling with nogil In-Reply-To: References: Message-ID: On Mon, Sep 27, 2010 at 1:55 PM, Lisandro Dalcin wrote: > consider this code: > > cdef void foo() except * with gil: > ? ?raise ValueError > > cdef int bar() except ? -1 with gil: > ? ?raise ValueError > > with nogil: > ? ?foo() > > with nogil: > ? ?bar() > > > Currently, Cython generates segfaulting code because in both cases it > is using PyErr_Occurred() without re-acquiring the GIL. I think the > easiest solution would be to use a utility code wrapping around > PyErr_Occurred() and implemented with the PyGILState_XXX API's, more > or less like below: > > int __Pyx_PyErr_Occurred_WithGIL() > { > int err; > _save = PyGILState_Ensure(); > err = !!PyErr_Occurred(); /* note: PyErr_Occurred() returns PyObject* ?*/ > PyGILState_Release(_save); > return err; > } > > Of course, this function is going to be used ONLY when calling C > functions within nogil blocks. > > Would such fix be fine? I would prefer to use the Py_[UN]BLOCK_THREAD > macros, but in such case the fix is not trivial for me and I do not > want to make a mess... > Yes, I would be fine with that for a fix. - Robert From dalcinl at gmail.com Tue Sep 28 18:33:53 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 28 Sep 2010 13:33:53 -0300 Subject: [Cython] Asking for advice: exception handling with nogil In-Reply-To: References: Message-ID: On 28 September 2010 13:02, Robert Bradshaw wrote: > On Mon, Sep 27, 2010 at 1:55 PM, Lisandro Dalcin wrote: >> consider this code: >> >> cdef void foo() except * with gil: >> ? ?raise ValueError >> >> cdef int bar() except ? -1 with gil: >> ? ?raise ValueError >> >> with nogil: >> ? ?foo() >> >> with nogil: >> ? ?bar() >> >> >> Currently, Cython generates segfaulting code because in both cases it >> is using PyErr_Occurred() without re-acquiring the GIL. I think the >> easiest solution would be to use a utility code wrapping around >> PyErr_Occurred() and implemented with the PyGILState_XXX API's, more >> or less like below: >> >> int __Pyx_PyErr_Occurred_WithGIL() >> { >> int err; >> _save = PyGILState_Ensure(); >> err = !!PyErr_Occurred(); /* note: PyErr_Occurred() returns PyObject* ?*/ >> PyGILState_Release(_save); >> return err; >> } >> >> Of course, this function is going to be used ONLY when calling C >> functions within nogil blocks. >> >> Would such fix be fine? I would prefer to use the Py_[UN]BLOCK_THREAD >> macros, but in such case the fix is not trivial for me and I do not >> want to make a mess... >> > > Yes, I would be fine with that for a fix. > Still not sure about it... We also have to handle this: http://bugs.python.org/issue9972 And I would like to change GILStatNode&GILExitNode: {PyThreadState *_save; Py_UNBLOCK_THREADS .... Py_BLOCK_THREADS} with these macros: Py_BEGIN_ALLOW_THREADS ... Py_BEGIN_ALLOW_THREADS -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From matej at laitl.cz Tue Sep 28 19:14:07 2010 From: matej at laitl.cz (=?UTF-8?B?TWF0xJtq?= Laitl) Date: Tue, 28 Sep 2010 19:14:07 +0200 Subject: [Cython] Wrong C code generated (cdef class < cdef class inheritance, cpdef methods, pure python mode) Message-ID: <4ca22262.8e05df0a.2a39.ffffaa6d@mx.google.com> Hi! I have problems with cython dealing with following example case (python 2.6.5, cython 0.13): 2 cdef classes, Pdf and GaussPdf; GaussPdf inherits Pdf. Both are written in pure python file pdfs.py and use augmentation pdfs.pxd for cythoning. (attached) Both have just cpdef methods, GaussPdf overrides 4 of 5 Pdf's methods. When tested as Python code, everything is okay, but when compiled using cython, I get following error: >>> from pdfs import Pdf >>> p = Pdf() >>> p.shape() Traceback (most recent call last): File "", line 1, in File "pdfs.py", line 15, in pybayes.pdfs.Pdf.shape (pybayes/pdfs.c:975) TypeError: Cannot convert pybayes.pdfs.Pdf to pybayes.pdfs.GaussPdf It should have thrown NotImplementedError, not TypeError! I think the problem is in the generated C code. First couple of lines from python wrapper around Pdf.mean() C function: static PyObject *__pyx_pf_7pybayes_4pdfs_3Pdf_shape(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) { PyObject *__pyx_r = NULL; PyObject *__pyx_t_1 = NULL; __Pyx_RefNannySetupContext("shape"); __Pyx_XDECREF(__pyx_r); if (!(likely(((__pyx_v_self) == Py_None) || likely(__Pyx_TypeTest(__pyx_v_self, __pyx_ptype_7pybayes_4pdfs_GaussPdf))))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;} The last line is #975. Notice the __Pyx_TypeTest(..self, ..GaussPdf) - I would expect it would read __Pyx_TypeTest(..self, .Pdf) instead! Also, vtabstruct for Pdf seems strange (it contains GaussPdf methods), but that may be correct inheritance magic. Is this really a bug in Cython, or am I doing something completely wrong? Thanks, Mat?j Laitl -------------- next part -------------- A non-text attachment was scrubbed... Name: pdfs.py Type: text/x-python Size: 2802 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20100928/20cac6b5/attachment.py -------------- next part -------------- # Copyright (c) 2010 Matej Laitl # Distributed under the terms of the GNU General Public License v2 or any # later version of the license, at your option. """Cython augmentation file for pdfs.py""" cimport cython from numpy cimport ndarray cdef class Pdf: cpdef tuple shape(self) cpdef ndarray mean(self) cpdef ndarray variance(self) cpdef object eval_log(self, ndarray x) # TODO: dtype of all arrays cpdef ndarray sample(self) cdef class GaussPdf(Pdf): cdef public ndarray mu # TODO: readonly cdef public ndarray R # TODO: readonly cpdef tuple shape(self) cpdef ndarray mean(self) cpdef ndarray variance(self) #cpdef eval_log(self, x): # TODO @cython.locals(z = ndarray) cpdef ndarray sample(self) From robertwb at math.washington.edu Tue Sep 28 19:25:42 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 28 Sep 2010 10:25:42 -0700 Subject: [Cython] Wrong C code generated (cdef class < cdef class inheritance, cpdef methods, pure python mode) In-Reply-To: <4ca22262.8e05df0a.2a39.ffffaa6d@mx.google.com> References: <4ca22262.8e05df0a.2a39.ffffaa6d@mx.google.com> Message-ID: On Tue, Sep 28, 2010 at 10:14 AM, Mat?j Laitl wrote: > Hi! > I have problems with cython dealing with following example case (python 2.6.5, > cython 0.13): > > 2 cdef classes, Pdf and GaussPdf; GaussPdf inherits Pdf. Both are written in > pure python file pdfs.py and use augmentation pdfs.pxd for cythoning. > (attached) Both have just cpdef methods, GaussPdf overrides 4 of 5 Pdf's > methods. When tested as Python code, everything is okay, but when compiled > using cython, I get following error: > >>>> from pdfs import Pdf >>>> p = Pdf() >>>> p.shape() > Traceback (most recent call last): > ?File "", line 1, in > ?File "pdfs.py", line 15, in pybayes.pdfs.Pdf.shape (pybayes/pdfs.c:975) > TypeError: Cannot convert pybayes.pdfs.Pdf to pybayes.pdfs.GaussPdf > > It should have thrown NotImplementedError, not TypeError! > > I think the problem is in the generated C code. First couple of lines from > python wrapper around Pdf.mean() C function: > > static PyObject *__pyx_pf_7pybayes_4pdfs_3Pdf_shape(PyObject *__pyx_v_self, > CYTHON_UNUSED PyObject *unused) { > ?PyObject *__pyx_r = NULL; > ?PyObject *__pyx_t_1 = NULL; > ?__Pyx_RefNannySetupContext("shape"); > ?__Pyx_XDECREF(__pyx_r); > ?if (!(likely(((__pyx_v_self) == Py_None) || > likely(__Pyx_TypeTest(__pyx_v_self, __pyx_ptype_7pybayes_4pdfs_GaussPdf))))) > {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto > __pyx_L1_error;} > > The last line is #975. Notice the __Pyx_TypeTest(..self, ..GaussPdf) - I would > expect it would read __Pyx_TypeTest(..self, .Pdf) instead! > > Also, vtabstruct for Pdf seems strange (it contains GaussPdf methods), but > that may be correct inheritance magic. > > Is this really a bug in Cython, or am I doing something completely wrong? No, this looks like a bug on our side. Clearly the pure mode + augmented .pxd is insufficiently tested. I have an idea where it might be happening too... - Robert From dsdale24 at gmail.com Tue Sep 28 22:54:03 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 28 Sep 2010 16:54:03 -0400 Subject: [Cython] question about cimports, python3 In-Reply-To: References: Message-ID: On Mon, Sep 27, 2010 at 10:08 AM, Darren Dale wrote: > On Mon, Sep 27, 2010 at 9:49 AM, Lisandro Dalcin wrote: >> On 27 September 2010 10:10, Darren Dale wrote: >>> On Mon, Sep 27, 2010 at 9:00 AM, Darren Dale wrote: >>>> On Sat, Sep 25, 2010 at 5:25 PM, Lisandro Dalcin wrote: >>>>> On 25 September 2010 16:04, Darren Dale wrote: >>>>>> On Sat, Sep 25, 2010 at 2:03 PM, Lisandro Dalcin wrote: >>>>>>> On 25 September 2010 12:10, Darren Dale wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> I am attempting to contribute to the h5py project by porting the code >>>>>>>> to python3. The code is available in the py3k branch at github >>>>>>>> (http://github.com/darrendale/h5py/tree/py3k). So far, I am able to >>>>>>>> build and install h5py using python3, but when I try to import it, I >>>>>>>> get the following error: >>>>>>>> >>>>>>>> Python 3.1.2 (release31-maint, Sep 17 2010, 20:27:33) >>>>>>>> [GCC 4.4.5] on linux2 >>>>>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>>>>>> import h5py >>>>>>>> Traceback (most recent call last): >>>>>>>> ?File "", line 1, in >>>>>>>> ?File "/home/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-linux-x86_64.egg/h5py/__init__.py", >>>>>>>> line 24, in >>>>>>> >>>>>>> Is h5py-1.3.1.dev-py3.1-linux-x86_64.egg a directory or a zip file? >>>>>> >>>>>> It is a directory, not a zipfile. >>>>>> >>>>> >>>>> OK. >>>>> >>>>> I've cloned your repo and tested with cython-devel, and I cannot >>>>> reproduce your error with Python 2.6. >>>> >>>> Right, that was the point of my inquiry. >>>> >>>>> With Python 3.1 (debug build), I get this: >>>>> >>>>> $ python3.1 >>>>> Python 3.1.1 (r311:74480, Jun 25 2010, 11:49:56) >>>>> [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2 >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>>>> import h5py >>>>> Traceback (most recent call last): >>>>> ?File "", line 1, in >>>>> ?File "h5py/__init__.py", line 24, in >>>>> ? ?from . import h5 >>>>> ?File "h5e.pyx", line 1, in init h5py.h5e (h5py/h5e.c:3538) >>>>> ?File "h5e.pxd", line 20, in init h5py.h5 (h5py/h5.c:5430) >>>>> RuntimeError: maximum recursion depth exceeded while calling a Python object >>>> >>>> Right. Does anyone have any ideas why these imports work with >>>> python-2.6 and not python-3.1? >>> >>> Additional information: I just tried installing that py3k branch on a >>> Snow Leopard machine at work, using the python provided by MacPorts. >>> In this case, I'm getting: >>> >>>>>> import h5py >>> Traceback (most recent call last): >>> ?File "", line 1, in >>> ?File "/Users/darren/.local/lib/python3.1/site-packages/h5py-1.3.1.dev-py3.1-macosx-10.6-x86_64.egg/h5py/__init__.py", >>> line 34, in >>> ? ?from . import h5, h5a, h5d, h5f, h5fd, h5g, h5l, h5o, h5i, h5p, >>> h5r, h5s, h5t, h5z >>> ?File "h5t.pxd", line 17, in init h5py.h5a (h5py/h5a.c:5248) >>> ?File "h5p.pxd", line 23, in init h5py.h5t (h5py/h5t.c:16481) >>> ?File "h5t.pxd", line 17, in init h5py.h5p (h5py/h5p.c:9297) >>> ImportError: No module named h5t >>> >> >> Is h5py know to have circular imports? > > Nothing that has ever manifested as a problem with python-2. But yes, > there are some cases, as illustrated by my last post, where one cython > extension cimports from another, and vice versa. I have also asked at comp.lang.python, but didn't get any responses. Does anyone else have any ideas? Thanks, Darren From dalcinl at gmail.com Wed Sep 29 14:07:09 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 29 Sep 2010 09:07:09 -0300 Subject: [Cython] [PATCH] exception checks within nogil blocks In-Reply-To: <1BBEE2BA50AFBB41BDCE56494A11093AADDE0D049F@SAFEX1MAIL2.st.com> References: <1BBEE2BA50AFBB41BDCE56494A11093AADDE0D0285@SAFEX1MAIL2.st.com> <1BBEE2BA50AFBB41BDCE56494A11093AADDE0D049F@SAFEX1MAIL2.st.com> Message-ID: On 29 September 2010 06:21, Stephane DROUARD wrote: > Lisandro Dalcin wrote: > >> OK, many thanks. Not sure if I'll push it right now, I'm still >> thinking about other issue we need to handle >>(http://bugs.python.org/issue9972) > > Wouldn't something like that solve the issue: > > #ifdef WITH_THREADS > ?#define __Pyx_PyGILState_Ensure() ? PyGILState_Ensure() > ?#define __Pyx_PyGILState_Release(s) PyGILState_Release(s) > #else > ?#define __Pyx_PyGILState_Ensure() ? PyGILState_LOCKED > ?#define __Pyx_PyGILState_Release(s) > #endif > > To be used like: > PyGILState_STATE state = __Pyx_PyGILState_Ensure() > ... > __Pyx_PyGILState_Release(s) > > Note that you may get a warning saying that state is not used in non-thread mode. > > Stephane > Yes, I'm using something very similar: http://code.google.com/p/mpi4py/source/browse/trunk/src/atimport.h#290 -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From cython at mspacek.mm.st Thu Sep 30 00:06:38 2010 From: cython at mspacek.mm.st (Martin Spacek) Date: Wed, 29 Sep 2010 22:06:38 +0000 (UTC) Subject: [Cython] redundant code in pyxbuild.py, [build] compiler=mingw32 still ignored? Message-ID: Hello, I just noticed a block of what I believe is some old redundant code in pyximport/pyxbuild.py (http://hg.cython.org/cython-devel/file/019d05a57e89/pyximport/pyxbuild.py). Lines 64 to 67 should probably be removed: 64 config_files = dist.find_config_files() 65 try: config_files.remove('setup.cfg') 66 except ValueError: pass 67 dist.parse_config_files(config_files) 68 69 cfgfiles = dist.find_config_files() 70 try: cfgfiles.remove('setup.cfg') 71 except ValueError: pass 72 dist.parse_config_files(cfgfiles) 73 try: 74 ok = dist.parse_command_line() 75 except DistutilsArgError: 76 raise "config_files" isn't used anywhere else in the file. I don't have hg to submit a patch. I only came across this because I'm still getting this error in Python 2.6.6, Cython 0.13, in Windows 2000 when building a cython module with pyximport: ImportError: Building module failed: ['DistutilsPlatformError: Unable to find vcvarsall.bat\n'] This is in spite of having a Python26/Lib/distutils/distutils.cfg file containing "[build] compiler = mingw32". Yet it appears that pyximport is still trying to call msvc to compile the .c file. Here's the full traceback: Traceback (most recent call last): File "C:\bzr\spyke\dev\spyke\main.py", line 11, in from climbing import climb # .pyx file File "C:\bin\Python26\lib\site-packages\pyximport\pyximport.py", line 328, in load_module self.pyxbuild_dir) File "C:\bin\Python26\lib\site-packages\pyximport\pyximport.py", line 180, in load_module so_path = build_module(name, pyxfilename, pyxbuild_dir) File "C:\bin\Python26\lib\site-packages\pyximport\pyximport.py", line 164, in build_module reload_support=pyxargs.reload_support) File "C:\bin\Python26\lib\site-packages\pyximport\pyxbuild.py", line 86, in pyx_to_dll dist.run_commands() File "C:\bin\Python26\lib\distutils\dist.py", line 975, in run_commands self.run_command(cmd) File "C:\bin\Python26\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\bin\Python26\lib\distutils\command\build_ext.py", line 340, in run self.build_extensions() File "C:\bin\Python26\lib\site-packages\Cython\Distutils\build_ext.py", line 81, in build_extensions self.build_extension(ext) File "C:\bin\Python26\lib\distutils\command\build_ext.py", line 499, in build_extension depends=ext.depends) File "C:\bin\Python26\lib\distutils\msvc9compiler.py", line 458, in compile self.initialize() File "C:\bin\Python26\lib\distutils\msvc9compiler.py", line 368, in initialize vc_env = query_vcvarsall(VERSION, plat_spec) File "C:\bin\Python26\lib\distutils\msvc9compiler.py", line 260, in query_vcvarsall raise DistutilsPlatformError("Unable to find vcvarsall.bat") ImportError: Building module failed: ['DistutilsPlatformError: Unable to find vcvarsall.bat\n'] This was apparently dealt with by a patch last year, related to the code block above (pyximport ignors [build] compiler = mingw32): http://codespeak.net/pipermail/cython-dev/2009-May/005417.html I've dropped into the debugger at line 72 above, and "cfgfiles" has just one entry that points to my distutils.cfg file. Any ideas? Just as a disclaimer, I'm using the Cython 0.13 win32 .exe installer from Christoph Gohlke (http://www.lfd.uci.edu/~gohlke/pythonlibs/#cython) Cheers, Martin From stefan_ml at behnel.de Thu Sep 30 17:20:14 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 30 Sep 2010 17:20:14 +0200 Subject: [Cython] Optimal/recommended compiler option sets for Cython In-Reply-To: <4C9DA024.8030003@gmx.de> References: <4C9DA024.8030003@gmx.de> Message-ID: <4CA4AAAE.5030906@behnel.de> Kay Hayen, 25.09.2010 09:09: > And then what to use for gcc (the C/C++ compiler of choice or is any > other working for you better on Linux), simply -O3 and be done with it? I commonly also use an appropriate value for "-march", such as "-march=core2". Usually does a lot better than the plain i386 code that it would generate otherwise. There's also a generic "-march=native" option that "does the right thing". Stefan From mike at pythonlibrary.org Thu Sep 16 16:08:25 2010 From: mike at pythonlibrary.org (Mike Driscoll) Date: Thu, 16 Sep 2010 14:08:25 -0000 Subject: [Cython] GSoC Article for PSF and my blog Message-ID: Hi, I am working on an article for the Python Software Foundation's blog, http://pyfound.blogspot.com/, about the various Python projects that were worked on during this year's Google Summer of Code program. They want me to write up something about what projects were worked on and what the results were. I found your project information here: http://wiki.python.org/moin/SummerOfCode/2010 Anyway, since the PSF blog article will be brief, I thought I would also write up a longer article about your projects on my personal blog as well. The information I found on the Python wiki page was pretty brief, so I would appreciate it if you could tell me the following: 1) What was worked on during the GSoC project 2) How many students helped with a guess at how many hours were put in 3) Your experiences being a mentor If possible, I would like the student's perspective too. Feel free to forward my information to them. Also feel free to opt out and I won't write anything more than the already public info I can find. Thanks a lot for your help! -- ----------------- Mike Driscoll Blog: http://blog.pythonlibrary.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/cython-dev/attachments/20100916/5b1b67db/attachment-0001.htm