From stefan_ml at behnel.de Sun Jun 1 14:10:11 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 01 Jun 2008 14:10:11 +0200 Subject: [Cython] about let users have more control on garbage collection In-Reply-To: References: <48418CC9.2030103@behnel.de> Message-ID: <484291A3.5090909@behnel.de> Hi, Lisandro Dalcin wrote: > cdef class Mat: # <- proxy class > cdef PetscMat # <- PETSc object, a pointer to a refcounted struct. > > I need to store arbitrary Python objects inside the PETSc object, I'm > actually storing a dictionary. This dictionary have to survive until > the PETSc object is deallocated, but that deallocation do not > necesarily occurs when the instance of my proxy class is deallocated, > because the PETSc object can still 'live' (that is, a referece being > owned) inside other PETSc objects. Can't you work with singleton proxies? That's what we do in lxml. There's never more than one Python Element proxy for an XML node struct. We use a factory function to build a proxy for a struct and keep a back pointer from the struct to the class, so that we can reuse an already existing proxy. All the Python state is kept in attributes of the proxy object, and the struct can be freed when the proxy is garbage collected by Python. Stefan From fperez.net at gmail.com Sun Jun 1 22:18:29 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 1 Jun 2008 13:18:29 -0700 Subject: [Cython] Cython tutorial at Scipy 2008? Message-ID: Hi folks, I'm in charge of co-organizing (with Travis Oliphant) the tutorial sessions at Scipy 2008, to be held in August of this year at Caltech: http://www.scipy.org/SciPy2008 http://www.scipy.org/SciPy2008/Tutorials Any chance someone from the Cython gang might come and be willing to give a cython tutorial? We don't have too many details sorted out yet, so we're just feeling things out... Cheers, f From wstein at gmail.com Sun Jun 1 22:23:46 2008 From: wstein at gmail.com (William Stein) Date: Sun, 1 Jun 2008 13:23:46 -0700 Subject: [Cython] Cython tutorial at Scipy 2008? In-Reply-To: References: Message-ID: <85e81ba30806011323h3c7bac75ua046616e5f6d54a8@mail.gmail.com> On Sun, Jun 1, 2008 at 1:18 PM, Fernando Perez wrote: > Hi folks, > > I'm in charge of co-organizing (with Travis Oliphant) the tutorial > sessions at Scipy 2008, to be held in August of this year at Caltech: > > http://www.scipy.org/SciPy2008 > http://www.scipy.org/SciPy2008/Tutorials > > Any chance someone from the Cython gang might come and be willing to > give a cython tutorial? We don't have too many details sorted out > yet, so we're just feeling things out... > Robert Bradshaw and I will likely both be in nearby San Diego then, in which case one or both of us could potentially come. William From fperez.net at gmail.com Mon Jun 2 02:46:15 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 1 Jun 2008 17:46:15 -0700 Subject: [Cython] Cython tutorial at Scipy 2008? In-Reply-To: <85e81ba30806011323h3c7bac75ua046616e5f6d54a8@mail.gmail.com> References: <85e81ba30806011323h3c7bac75ua046616e5f6d54a8@mail.gmail.com> Message-ID: On Sun, Jun 1, 2008 at 1:23 PM, William Stein wrote: > On Sun, Jun 1, 2008 at 1:18 PM, Fernando Perez wrote: >> Hi folks, >> >> I'm in charge of co-organizing (with Travis Oliphant) the tutorial >> sessions at Scipy 2008, to be held in August of this year at Caltech: >> >> http://www.scipy.org/SciPy2008 >> http://www.scipy.org/SciPy2008/Tutorials >> >> Any chance someone from the Cython gang might come and be willing to >> give a cython tutorial? We don't have too many details sorted out >> yet, so we're just feeling things out... >> > > Robert Bradshaw and I will likely both be in nearby San Diego then, > in which case one or both of us could potentially come. Great. I had also listed a Sage tutorial as a potential topic, so as long as some of you are around, perhaps you could pitch in for that too? Right now we're just getting feelers for both who could teach what, and what people want to see. It's a bit of a matchmaking game, but there's no point in offering something if we don't have anyone to teach it. Note that in addition to the 2 days of tutorials and 2 days of conference, the weekend is reserved for dev sprints. It would be very nice to have some joint work on sage/cython/numpy/scipy take place. Many of the usual suspects will be there. Cheers, f From wstein at gmail.com Mon Jun 2 03:46:36 2008 From: wstein at gmail.com (William Stein) Date: Sun, 1 Jun 2008 18:46:36 -0700 Subject: [Cython] Cython tutorial at Scipy 2008? In-Reply-To: References: <85e81ba30806011323h3c7bac75ua046616e5f6d54a8@mail.gmail.com> Message-ID: <85e81ba30806011846id8eb8b2w40828096229f3d6@mail.gmail.com> On Sun, Jun 1, 2008 at 5:46 PM, Fernando Perez wrote: > On Sun, Jun 1, 2008 at 1:23 PM, William Stein wrote: >> On Sun, Jun 1, 2008 at 1:18 PM, Fernando Perez wrote: >>> Hi folks, >>> >>> I'm in charge of co-organizing (with Travis Oliphant) the tutorial >>> sessions at Scipy 2008, to be held in August of this year at Caltech: >>> >>> http://www.scipy.org/SciPy2008 >>> http://www.scipy.org/SciPy2008/Tutorials >>> >>> Any chance someone from the Cython gang might come and be willing to >>> give a cython tutorial? We don't have too many details sorted out >>> yet, so we're just feeling things out... >>> >> >> Robert Bradshaw and I will likely both be in nearby San Diego then, >> in which case one or both of us could potentially come. > > Great. I had also listed a Sage tutorial as a potential topic, so as > long as some of you are around, perhaps you could pitch in for that > too? Right now we're just getting feelers for both who could teach I am certainly enthusiastic about doing a Sage tutorial. > what, and what people want to see. It's a bit of a matchmaking game, > but there's no point in offering something if we don't have anyone to > teach it. > > Note that in addition to the 2 days of tutorials and 2 days of > conference, the weekend is reserved for dev sprints. It would be very > nice to have some joint work on sage/cython/numpy/scipy take place. > Many of the usual suspects will be there. > Excellent. William From fperez.net at gmail.com Mon Jun 2 04:05:15 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 1 Jun 2008 19:05:15 -0700 Subject: [Cython] Cython tutorial at Scipy 2008? In-Reply-To: <85e81ba30806011846id8eb8b2w40828096229f3d6@mail.gmail.com> References: <85e81ba30806011323h3c7bac75ua046616e5f6d54a8@mail.gmail.com> <85e81ba30806011846id8eb8b2w40828096229f3d6@mail.gmail.com> Message-ID: On Sun, Jun 1, 2008 at 6:46 PM, William Stein wrote: > I am certainly enthusiastic about doing a Sage tutorial. Ok, thanks for the info. We'll get in touch with you when as soon as we know a bit more regarding schedule, etc. Cheers, f From ondrej at certik.cz Mon Jun 2 13:47:39 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 2 Jun 2008 13:47:39 +0200 Subject: [Cython] SWIG and Cython In-Reply-To: References: <96de71860805301247t5c6e53e8x953734e26c1340de@mail.gmail.com> Message-ID: <85b5c3130806020447g608bc3cfr104b1d8dfb5c1232@mail.gmail.com> On Fri, May 30, 2008 at 11:43 PM, Lisandro Dalcin wrote: > Well, In the case of wrapping large, legacy C++ libraries, then the > choice is not easy, it depend on many factors. I'll try to elaborate > > For something like MPI and PETSc in case for mpi4py and petsc4py, > which are both being converted to Cython, the option is really clear > for me: just use Cython, do not bother with SWIG. > > Why use Cython? Their native API's are just C (well, MPI has a > standard C++ API, but not much more featured than the C one). Wrapping > a C API and providing just C-ish functions at the Python level is just > unpythonic and painfull to use that plain C. You really need a trully > OO Python API, if not, you Python level API is just much crap than the > C one, hard to use, perhaps the only you alleviate is error checking. > Building a OO API targeting Python through using Cython and calling a > your legacy library C API is just easy, fast, convenient than using > SWIG and next use the SWIG generated code to implement a > better-looking, high-level, pythonic API for Python-level consumption. > > In the case of a large C++ API its depend on how much of that API it > makes sense to expose to Python. Let suppose we have an API with a lot > of classes and a lot of methods (let say 10,20,30 classes, with > 10,10,30 methods each). Then wrapping the full API in the current > status of Cython, that is going to be a real pain, to much manual. > BUT, if you are going to use only a few of your classes, and only > expose a few of your methods, or even want to expose new methods, more > convenient to use in the Python side, then, again I will definitelly > choose Cython. > > My personal experience in going from SWIG to Cython is worth to take > into account: > > Implementing a really good, robust, full featured, pythonic access to > full PETSc functionalities for petsc4py took me about two years of > carefull writting of typemaps, implementing Python extension types BY > HAND, and writting a log or C code inside custom typemaps, with Python > and Numpy C API calls plus PETSc calls. > > Then, after the success with Cython for the mpi4py case, I started the > process of rewritting petsc4py from scratch, working at night, let say > from 9:00 PM to until 1:00 AM or 2:00 AM. In about 10 days, yes, JUST > 10 DAYS of night work, I'm near to have implemented all the features > of the previous implementation, and the new implementation is actually > bether and more featured.... > > > In short, if you really, really, really want to wrapp the FULL C++ > library, then the Cython way will be harder than the SWIG way. But I > really doubt that you really need to access the FULL C++ API. But one > thing that you sould take for granted: the Cython way will let you > write better-looking, more easily maintenable source code. > > That's all, folks! Thanks for sharing this, it is a very useful info! Ondrej From dalcinl at gmail.com Mon Jun 2 16:58:10 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 2 Jun 2008 11:58:10 -0300 Subject: [Cython] about let users have more control on garbage collection In-Reply-To: <484291A3.5090909@behnel.de> References: <48418CC9.2030103@behnel.de> <484291A3.5090909@behnel.de> Message-ID: On 6/1/08, Stefan Behnel wrote: > Hi, > Can't you work with singleton proxies? That's what we do in lxml. There's > never more than one Python Element proxy for an XML node struct. I'll take a look, but I believe such approach, even if possible, would complicate a lot my implementation. Stefan, iff it is no much work for you, could you point me a link to the actual lxml code implementing all this machinery? -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From robertwb at math.washington.edu Mon Jun 2 20:43:08 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 2 Jun 2008 11:43:08 -0700 Subject: [Cython] SWIG and Cython In-Reply-To: References: <96de71860805301247t5c6e53e8x953734e26c1340de@mail.gmail.com> Message-ID: On May 30, 2008, at 3:20 PM, Lisandro Dalcin wrote: > On 5/30/08, rahul garg wrote: >> I am a little curious about ctypes. Have you looked at ctypes and >> what >> advantages/disadvantages do you think it has over SWIG and Cython for >> wrapping the libraries you are interested in? Of course ctypes >> doesnt work >> for C++ but any other specific problems? > > Well, a C library API is not just functions and data exported by the > linker. It is also #define'd macros, typedef'd stuff, custom C structs > with many slots. If you want to write a full featured wrapper to a > complex library, the ctypes way is going to be even more painfull than > SWIG or Cython. > > Suppose you want to access, in the MPI case, MPI_COMM_WORLD. Look at > the header files of major's MPI out there (MPICH2, OpenMPI, HP MPI, > IBM MPI): in all cases MPI_COMM_WORLD is a #defined thing, in some > case an integer, in other a pointer. And that applies to all MPI > predefined handles (operations, datatypes, etc,) and other stuff. You > just cannot access and manage a full C API in ctypes; at least, I do > not know how to handle that. > > For example, the new Cython-based mpi4py can be build against Python > version ranging from 2.3 to 3.0 (yes, it is Py3K ready!), and with all > major MPI's out there, either if they are MPI-1 or MPI-2 > implementation, in Linux, Mac OS X, and Windows. I'll never buy the > ctypes way until someone can show me that this degree of portability > is possible. Even if someone can do this with ctypes, I guess such a > code would not be easy to understand and maintain. > > For the case of petsc4py, the same comments apply. PETSc C API do have > macros. Furthermore, it's API changes from release to release (PETSc > developers does not bother from backward compatibility issues). But > petsc4py, either the former SWIG-based or the newer Cython-based, can > be used with petsc-2.3.2, petsc-2.3.3, or the latest development copy. > All this with a layer of compatibility functions and macros. How in > the hell a serious developer is going to handle all that with ctypes?? > > Please note that I do not have nothing against ctypes. It is a > wonderful and very good tool, but you cannot use it in all scenarios. > However, if the libraries are all C functions, and they do not use > macros in any way, then ctypes is still a good option, no compilation > needed, pure python code. Of course, be prepared to get a segfault > from time to time, not because of a ctypes bug, but because of your > bugs. Very good summary. Also, I would like to note that if you're actually using (rather than just wrapping) a C library then ctypes will be vastly slower. - Robert From robertwb at math.washington.edu Mon Jun 2 21:22:27 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 2 Jun 2008 12:22:27 -0700 Subject: [Cython] Cython tutorial at Scipy 2008? In-Reply-To: References: <85e81ba30806011323h3c7bac75ua046616e5f6d54a8@mail.gmail.com> Message-ID: <3C70C136-EBE6-432E-80BF-FCEA66DBDEFD@math.washington.edu> On Jun 1, 2008, at 5:46 PM, Fernando Perez wrote: > On Sun, Jun 1, 2008 at 1:23 PM, William Stein > wrote: >> On Sun, Jun 1, 2008 at 1:18 PM, Fernando Perez >> wrote: >>> Hi folks, >>> >>> I'm in charge of co-organizing (with Travis Oliphant) the tutorial >>> sessions at Scipy 2008, to be held in August of this year at >>> Caltech: >>> >>> http://www.scipy.org/SciPy2008 >>> http://www.scipy.org/SciPy2008/Tutorials >>> >>> Any chance someone from the Cython gang might come and be willing to >>> give a cython tutorial? We don't have too many details sorted out >>> yet, so we're just feeling things out... >>> >> >> Robert Bradshaw and I will likely both be in nearby San Diego then, >> in which case one or both of us could potentially come. > > Great. I had also listed a Sage tutorial as a potential topic, so as > long as some of you are around, perhaps you could pitch in for that > too? Right now we're just getting feelers for both who could teach > what, and what people want to see. It's a bit of a matchmaking game, > but there's no point in offering something if we don't have anyone to > teach it. > > Note that in addition to the 2 days of tutorials and 2 days of > conference, the weekend is reserved for dev sprints. It would be very > nice to have some joint work on sage/cython/numpy/scipy take place. > Many of the usual suspects will be there. Yes, I'll be around and would be willing to help out with something like that. - Robert From robertwb at math.washington.edu Mon Jun 2 21:43:52 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 2 Jun 2008 12:43:52 -0700 Subject: [Cython] Py3K: recent rename PyString -> PyBytes In-Reply-To: <48418AFE.2080303@behnel.de> References: <483B26B0.6060409@behnel.de> <483E536F.2080005@behnel.de> <483ECBE3.6010200@behnel.de> <483F0D0F.2080002@behnel.de> <483FE3D8.7020203@behnel.de> <1D4FE830-2596-4050-8FE4-4F5753893A57@math.washington.edu> <48418AFE.2080303@behnel.de> Message-ID: <322897BC-78CB-484D-BF55-EE3EB15AFBB6@math.washington.edu> On May 31, 2008, at 10:29 AM, Stefan Behnel wrote: > Hi, > > Robert Bradshaw wrote: >> On May 30, 2008, at 4:24 AM, Stefan Behnel wrote: >>> Also, code like >>> >>> code.putln('"%s",' % name) >>> >>> may not work in all cases. If we can't be sure it's a pure ASCII >>> identifier >>> (i.e. it was parsed as an identifier by the current Cython parser), >>> the name >>> has to be encoded as UTF-8 first and escaped using >>> Utils.escape_byte_string(). >> >> On this note, what happens when a cdef variable (like above) has non- >> ASCII characters in it? > > Not sure what you mean here. > > Do you mean in its name? That can't currently happen as the scanner > only > allows pure ASCII alphanumeric identifiers. Yep, that's what I meant (and, in fact, that's what the code above is using). > I don't think we should support > PEP 3131 in Cython, the mapping to C identifiers would become too > obscure. > > http://www.python.org/dev/peps/pep-3131/ Anything we don't support is a bug. > Keyword arguments are a different thing, but allowing non-ASCII > keywords in > function signatures will require us to write our own > ParseTupleAndKeywords() > to keep up compatibility with Py2, so I don't find that worth > supporting > either (for now). Although having our own PTAK() where you could > simply pass > Python strings as acceptable keyword list would be helpful in > general... We wouldn't have to back-port this to Py2, it would be an error in this case (maybe at C compile time there would an error raised if non- ascii identifiers are used). - Robert From robertwb at math.washington.edu Mon Jun 2 21:46:22 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 2 Jun 2008 12:46:22 -0700 Subject: [Cython] Problems with example In-Reply-To: References: Message-ID: <841A23EA-3C4D-4D96-989A-773D855D038F@math.washington.edu> On May 31, 2008, at 8:43 AM, Tommy Grav wrote: > I found the solution > > gcc -dynamiclib -undefined dynamic_lookup -single_module -o c1.so c1.o > > Note that python will not find libraries called *.dylib in your > work directory, > while *.so works fine. Does any know why? Glad you were able to figure out a solution. No idea why .so files are required--this sounds like a Python issue so perhaps someone there would know. > > Cheers > Tommy > > On May 30, 2008, at 9:22 PM, Tommy Grav wrote: > >> Hi everyone. >> >> I am on a Powerbook running 10.5.3 and trying to go through the >> example on >> http://www.perrygeo.net/wordpress/?p=116 , which fails. Can anyone >> tell me what I am doing wrong? >> >> I create the c1.pxd >> and run >> >> cython c1.pyx >> gcc -c -fPIC -I/usr/include/python2.5 cl.c >> gcc -dynamiclib -o c1.dylib -dylib c1.o >> Undefined symbols: >> "_PyDict_New", referenced from: >> ___Pyx_Import in c1.o >> >> ld: symbol(s) not found >> collect2: ld returned 1 exit status >> >> _______________________________________________ >> Cython-dev mailing list >> Cython-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/cython-dev > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From king at mathematik.uni-jena.de Mon Jun 2 21:51:31 2008 From: king at mathematik.uni-jena.de (Simon King) Date: Mon, 2 Jun 2008 21:51:31 +0200 (CEST) Subject: [Cython] Problem with cdef long Message-ID: Dear Cython team, hopefully the following really is a cython-question, not a Sage-question. Write a file Problem.pyx: ctypedef struct Term_t: long coef cdef class Term: cdef Term_t Data def __init__(self, c): self.Data.coef = c def coefficient(self): return self.Data.coef Start Sage and do sage: attach Problem.pyx sage: T=Term(3) sage: type(T.coefficient()) Then the result is and *not* ! Why is it of type 'int' although coef is defined 'long' in Term_t? How can i avoid this automatic down-grading of coef? I really want coef to be of type 'long', since 'int' isn't good enough in my application. Yours Simon From fperez.net at gmail.com Mon Jun 2 21:51:02 2008 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 2 Jun 2008 12:51:02 -0700 Subject: [Cython] Cython tutorial at Scipy 2008? In-Reply-To: <3C70C136-EBE6-432E-80BF-FCEA66DBDEFD@math.washington.edu> References: <85e81ba30806011323h3c7bac75ua046616e5f6d54a8@mail.gmail.com> <3C70C136-EBE6-432E-80BF-FCEA66DBDEFD@math.washington.edu> Message-ID: On Mon, Jun 2, 2008 at 12:22 PM, Robert Bradshaw wrote: > Yes, I'll be around and would be willing to help out with something > like that. Great, I've put you guys in as potential presenters. I'll let you know once I have more details. Cheers, f From robertwb at math.washington.edu Mon Jun 2 22:00:32 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 2 Jun 2008 13:00:32 -0700 Subject: [Cython] Problem with cdef long In-Reply-To: References: Message-ID: <7DB5F9C2-332B-4C3D-ACC6-A18B8576772D@math.washington.edu> On Jun 2, 2008, at 12:51 PM, Simon King wrote: > Dear Cython team, > > hopefully the following really is a cython-question, not a Sage- > question. > > Write a file Problem.pyx: > > ctypedef struct Term_t: > long coef > > cdef class Term: > cdef Term_t Data > def __init__(self, c): > self.Data.coef = c > def coefficient(self): > return self.Data.coef > > > Start Sage and do > sage: attach Problem.pyx > sage: T=Term(3) > sage: type(T.coefficient()) > > Then the result is > > and *not* ! > > Why is it of type 'int' although coef is defined 'long' in Term_t? I think your confusion here is over the difference between C ints/ longs/etc. and Python ints/longs. The Python "int" type is a Python object that wraps a C long. Python "long" type is an arbitrary- precision integer. Your Term.coefficient function returns a Python object, so it takes the self.Data.coef (which is a C long) and wraps it in a Python object (of type "int"). > How can i avoid this automatic down-grading of coef? > > I really want coef to be of type 'long', since 'int' isn't good > enough in > my application. When you say "int" isn't good enough, do you mean you need arbitrary precision? Because there isn't a (simple) C type that will give you that (you would have to use mpz_t or something like that). - Robert From stefan_ml at behnel.de Tue Jun 3 13:48:15 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 3 Jun 2008 13:48:15 +0200 (CEST) Subject: [Cython] Next release Message-ID: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Hi, the Python betas have been postponed by a week, to June 11: http://permalink.gmane.org/gmane.comp.python.python-3000.devel/13681 That's close to dev1, however, once "cython-release" is updated and released, we can always apply selected changes to it to keep releasing micro versions from there if we notice any incompatible changes in Py3/2.6. So we don't really have to wait for the final beta releases. I wouldn't mind a release next monday. BTW, what's the next version anyway? Has anyone taken a look at the latest Pyrex changes? Is there anything left that's worth merging in so that we can release a true 0.9.8? Stefan From stefan_ml at behnel.de Mon Jun 2 21:13:47 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 02 Jun 2008 21:13:47 +0200 Subject: [Cython] about let users have more control on garbage collection In-Reply-To: References: <48418CC9.2030103@behnel.de> <484291A3.5090909@behnel.de> Message-ID: <4844466B.8030406@behnel.de> Hi, Lisandro Dalcin wrote: > On 6/1/08, Stefan Behnel wrote: >> Hi, >> Can't you work with singleton proxies? That's what we do in lxml. There's >> never more than one Python Element proxy for an XML node struct. > > I'll take a look, but I believe such approach, even if possible, would > complicate a lot my implementation. Not sure. A factory function for creating proxies is a good idea in general, and once that's in place, you can do all sorts of weird things in there. > Stefan, iff it is no much work for you, could you point me a link to > the actual lxml code implementing all this machinery? Ah, asking about the deep magic, right? ;) Here's the _elementFactory() function: http://codespeak.net/svn/lxml/trunk/src/lxml/lxml.etree.pyx and a bit more of the proxy machinery is in here: http://codespeak.net/svn/lxml/trunk/src/lxml/proxy.pxi But that truly is black magic... Stefan From robertwb at math.washington.edu Tue Jun 3 23:13:53 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 3 Jun 2008 14:13:53 -0700 Subject: [Cython] Next release In-Reply-To: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> On Jun 3, 2008, at 4:48 AM, Stefan Behnel wrote: > Hi, > > the Python betas have been postponed by a week, to June 11: > > http://permalink.gmane.org/gmane.comp.python.python-3000.devel/13681 :-( Oh well... > That's close to dev1, however, once "cython-release" is updated and > released, we can always apply selected changes to it to keep releasing > micro versions from there if we notice any incompatible changes in > Py3/2.6. So we don't really have to wait for the final beta releases. Sure. > I wouldn't mind a release next monday. Sounds like a good target to me. > BTW, what's the next version anyway? Has anyone taken a look at the > latest > Pyrex changes? Is there anything left that's worth merging in so > that we > can release a true 0.9.8? I looked at all the Pyrex changes and incorporated everything we didn't already have except for the GIL and dependancy tracking stuff. Someone who actually deals a lot with multi-threaded code should probably comment on the GIL stuff, and the dependancy tracking stuff looks mostly good though I'd like to know what you think before blindly merging it over. - Robert From stefan_ml at behnel.de Tue Jun 3 23:06:46 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 03 Jun 2008 23:06:46 +0200 Subject: [Cython] Py3K: recent rename PyString -> PyBytes In-Reply-To: <48418AFE.2080303@behnel.de> References: <483B26B0.6060409@behnel.de> <483E536F.2080005@behnel.de> <483ECBE3.6010200@behnel.de> <483F0D0F.2080002@behnel.de> <483FE3D8.7020203@behnel.de> <1D4FE830-2596-4050-8FE4-4F5753893A57@math.washington.edu> <48418AFE.2080303@behnel.de> Message-ID: <4845B266.4040905@behnel.de> Hi, Robert Bradshaw wrote: > Stefan Behnel wrote: >>> On this note, what happens when a cdef variable (like above) has non- >>> ASCII characters in it? >> >> Do you mean in its name? That can't currently happen as the scanner only >> allows pure ASCII alphanumeric identifiers. > > Yep, that's what I meant (and, in fact, that's what the code above is > using). I know, I just wanted to mention it to be sure you're aware of it. If you say not supporting PEP 3131 is a bug, then even this may stop working one day. I was actually considering opening C source output files with a plain ASCII codec, just to make sure we get an error if we accidentally output any non-ASCII characters into the C source. I couldn't do that yet, one reason being that we currently write names as unicode strings but string literals as encoded byte strings. That will have to be fixed when migrating Cython itself to Py3, which is strict about the type of output streams (text vs. byte streams). I consider encoding at the very end to be the right thing to do, even more so the more unicode support we enable in various places. Maybe the CCodeWriter should get dedicated methods for writing byte string literals, unicode string literals and literal identifier names, so that we end up with a single place to handle output encodings. You'd then pass in the source code input encoding on creation and it would just do the right thing, depending on the method through which the code content came in. Or, instead of doing the string formatting outside of the call, we could pass all values in as "*args" and do the formatting in the code writer, where we could then convert EncodedString instances amongst the arguments, for example. I'm not sure yet what's the right way of handling this... >> Keyword arguments are a different thing, but allowing non-ASCII keywords in >> function signatures will require us to write our own ParseTupleAndKeywords() >> to keep up compatibility with Py2 > > We wouldn't have to back-port this to Py2, it would be an error in > this case (maybe at C compile time there would an error raised if non- > ascii identifiers are used). That's a good point. In Py2, keyword arguments *must* be byte strings. Although ASCII is not enforced, you can only pass non-ASCII keywords using "**dict" and accept them using "**kwargs", so it would be ok IMHO if we just generated an "#if Py2 #error" directive when we find non-ASCII keywords in the signature. Stefan From bblais at gmail.com Wed Jun 4 15:41:00 2008 From: bblais at gmail.com (Brian Blais) Date: Wed, 4 Jun 2008 09:41:00 -0400 Subject: [Cython] accessing an array of ints Message-ID: <814FE4F9-2AEB-4FCA-A020-178062EAE2AD@gmail.com> Hello, If I have an array of ints, as in numpy: a=numpy.zeros(5,int) and I call a cython function like: myfun(a) and in the cython I have: cpdef myfun(c_numpy.ndarray A): # stuff here how can I access the data pointer? I tried: cdef int *ap= A.data but that doesn't seem to work (it works for double *, but not int *). I must be missing something really simple. thanks, Brian Blais From languitar at semipol.de Wed Jun 4 16:57:53 2008 From: languitar at semipol.de (Johannes Wienke) Date: Wed, 04 Jun 2008 16:57:53 +0200 Subject: [Cython] char* and NULL in log statements Message-ID: <4846AD71.50407@semipol.de> Hi, I've just noticed that using NULL values in char* is a bad idea when converting them to a python string. This results in a segfault: cdef void doSomething(char *string): print string def doIt(): cdef char* string = NULL doSomething(NULL) *Wouldn't it be a good idea to automatically convert them to None?* For example I have a lot of debug messages that simply indicate which function is currently called and with which arguments. At the moment this is always 4 extra lines to check for the NULL value, which could even be semantically correct for the rest of that function: cdef void doSomething(char *string): if string == NULL: printString = None else: printString = string print printString # now I want to use NULL... def doIt(): cdef char* string = NULL doSomething(NULL) Even if I use a custom function for this (which isn't easy to manage in formatted strings), this will be or is a common cause for bugs. I don't want to know the times that I have forgotten to check this... - Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080604/9a939503/attachment.pgp From dagss at student.matnat.uio.no Wed Jun 4 17:48:51 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 04 Jun 2008 17:48:51 +0200 Subject: [Cython] accessing an array of ints In-Reply-To: <814FE4F9-2AEB-4FCA-A020-178062EAE2AD@gmail.com> References: <814FE4F9-2AEB-4FCA-A020-178062EAE2AD@gmail.com> Message-ID: <4846B963.3000301@student.matnat.uio.no> Brian Blais wrote: > Hello, > > If I have an array of ints, as in numpy: > > a=numpy.zeros(5,int) > > and I call a cython function like: > > myfun(a) > > and in the cython I have: > > cpdef myfun(c_numpy.ndarray A): > > # stuff here > > > how can I access the data pointer? I tried: > > cdef int *ap= A.data > > but that doesn't seem to work (it works for double *, but not int > *). I must be missing something really simple. > 1) Be aware that the "int" is really two different types (the latter one being a C "int", the former an identifier for numpy). Better use "numpy.int32" for numpy.zeros and a datatype that you know is 32 bit int in the cdef. I think that what you have written will fail on 64- bit machines but I'm not sure (could imagine that NumPy would choose int64 as the default "int", while the C compiler would stick with 32 bit *shrug*). There are some npy_int32 etc. typedefs/defines in the NumPy header files that are better used for C types rather than plain "int". 2) It will still only work if you have exactly the array created by zero above. I.e., if you instead pass something like numpy.zeros(10, int)[::-2] for instance, then the buffer isn't modified -- instead, something called "strides" is set in the struct which tells you to iterator "-2" for every step along the 1st dimension. Same goes for numpy.zeros((5,5,5), int)[1,:,2] and so on. So a) Either make absolutely sure you have a C-contiguous array or b) Assign ap as you do, but access it like this: a[i * A.strides[0]] rather than a[i]. (a[i*A.strides[0] + j*A.strides[1]] will give you 2D access and so on) Does this answer your question? Dag Sverre From stefan_ml at behnel.de Wed Jun 4 19:00:10 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 04 Jun 2008 19:00:10 +0200 Subject: [Cython] remaining Pyrex changes (was: Next release) In-Reply-To: <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> Message-ID: <4846CA1A.2060706@behnel.de> Hi, Robert Bradshaw wrote: > I looked at all the Pyrex changes and incorporated everything we > didn't already have Nice! I noticed you added a couple of things, but I lost track of what was in and what wasn't. Glad to hear most things are there already. > except for the GIL and dependancy tracking stuff. > Someone who actually deals a lot with multi-threaded code should > probably comment on the GIL stuff, I'll take a look at it. At first glance, it seems to be mostly bug fixes and a couple of simplifications, so I don't expect big surprises. > and the dependancy tracking stuff > looks mostly good though I'd like to know what you think before > blindly merging it over. That looks just fine to me. I did a mostly manual merge and it seems to work for me. I also merged (or rather reimplemented) the package resolution stuff that Pyrex does when searching for include/cimport files. This implies that when you cimport a .pxd from a package directory such as "package/module.pxd", there must be a "package/__init__.py[x]" in the search path. Cython didn't care about that before, but I think it makes sense to restrict package directory imports to the way Python would do them. That said, both imports from package directories and those from qualified dotted file names ("package.module.pxd") should work (lxml uses the latter). Robert, I don't know what naming convention you use in Sage, so could you test that the changes didn't break it? Stefan From stefan_ml at behnel.de Wed Jun 4 19:47:08 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 04 Jun 2008 19:47:08 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846AD71.50407@semipol.de> References: <4846AD71.50407@semipol.de> Message-ID: <4846D51C.2030607@behnel.de> Hi, Johannes Wienke wrote: > I've just noticed that using NULL values in char* is a bad idea when > converting them to a python string. Every pointer can potentially be NULL. If Cython added NULL checks to everything that might be NULL, you'd get pretty inefficient code with loads of expensive conditionals. You should take care to write robust code yourself. > *Wouldn't it be a good idea to automatically convert them to None?* What would be gained from having to check for None instead of having to check for NULL? Checking for NULL, referencing None and then checking for None is definitely more expensive than a straight check for NULL in your code. If you *really* want None, then you can use something like this: cdef inline stringOrNone(char* value): if value is NULL: return None return value and wrap all your char*->byte string conversions explicitly in a call to this function. Remember, explicit is better than implicit. Stefan From languitar at semipol.de Wed Jun 4 20:33:56 2008 From: languitar at semipol.de (Johannes Wienke) Date: Wed, 04 Jun 2008 20:33:56 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846D51C.2030607@behnel.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> Message-ID: <4846E014.5080300@semipol.de> Am 06/04/2008 07:47 PM schrieb Stefan Behnel: > Johannes Wienke wrote: >> I've just noticed that using NULL values in char* is a bad idea when >> converting them to a python string. > > Every pointer can potentially be NULL. If Cython added NULL checks to > everything that might be NULL, you'd get pretty inefficient code with loads of > expensive conditionals. You should take care to write robust code yourself. To my mind only char pointers would need this extra behavior as they have a somewhat special role in C because of the absence a string type. >> *Wouldn't it be a good idea to automatically convert them to None?* > > What would be gained from having to check for None instead of having to check > for NULL? Checking for NULL, referencing None and then checking for None is > definitely more expensive than a straight check for NULL in your code. None is automatically converted to Python strings, so you can use it in every string or print statement without troubles. This conversion should also only happen if the char* needs to be converted to a python string. Than it is in your own control to do this. In all cases where the string is not NULL this would only cost one little comparison with NULL. Moreover I think it's more desirable to have code that is less error prone (a program dying because of segfault is a serious bug) than the tiny speed up by wasting _one_ comparison with NULL. As someone else on this list said before: cython is all about writing C without writing C... And if I am not writing C, or at least try to minimize the part, I don't want to waste my time with ugly C memory management stuff. > If you *really* want None, then you can use something like this: > > cdef inline stringOrNone(char* value): > if value is NULL: return None > return value That's exactly what I'm doing now but that's error-prone as you have to do this manually and can forget it. - Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080604/be1436ce/attachment-0001.pgp From martin at martincmartin.com Wed Jun 4 21:38:22 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Wed, 04 Jun 2008 15:38:22 -0400 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846D51C.2030607@behnel.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> Message-ID: <4846EF2E.1000605@martincmartin.com> Stefan Behnel wrote: > Hi, > > Johannes Wienke wrote: >> I've just noticed that using NULL values in char* is a bad idea when >> converting them to a python string. > > Every pointer can potentially be NULL. If Cython added NULL checks to > everything that might be NULL, you'd get pretty inefficient code with loads of > expensive conditionals. You should take care to write robust code yourself. Actually, one of the lessons of JVM optimizations is that NULL checks are only a single instruction when they're not non-null. You can tell the compiler to predict that the "non-null" will be taken, and when it is, there's no branch penalty. However, I agree that it's against the spirit of C/C++, and hence Cython, to automatically check all pointers for null. >> *Wouldn't it be a good idea to automatically convert them to None?* > > What would be gained from having to check for None instead of having to check > for NULL? Checking for NULL, referencing None and then checking for None is > definitely more expensive than a straight check for NULL in your code. > > If you *really* want None, then you can use something like this: > > cdef inline stringOrNone(char* value): > if value is NULL: return None > return value > > and wrap all your char*->byte string conversions explicitly in a call to this > function. Remember, explicit is better than implicit. > > Stefan > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From dalcinl at gmail.com Wed Jun 4 21:45:52 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 4 Jun 2008 16:45:52 -0300 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846E014.5080300@semipol.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> Message-ID: Johannes, I think you are right. The char* -> str conversion should return None if the pointer is NULL. This is even consistent with the form Py_BuildValue() works in Python C API. As a side note, the mapping of NULL pointers to None is what SWIG does in almost all cases. I would not follow that approach in Cython for all cases, but the char* -> str is special enough. Stefan, are you completelly sure that the performance implications of checking for NULL pointers is this case are noticeable enough as to do not follow the safe path? IMHO, I would avoid the chances for segfaults. On 6/4/08, Johannes Wienke wrote: > Am 06/04/2008 07:47 PM schrieb Stefan Behnel: > > > Johannes Wienke wrote: > >> I've just noticed that using NULL values in char* is a bad idea when > >> converting them to a python string. > > > > Every pointer can potentially be NULL. If Cython added NULL checks to > > everything that might be NULL, you'd get pretty inefficient code with loads of > > expensive conditionals. You should take care to write robust code yourself. > > > To my mind only char pointers would need this extra behavior as they > have a somewhat special role in C because of the absence a string type. > > > >> *Wouldn't it be a good idea to automatically convert them to None?* > > > > What would be gained from having to check for None instead of having to check > > for NULL? Checking for NULL, referencing None and then checking for None is > > definitely more expensive than a straight check for NULL in your code. > > > None is automatically converted to Python strings, so you can use it in > every string or print statement without troubles. This conversion should > also only happen if the char* needs to be converted to a python string. > Than it is in your own control to do this. In all cases where the string > is not NULL this would only cost one little comparison with NULL. > > Moreover I think it's more desirable to have code that is less error > prone (a program dying because of segfault is a serious bug) than the > tiny speed up by wasting _one_ comparison with NULL. As someone else on > this list said before: cython is all about writing C without writing > C... And if I am not writing C, or at least try to minimize the part, I > don't want to waste my time with ugly C memory management stuff. > > > > If you *really* want None, then you can use something like this: > > > > cdef inline stringOrNone(char* value): > > if value is NULL: return None > > return value > > > That's exactly what I'm doing now but that's error-prone as you have to > do this manually and can forget it. > > > - Johannes > > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From stefan_ml at behnel.de Wed Jun 4 21:54:12 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 04 Jun 2008 21:54:12 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846E014.5080300@semipol.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> Message-ID: <4846F2E4.302@behnel.de> Hi, Johannes Wienke wrote: > Am 06/04/2008 07:47 PM schrieb Stefan Behnel: > > To my mind only char pointers would need this extra behavior as they > have a somewhat special role in C because of the absence a string type. Then why None and not ''? And why None and not a ValueError? > None is automatically converted to Python strings, so you can use it in > every string or print statement without troubles. Not on my side of the cable: Python 2.5.1 (r251:54863, Mar 7 2008, 04:10:12) [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> "abc" + None Traceback (most recent call last): File "", line 1, in TypeError: cannot concatenate 'str' and 'NoneType' objects And the following will still crash, even under your proposal: cdef char* s = NULL py_s = s print PyString_GET_SIZE(py_s) And would you also want this to work: cdef char* s = None > don't want to waste my time with ugly C memory management stuff. This is not about memory management at all. This is about making sure your code handles corner cases correctly. Where does the NULL pointer come from anyway? Is it maybe an error return of a function that you didn't catch? You wouldn't want Cython to find your off-by-one errors also, would you? >> If you *really* want None, then you can use something like this: >> >> cdef inline stringOrNone(char* value): >> if value is NULL: return None >> return value > > That's exactly what I'm doing now but that's error-prone as you have to > do this manually and can forget it. Why? Isn't it good practice to guard your code against anticipated invalid data? Don't you ask yourself from time to time when you write Python code: "can this variable be None?" So why not ask "can this pointer be NULL" in Cython? I think it's an advantage that you get a straight crash when your code contains the obvious bug that you forgot to test a NULL pointer for a NULL value, instead of an automatic coercing to None, which might propagate through your application and induce hard to track down bugs elsewhere in your program. Usually, C value handling happens within a limited scope (e.g. a function), while Python values tend to cross API boundaries into user code. Stefan From dalcinl at gmail.com Wed Jun 4 22:08:42 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 4 Jun 2008 17:08:42 -0300 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846F2E4.302@behnel.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4846F2E4.302@behnel.de> Message-ID: On 6/4/08, Stefan Behnel wrote: > Then why None and not ''? And why None and not a ValueError? Well, I would not object ValueError. That would be far better than a segfault. > > None is automatically converted to Python strings, so you can use it in > > every string or print statement without troubles. > > Not on my side of the cable: > TypeError: cannot concatenate 'str' and 'NoneType' objects I believe Johannes was actually talking about str(None) -> "None" > And the following will still crash, even under your proposal: > > cdef char* s = NULL > py_s = s > print PyString_GET_SIZE(py_s) So now I definitely believe that the 'py_s = s' line should generate a ValueError. > And would you also want this to work: > > cdef char* s = None I do not believe Johannes asked for this. That should never be valid, moreover, that should by a Cython compile-time syntax error (I believe it is, right?) > > don't want to waste my time with ugly C memory management stuff. > This is not about memory management at all. This is about making sure your > code handles corner cases correctly. Where does the NULL pointer come from > anyway? Is it maybe an error return of a function that you didn't catch? You > wouldn't want Cython to find your off-by-one errors also, would you? So the correct way of handling it is, again, raise ValueError. A segfault should never be an option. > > I think it's an advantage that you get a straight crash when your code > contains the obvious bug that you forgot to test a NULL pointer for a NULL I really do not see the advantage of a crash. Python has exceptions!! In short, would it make sense to generate ValueError for the char* -> str conversion? Honestly, the performance issues are going to be really small. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From languitar at semipol.de Wed Jun 4 22:38:16 2008 From: languitar at semipol.de (Johannes Wienke) Date: Wed, 04 Jun 2008 22:38:16 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846F2E4.302@behnel.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4846F2E4.302@behnel.de> Message-ID: <4846FD38.3080106@semipol.de> Hi, Am 06/04/2008 09:54 PM schrieb Stefan Behnel: > Johannes Wienke wrote: >> Am 06/04/2008 07:47 PM schrieb Stefan Behnel: >> >> To my mind only char pointers would need this extra behavior as they >> have a somewhat special role in C because of the absence a string type. > > Then why None and not ''? And why None and not a ValueError? Because None has for python the same meaning as NULL in C. ValueError would be another possibility. Nevertheless if NULL can be a legal value for the rest of that function, this would be as awkward to handle as the explicit check for NULL if I only want a safe print statement. >> None is automatically converted to Python strings, so you can use it in >> every string or print statement without troubles. > > Not on my side of the cable: > > Python 2.5.1 (r251:54863, Mar 7 2008, 04:10:12) > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> "abc" + None > Traceback (most recent call last): > File "", line 1, in > TypeError: cannot concatenate 'str' and 'NoneType' objects Maybe I wasn't precise enough. None can be displayed using str() and can therefore be used without worries in format strings. Moreover a python exception is a _much_ better error indicator than a segfault without any stack trace. > And the following will still crash, even under your proposal: > > cdef char* s = NULL > py_s = s > print PyString_GET_SIZE(py_s) I've never used the python API, so I have no clue how it is done there. > And would you also want this to work: > > cdef char* s = None For my purpose that's not necessary but maybe someone else will need this... Well, I don't know if that would be a good idea. Looks strange to give an explicit C variable an explicit python value. But that's only what I think at first sight. >> don't want to waste my time with ugly C memory management stuff. > > This is not about memory management at all. This is about making sure your > code handles corner cases correctly. Where does the NULL pointer come from > anyway? Is it maybe an error return of a function that you didn't catch? You > wouldn't want Cython to find your off-by-one errors also, would you? If I access memory that is not memory I own or that points to somewhere undefined this is a kind of memory management, to my mind. Maybe that's a strong personal bias as my preferred language is Java. But in pure Python there are also rarely any cases where you have to worry about directly working on the memory. Of course I want to check for NULL values and I'm a fan of defensive programming but life could be much easier if another source for segfaults is removed. And that's absolutely in the spirit of defensive programming (have a look at the chapter in Code Complete). I could understand that this is a bad idea to implement if there are any other reasons than speed. Is there any semantic problem that could arise for current code? I can't think of anything. >>> If you *really* want None, then you can use something like this: >>> >>> cdef inline stringOrNone(char* value): >>> if value is NULL: return None >>> return value >> That's exactly what I'm doing now but that's error-prone as you have to >> do this manually and can forget it. > > Why? Isn't it good practice to guard your code against anticipated invalid > data? Don't you ask yourself from time to time when you write Python code: > "can this variable be None?" So why not ask "can this pointer be NULL" in Cython? Of course this good practice and I totally agree that you have to guard code from potential error values. But this is some extra work you have to do as the programmer and programmers make mistakes and forget things and so on... So it would be really defensive if the language helps programmers to avoid such bugs. Especially because they result in ugly segfaults. And as Cython is about annotating python with C features I would claim that the main thinking is done in python and there it wouldn't be such a problem to use None. For example this works like a charm in python and is a typical logger pattern: def foo(maybeNone): logger.debug("Calling foo, argument: '%s'." % maybeNone) # other stuff > I think it's an advantage that you get a straight crash when your code > contains the obvious bug that you forgot to test a NULL pointer for a NULL > value, instead of an automatic coercing to None, which might propagate through > your application and induce hard to track down bugs elsewhere in your program. > Usually, C value handling happens within a limited scope (e.g. a function), > while Python values tend to cross API boundaries into user code. But what is a straight crash useful for, if it gives you nearly no cue where the error occurred? A python stack trace is so much more useful, even if it is much longer. And maybe it's even more robust as the error may somehow be corrected a few levels above the actually corrupt function call. Just as another thought I can give you some background why I have to constantly log method calls: For my project I provide an already existing API to old plugins written in C. So I have simply no control about how they call my API. For some of these plugins I even don't have the source code. The easiest way to track the control flow is to log method calls that print out the arguments the given to the function. - Johanne -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080604/83978cc9/attachment.pgp From languitar at semipol.de Wed Jun 4 22:45:12 2008 From: languitar at semipol.de (Johannes Wienke) Date: Wed, 04 Jun 2008 22:45:12 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4846F2E4.302@behnel.de> Message-ID: <4846FED8.3050303@semipol.de> Hi, Am 06/04/2008 10:08 PM schrieb Lisandro Dalcin: > In short, would it make sense to generate ValueError for the char* -> > str conversion? Honestly, the performance issues are going to be > really small. Just take a look at my reply to Stefan. In summary: ValueError to my mind is hard to use as I can do a lot of stuff with None in Python. I don't think that's an error case. In C, NULL is a special case exactly as None is in python. So why not use None? - Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080604/6204b919/attachment.pgp From dagss at student.matnat.uio.no Wed Jun 4 23:36:00 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 04 Jun 2008 23:36:00 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846FD38.3080106@semipol.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4846F2E4.302@behnel.de> <4846FD38.3080106@semipol.de> Message-ID: <48470AC0.7040901@student.matnat.uio.no> Johannes Wienke wrote: > Hi, > > Am 06/04/2008 09:54 PM schrieb Stefan Behnel: >> Johannes Wienke wrote: >>> Am 06/04/2008 07:47 PM schrieb Stefan Behnel: >>> >>> To my mind only char pointers would need this extra behavior as they >>> have a somewhat special role in C because of the absence a string type. >> Then why None and not ''? And why None and not a ValueError? > > Because None has for python the same meaning as NULL in C. ValueError > would be another possibility. Nevertheless if NULL can be a legal value > for the rest of that function, this would be as awkward to handle as the > explicit check for NULL if I only want a safe print statement. I have to agree with Stefan (not for performance reasons, but for not bogging down the language with too much magic). In Cython you do have to care about whether you are dealing with a C pointer or a Python object. Blurring it in this specific case doesn't seem like a good idea, to me it seems to simply encourage bad habits. (For instance, what kind of behaviour are you assuming for non-ASCII values in that automatic char* to str conversion?) char* is usually used to call into legacy C code, if you need to print them I'd argue that in most cases you are converting to char* one step too early. But if you really need to print char* directly, print "%s ... %s" % (cb2str(a), cb2str(b)) is *a lot* better than % (a, b) because a) it is explicit what's going on (conversion of a byte buffer to a string is *not* trivial and should not be transparent anyway) b) you can change cb2str to use the charset you are assuming c) your code can be made to work in Python 3 d) A function taking a char* may very well be taking unprintable characters, perhaps cb2str should really hex-encode the data instead and so on. Raising ValueError (or similar) as a generic feature for "any C pointer which coerces automatically to Python objects" (which currently is char* only?) seems like an good feature (though depending on a decision being reached on the char* coercion business then that might be deprecated anyway, and then it may not be worth the effort). However, Cython cannot be Java because C is not Java. It will always be possible to do (0)[0] and segfault; that's part of the deal when writing C. Cython can very well work safely; simply use Python strings!; avoid all pointers, etc. Java doesn't have a char* either. The Cython equivalent to java.lang.String is a Python str, not a char*! -- Dag Sverre From languitar at semipol.de Wed Jun 4 23:42:37 2008 From: languitar at semipol.de (Johannes Wienke) Date: Wed, 04 Jun 2008 23:42:37 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <48470AC0.7040901@student.matnat.uio.no> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4846F2E4.302@behnel.de> <4846FD38.3080106@semipol.de> <48470AC0.7040901@student.matnat.uio.no> Message-ID: <48470C4D.5060605@semipol.de> Hi, Am 06/04/2008 11:36 PM schrieb Dag Sverre Seljebotn: > Johannes Wienke wrote: >> Hi, >> >> Am 06/04/2008 09:54 PM schrieb Stefan Behnel: >>> Johannes Wienke wrote: >>>> Am 06/04/2008 07:47 PM schrieb Stefan Behnel: >>>> >>>> To my mind only char pointers would need this extra behavior as they >>>> have a somewhat special role in C because of the absence a string type. >>> Then why None and not ''? And why None and not a ValueError? >> Because None has for python the same meaning as NULL in C. ValueError >> would be another possibility. Nevertheless if NULL can be a legal value >> for the rest of that function, this would be as awkward to handle as the >> explicit check for NULL if I only want a safe print statement. > > char* is usually used to call into legacy C code, if you need to print > them I'd argue that in most cases you are converting to char* one step > too early. But if you really need to print char* directly, Well but for my purposes the legacy code calls into cython. > print "%s ... %s" % (cb2str(a), cb2str(b)) What is that function? I have never seen that before? Conversion problems are of course a much better reason not to implement this than speed reasons. [...] > However, Cython cannot be Java because C is not Java. It will always be > possible to do (0)[0] and segfault; that's part of the deal when > writing C. That's true. But that's a much more obvious bug. > Cython can very well work safely; simply use Python strings!; avoid all > pointers, etc. Java doesn't have a char* either. The Cython equivalent > to java.lang.String is a Python str, not a char*! But I can't, because the pugins I have to wrap work in C and there may be other situations where an existing C API requires callbacks and so on. - Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080604/d66957fb/attachment.pgp From dagss at student.matnat.uio.no Wed Jun 4 23:55:43 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 04 Jun 2008 23:55:43 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <48470C4D.5060605@semipol.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4846F2E4.302@behnel.de> <4846FD38.3080106@semipol.de> <48470AC0.7040901@student.matnat.uio.no> <48470C4D.5060605@semipol.de> Message-ID: <48470F5F.4010209@student.matnat.uio.no> Johannes Wienke wrote: > Hi, > > Am 06/04/2008 11:36 PM schrieb Dag Sverre Seljebotn: >> Johannes Wienke wrote: >>> Hi, >>> >>> Am 06/04/2008 09:54 PM schrieb Stefan Behnel: >>>> Johannes Wienke wrote: >>>>> Am 06/04/2008 07:47 PM schrieb Stefan Behnel: >>>>> >>>>> To my mind only char pointers would need this extra behavior as they >>>>> have a somewhat special role in C because of the absence a string >>>>> type. >>>> Then why None and not ''? And why None and not a ValueError? >>> Because None has for python the same meaning as NULL in C. ValueError >>> would be another possibility. Nevertheless if NULL can be a legal value >>> for the rest of that function, this would be as awkward to handle as the >>> explicit check for NULL if I only want a safe print statement. > > >> char* is usually used to call into legacy C code, if you need to print >> them I'd argue that in most cases you are converting to char* one step >> too early. But if you really need to print char* directly, > > Well but for my purposes the legacy code calls into cython. > >> print "%s ... %s" % (cb2str(a), cb2str(b)) > > What is that function? I have never seen that before? I just renamed what Stefan already provided you (I think you may have missed his point though, go reread his first post). Here, I'll reimplement it for you: cdef inline object cb2str(char* bytebuf): if bytebuf == NULL: return "None" else: return bytebuf Then, for Python 3 compatability and nice logging of any binary data you are passed, you can switch to cdef inline object cb2str(char* bytebuf): if bytebuf == NULL: return u"None" else: try: return unicode(bytebuf, "iso-8859-1") except: # have some code to return a hex-formatted string instead, # as the char* doesn't contain latin data... And so on. The point is: Outputting a char* to a logfile is a potentially complex task with associated business logic. Stick it in a function. -- Dag Sverre From dagss at student.matnat.uio.no Wed Jun 4 23:59:42 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 04 Jun 2008 23:59:42 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4846E014.5080300@semipol.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> Message-ID: <4847104E.6010003@student.matnat.uio.no> Johannes Wienke wrote: > Am 06/04/2008 07:47 PM schrieb Stefan Behnel: >> If you *really* want None, then you can use something like this: >> >> cdef inline stringOrNone(char* value): >> if value is NULL: return None >> return value > > That's exactly what I'm doing now but that's error-prone as you have to > do this manually and can forget it. (Sorry about telling you to reread, I can see that you commented on it already.) Well, in my mind, this is a reason for supporting Stefan in removing auto-coercion of char* to Python strings altogether (that is suggested once down in those unicode discussion threads, right Stefan?). Then you would get a nice compiler error when you forget it, and it won't be error-prone. -- Dag Sverre From languitar at semipol.de Thu Jun 5 00:02:53 2008 From: languitar at semipol.de (Johannes Wienke) Date: Thu, 05 Jun 2008 00:02:53 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4847104E.6010003@student.matnat.uio.no> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4847104E.6010003@student.matnat.uio.no> Message-ID: <4847110D.1090101@semipol.de> Am 06/04/2008 11:59 PM schrieb Dag Sverre Seljebotn: > Johannes Wienke wrote: >> Am 06/04/2008 07:47 PM schrieb Stefan Behnel: >>> If you *really* want None, then you can use something like this: >>> >>> cdef inline stringOrNone(char* value): >>> if value is NULL: return None >>> return value >> That's exactly what I'm doing now but that's error-prone as you have to >> do this manually and can forget it. > > (Sorry about telling you to reread, I can see that you commented on it > already.) > > Well, in my mind, this is a reason for supporting Stefan in removing > auto-coercion of char* to Python strings altogether (that is suggested > once down in those unicode discussion threads, right Stefan?). Then you > would get a nice compiler error when you forget it, and it won't be > error-prone. Now with the context of encoding issues I can see the point. Things you don't have to think of to often as Java developer. ;) Removing the conversion then would be a good idea. Compiler warnings are of course the best bug prevention. On the other hand this really handy... Hard to decide. - Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080605/6d167a1b/attachment.pgp From dagss at student.matnat.uio.no Thu Jun 5 00:06:26 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 05 Jun 2008 00:06:26 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <48470F5F.4010209@student.matnat.uio.no> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4846F2E4.302@behnel.de> <4846FD38.3080106@semipol.de> <48470AC0.7040901@student.matnat.uio.no> <48470C4D.5060605@semipol.de> <48470F5F.4010209@student.matnat.uio.no> Message-ID: <484711E2.9000709@student.matnat.uio.no> Dag Sverre Seljebotn wrote: > Then, for Python 3 compatability and nice logging of any binary data you > are passed, you can switch to > > cdef inline object cb2str(char* bytebuf): > if bytebuf == NULL: > return u"None" > else: > try: > return unicode(bytebuf, "iso-8859-1") > except: > # have some code to return a hex-formatted string instead, > # as the char* doesn't contain latin data... > > And so on. The point is: Outputting a char* to a logfile is a > potentially complex task with associated business logic. Stick it in a > function. To expand on that: For your purposes, outputting "None" is probably wrong anyway. What happens if the client library passes a char* containing the string "None"? How do you distinguish it? I.e. something like this is probably better (cb means "charbuf" btw): cdef inline object cbrepr(char* bytebuf): if bytebuf == NULL: return u'None' else: try: return u'"%s"' % unicode(bytebuf, "iso-8859-1") except: return u'binary data:%s' % your_hex_dump_func(bytebuf) (But now we're straying rather far from Cython and into general programming...) -- Dag Sverre From dagss at student.matnat.uio.no Thu Jun 5 00:10:38 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 05 Jun 2008 00:10:38 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4847110D.1090101@semipol.de> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4847104E.6010003@student.matnat.uio.no> <4847110D.1090101@semipol.de> Message-ID: <484712DE.9080904@student.matnat.uio.no> Johannes Wienke wrote: > Am 06/04/2008 11:59 PM schrieb Dag Sverre Seljebotn: >> Johannes Wienke wrote: >>> Am 06/04/2008 07:47 PM schrieb Stefan Behnel: >>>> If you *really* want None, then you can use something like this: >>>> >>>> cdef inline stringOrNone(char* value): >>>> if value is NULL: return None >>>> return value >>> That's exactly what I'm doing now but that's error-prone as you have to >>> do this manually and can forget it. >> >> (Sorry about telling you to reread, I can see that you commented on it >> already.) >> >> Well, in my mind, this is a reason for supporting Stefan in removing >> auto-coercion of char* to Python strings altogether (that is suggested >> once down in those unicode discussion threads, right Stefan?). Then >> you would get a nice compiler error when you forget it, and it won't >> be error-prone. > > Now with the context of encoding issues I can see the point. Things you > don't have to think of to often as Java developer. ;) Removing the > conversion then would be a good idea. Compiler warnings are of course > the best bug prevention. On the other hand this really handy... Hard to > decide. To further expand on this point (for the purposes on the ongoing month-long argument about char* behaviour on this mailing list): If you were doing the same thing in Java (i.e. interfacing with a C library), I can tell you that you *would* have to care about encoding issues :-) There's no way Java would have let you create a string without specifying the encoding somewhere. -- Dag Sverre From dalcinl at gmail.com Thu Jun 5 01:36:38 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 4 Jun 2008 20:36:38 -0300 Subject: [Cython] char* and NULL in log statements In-Reply-To: <4847104E.6010003@student.matnat.uio.no> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4847104E.6010003@student.matnat.uio.no> Message-ID: On 6/4/08, Dag Sverre Seljebotn wrote: > Well, in my mind, this is a reason for supporting Stefan in removing > auto-coercion of char* to Python strings altogether (that is suggested > once down in those unicode discussion threads, right Stefan?). Well, I believe that is the right approach. However, what would be the way to generate a byte string from a char* pointer? -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From greg.ewing at canterbury.ac.nz Thu Jun 5 04:06:54 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Jun 2008 14:06:54 +1200 Subject: [Cython] char* and NULL in log statements In-Reply-To: References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> Message-ID: <48474A3E.4050505@canterbury.ac.nz> Lisandro Dalcin wrote: > Stefan, are you completelly sure that the performance implications of > checking for NULL pointers is this case are noticeable enough as to do > not follow the safe path? Also keep in mind that this is only going to be done when you're about to construct a Python string, which is a fairly expensive operation. -- Greg From stefan_ml at behnel.de Thu Jun 5 08:17:58 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 05 Jun 2008 08:17:58 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: <484711E2.9000709@student.matnat.uio.no> References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4846F2E4.302@behnel.de> <4846FD38.3080106@semipol.de> <48470AC0.7040901@student.matnat.uio.no> <48470C4D.5060605@semipol.de> <48470F5F.4010209@student.matnat.uio.no> <484711E2.9000709@student.matnat.uio.no> Message-ID: <48478516.6070107@behnel.de> Hi, Dag Sverre Seljebotn wrote: > For your purposes, outputting "None" is probably > wrong anyway. What happens if the client library passes a char* > containing the string "None"? How do you distinguish it? Yes, that's a valid point (and you're right that Johannes pretty much missed my point). It would be weird if this worked: cdef char* s = NULL py_s = s assert py_s is None but this didn't: cdef char* s py_s = None s = py_s So if both would work, you might really end up with situations where you pass None into a C function as a NULL pointer. Sounds pretty dangerous to me. Even if we don't allow the second bit, the first case would already blur the border between NULL pointers and Python objects, which currently is very well separated. I think it would be a step backwards to cut into that. Stefan From stefan_ml at behnel.de Thu Jun 5 08:39:21 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 05 Jun 2008 08:39:21 +0200 Subject: [Cython] char* and NULL in log statements In-Reply-To: References: <4846AD71.50407@semipol.de> <4846D51C.2030607@behnel.de> <4846E014.5080300@semipol.de> <4847104E.6010003@student.matnat.uio.no> Message-ID: <48478A19.2030607@behnel.de> Hi, Lisandro Dalcin wrote: > On 6/4/08, Dag Sverre Seljebotn wrote: >> Well, in my mind, this is a reason for supporting Stefan in removing >> auto-coercion of char* to Python strings altogether (that is suggested >> once down in those unicode discussion threads, right Stefan?). > > Well, I believe that is the right approach. However, what would be the > way to generate a byte string from a char* pointer? The Python equivalent of a C char* is a byte string ("bytes" or "bytearray" in Py3). I totally support auto-coercion between byte strings and char*. I'm just opposed to coercing a unicode object to a char*, as that tends to be an easy source of bugs rather than something that makes your life easier. My current favourite are file names. lxml deals with two types of path names: URLs and filesystem paths. Both are UTF-8 encoded when coming from an XML file and users commonly pass byte strings in Py2 and unicode strings in Py3. UTF-8 encoded URLs are fine in the case of libxml2, but to access a file on the local file system, the file name must always use the local file system encoding, which is often an ISO encoding or stuff like cp1252 (IIRC). So it is actually pretty involved to encode a file path (once you know that it actually *is* a file path and not a URL), or to decode a user provided byte string path into a unicode string, e.g. to print it in an error message (which must use the encoding of the output device!). Encodings can really, really become a complex matter... Stefan From stefan_ml at behnel.de Thu Jun 5 15:05:17 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 05 Jun 2008 15:05:17 +0200 Subject: [Cython] nogil checking Message-ID: <4847E48D.2070103@behnel.de> Hi, I merged most of the nogil checking code from Pyrex into Cython, but I noticed that it will break a lot of code. It makes Cython pretty strict about what is allowed in a nogil function and what isn't. It even checks function pointer assignments for "nogil" matches, so you really have to take care when declaring and assigning callback functions. The problem is that it's not always possible to fix such code, as "nogil" is more about semantics than about syntax. In lxml, I use the same (SAX2) callback function struct in a number of places, and I sometimes release the GIL when passing it into libxml2 and sometimes I keep holding it, depending on what I consider faster. The related callback functions are designed to handle exactly their specific case, so some are declared "with GIL" and others do not have a declaration as they know the GIL will be held when they get called. So this requires an explicit cast now. And a cast always holds the risk of shadowing real bugs. Here's an example. Say, we have three functions that implement a callback: cdef void c1(): # Python stuff cdef void c2() with GIL: # Python stuff cdef void c3() nogil: # C stuff For the following callback function type ctypedef void (*callback)() all of the three functions should be assignable, as we know we hold the GIL when it's called, whereas for ctypedef void (*callback)() nogil only c2 and c3 should work, as we know we do not hold it. Pyrex currently distinguishes "nogil" and "with GIL" functions generally from normal cdef functions, so c2 and c3 cannot be assigned to the first callback pointer. I consider that a bug in Pyrex. Below is a very simple patch that fixes the problem above. It's a bit hackish in that it's not restricted to an assignment, but it seems to work for me. Stefan diff -r d69b3342f623 Cython/Compiler/PyrexTypes.py --- a/Cython/Compiler/PyrexTypes.py Thu Jun 05 12:20:27 2008 +0200 +++ b/Cython/Compiler/PyrexTypes.py Thu Jun 05 14:09:21 2008 +0200 @@ -695,7 +695,7 @@ class CFuncType(CType): return 0 if not self.same_calling_convention_as(other_type): return 0 - if self.nogil != other_type.nogil: + if self.nogil and not other_type.nogil: return 0 return 1 From dalcinl at gmail.com Thu Jun 5 16:24:17 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 5 Jun 2008 11:24:17 -0300 Subject: [Cython] Context.extract_module_name() not being used in Context.compile() at Main.py Message-ID: Stefan, could you review if the one-line patch I've attached is right? I believe you forgot to do that. Without this patch, Cython just does not work (at Cython compilation time) for me, not easy to figure out why, but I guess *.pxd's are not being found. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- A non-text attachment was scrubbed... Name: find_module.patch Type: application/octet-stream Size: 621 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20080605/5bc17a4f/attachment.obj From dalcinl at gmail.com Thu Jun 5 17:21:58 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 5 Jun 2008 12:21:58 -0300 Subject: [Cython] GCC warnings about uninitialized variables Message-ID: GCC complains about uninitialized variables when compiling the generated C sources in some cases like the below: cdef int CHKERR(int ierr) except -1: if ierr==0: return 0 raise RuntimeError cdef int obj2int(object ob) except *: return ob def foo(a): cdef int i = obj2int(a) CHKERR(i) The warning is something like : retval.c: In function '__pyx_pf_6retval_foo': retval.c:244: warning: '__pyx_r' may be used uninitialized in this function Then I noticed that Cython does not initializes '__pyx_r' to NULL and that's the source of the problem. IMHO, we should initialize it to NULL. Iff it even happens that (because of a bug) Cython generates bad code, It's far easier to discover it from a SystemError exception (because function returned NULL and an exception was not raised) than a segfault because of the return is just garbage. Whould a patch for this be accepted (well, it would be just one line patch) ?? Iff it is accepted, what do you prefer for the generted C code: PyObject __pyx_r = 0; or PyObject __pyx_r = NULL; -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From stefan_ml at behnel.de Thu Jun 5 21:18:24 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 05 Jun 2008 21:18:24 +0200 Subject: [Cython] Context.extract_module_name() not being used in Context.compile() at Main.py In-Reply-To: References: Message-ID: <48483C00.4040302@behnel.de> Hi, Lisandro Dalcin wrote: > Stefan, could you review if the one-line patch I've attached is right? Doesn't module_name = full_module_name or self.extract_module_name(source, options) work for you? "full_module_name" is something that is extracted from distutils, so it's the best guess we can start with if it's provided. Stefan From dalcinl at gmail.com Thu Jun 5 22:46:06 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 5 Jun 2008 17:46:06 -0300 Subject: [Cython] Context.extract_module_name() not being used in Context.compile() at Main.py In-Reply-To: <48483C00.4040302@behnel.de> References: <48483C00.4040302@behnel.de> Message-ID: On 6/5/08, Stefan Behnel wrote: > Doesn't > > module_name = full_module_name or self.extract_module_name(source, options) > > work for you? It's not working, but I believe I definitelly catched the problem... > "full_module_name" is something that is extracted from > distutils, so it's the best guess we can start with if it's provided. But then what are those lines near the begining of Context.compile(), pasted below ? if full_module_name is None: full_module_name, _ = os.path.splitext(source) full_module_name = re.sub(r'[\\/]', '.', full_module_name) full_module_name = re.sub(r'[^\w.]', '_', full_module_name) Iff I comment-out those lines, then all works with your latest patch. And now I believe they should be removed. It seems that all the previous hackery was in fact what was letting Cython manage modules better than Pyrex (which required *.pyx files with full dotted names for package/module management, righ?). In short, the lines above seems to be obsolete... > > Stefan > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From greg.ewing at canterbury.ac.nz Fri Jun 6 05:48:08 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 06 Jun 2008 15:48:08 +1200 Subject: [Cython] nogil checking In-Reply-To: <4847E48D.2070103@behnel.de> References: <4847E48D.2070103@behnel.de> Message-ID: <4848B378.4040103@canterbury.ac.nz> Stefan Behnel wrote: > cdef void c1(): > cdef void c2() with GIL: > cdef void c3() nogil: > > For the following callback function type > > ctypedef void (*callback)() > > all of the three functions should be assignable Yes, it's being too strict at the moment. I'll see about fixing it. -- Greg From greg.ewing at canterbury.ac.nz Fri Jun 6 06:09:39 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 06 Jun 2008 16:09:39 +1200 Subject: [Cython] Context.extract_module_name() not being used in Context.compile() at Main.py In-Reply-To: References: Message-ID: <4848B883.1060904@canterbury.ac.nz> Lisandro Dalcin wrote: > Without this patch, Cython just does not work (at Cython compilation > time) for me, not easy to figure out why, but I guess *.pxd's are not > being found. You may need to put __init__.py or __init__.pyx files in your source directories so that they will be recognised as packages. -- Greg From stefan_ml at behnel.de Fri Jun 6 07:46:48 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 06 Jun 2008 07:46:48 +0200 Subject: [Cython] Context.extract_module_name() not being used in Context.compile() at Main.py In-Reply-To: References: <48483C00.4040302@behnel.de> Message-ID: <4848CF48.7070400@behnel.de> Hi, Lisandro Dalcin wrote: > if full_module_name is None: > full_module_name, _ = os.path.splitext(source) > full_module_name = re.sub(r'[\\/]', '.', full_module_name) > full_module_name = re.sub(r'[^\w.]', '_', full_module_name) > > Iff I comment-out those lines, then all works with your latest patch. another good catch, thanks! Stefan From stefan_ml at behnel.de Fri Jun 6 08:33:25 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 06 Jun 2008 08:33:25 +0200 Subject: [Cython] GCC warnings about uninitialized variables In-Reply-To: References: Message-ID: <4848DA35.506@behnel.de> Hi, Lisandro Dalcin wrote: > GCC complains about uninitialized variables when compiling the > generated C sources in some cases like the below: > > cdef int CHKERR(int ierr) except -1: > if ierr==0: return 0 > raise RuntimeError > > cdef int obj2int(object ob) except *: > return ob > > def foo(a): > cdef int i = obj2int(a) > CHKERR(i) I added that as test case under tests/run/exceptionpropagation.pyx. > The warning is something like : > > retval.c: In function '__pyx_pf_6retval_foo': > retval.c:244: warning: '__pyx_r' may be used uninitialized in this function GCC points to the wrong function here (maybe due to inlining), it actually means obj2int(). Looking at the generated code: """ static int __pyx_f_20exceptionpropagation_obj2int(PyObject *__pyx_v_ob) { int __pyx_r; int __pyx_1; __pyx_1 = __pyx_PyInt_int(__pyx_v_ob); if (unlikely((__pyx_1 == (int)-1) {...; goto __pyx_L1;} __pyx_r = __pyx_1; goto __pyx_L0; __pyx_r = 0; /* sadly, this is unused code ... */ goto __pyx_L0; __pyx_L1:; __Pyx_AddTraceback("exceptionpropagation.obj2int"); __pyx_L0:; return __pyx_r; } """ In foo(), Cython correctly generates """ __pyx_r = Py_None; Py_INCREF(Py_None); /* initialisation in normal case */ goto __pyx_L0; __pyx_L1:; __Pyx_AddTraceback("exceptionpropagation.foo"); __pyx_r = NULL; /* initialisation in exception case */ __pyx_L0:; return __pyx_r; """ I'm not sure how to fix this (I've never even used "except *"). Maybe an initialisation in exactly that case makes sense? Stefan From stefan_ml at behnel.de Fri Jun 6 17:14:59 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 06 Jun 2008 17:14:59 +0200 Subject: [Cython] GCC warnings about uninitialized variables In-Reply-To: <4848DA35.506@behnel.de> References: <4848DA35.506@behnel.de> Message-ID: <48495473.5060807@behnel.de> Stefan Behnel wrote: > Lisandro Dalcin wrote: >> GCC complains about uninitialized variables when compiling the >> generated C sources in some cases like the below: >> >> cdef int CHKERR(int ierr) except -1: >> if ierr==0: return 0 >> raise RuntimeError >> >> cdef int obj2int(object ob) except *: >> return ob >> >> def foo(a): >> cdef int i = obj2int(a) >> CHKERR(i) > > I added that as test case under tests/run/exceptionpropagation.pyx. The following works for me (it's in cython-devel). Stefan # HG changeset patch # User Stefan Behnel # Date 1212765117 -7200 # Node ID 7a3fa433aaf4da1c2e2f655facd8889658820e8c # Parent e92098251a9d38ef700c78bfb6ed593f685ba563 fix return value setting for 'except *' functions diff -r e92098251a9d -r 7a3fa433aaf4 Cython/Compiler/Nodes.py --- a/Cython/Compiler/Nodes.py Fri Jun 06 08:31:11 2008 +0200 +++ b/Cython/Compiler/Nodes.py Fri Jun 06 17:11:57 2008 +0200 @@ -956,6 +956,8 @@ class FuncDefNode(StatNode, BlockNode): exc_check = self.caller_will_check_exceptions() if err_val is not None or exc_check: code.putln('__Pyx_AddTraceback("%s");' % self.entry.qualified_name) + if err_val is None and self.return_type.default_value: + err_val = self.return_type.default_value if err_val is not None: code.putln( "%s = %s;" % ( From dalcinl at gmail.com Fri Jun 6 18:11:16 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 6 Jun 2008 13:11:16 -0300 Subject: [Cython] GCC warnings about uninitialized variables In-Reply-To: <48495473.5060807@behnel.de> References: <4848DA35.506@behnel.de> <48495473.5060807@behnel.de> Message-ID: This is not working for me in other cases, I'll try to make equivalent examples and then I'll come back. On 6/6/08, Stefan Behnel wrote: > > Stefan Behnel wrote: > > Lisandro Dalcin wrote: > >> GCC complains about uninitialized variables when compiling the > >> generated C sources in some cases like the below: > >> > >> cdef int CHKERR(int ierr) except -1: > >> if ierr==0: return 0 > >> raise RuntimeError > >> > >> cdef int obj2int(object ob) except *: > >> return ob > >> > >> def foo(a): > >> cdef int i = obj2int(a) > >> CHKERR(i) > > > > I added that as test case under tests/run/exceptionpropagation.pyx. > > > The following works for me (it's in cython-devel). > > Stefan > > > # HG changeset patch > # User Stefan Behnel > # Date 1212765117 -7200 > # Node ID 7a3fa433aaf4da1c2e2f655facd8889658820e8c > # Parent e92098251a9d38ef700c78bfb6ed593f685ba563 > fix return value setting for 'except *' functions > > diff -r e92098251a9d -r 7a3fa433aaf4 Cython/Compiler/Nodes.py > --- a/Cython/Compiler/Nodes.py Fri Jun 06 08:31:11 2008 +0200 > +++ b/Cython/Compiler/Nodes.py Fri Jun 06 17:11:57 2008 +0200 > @@ -956,6 +956,8 @@ class FuncDefNode(StatNode, BlockNode): > exc_check = self.caller_will_check_exceptions() > if err_val is not None or exc_check: > code.putln('__Pyx_AddTraceback("%s");' % > self.entry.qualified_name) > + if err_val is None and self.return_type.default_value: > + err_val = self.return_type.default_value > if err_val is not None: > code.putln( > "%s = %s;" % ( > > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From stefan_ml at behnel.de Fri Jun 6 22:03:08 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 06 Jun 2008 22:03:08 +0200 Subject: [Cython] remaining Pyrex changes In-Reply-To: <4846CA1A.2060706@behnel.de> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> Message-ID: <484997FC.7010207@behnel.de> Hi, Stefan Behnel wrote: > Robert Bradshaw wrote: >> I looked at all the Pyrex changes and incorporated everything we >> didn't already have >> except for the GIL and dependancy tracking stuff. I finished merging both now and also the parser cleanup (it now has a parser context object instead of tons of pass-through keyword arguments, that was so over-due...). After adapting lxml to the stricter nogil checking (and applying the temporary (?) fix I posted), all seems to work just fine for me. I think the next big test is Sage then. Anyone? :) Robert, if I'm not mistaken, this means that the next Cython release will be 0.9.8.15, right? Could you double-check that all Pyrex changes are in now? We seem to be missing at least the forward declaration simplifications, Pyrex changesets 71/74/76. I think they'd be nice to have. We also have to check that we added all new test cases from Pyrex, at least into the tests/compile and tests/errors directories. I'll have close to no time this weekend, but since this is a simple task, maybe someone else could take a start here? More tests means a better release on Monday. Stefan From robertwb at math.washington.edu Fri Jun 6 22:10:11 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 6 Jun 2008 13:10:11 -0700 Subject: [Cython] remaining Pyrex changes In-Reply-To: <484997FC.7010207@behnel.de> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> Message-ID: On Jun 6, 2008, at 1:03 PM, Stefan Behnel wrote: > Hi, > > Stefan Behnel wrote: >> Robert Bradshaw wrote: >>> I looked at all the Pyrex changes and incorporated everything we >>> didn't already have >>> except for the GIL and dependancy tracking stuff. > > I finished merging both now and also the parser cleanup (it now has > a parser > context object instead of tons of pass-through keyword arguments, > that was so > over-due...). After adapting lxml to the stricter nogil checking > (and applying > the temporary (?) fix I posted), all seems to work just fine for me. > > I think the next big test is Sage then. Anyone? :) I plan on doing that later tonight after I get back... I've compiled sage recently with cython-devel, so don't expect any big surprises. > Robert, if I'm not mistaken, this means that the next Cython > release will be > 0.9.8.15, right? Could you double-check that all Pyrex changes are > in now? We > seem to be missing at least the forward declaration > simplifications, Pyrex > changesets 71/74/76. I think they'd be nice to have. Our circular import stuff is a superset of this, but maybe I'm mistaken here. We could support this too if you want. > We also have to check that we added all new test cases from Pyrex, > at least > into the tests/compile and tests/errors directories. I'll have > close to no > time this weekend, but since this is a simple task, maybe someone > else could > take a start here? More tests means a better release on Monday. > > Stefan > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From stefan_ml at behnel.de Sat Jun 7 08:04:04 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 07 Jun 2008 08:04:04 +0200 Subject: [Cython] remaining Pyrex changes In-Reply-To: References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> Message-ID: <484A24D4.7050901@behnel.de> Hi, Robert Bradshaw wrote: > On Jun 6, 2008, at 1:03 PM, Stefan Behnel wrote: >> We seem to be missing at least the forward declaration >> simplifications, Pyrex >> changesets 71/74/76. I think they'd be nice to have. > > Our circular import stuff is a superset of this Partly, yes. However, it's also a way to simplify declarations. From what I understood so far, it basically restricts class forward declarations to saying "this is a class", without repeating things like base classes and maybe even the [] renaming of exported types. That allows you to keep all of that in just one place in your source. Stefan From languitar at semipol.de Sun Jun 8 01:24:11 2008 From: languitar at semipol.de (Johannes Wienke) Date: Sun, 08 Jun 2008 01:24:11 +0200 Subject: [Cython] Accessing C attributes from Python functions in the interpreter Message-ID: <484B189B.8030402@semipol.de> Hi, maybe I've just got a lack of understanding or I don't know... I've got a class that in general looks like this: cpdef class Foo: cdef char* myData cdef void setData(Foo self, char *data) self.myData = data cpdef doSomething(Foo self) print self.myData The problem is that working with myData directly from Cython is no problem but calling doSomething from the python interpreter causes a segfault because self.myData then points to NULL. I've also observed that self in setData points to a different address than self in doSomething if it is called from the interpreter. What's the problem with this approach? Thanks Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080608/ea2a702d/attachment.pgp From dagss at student.matnat.uio.no Sun Jun 8 01:48:07 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 8 Jun 2008 01:48:07 +0200 (CEST) Subject: [Cython] Accessing C attributes from Python functions in the interpreter In-Reply-To: <484B189B.8030402@semipol.de> References: <484B189B.8030402@semipol.de> Message-ID: <64259.193.157.229.67.1212882487.squirrel@webmail.uio.no> Johannes Wienke wrote: > Hi, > > maybe I've just got a lack of understanding or I don't know... > > I've got a class that in general looks like this: > > cpdef class Foo: > cdef char* myData > > cdef void setData(Foo self, char *data) > self.myData = data > > cpdef doSomething(Foo self) > print self.myData > > The problem is that working with myData directly from Cython is no > problem but calling doSomething from the python interpreter causes a > segfault because self.myData then points to NULL. I've also observed > that self in setData points to a different address than self in > doSomething if it is called from the interpreter. Can you provide a full running session of what you try to do? I.e. can you confirm that you do something like this: Py> import foo Py> a = foo.Foo() Py> a.setData("aaaa") Py> a.doSomething() and it still crashes? (I'm guessing so...and in that case, I don't have a clue, wait for somebody else to respond :-) ). (But do double-check that the "self" argument is typed correctly. What happens if you try to assign "self" to a cdef-ed local variable, i.e. cdef Foo x = self print x.myData ?) Dag Sverre From robertwb at math.washington.edu Sun Jun 8 02:00:41 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 7 Jun 2008 17:00:41 -0700 Subject: [Cython] Accessing C attributes from Python functions in the interpreter In-Reply-To: <484B189B.8030402@semipol.de> References: <484B189B.8030402@semipol.de> Message-ID: On Jun 7, 2008, at 4:24 PM, Johannes Wienke wrote: > Hi, > > maybe I've just got a lack of understanding or I don't know... > > I've got a class that in general looks like this: > > cpdef class Foo: > cdef char* myData > > cdef void setData(Foo self, char *data) > self.myData = data > > cpdef doSomething(Foo self) > print self.myData > > The problem is that working with myData directly from Cython is no > problem but calling doSomething from the python interpreter causes a > segfault because self.myData then points to NULL. I've also observed > that self in setData points to a different address than self in > doSomething if it is called from the interpreter. > > What's the problem with this approach? Where are you setting your myData? If you do Foo().doSomething() then it will segfault because the data isn't set yet. If you have def set_data_from_python(self, py_string): self.myData = py_string Then the conversion will happen, but myData is set to point to the inside of py_string (i.e. no copying is actually done) and so the instant py_string gets deallocated the actual char* gets reclaimed. Hopefully this helps, though the code snippet above doesn't crash by itself. - Robert -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://codespeak.net/pipermail/cython-dev/attachments/20080607/1de1ce27/attachment.pgp From greg.ewing at canterbury.ac.nz Sun Jun 8 01:55:50 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 08 Jun 2008 11:55:50 +1200 Subject: [Cython] remaining Pyrex changes In-Reply-To: <484A24D4.7050901@behnel.de> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> <484A24D4.7050901@behnel.de> Message-ID: <484B2006.3070308@canterbury.ac.nz> Robert Bradshaw wrote: > Our circular import stuff is a superset of this What does Cython do differently in this area? -- Greg From robertwb at math.washington.edu Sun Jun 8 02:05:28 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 7 Jun 2008 17:05:28 -0700 Subject: [Cython] remaining Pyrex changes In-Reply-To: <484B2006.3070308@canterbury.ac.nz> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> <484A24D4.7050901@behnel.de> <484B2006.3070308@canterbury.ac.nz> Message-ID: <8D0A9ECF-786C-4D15-9E4F-DE04AD16849C@math.washington.edu> On Jun 7, 2008, at 4:55 PM, Greg Ewing wrote: > Robert Bradshaw wrote: > >> Our circular import stuff is a superset of this > > What does Cython do differently in this area? In a.pyx you can do "from b cimport B" and from b.pyx you can do "from a cimport A" without any problems. - Robert From greg.ewing at canterbury.ac.nz Sun Jun 8 02:31:43 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 08 Jun 2008 12:31:43 +1200 Subject: [Cython] remaining Pyrex changes In-Reply-To: <8D0A9ECF-786C-4D15-9E4F-DE04AD16849C@math.washington.edu> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> <484A24D4.7050901@behnel.de> <484B2006.3070308@canterbury.ac.nz> <8D0A9ECF-786C-4D15-9E4F-DE04AD16849C@math.washington.edu> Message-ID: <484B286F.3050200@canterbury.ac.nz> Robert Bradshaw wrote: > In a.pyx you can do "from b cimport B" and from b.pyx you can do > "from a cimport A" without any problems. There's never been any problem with that in Pyrex, as far as I know. The problems occur when .pxd files cimport from each other, not .pyx files. That's what the recently added forward declaration features are addressing. Does Cython have a different way of handling circular cimports among .pxd files? If so, how does it work? -- Greg From gfurnish at gfurnish.net Sun Jun 8 02:48:18 2008 From: gfurnish at gfurnish.net (Gary Furnish) Date: Sat, 7 Jun 2008 17:48:18 -0700 Subject: [Cython] remaining Pyrex changes In-Reply-To: <484B286F.3050200@canterbury.ac.nz> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> <484A24D4.7050901@behnel.de> <484B2006.3070308@canterbury.ac.nz> <8D0A9ECF-786C-4D15-9E4F-DE04AD16849C@math.washington.edu> <484B286F.3050200@canterbury.ac.nz> Message-ID: <8f8f8530806071748l29495fcct602d4f6c5c390ce@mail.gmail.com> Classes already get forward declared; we thus just essentially run a dependency sorting algorithm on the classes to make sure they are output in the correct order and this gives us circular imports in pxd files. On Sat, Jun 7, 2008 at 5:31 PM, Greg Ewing wrote: > Robert Bradshaw wrote: > >> In a.pyx you can do "from b cimport B" and from b.pyx you can do >> "from a cimport A" without any problems. > > There's never been any problem with that in Pyrex, as > far as I know. > > The problems occur when .pxd files cimport from each > other, not .pyx files. That's what the recently > added forward declaration features are addressing. > > Does Cython have a different way of handling circular > cimports among .pxd files? If so, how does it work? > > -- > Greg > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From greg.ewing at canterbury.ac.nz Sun Jun 8 03:02:29 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 08 Jun 2008 13:02:29 +1200 Subject: [Cython] remaining Pyrex changes In-Reply-To: <8f8f8530806071748l29495fcct602d4f6c5c390ce@mail.gmail.com> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> <484A24D4.7050901@behnel.de> <484B2006.3070308@canterbury.ac.nz> <8D0A9ECF-786C-4D15-9E4F-DE04AD16849C@math.washington.edu> <484B286F.3050200@canterbury.ac.nz> <8f8f8530806071748l29495fcct602d4f6c5c390ce@mail.gmail.com> Message-ID: <484B2FA5.6070502@canterbury.ac.nz> Gary Furnish wrote: > Classes already get forward declared; we thus just essentially run a > dependency sorting algorithm on the classes to make sure they are > output in the correct order and this gives us circular imports in pxd > files. You mean if a cimport references something that's not defined yet, it's assumed to be a class? That's not necessarily correct -- it could be a struct or union, or a typedef referring to just about any type. -- Greg From gfurnish at gfurnish.net Sun Jun 8 03:34:18 2008 From: gfurnish at gfurnish.net (Gary Furnish) Date: Sat, 7 Jun 2008 18:34:18 -0700 Subject: [Cython] remaining Pyrex changes In-Reply-To: <484B2FA5.6070502@canterbury.ac.nz> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> <484A24D4.7050901@behnel.de> <484B2006.3070308@canterbury.ac.nz> <8D0A9ECF-786C-4D15-9E4F-DE04AD16849C@math.washington.edu> <484B286F.3050200@canterbury.ac.nz> <8f8f8530806071748l29495fcct602d4f6c5c390ce@mail.gmail.com> <484B2FA5.6070502@canterbury.ac.nz> Message-ID: <8f8f8530806071834w929683dtff1f0d453b343ad6@mail.gmail.com> No, I mean we basically run a sort on the vtabs, after everything is imported. This plus a sort on (something else) is sufficient to make circular pxd imports work. On Sat, Jun 7, 2008 at 6:02 PM, Greg Ewing wrote: > Gary Furnish wrote: >> Classes already get forward declared; we thus just essentially run a >> dependency sorting algorithm on the classes to make sure they are >> output in the correct order and this gives us circular imports in pxd >> files. > > You mean if a cimport references something that's > not defined yet, it's assumed to be a class? > > That's not necessarily correct -- it could be a > struct or union, or a typedef referring to just > about any type. > > -- > Greg > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From greg.ewing at canterbury.ac.nz Sun Jun 8 06:20:13 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 08 Jun 2008 16:20:13 +1200 Subject: [Cython] GCC warnings about uninitialized variables In-Reply-To: References: Message-ID: <484B5DFD.70209@canterbury.ac.nz> Lisandro Dalcin wrote: > GCC complains about uninitialized variables when compiling the > generated C sources in some cases like the below: > > cdef int obj2int(object ob) except *: > return ob As an aside, it's probably more efficient to declare a function like that as something like cdef int obj2int(object ob) except? -1: or some other unlikely value in place of -1. Then callers will only call PyError_Occurred if that particular value is returned, rather than every time. The only time you should really have to use except * is with a function returning void. -- Greg From stefan_ml at behnel.de Sun Jun 8 07:44:24 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 08 Jun 2008 07:44:24 +0200 Subject: [Cython] Accessing C attributes from Python functions in the interpreter In-Reply-To: <484B189B.8030402@semipol.de> References: <484B189B.8030402@semipol.de> Message-ID: <484B71B8.90504@behnel.de> Hi, as a side-note: Johannes Wienke wrote: > cpdef class Foo: > cdef char* myData > > cdef void setData(Foo self, char *data) > self.myData = data no need to declare the type of self here, you can just write cdef void setData(self, char *data) self.myData = data Stefan From stefan_ml at behnel.de Sun Jun 8 08:21:27 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 08 Jun 2008 08:21:27 +0200 Subject: [Cython] remaining Pyrex changes In-Reply-To: <8f8f8530806071748l29495fcct602d4f6c5c390ce@mail.gmail.com> References: <54075.194.114.62.69.1212493695.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <18257ABF-C06F-4C78-A4D8-D35267426525@math.washington.edu> <4846CA1A.2060706@behnel.de> <484997FC.7010207@behnel.de> <484A24D4.7050901@behnel.de> <484B2006.3070308@canterbury.ac.nz> <8D0A9ECF-786C-4D15-9E4F-DE04AD16849C@math.washington.edu> <484B286F.3050200@canterbury.ac.nz> <8f8f8530806071748l29495fcct602d4f6c5c390ce@mail.gmail.com> Message-ID: <484B7A67.1030204@behnel.de> Hi, Gary Furnish wrote: > Classes already get forward declared; we thus just essentially run a > dependency sorting algorithm on the classes to make sure they are > output in the correct order and this gives us circular imports in pxd > files. the code for that is in the method sort_type_hierarchy() in ModuleNode.py. Stefan From languitar at semipol.de Sun Jun 8 10:04:48 2008 From: languitar at semipol.de (Johannes Wienke) Date: Sun, 08 Jun 2008 10:04:48 +0200 Subject: [Cython] Accessing C attributes from Python functions in the interpreter In-Reply-To: <484B71B8.90504@behnel.de> References: <484B189B.8030402@semipol.de> <484B71B8.90504@behnel.de> Message-ID: <484B92A0.5010202@semipol.de> Hi, Am 06/08/2008 07:44 AM schrieb Stefan Behnel: > as a side-note: > > Johannes Wienke wrote: >> cpdef class Foo: >> cdef char* myData >> >> cdef void setData(Foo self, char *data) >> self.myData = data > > no need to declare the type of self here, you can just write > > cdef void setData(self, char *data) > self.myData = data Yes, I know. This was only one thing I tried to find the bug.... but without luck. I will try to break it down to a working example today. Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080608/7f7d46b5/attachment.pgp From languitar at semipol.de Sun Jun 8 11:10:08 2008 From: languitar at semipol.de (Johannes Wienke) Date: Sun, 08 Jun 2008 11:10:08 +0200 Subject: [Cython] Accessing C attributes from Python functions in the interpreter In-Reply-To: <64259.193.157.229.67.1212882487.squirrel@webmail.uio.no> References: <484B189B.8030402@semipol.de> <64259.193.157.229.67.1212882487.squirrel@webmail.uio.no> Message-ID: <484BA1F0.60209@semipol.de> Am 06/08/2008 01:48 AM schrieb Dag Sverre Seljebotn: > Johannes Wienke wrote: >> Hi, >> >> maybe I've just got a lack of understanding or I don't know... >> >> I've got a class that in general looks like this: >> >> cpdef class Foo: >> cdef char* myData >> >> cdef void setData(Foo self, char *data) >> self.myData = data >> >> cpdef doSomething(Foo self) >> print self.myData >> >> The problem is that working with myData directly from Cython is no >> problem but calling doSomething from the python interpreter causes a >> segfault because self.myData then points to NULL. I've also observed >> that self in setData points to a different address than self in >> doSomething if it is called from the interpreter. > > Can you provide a full running session of what you try to do? Ok, I have found the problem but have absolutely no clue how to solve this: My code has to run inside Reinteract (http://fishsoup.net/software/reinteract/). At the bottom of the site the author explains the system Reinteract uses: "However, if reinteract detects that a statement modifies a variable, then it makes a shallow copy of the variable using copy.copy()" This call to copy seems to be the problem because C data structures seem to be forgotten while copying. Here's a little test case: cdef extern from "string.h": char *strcpy(char *dest, char *src) long strlen(char *s) cdef extern from "stdlib.h": void *calloc(long nmemb, long size) cpdef class Foo: cdef char *arg def fillArg(self, string): s = string self.arg = calloc(strlen(s), sizeof(char)) strcpy(self.arg, s) def tellArg(self): if self.arg == NULL: print "I don't have an arg" else: print self.arg >>> from ship import test >>> f = test.Foo() >>> f.fillArg("Hey") >>> f.tellArg() Hey >>> import copy >>> fCopy = copy.copy(f) >>> fCopy.tellArg() I don't have an arg In my case I simply didn't check that arg was not NULL and that resulted in the segfault. But is there a way to convince copy to copy also the C declarations? Thanks in advance Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080608/c3caae25/attachment-0001.pgp From dagss at student.matnat.uio.no Sun Jun 8 11:16:15 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 8 Jun 2008 11:16:15 +0200 (CEST) Subject: [Cython] Accessing C attributes from Python functions in the interpreter In-Reply-To: <484BA1F0.60209@semipol.de> References: <484B189B.8030402@semipol.de> <64259.193.157.229.67.1212882487.squirrel@webmail.uio.no> <484BA1F0.60209@semipol.de> Message-ID: <64326.193.157.229.67.1212916575.squirrel@webmail.uio.no> Johannes Wienke wrote: > > But is there a way to convince copy to copy also the C declarations? > Google for "python copy module". The copy module docs then will state that it uses the pickling API. So see the pyrex docs for pickling and try that... Dag Sverre From languitar at semipol.de Sun Jun 8 12:02:41 2008 From: languitar at semipol.de (Johannes Wienke) Date: Sun, 08 Jun 2008 12:02:41 +0200 Subject: [Cython] Accessing C attributes from Python functions in the interpreter In-Reply-To: <64326.193.157.229.67.1212916575.squirrel@webmail.uio.no> References: <484B189B.8030402@semipol.de> <64259.193.157.229.67.1212882487.squirrel@webmail.uio.no> <484BA1F0.60209@semipol.de> <64326.193.157.229.67.1212916575.squirrel@webmail.uio.no> Message-ID: <484BAE41.4070108@semipol.de> Am 06/08/2008 11:16 AM schrieb Dag Sverre Seljebotn: > Johannes Wienke wrote: >> But is there a way to convince copy to copy also the C declarations? >> > > Google for "python copy module". The copy module docs then will state that > it uses the pickling API. So see the pyrex docs for pickling and try > that... Thanks for that hint, but where exactly do I find something about pickling in the Pyrex docs? Furthermore, wouldn't it be worth to mention this in the Cython docs (preferably the Sphinx documentation)? Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080608/f13b6556/attachment.pgp From dagss at student.matnat.uio.no Sun Jun 8 12:57:44 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 8 Jun 2008 12:57:44 +0200 (CEST) Subject: [Cython] Accessing C attributes from Python functions in the interpreter In-Reply-To: <484BAE41.4070108@semipol.de> References: <484B189B.8030402@semipol.de> <64259.193.157.229.67.1212882487.squirrel@webmail.uio.no> <484BA1F0.60209@semipol.de> <64326.193.157.229.67.1212916575.squirrel@webmail.uio.no> <484BAE41.4070108@semipol.de> Message-ID: <64409.193.157.229.67.1212922664.squirrel@webmail.uio.no> Johannes Wienke wrote: > Am 06/08/2008 11:16 AM schrieb Dag Sverre Seljebotn: >> Johannes Wienke wrote: >>> But is there a way to convince copy to copy also the C declarations? >>> >> >> Google for "python copy module". The copy module docs then will state >> that >> it uses the pickling API. So see the pyrex docs for pickling and try >> that... > > Thanks for that hint, but where exactly do I find something about > pickling in the Pyrex docs? > I'm sorry, I think I've misunderstood something myself here. It looks like the pickling API is a generic Python feature towards extension classes, and not really a Cython (or Pyrex) feature as such. So the documentation for it can be found in the Python pickling module docs. Also there's this thread from not long ago: http://thread.gmane.org/gmane.comp.python.cython.devel/1896 Dag Sverre From greg.ewing at canterbury.ac.nz Sun Jun 8 13:44:31 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 08 Jun 2008 23:44:31 +1200 Subject: [Cython] ANN: Pyrex 0.9.8.3 Message-ID: <484BC61F.8040407@canterbury.ac.nz> Pyrex 0.9.8.3 is now available: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/ Compiling multiple .pyx files in one go works properly now, and can be substantially faster if you have a lot of modules that cimport from each other. I had to rearrange various things to make this work, so I hope I haven't broken anything. The compatibility rules for nogil function pointers have been fixed, so you can assign a nogil function to a function pointer that isn't declared nogil (but not the other way around). Plus a few other bug fixes listed in CHANGES. What is Pyrex? -------------- Pyrex is a language for writing Python extension modules. It lets you freely mix operations on Python and C data, with all Python reference counting and error checking handled automatically. From stefan_ml at behnel.de Sun Jun 8 14:59:28 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 08 Jun 2008 14:59:28 +0200 Subject: [Cython] Profiling Cython Message-ID: <484BD7B0.80504@behnel.de> Hi, I did a tiny bit of profiling on Cython compiling lxml.etree. Here are the numbers I get: """ 3127018 function calls (2777951 primitive calls) in 25.128 CPU seconds Ordered by: internal time, call count List reduced from 1035 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 119144 4.579 0.000 4.671 0.000 Scanners.py:148(run_machine_inlined) 18316 1.586 0.000 1.586 0.000 codecs.py:371(read) 119144 1.074 0.000 5.746 0.000 Scanners.py:109(scan_a_token) 88362 1.023 0.000 7.976 0.000 Scanners.py:88(read) 29680 0.673 0.000 1.659 0.000 Code.py:103(mark_pos) 77 0.588 0.008 0.588 0.008 posixpath.py:168(exists) 88362 0.538 0.000 8.514 0.000 Scanning.py:397(next) 23318/2985 0.495 0.000 0.517 0.000 Nodes.py:155(end_pos) 70446 0.442 0.000 0.591 0.000 Code.py:62(put) 90657/10562 0.351 0.000 4.549 0.000 Parsing.py:59(p_binop_expr) [...] """ So the major headache here is Scanners.py in Plex. The method at the top-rank is a huge function. According to the comments, it's the result of inlining a couple of method calls that originally lead to slow code, and it looks heavily profiled already. Assuming that further optimisation attempts were rather futile, I just compiled the module with Cython. The first (obvious) result is that the internal calls disappear from the profile log, as they are now internal C calls. The call that remains is Scanner.next(), which originally took an accumulated 8.5 seconds. In the compiled version, it's down to just over 5 seconds, that's more than 40 percent faster. """ 2595627 function calls (2246560 primitive calls) in 18.681 CPU seconds Ordered by: internal time, call count List reduced from 1028 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 88362 4.246 0.000 5.041 0.000 Scanning.py:397(next) 29680 0.673 0.000 1.632 0.000 Code.py:103(mark_pos) 70446 0.439 0.000 0.586 0.000 Code.py:62(put) 90657/10562 0.335 0.000 3.228 0.000 Parsing.py:59(p_binop_expr) 23318/2985 0.316 0.000 0.338 0.000 Nodes.py:155(end_pos) 65791 0.295 0.000 0.903 0.000 Code.py:47(putln) 29677 0.289 0.000 0.915 0.000 Code.py:93(file_contents) 4724 0.287 0.000 0.292 0.000 Symtab.py:532(allocate_temp) 88070 0.247 0.000 0.247 0.000 ExprNodes.py:192(subexpr_nodes) 52071 0.232 0.000 0.232 0.000 Nodes.py:82(__init__) [...] """ In total, I get an improvement of 12% in compilation time. That makes me think that it's actually worth putting the compilation into Cython's own setup.py, and installing the compiled Scanning module next to the Python one (Python prefers C extensions on import). Here's a patch, what do you think? Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: compile-scan