From dagss at student.matnat.uio.no Tue Apr 1 20:33:06 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 01 Apr 2008 20:33:06 +0200 Subject: [Cython] Robin Message-ID: <47F27FE2.8090309@student.matnat.uio.no> Just saw this: http://robin.python-hosting.com/ Seems like a good tool taking it on face value, but I didn't try it or look at the source (there is a numpy directory in the source tree though...). Perhaps there is something we can lift from it... I saw it on the Python gsoc list when somebody was looking for mentors for it, full email: Hello all, I am an M.Sc. student at Tel-Aviv University, Israel, and an open source developer of the project "Robin" (see http://corwin.amber.googlepages.com/, http://robin.python-hosting.com for a taste) which is somewhat of a SWIG alternative for automatically creating Python bindings for C++ libraries. It provides several features missing from existing tools, and is also cleaner, maintainable, and extendable. I would like to push Robin forward as part of Google's "Summer of Code", which means: 1. Create a good reference manual and tutorial (with demos) 2. Compile packages for popular OS distros 3. Stabilize Robin such that it would be capable of seamlessly generate bindings for some well-known open-source C++ libraries such as a. webkit (http://webkit.org) b. libtorrent (http://libtorrent.rakshasa.no/) And if anyone has any other libraries to "challange" Robin I would be happy to use them as test cases and refine Robin that way. I am still looking for a mentor from PSF. thank you, -- Shachar -- Dag Sverre From dagss at student.matnat.uio.no Tue Apr 1 23:34:58 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 01 Apr 2008 23:34:58 +0200 Subject: [Cython] Generic Message-ID: <47F2AA82.2080102@student.matnat.uio.no> Another draft from me that is an attempt to see if my drafts and Fabrizio's drafts have common traits that can be developed together. http://wiki.cython.org/enhancements/treevisitors Stefan, we already talked about some of this... -- Dag Sverre From dagss at student.matnat.uio.no Tue Apr 1 23:36:05 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 01 Apr 2008 23:36:05 +0200 Subject: [Cython] Generic tree visitors In-Reply-To: <47F2AA82.2080102@student.matnat.uio.no> References: <47F2AA82.2080102@student.matnat.uio.no> Message-ID: <47F2AAC5.8060304@student.matnat.uio.no> ...should have been the title. -- Dag Sverre From dagss at student.matnat.uio.no Wed Apr 2 20:37:46 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 02 Apr 2008 20:37:46 +0200 Subject: [Cython] Wiki cleanup / CEP numbering scheme Message-ID: <47F3D27A.80501@student.matnat.uio.no> I'd like to spend an hour or so cleaning up the enhancements page if this (I guess it is not a necessity, but if it should be done anyway then this is a good time). If I get thumbs up for the following I'll get at it. I propose the following scheme: Number each document with a CEP number. Numbering scheme: CEP1xx: "Frontend". User experience CEPs and meta-CEPs; high-level goals that drive development process ("better C++ support", "better numpy support") as well as build system, command-line interface, usage as library etc. CEP2xx: Better support for compiling non-extended Python code CEP3xx: New features for the Cython language that doesn't overlap with Python (templates, code in pxd files) CEP4xx: Optimizations -- things that doesn't change the Cython language but which changes C output. For instance for-in => for-from transform. CEP9xx: Cython implementation things (transformation pipelines, development of new parser, etc.) The important thing isn't that we cover every case now, but that documents that's numbered now doesn't have to be renumbered. Each document has the following fields (guidelines only): - CEP status: Anyone can submit a CEP with status to "Idea" (working on it) or "Proposed" (believe it is ready for getting accepted), project administrators can set status to "Accepted" - Implementation status: "None", "Prototyped", "Patch ready", "Committed in #...." etc. Author field is not necesarry (one can check wiki info), however if under development the developer "owning" the implementation initiative for it should be noted. Also the CEPs that goes into a GSoC application should note that in a Comment field. In the wiki: - Pages keep their current URL, but the CEP number is noted on the enhancements page and in the title - The enhancements page has a link to a description of this system, then CEPs by overall category. However the CEPs are NOT necesarrily sorted, as one might want to create subgroups of CEPs - A brainstorming section at the end of enhancements contains stuff that is on their way to become a CEP but is not specific enough (however the bar shouldn't be too high if it is evident that the idea is an isolated concept that doesn't need to be broken up, such as "better local variable handling") -- Dag Sverre From robertwb at math.washington.edu Thu Apr 3 07:13:05 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 2 Apr 2008 22:13:05 -0700 Subject: [Cython] cimport inside a package In-Reply-To: <20080327162858.937d684c.simon@arrowtheory.com> References: <20080327161750.083aecce.simon@arrowtheory.com> <49F941B9-B09C-4B26-8751-974329FE45D2@math.washington.edu> <20080327162858.937d684c.simon@arrowtheory.com> Message-ID: I think the problem is that it's getting confused between the names "foo" and "zap.foo." I'm not sure, however, what the correct solution to this is, other than giving the names via setup.py, but perhaps someone more familiar with distutils and/or compiling extensions by hand would have a better idea what's going on. - Robert On Mar 27, 2008, at 1:28 PM, Simon Burton wrote: > > > I have a Makefile ( the -I. doesn't seem to help): > > > bar.so: bar.o > gcc $(CFLAGS) -shared bar.o -o bar.so > > bar.o: bar.c > gcc $(CFLAGS) -c bar.c -I/usr/include/python2.5 > > bar.c: bar.pyx > cython -I. bar.pyx > > > > foo.so: foo.o > gcc $(CFLAGS) -shared foo.o -o foo.so > > foo.o: foo.c > gcc $(CFLAGS) -c foo.c -I/usr/include/python2.5 > > foo.c: foo.pyx > cython -I. foo.pyx > > > > > On Thu, 27 Mar 2008 13:25:45 -0700 > Robert Bradshaw wrote: > >> What does your setup.py look like? >> >> On Mar 27, 2008, at 1:17 PM, Simon Burton wrote: >> >>> == foo.pxd == >>> >>> >>> cdef int foo() >>> >>> >>> == foo.pyx == >>> >>> cdef int foo(): >>> print "hi foo" >>> return 9 >>> >>> >>> == bar.pyx == >>> >>> cimport foo >>> >>> def bar(): >>> print foo.foo() >>> >>> ============= >>> >>> All good so far. >>> >>> If I put the above in a regular python package called zap, >>> and then (from outside of the zap package): >>> >>> $ python >>> Python 2.5.1 (r251:54863, Mar 7 2008, 04:10:12) >>> [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 >>> Type "help", "copyright", "credits" or "license" for more >>> information. >>>>>> from zap import bar >>> Traceback (most recent call last): >>> File "", line 1, in >>> File "bar.pyx", line 2, in bar >>> ImportError: No module named foo >>>>>> >>> >>> The C code in bar.c is trying to import module bar. >>> What is the correct cimport invocation in bar.pyx ? >>> I tried some obvious permutations ("from zap cimport bar"), but >>> nothing got past >>> the cython compiler. >>> >>> Simon. >>> >>> _______________________________________________ >>> Cython-dev mailing list >>> Cython-dev at codespeak.net >>> http://codespeak.net/mailman/listinfo/cython-dev >> > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From robertwb at math.washington.edu Thu Apr 3 08:19:16 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 2 Apr 2008 23:19:16 -0700 Subject: [Cython] --cplus option to the "cython" command line script In-Reply-To: <47DEEAF2.6070805@martincmartin.com> References: <47DEEAF2.6070805@martincmartin.com> Message-ID: <044F70B6-0E1A-4294-874E-5AAE38447627@math.washington.edu> Done. On Mar 17, 2008, at 3:04 PM, Martin C. Martin wrote: > Hi, > > the --cplus option doesn't appear in the usage, and the comments in > the > code say its only supported on MacOS X. But that isn't true, it's > used > by the distutils when you specify langauge="c++". > > Can it be added to the usage in Cython/Compiler/CmdLine.py ? > > Best, > Martin > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From mistobaan at gmail.com Thu Apr 3 08:44:35 2008 From: mistobaan at gmail.com (Fabrizio Milo aka misto) Date: Thu, 3 Apr 2008 08:44:35 +0200 Subject: [Cython] Wiki cleanup / CEP numbering scheme In-Reply-To: <47F3D27A.80501@student.matnat.uio.no> References: <47F3D27A.80501@student.matnat.uio.no> Message-ID: Great, I had this idea in mind too. Thumbs up! Fabrizio From robertwb at math.washington.edu Fri Apr 4 01:26:59 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 3 Apr 2008 16:26:59 -0700 Subject: [Cython] [Pyrex] C++ forward declaration In-Reply-To: <47F5663B.4010100@acusim.com> References: <47F5663B.4010100@acusim.com> Message-ID: <2D8F0128-966B-4965-83E9-F069FC1EC680@math.washington.edu> On Apr 3, 2008, at 4:20 PM, Ravi Lanka wrote: > Pyrex gurus, > > I am trying to wrap two C++ classes in a Namespace called > "MyNameSpace". I was breezing through using some of the ideas > shared by > Lenard and others. I got stuck with a case of forward declaration as > below. The two classes I have, A and B, have methods that require the > objects corresponding to B and A. If I try to declare, say class B, > without all the methods before class A and then try to create the > complete code for class B, it complains that class B is re-declared. > How do I get around this problem ? > > cdef extern from "A.h": > ctypedef struct A "MyNameSpace::A": > void (*foo)( B data ) > > cdef extern from "B.h": > ctypedef struct B "MyNameSpace::B": > void (*bar)( A data ) > > thanks > Ravi I believe you have to do a forward declaration. The following works for me: cdef extern from "B.h": ctypedef struct B cdef extern from "A.h": ctypedef struct A "MyNameSpace::A": void foo(B data) cdef extern from "B.h": ctypedef struct B "MyNameSpace::B": void bar(A data) cdef A a cdef B b # this will compile...but don't run it a.foo(b) b.bar(a) - Robert From ggellner at uoguelph.ca Fri Apr 4 05:19:21 2008 From: ggellner at uoguelph.ca (Gabriel Gellner) Date: Thu, 3 Apr 2008 23:19:21 -0400 Subject: [Cython] access to enclosing scope in a cdef Message-ID: <20080404031921.GA6772@giton> Sorry for the simple question, but I couldn't figure out how to do this myself... I am wrapping a C-code ode solver that uses a callback after each time step. So to save the output I need to append these values to a global array. Roughly I want to have a calling function that creates a numpy array that the callback will write to. The problem is I don't know how to make this kind of global variable in cython. The skeleton that I would like to be able to do (given that I have removed all the real logic): # This is the solution callback (not the model definition for those who use # ode solvers) that is the ode solver is returning the *y values after each # step. cdef void solout(int nr, double *x, double *y): # I will have a loop in the future, but as long as I can # save a single value to the array I can solve this myself. output.data[0][0] = y[0] # This the python driver def ode(t): dim = 4 # hard set to make the code short . . . # Allocate my array output = numpy.zeros((4, len(t))) # set solver values y0 = # Call the solver, (I assume func, the model, is defined elsewhere . . .) odesolve(func, y0, t) # now at this point I would want output to have its first row set I hope this is clear, if my simplification has made it worse I can give a full example of what the code would look like (that is I can get everything to work with print statements, I just am not able to save the output . . .) Any help would be greatly appreciated! Gabriel From robertwb at math.washington.edu Fri Apr 4 09:22:04 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 4 Apr 2008 00:22:04 -0700 Subject: [Cython] Cython 0.9.6.13rc1 is up. Message-ID: <46256EE6-B976-418A-99FE-73E0C2D43DF0@math.washington.edu> This has the new repository hierarchy, so you won't be able to pull from the online -devel ones. If no one reports any bugs in then I will release tomorrow. http://sage.math.washington.edu/home/robertwb/cython/ - Robert From dagss at student.matnat.uio.no Fri Apr 4 11:29:04 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 04 Apr 2008 11:29:04 +0200 Subject: [Cython] Wiki cleanup / CEP numbering scheme In-Reply-To: <47F3D27A.80501@student.matnat.uio.no> References: <47F3D27A.80501@student.matnat.uio.no> Message-ID: <47F5F4E0.6030207@student.matnat.uio.no> Dag Sverre Seljebotn wrote: > I'd like to spend an hour or so cleaning up the enhancements page if > this (I guess it is not a necessity, but if it should be done anyway > then this is a good time). If I get thumbs up for the following I'll get > at it. > OK, this is done (not all the spec titles etc. are changed accordingly but we'll take that as we go). Dag Sverre From mistobaan at gmail.com Fri Apr 4 15:14:57 2008 From: mistobaan at gmail.com (Fabrizio Milo aka misto) Date: Fri, 4 Apr 2008 15:14:57 +0200 Subject: [Cython] Wiki cleanup / CEP numbering scheme In-Reply-To: <47F5F4E0.6030207@student.matnat.uio.no> References: <47F3D27A.80501@student.matnat.uio.no> <47F5F4E0.6030207@student.matnat.uio.no> Message-ID: > OK, this is done (not all the spec titles etc. are changed accordingly > but we'll take that as we go). Good Job! Fabrizio -------------------------- Luck favors the prepared mind. (Pasteur) From simon at arrowtheory.com Fri Apr 4 16:29:37 2008 From: simon at arrowtheory.com (Simon Burton) Date: Fri, 4 Apr 2008 10:29:37 -0400 Subject: [Cython] access to enclosing scope in a cdef In-Reply-To: <20080404031921.GA6772@giton> References: <20080404031921.GA6772@giton> Message-ID: <20080404102937.b7140134.simon@arrowtheory.com> On Thu, 3 Apr 2008 23:19:21 -0400 Gabriel Gellner wrote: > Sorry for the simple question, but I couldn't figure out how to do this > myself... > > I am wrapping a C-code ode solver that uses a callback after each time step. > So to save the output I need to append these values to a global array. > > Roughly I want to have a calling function that creates a numpy array that the > callback will write to. The problem is I don't know how to make this kind of > global variable in cython. > > The skeleton that I would like to be able to do (given that I have removed all > the real logic): > > # This is the solution callback (not the model definition for those who use > # ode solvers) that is the ode solver is returning the *y values after each > # step. > cdef void solout(int nr, double *x, double *y): > # I will have a loop in the future, but as long as I can > # save a single value to the array I can solve this myself. > output.data[0][0] = y[0] > > # This the python driver > def ode(t): global output > dim = 4 # hard set to make the code short . . . > > # Allocate my array > output = numpy.zeros((4, len(t))) > > # set solver values > y0 = > # Call the solver, (I assume func, the model, is defined elsewhere . . .) > odesolve(func, y0, t) > > # now at this point I would want output to have its first row set > > I hope this is clear, if my simplification has made it worse I can give a full > example of what the code would look like (that is I can get everything to work > with print statements, I just am not able to save the output . . .) > > Any help would be greatly appreciated! > > Gabriel > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From robertwb at math.washington.edu Fri Apr 4 19:20:26 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 4 Apr 2008 10:20:26 -0700 Subject: [Cython] Wiki cleanup / CEP numbering scheme In-Reply-To: <47F5F4E0.6030207@student.matnat.uio.no> References: <47F3D27A.80501@student.matnat.uio.no> <47F5F4E0.6030207@student.matnat.uio.no> Message-ID: <8A34BEE8-3E20-4510-BCCE-0BE6F468589C@math.washington.edu> On Apr 4, 2008, at 2:29 AM, Dag Sverre Seljebotn wrote: > Dag Sverre Seljebotn wrote: >> I'd like to spend an hour or so cleaning up the enhancements page if >> this (I guess it is not a necessity, but if it should be done anyway >> then this is a good time). If I get thumbs up for the following >> I'll get >> at it. >> > OK, this is done (not all the spec titles etc. are changed accordingly > but we'll take that as we go). Thanks! From robertwb at math.washington.edu Sat Apr 5 03:01:37 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 4 Apr 2008 18:01:37 -0700 Subject: [Cython] [Pyrex] C++ forward declaration In-Reply-To: <47F6A9FF.1020005@telus.net> References: <47F5663B.4010100@acusim.com> <2D8F0128-966B-4965-83E9-F069FC1EC680@math.washington.edu> <47F6A9FF.1020005@telus.net> Message-ID: <2700F0D2-3FE8-447A-BE07-C1D3B0A187C0@math.washington.edu> On Apr 4, 2008, at 3:21 PM, Lenard Lindstrom wrote: > First, the Pyrex 0.9.6.4 compiler rejected the above code given by > Robert Bradshaw. Second, even when altered to use function pointers > the > generated code was wrong. Sorry, I just hastily typed this up in my email to demonstrate the concept of forward declaration (which was the missing piece). Also, you're right about needing to do function pointers if you're using Pyrex. > The global variable b was delcared as "B", not > "MyNameSpace::B". This example corrects both problems: > > cdef extern from "B.h": > ctypedef struct B "MyNameSpace::B" > > cdef extern from "A.h": > ctypedef struct A "MyNameSpace::A": > void (* foo)(B data) > > cdef extern from "B.h": > ctypedef struct B "MyNameSpace::B": > void (* bar)(A data) > > cdef A a > cdef B b > > # this will compile...but don't run it > a.foo(b) > b.bar(a) > > Note that B.h will be included before A.h in the generated C file. > > -- > Lenard Lindstrom > > > > _______________________________________________ > Pyrex mailing list > Pyrex at lists.copyleft.no > http://lists.copyleft.no/mailman/listinfo/pyrex From robertwb at math.washington.edu Sat Apr 5 03:25:19 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 4 Apr 2008 18:25:19 -0700 Subject: [Cython] Cython 0.9.6.13 Released Message-ID: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> Hi All. Cython 0.9.6.13 is available for download at http://cython.org or http://pypi.python.org/pypi/Cython/0.9.6.13. The main improvements to this release are - C++ exception handling (Felix Wu) - (optional) C line numbers in Errors (Gary Furnish) - some circular cimports (Gary Furnish) - (experimental) parse tree transforms (Dag Seljebotn) - struct member functions automatically coerced to function pointers (for easier C++ wrapping) - no unneeded incref on function arguments - allow single-character ascii literals to be used as ints (no need for c'x' notation) - better support for using arrays as pointers There are also several bugfixes and pre-Py3K changes due to Robert Bradshaw, Stefan Behnel, Jum Kleckner, and Chris Perkins. The compiler and package repositories have been merged, and while all history has been preserved it is a completely new repository now. We are looking forward to lots of development at Sage Developer Days 1 (http://wiki.sagemath.org/dev1) and hopefully some Google Summer of Code projects over the summer. - Robert From ggellner at uoguelph.ca Sat Apr 5 06:34:16 2008 From: ggellner at uoguelph.ca (Gabriel Gellner) Date: Sat, 5 Apr 2008 00:34:16 -0400 Subject: [Cython] python callback Message-ID: <20080405043416.GA9069@basestar> Could anyone give me a tip on how to convert a python function of the form def func(x): return some_operation(x) to a C function of the form cdef void cfunc(double x, double *y): y = some_operation(x) That is I want to be able to turn a python function into a C function with pass by reference return semantics. I looked at the cheesefinder example, and I am not sure how to adapt it, as I want to be able to have the user supply the python function, and I can't think of how to make a cdef callback function that knows what this function is, that is cdef void callback(souble x, double *y): y = func(x) I am not sure how to set func in this way, without rewriting the cdef each time. In python I would just use a higher order function . . . Thanks, Gabriel From michael.abshoff at googlemail.com Sat Apr 5 06:27:08 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Sat, 05 Apr 2008 06:27:08 +0200 Subject: [Cython] Cython 0.9.6.13 Released In-Reply-To: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> References: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> Message-ID: <47F6FF9C.5090509@gmail.com> Robert Bradshaw wrote: > Hi All. > > Cython 0.9.6.13 is available for download at http://cython.org or > http://pypi.python.org/pypi/Cython/0.9.6.13. The main improvements to > this release are > > - C++ exception handling (Felix Wu) > - (optional) C line numbers in Errors (Gary Furnish) > - some circular cimports (Gary Furnish) > - (experimental) parse tree transforms (Dag Seljebotn) > - struct member functions automatically coerced to function pointers > (for easier C++ wrapping) > - no unneeded incref on function arguments > - allow single-character ascii literals to be used as ints (no need > for c'x' notation) > - better support for using arrays as pointers > > There are also several bugfixes and pre-Py3K changes due to Robert > Bradshaw, Stefan Behnel, Jum Kleckner, and Chris Perkins. The > compiler and package repositories have been merged, and while all > history has been preserved it is a completely new repository now. > > We are looking forward to lots of development at Sage Developer Days > 1 (http://wiki.sagemath.org/dev1) and hopefully some Google Summer of > Code projects over the summer. > > - Robert Hi guys, Sage with the new Cython has a lot of leaks exposed by valgrind: ==9372== LEAK SUMMARY: ==9372== definitely lost: 232,533 bytes in 3,757 blocks. Those leaks are all considered "possibly lost" or "still reachable" with Cython 0.9.6.12, so it seems to be mostly an accounting issue [valgrind's accounting of possibly lost vs. still reachable vs. definitely lost can vary depending on other leaks, but I will spare you the details here]. While we have looked into this in the past at various Sage Days it might be a good idea to put this on the agenda for Dev Day 1 since this currently adds massive noise to the interesting bits when looking for memleaks in Sage. I can add a suppression file, but I would consider that only a last resort. Keep up the great work. Cheers, Michael > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From dagss at student.matnat.uio.no Sat Apr 5 14:31:55 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 05 Apr 2008 14:31:55 +0200 Subject: [Cython] Refactoring Errors + test harness Message-ID: <47F7713B.8020301@student.matnat.uio.no> I've been playing with making Errors an object that is passed into functions (rather than "global variable"). I think this should happen at some point as the current approach is rather nasty. This is not a pointless excersize; the long-term aim for this would be to make it possible to export Cython cleanly as a library rather than just a command-line program, which in turn is necesarry for. Also one should already operate with seperate error contexts: Some errors (like, in initialization of Builtins.py) should be ignored (now they are recorded and subsequently deleted) while other errors (initalization in Symtab.py) should probably raise exceptions instead of reporting errors as they would signify Cython bugs. The changes needed were pretty extensive though. I ended up giving every single function in 'ExprNodes.py', 'Nodes.py', 'ModuleNode.py', 'Symtab.py' a new parameter, "ctx", specifying the "compilation context" one is in and which carry error() and warning() functions. (I use a script to do this; it goes in two passes; first extract and change all function definitions, and then change all function calls using a name used in a function definition.) The alternative would be to basically set the context parameter as attributes on the nodes themselves; I felt this was unnatural as method calls on nodes does changes on the node structure, while errors encountered while doing that is something that should be reported to the caller of the method. My question then: - Will this get into Cython if I spend more time on it and finish it? It is a quite dangerous change, although now seems the right time to do it (having just done a release one has time to discover any bugs introduced). - What's the best method to regression test easily? I've not built SAGE yet and would prefer something lighter if available; somebody has a test harness set up? Is it in the Cython repo? -- Dag Sverre From dagss at student.matnat.uio.no Sat Apr 5 14:33:52 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 05 Apr 2008 14:33:52 +0200 Subject: [Cython] Refactoring Errors + test harness In-Reply-To: <47F7713B.8020301@student.matnat.uio.no> References: <47F7713B.8020301@student.matnat.uio.no> Message-ID: <47F771B0.4010809@student.matnat.uio.no> > > This is not a pointless excersize; the long-term aim for this would be > to make it possible to export Cython cleanly as a library rather than > just a command-line program, which in turn is necesarry for. > ...necesarry for creating small experimentation programs that only plays with certain parts of Cython rather than having to always do a full compile. OK, I'll readily admit the benefits aren't *that* great. I'm easily convinced into dropping it too. -- Dag Sverre From simon at arrowtheory.com Sat Apr 5 18:58:21 2008 From: simon at arrowtheory.com (simon at arrowtheory.com) Date: Sat, 5 Apr 2008 12:58:21 -0400 (EDT) Subject: [Cython] python callback In-Reply-To: <20080405043416.GA9069@basestar> References: <20080405043416.GA9069@basestar> Message-ID: <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> > Could anyone give me a tip on how to convert a python function > of the form > > def func(x): > return some_operation(x) > > to a C function of the form > > cdef void cfunc(double x, double *y): > y = some_operation(x) y[0] = some_operation(x) > > That is I want to be able to turn a python function into > a C function with pass by reference return semantics. > > I looked at the cheesefinder example, and I am not sure how to adapt it, > as I > want to be able to have the user supply the python function, and I can't > think > of how to make a cdef callback function that knows what this function is, > that > is > > cdef void callback(souble x, double *y): > y = func(x) > > I am not sure how to set func in this way, without rewriting the cdef each > time. In python I would just use a higher order function . . . in general, you need an extra "void *" argument in the callback function (which you would use to pass a python callable). Well designed C libraries use this convention, but it sounds like you don't have this. > > Thanks, > Gabriel > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > Simon. From ggellner at uoguelph.ca Sat Apr 5 19:41:03 2008 From: ggellner at uoguelph.ca (Gabriel Gellner) Date: Sat, 5 Apr 2008 13:41:03 -0400 Subject: [Cython] python callback In-Reply-To: <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> References: <20080405043416.GA9069@basestar> <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> Message-ID: <20080405174103.GA12685@basestar> On Sat, Apr 05, 2008 at 12:58:21PM -0400, simon at arrowtheory.com wrote: > > Could anyone give me a tip on how to convert a python function > > of the form > > > > def func(x): > > return some_operation(x) > > > > to a C function of the form > > > > cdef void cfunc(double x, double *y): > > y = some_operation(x) > > y[0] = some_operation(x) > Yeah, sorry that was a typo... > > > > That is I want to be able to turn a python function into > > a C function with pass by reference return semantics. > > > > I looked at the cheesefinder example, and I am not sure how to adapt it, > > as I > > want to be able to have the user supply the python function, and I can't > > think > > of how to make a cdef callback function that knows what this function is, > > that > > is > > > > cdef void callback(souble x, double *y): > > y = func(x) > > > > I am not sure how to set func in this way, without rewriting the cdef each > > time. In python I would just use a higher order function . . . > > in general, you need an extra "void *" argument in the callback > function (which you would use to pass a python callable). > Well designed C libraries use this convention, but it > sounds like you don't have this. > Curses, I guess I will have to change the original code :-( Thanks for the help. Gabriel From robertwb at math.washington.edu Sun Apr 6 01:20:22 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 5 Apr 2008 16:20:22 -0700 Subject: [Cython] Cython 0.9.6.13 Released In-Reply-To: <47F6FF9C.5090509@gmail.com> References: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> <47F6FF9C.5090509@gmail.com> Message-ID: <2B328C76-3102-4199-9319-D2EDFAE22338@math.washington.edu> On Apr 4, 2008, at 9:27 PM, Michael.Abshoff wrote: > Robert Bradshaw wrote: >> Hi All. >> >> Cython 0.9.6.13 is available for download at http://cython.org or >> http://pypi.python.org/pypi/Cython/0.9.6.13. The main improvements to >> this release are >> >> - C++ exception handling (Felix Wu) >> - (optional) C line numbers in Errors (Gary Furnish) >> - some circular cimports (Gary Furnish) >> - (experimental) parse tree transforms (Dag Seljebotn) >> - struct member functions automatically coerced to function pointers >> (for easier C++ wrapping) >> - no unneeded incref on function arguments >> - allow single-character ascii literals to be used as ints (no need >> for c'x' notation) >> - better support for using arrays as pointers >> >> There are also several bugfixes and pre-Py3K changes due to Robert >> Bradshaw, Stefan Behnel, Jum Kleckner, and Chris Perkins. The >> compiler and package repositories have been merged, and while all >> history has been preserved it is a completely new repository now. >> >> We are looking forward to lots of development at Sage Developer Days >> 1 (http://wiki.sagemath.org/dev1) and hopefully some Google Summer of >> Code projects over the summer. >> >> - Robert > > Hi guys, > > Sage with the new Cython has a lot of leaks exposed by valgrind: > > ==9372== LEAK SUMMARY: > ==9372== definitely lost: 232,533 bytes in 3,757 blocks. > > Those leaks are all considered "possibly lost" or "still reachable" > with > Cython 0.9.6.12, so it seems to be mostly an accounting issue > [valgrind's accounting of possibly lost vs. still reachable vs. > definitely lost can vary depending on other leaks, but I will spare > you > the details here]. To clarify, it doesn't sound like there are significantly many new leaks, but the accounting has shifted around a bit. > While we have looked into this in the past at various Sage Days it > might > be a good idea to put this on the agenda for Dev Day 1 since this > currently adds massive noise to the interesting bits when looking for > memleaks in Sage. I can add a suppression file, but I would consider > that only a last resort. Perhaps one thing that might cut down the noise is disabling intern_names and cache_builtins in http://hg.cython.org/cython/file/b68682070c8e/Cython/Compiler/Options.py (Also, on that note, I think we want to ship c_line_in_traceback = 1 with Sage.) I also made the default cleanup level 0 again, as it was causing Segfaults in peoples code. Here is the issue: ---- a.pyx ---- import b foo = b.B() ---- b.pyx --- class B: def __del__(self): print "hi" ---- Now B *cannot* be deleted after the module b is cleaned up because the string "hi" may have been deallocated (sometimes it seems to still work, e.g. if the memory involved hasn't been dirtied by anything else). There's also the issue of the dictionaries that don't go away when they should. - Robert From michael.abshoff at googlemail.com Sun Apr 6 01:39:32 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Sun, 06 Apr 2008 01:39:32 +0200 Subject: [Cython] Cython 0.9.6.13 Released In-Reply-To: <2B328C76-3102-4199-9319-D2EDFAE22338@math.washington.edu> References: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> <47F6FF9C.5090509@gmail.com> <2B328C76-3102-4199-9319-D2EDFAE22338@math.washington.edu> Message-ID: <47F80DB4.4050807@gmail.com> Robert Bradshaw wrote: > On Apr 4, 2008, at 9:27 PM, Michael.Abshoff wrote: >> Robert Bradshaw wrote: >>> Hi All. >>> >>> Cython 0.9.6.13 is available for download at http://cython.org or >>> http://pypi.python.org/pypi/Cython/0.9.6.13. The main improvements to >>> this release are >>> >>> - C++ exception handling (Felix Wu) >>> - (optional) C line numbers in Errors (Gary Furnish) >>> - some circular cimports (Gary Furnish) >>> - (experimental) parse tree transforms (Dag Seljebotn) >>> - struct member functions automatically coerced to function pointers >>> (for easier C++ wrapping) >>> - no unneeded incref on function arguments >>> - allow single-character ascii literals to be used as ints (no need >>> for c'x' notation) >>> - better support for using arrays as pointers >>> >>> There are also several bugfixes and pre-Py3K changes due to Robert >>> Bradshaw, Stefan Behnel, Jum Kleckner, and Chris Perkins. The >>> compiler and package repositories have been merged, and while all >>> history has been preserved it is a completely new repository now. >>> >>> We are looking forward to lots of development at Sage Developer Days >>> 1 (http://wiki.sagemath.org/dev1) and hopefully some Google Summer of >>> Code projects over the summer. >>> >>> - Robert >> Hi guys, >> >> Sage with the new Cython has a lot of leaks exposed by valgrind: >> >> ==9372== LEAK SUMMARY: >> ==9372== definitely lost: 232,533 bytes in 3,757 blocks. Hi guys >> Those leaks are all considered "possibly lost" or "still reachable" >> with >> Cython 0.9.6.12, so it seems to be mostly an accounting issue >> [valgrind's accounting of possibly lost vs. still reachable vs. >> definitely lost can vary depending on other leaks, but I will spare >> you >> the details here]. > > To clarify, it doesn't sound like there are significantly many new > leaks, but the accounting has shifted around a bit. Correct. While I do not have precise figures yet it is very close to what I saw with 0.9.6.12 and Sage 2.11. >> While we have looked into this in the past at various Sage Days it >> might >> be a good idea to put this on the agenda for Dev Day 1 since this >> currently adds massive noise to the interesting bits when looking for >> memleaks in Sage. I can add a suppression file, but I would consider >> that only a last resort. > > Perhaps one thing that might cut down the noise is disabling > intern_names and cache_builtins in > > http://hg.cython.org/cython/file/b68682070c8e/Cython/Compiler/Options.py > > (Also, on that note, I think we want to ship c_line_in_traceback = 1 > with Sage.) I also made the default cleanup level 0 again, as it was > causing Segfaults in peoples code. Here is the issue: Ok, I will dial that up and see if there is any trouble in Sage. I remember some trouble with integer.pyx but we did fix those issues when we finally squashed #1337. > ---- a.pyx ---- > > import b > foo = b.B() > > ---- b.pyx --- > > class B: > def __del__(self): > print "hi" > > ---- > > Now B *cannot* be deleted after the module b is cleaned up because > the string "hi" may have been deallocated (sometimes it seems to > still work, e.g. if the memory involved hasn't been dirtied by > anything else). > > There's also the issue of the dictionaries that don't go away when > they should. Yes, hopefully we can spend some time on this before, during or after Dev1 since I will be in Seattle for about three weeks around Dev1 :) > - Robert Cheers, Michael > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From f.guerrieri at gmail.com Sun Apr 6 12:57:48 2008 From: f.guerrieri at gmail.com (Francesco Guerrieri) Date: Sun, 6 Apr 2008 12:57:48 +0200 Subject: [Cython] Installing on Windows Message-ID: <79b79e730804060357j74b04d2xb5d4959f9a13f566@mail.gmail.com> Hello, I have added on the wiki a short tutorial on installing cython on windows, based on my experience. I thought it was useful to collect the steps on the wiki. Please check for the contents to see if I misunderstood something (the steps described have actually worked for me but maybe I was in some corner case and didn't know that :) ) I put a link to the page under the section Cython installers bye Francesco From dagss at student.matnat.uio.no Sun Apr 6 13:43:30 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 06 Apr 2008 13:43:30 +0200 Subject: [Cython] Results of XPathTransform / W3CDOM experiments Message-ID: <47F8B762.4010700@student.matnat.uio.no> I'm still fooling around with some experiments. The following is now working (in my local repo) as a way of transforming for-froms: class ForInToForFrom(XPathTransform): @template("pyr:ForInStatNode[iterator/pyr:IteratorNode/sequence/pyr:SimpleCallNode/function/pyr:NameNode/@name = 'range']") def for_in_range_to_for_from_range(self, node): result = Nodes.ForFromStatNode(... ... return result Everything happens on the Pyrex tree, there's no translation to XML or anything like that. Example attached (though you can't run it outside me repo, it's just for demonstration). The question is: Is this a way forward for transforms? For some more examples, consider that one could for instance select all equality statements that must have some coercion by "pyr:SimpleAssignmentNode[lhs/@type != rhs/@type]" But this is contrived, coercion won't work this way. But also consider that one can select inner functions by "pyr:FuncDefNode//pyr:FuncDefNode" and outer functions only by "pyr:ModuleNode/body/pyr:FuncDefNode" and so on. The gains are highest if XPath selections are used for all transforms written, because then the finite state machines (see below) can (in principle at least) be combined so that only one tree traversal per phase is needed regardless of how the code is modularized into multiple transforms. (If combining, one must use a subset of XPath where only the descendants axis is available outside of predicates, I guess this is the same as XSLT match statements?). What I've done: - Put a subset of the W3C DOM API on top of the tree. No modifications to Cython code tree was necesarry except adding a base class (and I finally had a legitimate use for a metaclass or two. Yay!). A "side-effect" is that the tree can be streamed to XML (see example code). - Use the webpath XPath 2.0 transform to select nodes (http://sourceforge.net/projects/webpath), and act on them on traversal. Questions: - Anyone know of good DOM transformation libraries for Cython? - Does anyone think this would be useful? - Does anyone think this could be a standard for writing transforms? - Any other good uses for a W3C DOM on our parse trees? (it is a seperate component) I'll assume that streaming in and out of XSLT is not going to be convenient, but something else perhaps? Some notes: - It currently scales horribly with the number of "templates"; one full tree traversal per match. In order to fix this, one either has to find a better XPath library (which must be hacked a bit), an XSLT processor or similar implemented entirely in Python, or a full-time week is needed to improve webpath by using a Finite State Automata library (which does the standard non-deterministic automatas to deterministic automatas, there are several good ones and this is not too hard to do). Does it matter if we do 30 traversals on the tree rather than 2-3? As long as it can be optimized "in principle"? - On the other hand, once that is done, one can "combine" tree traversals so that multiple transforms work in the same traversal, meaning that the number of traversals will be reduced compared to what is in sight now. - But, the current less efficient implementation is working. I might probably leave it for now at this because the gains seem less than the effort, but if anyone thinks this is interesting then speak up and we can see. -- Dag Sverre -------------- next part -------------- A non-text attachment was scrubbed... Name: testbed.py Type: text/x-python Size: 2476 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20080406/2ca2cea1/attachment-0001.py From dagss at student.matnat.uio.no Sun Apr 6 13:53:54 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 06 Apr 2008 13:53:54 +0200 Subject: [Cython] Results of XPathTransform / W3CDOM experiments In-Reply-To: <47F8B762.4010700@student.matnat.uio.no> References: <47F8B762.4010700@student.matnat.uio.no> Message-ID: <47F8B9D2.6030307@student.matnat.uio.no> > > Questions: > - Anyone know of good DOM transformation libraries for Cython? _P_ython. -- Dag Sverre From martin at martincmartin.com Sun Apr 6 14:46:56 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Sun, 06 Apr 2008 08:46:56 -0400 Subject: [Cython] Lisp inspired transforms Message-ID: <47F8C640.10901@martincmartin.com> Hi all, I've been doing some thinking and prototyping of a transform system inspired by Common Lisp macros. You can see the results as the newest CEP: http://wiki.cython.org/enhancements/metaprogramming Briefly, it allows you to define a transform in the Cython source code. The transform runs at compile time, and takes the *parse trees* of its arguments. In the examples, I define a simple symbolic differentiator "deriv" which means you can write: def eggs(a, b): return deriv(5*a+b, a) and it is translated at compile time to: def eggs(a, b): return (((5 * 1) + (0 * a)) + 0) You can use this, for example, with numerical optimization techniques like Newton's method. An example which differentiates 5 * x**3 - 10 and finds it's root is given on the above wiki page, here's the output: >>> TestTrans.myfunct(10.0) (4990.0, 1500.0) >>> TestTrans.newtons(TestTrans.myfunct, 10.0) f( 10.0 ) = 4990.0 , f'( 10.0 ) = 1500.0 f( 6.67333333333 ) = 1475.93037185 , f'( 6.67333333333 ) = 668.000666667 f( 4.46385893383 ) = 434.735082042 , f'( 4.46385893383 ) = 298.890548717 f( 3.00936301916 ) = 126.267956666 , f'( 3.00936301916 ) = 135.843986716 f( 2.07985587115 ) = 34.9852072625 , f'( 2.07985587115 ) = 64.8870066716 f( 1.54068464016 ) = 8.28568621842 , f'( 1.54068464016 ) = 35.6056374064 f( 1.3079774954 ) = 1.18847303518 , f'( 1.3079774954 ) = 25.6620769269 f( 1.26166506953 ) = 0.0415843883547 , f'( 1.26166506953 ) = 23.8769812152 f( 1.25992345957 ) = 5.73769235945e-05 , f'( 1.25992345957 ) = 23.8111068596 f( 1.2599210499 ) = 1.09734443754e-10 , f'( 1.2599210499 ) = 23.8110157797 f( 1.25992104989 ) = 0.0 , f'( 1.25992104989 ) = 23.8110157795 1.2599210498948732 It's also useful for moving computation from run time to compile time, and makes it easy for the programmer to specify what should happen where. There are many other use cases as well. I'm attaching a patch that implements the examples. It's very proof-of-concept, which means its really rough. :) I also implemented the bar minimum to get the examples working, but it gives you an idea. Best, Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: CythonMetaprogramming.tar.gz Type: application/gzip Size: 8137 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20080406/de466ac3/attachment.bin From mistobaan at gmail.com Sun Apr 6 17:19:41 2008 From: mistobaan at gmail.com (Fabrizio Milo aka misto) Date: Sun, 6 Apr 2008 17:19:41 +0200 Subject: [Cython] Results of XPathTransform / W3CDOM experiments In-Reply-To: <47F8B762.4010700@student.matnat.uio.no> References: <47F8B762.4010700@student.matnat.uio.no> Message-ID: Dag, Do you want to transform the ExprTree or the DOCUMENT (i.e. the C file) produced by the ExprTree? I understood that you were working on the second point, but seems that you are working on the first point ! Matching an expr tree with XPath, seems to me like shooting to a fly with a cannon. It will be very slow and not all the possible matching rules will be possible with XPath plus I imagine that will be some trickery with recursive functions. I would prefer other forms of matching: i.e. class ForInToForFrom(Transformation): def ex_match1 (self, tree): return len(tree) >=3 and tree is ForInStatNode and tree[1] is SimpleCallNode and tree[2] is NameNode and tree[2].name == RANGE_ID def ex_match2 (self, tree): return tree == ForInStatNode ( SimpleCallNode ( NameNode (name = RANGE_ID))) def eg_match3 (self,tree): subtree = buildMatch (""" ForInStatNode: SimpleCallNode: NameNode: name = range """) return subtree.matches(tree) def transform (self, tree): return Nodes.ForFromStatNode(...) Plex have a DSL to express matching Rules we could readapt it to match Nodes instead of chars ... To better understand what kind of transformation we would like to apply I think we should have a 10 / 20 item list of possible transforms. >From this list should be derived the best form to express the matching against the ExprTree or the AST. This things could be done in N different ways. Finding the most simple and effective is the challenge. Fabrizio -------------------------- Luck favors the prepared mind. (Pasteur) From dagss at student.matnat.uio.no Sun Apr 6 18:29:15 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 06 Apr 2008 18:29:15 +0200 Subject: [Cython] Results of XPathTransform / W3CDOM experiments In-Reply-To: References: <47F8B762.4010700@student.matnat.uio.no> Message-ID: <47F8FA5B.5070907@student.matnat.uio.no> > Matching an expr tree with XPath, seems to me like shooting to a fly > with a cannon. > That appears to be my conclusion as well, I just had to explore it (and discuss it). I wonder if you overestimate how heavy XPath (or a subset) is though as the alternatives you suggest seems to me to be just as heavy! > It will be very slow and not all the possible matching rules will be > possible with XPath plus I imagine that will be some trickery with > recursive functions. > Slow: Not slower than any alternatives doing the same thing. Matching possible: Yes, one would be able to call into any Python function so... Recursive functions: Not sure what you mean. There are potential problems related with recursive transforms processes in general but that is independent of the mechanism used for node matching. > class ForInToForFrom(Transformation): > > def ex_match1 (self, tree): > return len(tree) >=3 and tree is ForInStatNode and tree[1] is > SimpleCallNode and tree[2] is NameNode and tree[2].name == RANGE_ID > I like this. From a theoretic perspective though: But how would you select nodes depending on where it is located (ie all nested functions)? So you need to access parent stack as well. And then (in theory) you have algorithmic performance worse than with XPath, because with XPath you can turn stuff like (function//function)|(function//for) into function//(function|for) and this transformation is done at Cython launch-time (and if a problem can be pickled to speed up Cython launch). > def ex_match2 (self, tree): > return tree == ForInStatNode ( SimpleCallNode ( NameNode > (name = RANGE_ID))) > This would be about the same thing as an XPath approach! But one has to write the == machinery, and since that is done manually towards our exact tree then some optimizations might creep in. > def eg_match3 (self,tree): > subtree = buildMatch (""" > ForInStatNode: > SimpleCallNode: > NameNode: > name = range > """) > return subtree.matches(tree) > Now you have exactly the XPath approach, except that we have to reimplement a custom parser and specify a custom language! > To better understand what kind of transformation we would like to > apply I think we should have a 10 / 20 item list of possible > transforms. > Well said. I think it will probably be the first one, but at least now XPath is a bit explored so that one can recognize it if it is needed... -- Dag Sverre From ggellner at uoguelph.ca Mon Apr 7 07:57:26 2008 From: ggellner at uoguelph.ca (Gabriel Gellner) Date: Mon, 7 Apr 2008 01:57:26 -0400 Subject: [Cython] python callback In-Reply-To: <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> References: <20080405043416.GA9069@basestar> <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> Message-ID: <20080407055726.GA9347@basestar> Looking through the scipy odeint C wrapper, I realized the simple pattern to solve this (just having the callback call a global python function, that is set by the driver.) Does anyone think this is worth making a Wiki page for? If so I would be happy to write up the solution. Gabriel On Sat, Apr 05, 2008 at 12:58:21PM -0400, simon at arrowtheory.com wrote: > > Could anyone give me a tip on how to convert a python function > > of the form > > > > def func(x): > > return some_operation(x) > > > > to a C function of the form > > > > cdef void cfunc(double x, double *y): > > y = some_operation(x) > > y[0] = some_operation(x) > > > > > That is I want to be able to turn a python function into > > a C function with pass by reference return semantics. > > > > I looked at the cheesefinder example, and I am not sure how to adapt it, > > as I > > want to be able to have the user supply the python function, and I can't > > think > > of how to make a cdef callback function that knows what this function is, > > that > > is > > > > cdef void callback(souble x, double *y): > > y = func(x) > > > > I am not sure how to set func in this way, without rewriting the cdef each > > time. In python I would just use a higher order function . . . > > in general, you need an extra "void *" argument in the callback > function (which you would use to pass a python callable). > Well designed C libraries use this convention, but it > sounds like you don't have this. > > > > > Thanks, > > Gabriel > > _______________________________________________ > > Cython-dev mailing list > > Cython-dev at codespeak.net > > http://codespeak.net/mailman/listinfo/cython-dev > > > > Simon. > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From stefan_ml at behnel.de Mon Apr 7 08:21:31 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 Apr 2008 08:21:31 +0200 Subject: [Cython] Refactoring Errors + test harness In-Reply-To: <47F7713B.8020301@student.matnat.uio.no> References: <47F7713B.8020301@student.matnat.uio.no> Message-ID: <47F9BD6B.6080506@behnel.de> Hi, Dag Sverre Seljebotn wrote: > - What's the best method to regression test easily? I've not built SAGE > yet and would prefer something lighter if available; somebody has a test > harness set up? Is it in the Cython repo? Sure, seen the little "runtests.py" script? The tests are in the "tests" directory. Stefan From stefan_ml at behnel.de Mon Apr 7 11:25:21 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 Apr 2008 11:25:21 +0200 Subject: [Cython] Cython 0.9.6.13 Released In-Reply-To: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> References: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> Message-ID: <47F9E881.7080405@behnel.de> Hi Robert, thanks for this release. A few comments: The file "MANIFEST" shouldn't be in the repository as it's autogenerated from the file "MANIFEST.in" by distutils. Also, there seems to be a problem with cimporting public extension classes from another Cython module. I was away last week, so I couldn't test with the release candidate, but building lxml now gives me: ----------------- Traceback (most recent call last): File "setup.py", line 106, in **extra_options File "distutils/core.py", line 151, in setup File "distutils/dist.py", line 974, in run_commands File "distutils/dist.py", line 994, in run_command File "distutils/command/build_ext.py", line 290, in run File "/.../Cython/Distutils/build_ext.py", line 81, in build_extensions ext.sources = self.cython_sources(ext.sources, ext) File "/.../Cython/Distutils/build_ext.py", line 193, in cython_sources full_module_name=module_name) File "/.../Cython/Compiler/Main.py", line 304, in compile return context.compile(source, options, full_module_name) File "/.../Cython/Compiler/Main.py", line 201, in compile tree.process_implementation(scope, options, result) File "/.../Cython/Compiler/ModuleNode.py", line 78, in process_implementation self.generate_c_code(env, options, result) File "/.../Cython/Compiler/ModuleNode.py", line 262, in generate_c_code self.generate_declarations_for_modules(env, modules, code.h) File "/.../Cython/Compiler/ModuleNode.py", line 454, in generate_declarations_for_modules vtabslot_list = self.generate_vtabslot_list(vtabslot_dict) File "/.../Cython/Compiler/ModuleNode.py", line 418, in generate_vtabslot_list if(recurse_vtabslot_check_inheritance(vtab_list[j],vtab_list[i], vtab_dict)==1): File "/.../Cython/Compiler/ModuleNode.py", line 44, in recurse_vtabslot_check_inheritance base = dict[base.type.base_type.objstruct_cname] KeyError: 'LxmlElementBase' ----------------- This class is cimported from this definition in a .pxd: cdef extern from "lxml.etree_api.h": cdef class lxml.etree.ElementBase(_Element) \ [ object LxmlElementBase ]: ... http://codespeak.net/svn/lxml/trunk/src/lxml/etreepublic.pxd by this Cython file: http://codespeak.net/svn/lxml/trunk/src/lxml/lxml.objectify.pyx I don't think there is a test case for cimporting extension classes yet. I'll have to see when I find the time to look into this, but maybe not before the end of the week. Stefan From dagss at student.matnat.uio.no Mon Apr 7 11:35:38 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 7 Apr 2008 11:35:38 +0200 (CEST) Subject: [Cython] [Fwd: [Numpy-discussion] New project : Spyke python-to-C compiler] Message-ID: <50815.193.157.243.12.1207560938.squirrel@webmail.uio.no> I already replied saying that Cython has many of the same goals (though as I'm not on python-dev I'm not sure if my email gets through, and somebody might want to create a more informed and better response; didn't have time). Dag Sverre ---------------------------- Original Message ---------------------------- Subject: [Numpy-discussion] New project : Spyke python-to-C compiler From: "Rahul Garg" Date: Mon, April 7, 2008 02:48 To: numpy-discussion at scipy.org python-dev at python.org -------------------------------------------------------------------------- Note this message has been posted to numpy-discussion and python-dev. Sorry for the multiple posting but I thought both python devs and numpy users will be interested. If you believe your list should not receive this email, let me know. Also I just wanted to introduce myself since I may ask doubts about Python and Numpy internals from time to time :) Hi. I am a student at Univ of Alberta doing my masters in computing science. I am writing a Python-to-C compiler as one part of my thesis. The compiler, named Spyke, will be made available in a couple of weeks and is geared towards scientific applications and will therefore focus mostly on needs of scientific app developers. What is Spyke? In many performance critical projects, it is often necessary to rewrite parts of the application in C. However writing C wrappers can be time consuming. Spyke offers an alternative approach. You add annotations to your Python code as strings. These strings are discarded by the Python interpreter but these are interpreted as types by Spyke compiler to convert to C. Example : "int -> int" def f(x): return 2*x In this case the Spyke compiler will consider the string "int -> int" as a decalration that the function accepts int as parameter and returns int. Spyke will then generate a C function and a wrapper function. This idea is directly copied from PLW (Python Language Wrapper) project. Once Python3k arrives, much of these declarations will be moved to function annotations and class decorators. This way you can do all your development and debugging interactively using the standard Python interpreter. When you need to compile to C, you just add type annotations to places that you want to convert and invoke spyke on the annotated module. This is different from Pyrex because Pyrex does not accept Python code. With Spyke, your code is 100% pure python. Spyke has basic support for functions and classes. Spyke can do very basic type inference for local variables in function bodies. Spyke also has partial support for homogenous lists and dictionaries and fixed length tuples. One big advantage of Spyke is that it understands at least part of numpy. Numpy arrays are treated as fundamental types and Spyke knows what C code to generate for slicing/indexing of numpy arrays etc. This should help a lot in scientific applications. Note that Spyke can handle only a subset of Python. Exceptions, iterators, generators, runtime code generation of any kind etc is not handled. Nested functions will be added soon. I will definitely add some of these missing features based on what is actually required for real world Python codes. Currently if Spyke does not understand a function, it just leaves it as Python code. Classes can be handled but special methods are not currently supported. The support of classes is a little brittle because I am trying to resolve some issues b/w old and new style of classes. Where is Spyke? Spyke will be available as a binary only release in a couple of weeks. I intend to make it open source after a few months. Spyke is written in Python and Java and should be platform independant. I do intend to make the source open in a few months. Right now its undergoing very rapid development and has negligible amounts of documentation so the source code right now is pretty useless to anyone else anyway. I need help: However I need a bit of help. I am having a couple of problems : a) I am finding it hard to get pure Python+NumPy testing codes. I need more codes to test the compiler. Developing a compiler without a test-suite is kind of useless. If you have some pure Python codes which need better performance, please contact me. I guarantee that your codes will not be released to public without your permission but might be referenced in academic publications. I can also make the compiler available to you hopefully after 10th of April. Its kind of unstable currently. I will also need your help in annotating the provided testing codes since I probably wont know what your application is doing. b) Libraries which interface with C/C++ : Many codes in SciPy for instance have mixed language codes. Part of the code is written in C/C++. Spyke only knows how to annotated Python codes. For C/C++ libraries wrapped into Python modules, Spyke will therefore need to know at least 2 things : i) The mapping of a C function name/struct etc to Python ii) The type information of the said C function. There are many many ways that people interact with C code. People either write wrappers manually, or use autogenerated wrappers using SWIG or SIP Boost.Python etc., use Pyrex or Cython while some people use ctypes. I dont have the time or resources to support these multitude of methods. I considered trying to parse the C code implementing wrappers but its "non-trivial" to put it mildly. Parsing only SWIG generated code is another possibility but its still hard. Another approach that I am seriously considering is to support a subset of ctypes (with additional restriction) instead. But my question is : Is ctypes good enough for most of you? Ctypes cannot interface with C++ code but its pure Python. However I have not seen too many instances of people using ctypes. c) Strings as type declarations : Do you think I should use decorators instead at least for function type declarations? thanks for patiently reading this, comments and inquiries sought. rahul _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From robertwb at math.washington.edu Mon Apr 7 19:09:10 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 7 Apr 2008 10:09:10 -0700 Subject: [Cython] python callback In-Reply-To: <20080407055726.GA9347@basestar> References: <20080405043416.GA9069@basestar> <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> <20080407055726.GA9347@basestar> Message-ID: On Apr 6, 2008, at 10:57 PM, Gabriel Gellner wrote: > Looking through the scipy odeint C wrapper, I realized the simple > pattern to > solve this (just having the callback call a global python function, > that is > set by the driver.) > > Does anyone think this is worth making a Wiki page for? If so I > would be happy > to write up the solution. Yes, certainly. Thanks. > > Gabriel > > On Sat, Apr 05, 2008 at 12:58:21PM -0400, simon at arrowtheory.com wrote: >>> Could anyone give me a tip on how to convert a python function >>> of the form >>> >>> def func(x): >>> return some_operation(x) >>> >>> to a C function of the form >>> >>> cdef void cfunc(double x, double *y): >>> y = some_operation(x) >> >> y[0] = some_operation(x) >> >>> >>> That is I want to be able to turn a python function into >>> a C function with pass by reference return semantics. >>> >>> I looked at the cheesefinder example, and I am not sure how to >>> adapt it, >>> as I >>> want to be able to have the user supply the python function, and >>> I can't >>> think >>> of how to make a cdef callback function that knows what this >>> function is, >>> that >>> is >>> >>> cdef void callback(souble x, double *y): >>> y = func(x) >>> >>> I am not sure how to set func in this way, without rewriting the >>> cdef each >>> time. In python I would just use a higher order function . . . >> >> in general, you need an extra "void *" argument in the callback >> function (which you would use to pass a python callable). >> Well designed C libraries use this convention, but it >> sounds like you don't have this. >> >>> >>> Thanks, >>> Gabriel >>> _______________________________________________ >>> Cython-dev mailing list >>> Cython-dev at codespeak.net >>> http://codespeak.net/mailman/listinfo/cython-dev >>> >> >> Simon. >> >> _______________________________________________ >> Cython-dev mailing list >> Cython-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/cython-dev > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From robertwb at math.washington.edu Mon Apr 7 19:10:05 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 7 Apr 2008 10:10:05 -0700 Subject: [Cython] Cython 0.9.6.13 Released In-Reply-To: <47F9E881.7080405@behnel.de> References: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> <47F9E881.7080405@behnel.de> Message-ID: I'm pretty sure I know what this is and how to fix it. - Robert On Apr 7, 2008, at 2:25 AM, Stefan Behnel wrote: > Hi Robert, > > thanks for this release. A few comments: > > The file "MANIFEST" shouldn't be in the repository as it's > autogenerated from > the file "MANIFEST.in" by distutils. > > Also, there seems to be a problem with cimporting public extension > classes > from another Cython module. I was away last week, so I couldn't > test with the > release candidate, but building lxml now gives me: > > ----------------- > Traceback (most recent call last): > File "setup.py", line 106, in > **extra_options > File "distutils/core.py", line 151, in setup > File "distutils/dist.py", line 974, in run_commands > File "distutils/dist.py", line 994, in run_command > File "distutils/command/build_ext.py", line 290, in run > File "/.../Cython/Distutils/build_ext.py", line 81, in > build_extensions > ext.sources = self.cython_sources(ext.sources, ext) > File "/.../Cython/Distutils/build_ext.py", line 193, in > cython_sources > full_module_name=module_name) > File "/.../Cython/Compiler/Main.py", line 304, in compile > return context.compile(source, options, full_module_name) > File "/.../Cython/Compiler/Main.py", line 201, in compile > tree.process_implementation(scope, options, result) > File "/.../Cython/Compiler/ModuleNode.py", line 78, in > process_implementation > self.generate_c_code(env, options, result) > File "/.../Cython/Compiler/ModuleNode.py", line 262, in > generate_c_code > self.generate_declarations_for_modules(env, modules, code.h) > File "/.../Cython/Compiler/ModuleNode.py", line 454, in > generate_declarations_for_modules > vtabslot_list = self.generate_vtabslot_list(vtabslot_dict) > File "/.../Cython/Compiler/ModuleNode.py", line 418, in > generate_vtabslot_list > if(recurse_vtabslot_check_inheritance(vtab_list[j],vtab_list[i], > vtab_dict)==1): > File "/.../Cython/Compiler/ModuleNode.py", line 44, in > recurse_vtabslot_check_inheritance > base = dict[base.type.base_type.objstruct_cname] > KeyError: 'LxmlElementBase' > ----------------- > > This class is cimported from this definition in a .pxd: > > cdef extern from "lxml.etree_api.h": > cdef class lxml.etree.ElementBase(_Element) \ > [ object LxmlElementBase ]: > ... > > http://codespeak.net/svn/lxml/trunk/src/lxml/etreepublic.pxd > > by this Cython file: > > http://codespeak.net/svn/lxml/trunk/src/lxml/lxml.objectify.pyx > > > I don't think there is a test case for cimporting extension classes > yet. > > I'll have to see when I find the time to look into this, but maybe > not before > the end of the week. > > Stefan From stefan_ml at behnel.de Mon Apr 7 19:35:27 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 Apr 2008 19:35:27 +0200 Subject: [Cython] python callback In-Reply-To: <20080407055726.GA9347@basestar> References: <20080405043416.GA9069@basestar> <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> <20080407055726.GA9347@basestar> Message-ID: <47FA5B5F.6020101@behnel.de> Gabriel Gellner wrote: > Looking through the scipy odeint C wrapper, I realized the simple pattern to > solve this (just having the callback call a global python function, that is > set by the driver.) > > Does anyone think this is worth making a Wiki page for? If so I would be happy > to write up the solution. There already is a (tiny) FAQ entry on callbacks, but it would be good to have a real page on this topic (even if your solution here is less common than passing a void*). http://wiki.cython.org/FAQ#head-0ea9184c404ed73456c5988eaf208d91bf04accb Stefan From robertwb at math.washington.edu Mon Apr 7 19:59:32 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 7 Apr 2008 10:59:32 -0700 Subject: [Cython] Lisp inspired transforms In-Reply-To: <47F8C640.10901@martincmartin.com> References: <47F8C640.10901@martincmartin.com> Message-ID: <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> On Apr 6, 2008, at 5:46 AM, Martin C. Martin wrote: > Hi all, > > I've been doing some thinking and prototyping of a transform system > inspired by Common Lisp macros. You can see the results as the > newest CEP: > > http://wiki.cython.org/enhancements/metaprogramming > > Briefly, it allows you to define a transform in the Cython source > code. The transform runs at compile time, and takes the *parse > trees* of its arguments. In the examples, I define a simple > symbolic differentiator > "deriv" which means you can write: > > def eggs(a, b): > return deriv(5*a+b, a) > > and it is translated at compile time to: > > def eggs(a, b): > return (((5 * 1) + (0 * a)) + 0) > > You can use this, for example, with numerical optimization > techniques like Newton's method. An example which differentiates 5 > * x**3 - 10 and finds it's root is given on the above wiki page, > here's the output: > > >>> TestTrans.myfunct(10.0) > (4990.0, 1500.0) > >>> TestTrans.newtons(TestTrans.myfunct, 10.0) > f( 10.0 ) = 4990.0 , f'( 10.0 ) = 1500.0 > f( 6.67333333333 ) = 1475.93037185 , f'( 6.67333333333 ) = > 668.000666667 > f( 4.46385893383 ) = 434.735082042 , f'( 4.46385893383 ) = > 298.890548717 > f( 3.00936301916 ) = 126.267956666 , f'( 3.00936301916 ) = > 135.843986716 > f( 2.07985587115 ) = 34.9852072625 , f'( 2.07985587115 ) = > 64.8870066716 > f( 1.54068464016 ) = 8.28568621842 , f'( 1.54068464016 ) = > 35.6056374064 > f( 1.3079774954 ) = 1.18847303518 , f'( 1.3079774954 ) = 25.6620769269 > f( 1.26166506953 ) = 0.0415843883547 , f'( 1.26166506953 ) = > 23.8769812152 > f( 1.25992345957 ) = 5.73769235945e-05 , f'( 1.25992345957 ) = > 23.8111068596 > f( 1.2599210499 ) = 1.09734443754e-10 , f'( 1.2599210499 ) = > 23.8110157797 > f( 1.25992104989 ) = 0.0 , f'( 1.25992104989 ) = 23.8110157795 > 1.2599210498948732 > > It's also useful for moving computation from run time to compile > time, and makes it easy for the programmer to specify what should > happen where. > > There are many other use cases as well. > > I'm attaching a patch that implements the examples. It's very > proof-of-concept, which means its really rough. :) I also > implemented the bar minimum to get the examples working, but it > gives you an idea. > > Best, > Martin Thanks. This looks very interesting. Also, I much prefer this type of "macro" to the text-substitution C-style macros that have occasionally been suggested. One thought that kept going through my head as I was reading this, however, is that one of the current defects (in my mind) is its inconsistency with actual Python, and the associated learning curve to read and write Python. Some of this inconsistency (e.g. (minimal) static typing information) is necessary to achieve the kind of speedups we need, but I think the focus should be on features that narrow the gap between Cython and Python, not those that widen it. On the other hand, features like the one above can be very useful, and it'd be a shame to deny their availability to "power users" (though it would take away one thing that I really like about Cython--almost any Python programmer can at least read a Cython function without any additional knowledge). - Robert From robertwb at math.washington.edu Mon Apr 7 20:19:14 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 7 Apr 2008 11:19:14 -0700 Subject: [Cython] Results of XPathTransform / W3CDOM experiments In-Reply-To: <47F8B762.4010700@student.matnat.uio.no> References: <47F8B762.4010700@student.matnat.uio.no> Message-ID: On Apr 6, 2008, at 4:43 AM, Dag Sverre Seljebotn wrote: > I'm still fooling around with some experiments. The following is > now working (in my local repo) as a way of transforming for-froms: > > class ForInToForFrom(XPathTransform): > @template("pyr:ForInStatNode[iterator/pyr:IteratorNode/sequence/ > pyr:SimpleCallNode/function/pyr:NameNode/@name = 'range']") > def for_in_range_to_for_from_range(self, node): > result = Nodes.ForFromStatNode(... > ... > return result > > Everything happens on the Pyrex tree, there's no translation to XML > or anything like that. Example attached (though you can't run it > outside me repo, it's just for demonstration). > > The question is: Is this a way forward for transforms? Perhaps, but I doubt it. I would echo Fabrizio's call for a dozen or so transformations that one would want to do with examples that the xpath way of going about it is cleaner. Also, I don't see the end user as writing their own transformations much (nor do I think it's a good idea to encourage it--it makes the language much more obscure). It also greatly increases the dependancies to run Cython. > For some more examples, consider that one could for instance select > all equality statements that must have some coercion by > > "pyr:SimpleAssignmentNode[lhs/@type != rhs/@type]" > > But this is contrived, coercion won't work this way. I think coercion (especially the relationship between types) is too complicated to be done this way. (OK, maybe it's possible, but it would almost certainly be horribly inefficient and hard to understand). > But also consider that one can select inner functions by > > "pyr:FuncDefNode//pyr:FuncDefNode" > > and outer functions only by > > "pyr:ModuleNode/body/pyr:FuncDefNode" > > and so on. > > The gains are highest if XPath selections are used for all > transforms written, because then the finite state machines (see > below) can (in principle at least) be combined so that only one > tree traversal per phase is needed regardless of how the code is > modularized into multiple transforms. (If combining, one must use a > subset of XPath where only the descendants axis is available > outside of predicates, I guess this is the same as XSLT match > statements?). I think most of the code processing is done in "phases" rather than a bunch of transforms that can be done all at once. Optimizations are perhaps an exception. > What I've done: > - Put a subset of the W3C DOM API on top of the tree. No > modifications to Cython code tree was necesarry except adding a > base class (and I finally had a legitimate use for a metaclass or > two. Yay!). A "side-effect" is that the tree can be streamed to XML > (see example code). > - Use the webpath XPath 2.0 transform to select nodes (http:// > sourceforge.net/projects/webpath), and act on them on traversal. > > Questions: > - Anyone know of good DOM transformation libraries for Cython? Perhaps lxml does this? > - Does anyone think this would be useful? > - Does anyone think this could be a standard for writing transforms? > - Any other good uses for a W3C DOM on our parse trees? (it is a > seperate component) I'll assume that streaming in and out of XSLT > is not going to be convenient, but something else perhaps? > > Some notes: > - It currently scales horribly with the number of "templates"; one > full tree traversal per match. In order to fix this, one either has > to find a better XPath library (which must be hacked a bit), an > XSLT processor or similar implemented entirely in Python, or a full- > time week is needed to improve webpath by using a Finite State > Automata library (which does the standard non-deterministic > automatas to deterministic automatas, there are several good ones > and this is not too hard to do). Again, there's the question of dependancies. I'd rather not require anything but Python itself. > Does it matter if we do 30 traversals on the tree rather than 2-3? > As long as it can be optimized "in principle"? > > - On the other hand, once that is done, one can "combine" tree > traversals so that multiple transforms work in the same traversal, > meaning that the number of traversals will be reduced compared to > what is in sight now. > > - But, the current less efficient implementation is working. > > I might probably leave it for now at this because the gains seem > less than the effort, but if anyone thinks this is interesting then > speak up and we can see. Perhaps it can be offered as a plugin, so people who want to do things like this can use it. But I'm not convinced that this is the direction we want to take for the Cython core. - Robert From stefan_ml at behnel.de Mon Apr 7 20:58:07 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 Apr 2008 20:58:07 +0200 Subject: [Cython] Lisp inspired transforms In-Reply-To: <47F8C640.10901@martincmartin.com> References: <47F8C640.10901@martincmartin.com> Message-ID: <47FA6EBF.3050702@behnel.de> Hi, Martin C. Martin wrote: > I've been doing some thinking and prototyping of a transform system > inspired by Common Lisp macros. You can see the results as the newest CEP: > > http://wiki.cython.org/enhancements/metaprogramming > > Briefly, it allows you to define a transform in the Cython source code. > The transform runs at compile time, and takes the *parse trees* of its > arguments. I like the way this reads, yes. I'd like to see the "deftrans" functions in a separate source file, maybe a ".pxt"? I wouldn't want to see them mixed with normal Cython code. Stefan From martin at martincmartin.com Mon Apr 7 23:05:08 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Mon, 07 Apr 2008 17:05:08 -0400 Subject: [Cython] Lisp inspired transforms In-Reply-To: <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> Message-ID: <47FA8C84.5070207@martincmartin.com> Robert Bradshaw wrote: > > Thanks. This looks very interesting. Also, I much prefer this type of > "macro" to the text-substitution C-style macros that have occasionally > been suggested. Glad you like it. It's certainly a lot cleaner. > One thought that kept going through my head as I was reading this, > however, is that one of the current defects (in my mind) is its > inconsistency with actual Python, and the associated learning curve to > read and write Python. Some of this inconsistency (e.g. (minimal) static > typing information) is necessary to achieve the kind of speedups we > need, but I think the focus should be on features that narrow the gap > between Cython and Python, not those that widen it. > > On the other hand, features like the one above can be very useful, and > it'd be a shame to deny their availability to "power users" (though it > would take away one thing that I really like about Cython--almost any > Python programmer can at least read a Cython function without any > additional knowledge). Yes, this certainly adds a whole new dimension to Cython that's not in Python. Perhaps this is a time to reflect on the goals that Cython has had up to now, and whether to expand them. Cython and Pyrex started with the goal of making it easy to wrap existing C/C++ code as Python extensions, but if you add the ability to declare that a variable has a Python type, you get a language where you can code Python when you need to, but then when you need speed, sprinkle a few type declarations around and get code and speed similar to C. Is it worth widening the gap a little to provide a useful language in its own right? The gap is still much, much smaller than that between C++ and C, for example. Best, Martin From ggellner at uoguelph.ca Mon Apr 7 23:10:01 2008 From: ggellner at uoguelph.ca (Gabriel Gellner) Date: Mon, 7 Apr 2008 17:10:01 -0400 Subject: [Cython] python callback In-Reply-To: <47FA5B5F.6020101@behnel.de> References: <20080405043416.GA9069@basestar> <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> <20080407055726.GA9347@basestar> <47FA5B5F.6020101@behnel.de> Message-ID: <20080407211001.GA11424@basestar> On Mon, Apr 07, 2008 at 07:35:27PM +0200, Stefan Behnel wrote: > > Gabriel Gellner wrote: > > Looking through the scipy odeint C wrapper, I realized the simple pattern to > > solve this (just having the callback call a global python function, that is > > set by the driver.) > > > > Does anyone think this is worth making a Wiki page for? If so I would be happy > > to write up the solution. > > There already is a (tiny) FAQ entry on callbacks, but it would be good to have > a real page on this topic (even if your solution here is less common than > passing a void*). > > http://wiki.cython.org/FAQ#head-0ea9184c404ed73456c5988eaf208d91bf04accb > > Stefan > One thing about this FAQ entry, the Cython releases doesn't seem to include the callback folder in Demos. I had to go into the Mercural repo to get it. Gabriel > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From dagss at student.matnat.uio.no Tue Apr 8 00:34:44 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 8 Apr 2008 00:34:44 +0200 (CEST) Subject: [Cython] Results of XPathTransform / W3CDOM experiments In-Reply-To: References: <47F8B762.4010700@student.matnat.uio.no> Message-ID: <51334.193.157.243.12.1207607684.squirrel@webmail.uio.no> > xpath way of going about it is cleaner. Also, I don't see the end > user as writing their own transformations much (nor do I think it's a > good idea to encourage it--it makes the language much more obscure). This was definitely considered a feature to help with developing Cython, if transforms made a coding-style convenient where each change could be described in relative isolation in one function. But I'll definitely stick it in the drawer. Fun experiment though. I should have known that finally having a legitimate use for Python metaclasses was too good to be true... ;-) Dag Sverre From martin at martincmartin.com Tue Apr 8 03:45:38 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Mon, 07 Apr 2008 21:45:38 -0400 Subject: [Cython] Lisp inspired transforms In-Reply-To: <47FA6EBF.3050702@behnel.de> References: <47F8C640.10901@martincmartin.com> <47FA6EBF.3050702@behnel.de> Message-ID: <47FACE42.9050702@martincmartin.com> Hi Stefan, Stefan Behnel wrote: > Hi, > > Martin C. Martin wrote: >> I've been doing some thinking and prototyping of a transform system >> inspired by Common Lisp macros. You can see the results as the newest CEP: >> >> http://wiki.cython.org/enhancements/metaprogramming >> >> Briefly, it allows you to define a transform in the Cython source code. >> The transform runs at compile time, and takes the *parse trees* of its >> arguments. > > I like the way this reads, yes. > > I'd like to see the "deftrans" functions in a separate source file, maybe a > ".pxt"? I wouldn't want to see them mixed with normal Cython code. It's true they're an "advanced" feature, but they're often closely tied to other code you write. For example, suppose you'd like to keep track of some statistics on a given expression. It could be CPU time, memory allocated, or a simple count of the number of times you reach a line. And you'd like to put these throughout your code, to measure different sections. (We do this so we know where both time and memory allocation are going.) You could keep a hash table that maps the name of the section to the accumulating value, but if you don't want the hash table overhead, you could assign a numeric id to each name, and use that id to index into an array. In the example below, I'm assuming names_to_ids and num_ids are available at compile time, and that the final value of num_ids makes it into the C code. names_to_ids = {} num_ids = 0 # Get names_to_ids[name] if it exists, otherwise add it to the hash # table. def getid(name): if name in names_to_ids: return names_to_ids[name] else: id = num_ids names_to_ids[name] = id num_ids += 1 return id record_array = [0]*num_ids # Created at runtime, ideally as a C array. # Generating IDs and converting names to IDs happens at compile time. deftrans record(name, *expressions): id = getid(name) return "record_internal(%c, *%c)" % (id, expressions) # Get the start time, then evaluate the expressions, then get the # end time, subtract to get the elapsed time, and accumulate that. def record_internal(id, *expressions): ... record_array[id] += elapsed_time So here there's a lot of mixing of transforms and regular functions. It seems unnatural to me to put record() into a separate file. What do you think? Best, Martin From robertwb at math.washington.edu Tue Apr 8 09:52:11 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 8 Apr 2008 00:52:11 -0700 Subject: [Cython] Lisp inspired transforms In-Reply-To: <47FA8C84.5070207@martincmartin.com> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> Message-ID: <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> On Apr 7, 2008, at 2:05 PM, Martin C. Martin wrote: > Robert Bradshaw wrote: >> Thanks. This looks very interesting. Also, I much prefer this type >> of "macro" to the text-substitution C-style macros that have >> occasionally been suggested. > > Glad you like it. It's certainly a lot cleaner. > >> One thought that kept going through my head as I was reading this, >> however, is that one of the current defects (in my mind) is its >> inconsistency with actual Python, and the associated learning >> curve to read and write Python. Some of this inconsistency (e.g. >> (minimal) static typing information) is necessary to achieve the >> kind of speedups we need, but I think the focus should be on >> features that narrow the gap between Cython and Python, not those >> that widen it. >> >> On the other hand, features like the one above can be very >> useful, and >> it'd be a shame to deny their availability to "power >> users" (though it >> would take away one thing that I really like about Cython--almost >> any >> Python programmer can at least read a Cython function without any >> additional knowledge). > > > Yes, this certainly adds a whole new dimension to Cython that's not > in Python. > > Perhaps this is a time to reflect on the goals that Cython has had > up to now, and whether to expand them. Cython and Pyrex started > with the goal of making it easy to wrap existing C/C++ code as > Python extensions, but if you add the ability to declare that a > variable has a Python type, you get a language where you can code > Python when you need to, but then when you need speed, sprinkle a > few type declarations around and get code and speed similar to C. > > Is it worth widening the gap a little to provide a useful language > in its own right? The gap is still much, much smaller than that > between C++ and C, for example. The goal of Cython is for it to be the best way to write Python extension modules. This has two (interrelated) sub-goals: make it easy to wrap C/C++ libraries, and make it a good Python -> C compiler. The target (and largest) audience is especially Python programmers. Though it is tempting to head down the language development path, adding little (or big) features that make it more powerful than Python itself, I think doing so will actually be counterproductive to the goals stated above. Perhaps there could be a Cython++ that is a proper superset of the Cython language with more powerful features (though I'd hope not near the gap of C vs. C++) but in the near term we should be focusing on things like being able to compile all of Python as it is. - Robert From robertwb at math.washington.edu Tue Apr 8 10:08:14 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 8 Apr 2008 01:08:14 -0700 Subject: [Cython] New project : Spyke python-to-C compiler In-Reply-To: <20080407175606.d1tlnbczk0wo8sk0@webmail.ualberta.ca> References: <20080407175606.d1tlnbczk0wo8sk0@webmail.ualberta.ca> Message-ID: Forwarding some correspondence with an author of another Python-to-C compiler: On Apr 7, 2008, at 4:56 PM, Rahul Garg wrote: > Quoting Robert Bradshaw : > >> Have you heard of Cython before? Do you have any thoughts on how it >> compares/overlaps/relates to Spyke? >> >> - Robert > > Hi. > 1. Cython and Spyke have certain differences but mostly related to the > surface syntax accepted. From what I understand (and do correct me if > I am wrong) Cython is a Python like language but Cython code isnt 100% > Python and does not run on the Python interpreter. > > Example : > > Lets assume you had the following Python code : > > def f(x): return 2*x > > If you want to convert this to C, then from what I understand in > Pyrex you would write a different file with the extension pyx. The > function would look like : > > cdef int f(int x): return 2*x > > Now this code is very similar to Python but will not run on the Python > interpreter. Spyke on the other hand would want you to declare types > within the Python source itself : > > "int -> int" > def f(x): return 2*x > > This string is discarded by the Python interpreter so its still valid > Python code and will run on the Python interpreter as well is capable > of being compiled. So its mostly a matter of surface syntax currently > and it might even be possible to interconvert the 2 formats. > > Cython is also ahead in terms of language supported and probably give > better performance currently. But Spyke takes the philosophy that > the code > should run on the Python interpreter so has no additional keywords etc > and all additional info is specified through mechanisms ignored by the > Python interpreter. Spyke copies ideas from Python Language Wrapper > project (PLW) project. You can get a paper on PLW here : > http://www.cs.utk.edu/~luszczek/pubs/plw-200605.pdf This is a difference, but just on the surface. We have a goal to be able to support this kind of thing as well (though we don't intend on making the old syntax obsolete, as it is very easy to use when there are a lot of types). > 2. Long term : Spyke is actually for my thesis on compiler > optimizations for dynamic languages. Some experiments with parallel > constructs are also on the roadmap. Also I will focus heavily on > loop optimizations and such for numpy code. Spyke does currently > understand how slicing and indexing work and numpy arrays are sort- > of native datatypes for Spyke. I also intend to implement a whole > program compiler. I have a number of ideas related to targeting a > modified Python interpreter but those probably will be in a > different branch. That's cool. Most of the differences seem to be in this point, though we hope to get better NumPy support soon. > 3. Integration : Spyke is written in Java mostly. On code level > therefore its difficult to integrate Spyke and Cython but ideas can > freely go back and forth. I will follow Cython closely. Many of the > issues remain the same since the semantics of the language > supported is similar. We'll keep our eye on Spyke too. Thanks for answering. Do you mind if I forward this response to the cython mailing list? - Robert From robertwb at math.washington.edu Tue Apr 8 10:11:36 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 8 Apr 2008 01:11:36 -0700 Subject: [Cython] New project : Spyke python-to-C compiler In-Reply-To: <20080407184837.4yn4yzipc888o44s@webmail.ualberta.ca> References: <20080407175606.d1tlnbczk0wo8sk0@webmail.ualberta.ca> <15BA280D-E9EB-4A4E-8604-D2133E3D279A@math.washington.edu> <20080407184837.4yn4yzipc888o44s@webmail.ualberta.ca> Message-ID: <2831E432-CA4B-4F62-94A7-C13D68F2B317@math.washington.edu> On Apr 7, 2008, at 5:48 PM, Rahul Garg wrote: > > Quoting Robert Bradshaw : > >> Thanks for answering. Do you mind if I forward this response to the >> cython mailing list? >> >> - Robert > > 1. Please forward if appropriate. Also include this reply please :) > > 2. I forgot to ask : Do you use some particular tests/benchmarks > etc? I can try and adapt those to Spyke-syntax if some test cases > are available. Not much, but Stefan Behnel did do some tests at http://codespeak.net/ pipermail/cython-dev/2008-March/000107.html . > 3. Looking forward : with function annotations in Python-3k it > might actually be a good idea to work towards a unified syntax. > That way code will be portable b/w the 2 compilers and can also be > adopted by other tools like editors, debuggers and such. Of course > this is just an idea off the top of my head so need to look into > more details. Yes, this would be very good. We certainly plan to make use of function annotations (and decorators) to be able to specify types. - Robert From dagss at student.matnat.uio.no Tue Apr 8 10:29:38 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 Apr 2008 10:29:38 +0200 Subject: [Cython] uneval + me and Martin's macro discussion Message-ID: <47FB2CF2.6000907@student.matnat.uio.no> Over the weekend me and Martin discussed (long-term, hypothetically) how macros could be "better integrated" with the Python way of doing things (and macros in general); for anyone interested. It ended up at 40 k, and can be found here: http://wiki.cython.org/enhancements/uneval/emails Anyway, my own private conclusions on this ended up in this pre-CEP: http://wiki.cython.org/enhancements/uneval -- Dag Sverre From robertwb at math.washington.edu Tue Apr 8 10:38:02 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 8 Apr 2008 01:38:02 -0700 Subject: [Cython] python callback In-Reply-To: <20080407211001.GA11424@basestar> References: <20080405043416.GA9069@basestar> <1243.70.107.93.141.1207414701.squirrel@webmail10.pair.com> <20080407055726.GA9347@basestar> <47FA5B5F.6020101@behnel.de> <20080407211001.GA11424@basestar> Message-ID: Thanks. I just added it. On Apr 7, 2008, at 2:10 PM, Gabriel Gellner wrote: > On Mon, Apr 07, 2008 at 07:35:27PM +0200, Stefan Behnel wrote: >> >> Gabriel Gellner wrote: >>> Looking through the scipy odeint C wrapper, I realized the simple >>> pattern to >>> solve this (just having the callback call a global python >>> function, that is >>> set by the driver.) >>> >>> Does anyone think this is worth making a Wiki page for? If so I >>> would be happy >>> to write up the solution. >> >> There already is a (tiny) FAQ entry on callbacks, but it would be >> good to have >> a real page on this topic (even if your solution here is less >> common than >> passing a void*). >> >> http://wiki.cython.org/ >> FAQ#head-0ea9184c404ed73456c5988eaf208d91bf04accb >> >> Stefan >> > One thing about this FAQ entry, the Cython releases doesn't seem to > include > the callback folder in Demos. I had to go into the Mercural repo > to get it. > > Gabriel >> _______________________________________________ >> Cython-dev mailing list >> Cython-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/cython-dev > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From dagss at student.matnat.uio.no Tue Apr 8 10:41:00 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 Apr 2008 10:41:00 +0200 Subject: [Cython] Numpy and Lisp inspired transforms In-Reply-To: <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> Message-ID: <47FB2F9C.5040608@student.matnat.uio.no> > > Though it is tempting to head down the language development path, > adding little (or big) features that make it more powerful than > Python itself, I think doing so will actually be counterproductive to > the goals stated above. Perhaps there could be a Cython++ that is a > I'm for the moment considering that using Martin's code here, and extending the approach to have support for "member macros" (that can also be used for operator overloading), numpy.ndarray.__getitem__ could perhaps be implemented quicker if Martin's work is included. Which might "up" the priority. (Not saying that approach is taken, but it is a thought). As for Cython as a language, one has to consider that also templates, function overloading, and other stuff that it is "natural" to add because we now have a _typed_ (templates overloading) _compiled_ (macros) language. Though perhaps not macros as powerful as these... (shrug) Priorities are a different matter though, and probably most of what I've proposed (as features in themselves, rather than NunPy requirements) might have less priority than getting Python code to run (which is why I should get more down-to-earth with my thoughts about Cython real soon. Which I will.). -- Dag Sverre From robertwb at math.washington.edu Tue Apr 8 10:40:06 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 8 Apr 2008 01:40:06 -0700 Subject: [Cython] Cython 0.9.6.13 Released In-Reply-To: <47F9E881.7080405@behnel.de> References: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> <47F9E881.7080405@behnel.de> Message-ID: <99CAFB68-4433-4AFF-A25B-B093EF56D76A@math.washington.edu> On Apr 7, 2008, at 2:25 AM, Stefan Behnel wrote: > Hi Robert, > > thanks for this release. A few comments: > > The file "MANIFEST" shouldn't be in the repository as it's > autogenerated from > the file "MANIFEST.in" by distutils. Yep. Realized that after the fact. It's not in the directory any more. > Also, there seems to be a problem with cimporting public extension > classes > from another Cython module. I was away last week, so I couldn't > test with the > release candidate, but building lxml now gives me: And this is exactly what I was hoping to avoid with an rc...bad timing I guess. Anyways, I've fixed this now and released 0.9.6.13.1. > > ----------------- > Traceback (most recent call last): > File "setup.py", line 106, in > **extra_options > File "distutils/core.py", line 151, in setup > File "distutils/dist.py", line 974, in run_commands > File "distutils/dist.py", line 994, in run_command > File "distutils/command/build_ext.py", line 290, in run > File "/.../Cython/Distutils/build_ext.py", line 81, in > build_extensions > ext.sources = self.cython_sources(ext.sources, ext) > File "/.../Cython/Distutils/build_ext.py", line 193, in > cython_sources > full_module_name=module_name) > File "/.../Cython/Compiler/Main.py", line 304, in compile > return context.compile(source, options, full_module_name) > File "/.../Cython/Compiler/Main.py", line 201, in compile > tree.process_implementation(scope, options, result) > File "/.../Cython/Compiler/ModuleNode.py", line 78, in > process_implementation > self.generate_c_code(env, options, result) > File "/.../Cython/Compiler/ModuleNode.py", line 262, in > generate_c_code > self.generate_declarations_for_modules(env, modules, code.h) > File "/.../Cython/Compiler/ModuleNode.py", line 454, in > generate_declarations_for_modules > vtabslot_list = self.generate_vtabslot_list(vtabslot_dict) > File "/.../Cython/Compiler/ModuleNode.py", line 418, in > generate_vtabslot_list > if(recurse_vtabslot_check_inheritance(vtab_list[j],vtab_list[i], > vtab_dict)==1): > File "/.../Cython/Compiler/ModuleNode.py", line 44, in > recurse_vtabslot_check_inheritance > base = dict[base.type.base_type.objstruct_cname] > KeyError: 'LxmlElementBase' > ----------------- > > This class is cimported from this definition in a .pxd: > > cdef extern from "lxml.etree_api.h": > cdef class lxml.etree.ElementBase(_Element) \ > [ object LxmlElementBase ]: > ... > > http://codespeak.net/svn/lxml/trunk/src/lxml/etreepublic.pxd > > by this Cython file: > > http://codespeak.net/svn/lxml/trunk/src/lxml/lxml.objectify.pyx > > > I don't think there is a test case for cimporting extension classes > yet. > > I'll have to see when I find the time to look into this, but maybe > not before > the end of the week. > > Stefan From stefan_ml at behnel.de Tue Apr 8 14:38:24 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 8 Apr 2008 14:38:24 +0200 (CEST) Subject: [Cython] New project : Spyke python-to-C compiler In-Reply-To: <2831E432-CA4B-4F62-94A7-C13D68F2B317@math.washington.edu> References: <20080407175606.d1tlnbczk0wo8sk0@webmail.ualberta.ca> <15BA280D-E9EB-4A4E-8604-D2133E3D279A@math.washington.edu> <20080407184837.4yn4yzipc888o44s@webmail.ualberta.ca> <2831E432-CA4B-4F62-94A7-C13D68F2B317@math.washington.edu> Message-ID: <58749.194.114.62.67.1207658304.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Hi, > On Apr 7, 2008, at 5:48 PM, Rahul Garg wrote: >> 2. I forgot to ask : Do you use some particular tests/benchmarks >> etc? I can try and adapt those to Spyke-syntax if some test cases >> are available. Besides the pybench test runs that Robert showed you, we also have a test suite. It's quite simple and based on doctest. http://hg.cython.org/cython-devel/file/e005b58d83b8/tests/run/ The main idea is to let Cython compile a .pyx source file to an extension module and run the doctest strings of that module from Python. This provides a very comfortable way of comparing test results between Cython and Python. Here is an example I like: http://hg.cython.org/cython-devel/file/e005b58d83b8/tests/run/unicodeliterals.pyx Stefan From stefan_ml at behnel.de Tue Apr 8 14:50:04 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 8 Apr 2008 14:50:04 +0200 (CEST) Subject: [Cython] Lisp inspired transforms In-Reply-To: <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> Message-ID: <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Robert Bradshaw wrote: > Though it is tempting to head down the language development path, > adding little (or big) features that make it more powerful than > Python itself, I think doing so will actually be counterproductive to > the goals stated above. Perhaps there could be a Cython++ that is a > proper superset of the Cython language with more powerful features > (though I'd hope not near the gap of C vs. C++) but in the near term > we should be focusing on things like being able to compile all of > Python as it is. I agree with Robert. As long as Cython does not support closures, for example, it cannot come close enough to being a real option for speeding up existing (non-static) Python code and making it easy to use for non-C-but-Python programmers. For the time being, we should try to a) implement as many Python (3?) language features as possible, keeping in mind that a correct implementation is more important than a fast one, especially for dynamic features that do not have a direct counterpart in the C world. b) get a well-designed and well-integrated compile-time code transformation infrastructure in place, thus allowing to provide pluggable language enhancements *later* and independent of the core compiler, which could then become an advanced Cython++ distribution (or make it back into mainstream). I see the major focus here on adjusting the line between compile-time and runtime code evaluation, and maybe some additional AOP features (as the ones Martin described). I think it would help Cython a *lot* to have a stable core language feature set that is well based in the Python language, *before* we start extending the language with all sorts of 'cool' new features that may already have some sort of (run-time) equivalent in Python. For a programming language, stability is a very valuable feature of its own - and there should (preferrably) be "one way to do it". That's why I like Martin's transformers, for example, they look like plain Python but run at compile time. I think that's the way to do it. Stefan From martin at martincmartin.com Tue Apr 8 15:20:32 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Tue, 08 Apr 2008 09:20:32 -0400 Subject: [Cython] Lisp inspired transforms In-Reply-To: <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <47FB7120.6030104@martincmartin.com> Stefan Behnel wrote: > Robert Bradshaw wrote: >> Though it is tempting to head down the language development path, >> adding little (or big) features that make it more powerful than >> Python itself, I think doing so will actually be counterproductive to >> the goals stated above. Perhaps there could be a Cython++ that is a >> proper superset of the Cython language with more powerful features >> (though I'd hope not near the gap of C vs. C++) but in the near term >> we should be focusing on things like being able to compile all of >> Python as it is. > > I agree with Robert. As long as Cython does not support closures, for > example, it cannot come close enough to being a real option for speeding > up existing (non-static) Python code and making it easy to use for > non-C-but-Python programmers. Both your and Robert's thoughts are very wise. > For the time being, we should try to > > a) implement as many Python (3?) language features as possible, keeping in > mind that a correct implementation is more important than a fast one, > especially for dynamic features that do not have a direct counterpart in > the C world. > > b) get a well-designed and well-integrated compile-time code > transformation infrastructure in place, thus allowing to provide pluggable > language enhancements *later* and independent of the core compiler, which > could then become an advanced Cython++ distribution (or make it back into > mainstream). I see the major focus here on adjusting the line between > compile-time and runtime code evaluation, and maybe some additional AOP > features (as the ones Martin described). > > I think it would help Cython a *lot* to have a stable core language > feature set that is well based in the Python language, *before* we start > extending the language with all sorts of 'cool' new features that may > already have some sort of (run-time) equivalent in Python. For a > programming language, stability is a very valuable feature of its own - > and there should (preferrably) be "one way to do it". That's why I like > Martin's transformers, for example, they look like plain Python but run at > compile time. I think that's the way to do it. Do you see deftrans as the "well-integrated compile-time code transformation infrastructure," and thus a target for (base) Cython, rather than Cython++? Or by infrastructure, do you mean a change to the compiler to support Python-coded transforms, without saying where these transforms come from? In the latter, deftrans (a syntatical convention for defining the transforms) becomes part of Cython++, and if the experiment works out, is incorporated back into Cython. Best, Martin From martin at martincmartin.com Tue Apr 8 15:46:24 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Tue, 08 Apr 2008 09:46:24 -0400 Subject: [Cython] Lisp inspired transforms In-Reply-To: <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <47FB7730.4060503@martincmartin.com> To recap the motivation for having it in base Cython: - C++ template metaprogramming shows the value of compile time metaprogramming, i.e. writing programs that run at compile time and that produce programs as their output. - However, C++ templates are essentially a separate, purely functional programming language embedded in C++, with different syntax and semantics than C+. Of course, they were never really intended or designed for metaprogramming. But now that they're being used that way, it's clear that the substitution model of templates is awkward and unnatural. If we're going to write little functions to run at compile time, it makes sense to use the syntax and semantics of the base language. Compile time metaprogramming doesn't exist in Python, so adding it to Cython means extending Cython beyond what Python has. There are a couple options: 1. Add a way to generate C++ templates, and use that for metaprogramming. It keeps Cython as "writing both Python and C++ with an extended Python syntax." 2. Don't add metaprogramming, even C++ templates. Keeps Cython close to Python and C and C++, including all the good & bad parts. 3. Add a way to specify Python code for the transformation. This recognizes that metaprogramming is a valuable activity that Cython developers will want to do; that the existing way to do it in C++ is more-or-less not up to the task; and that it's better to provide a new, cleaner mechanism using what we've learned in hindsight. Of course, metaprogramming in an imperative, stateful language, opens a can of worms, e.g. it will be valuable to modify Python data at compile time, and have that serialized once all transforming is done, then loaded at the start of runtime. I don't think any of these problems are particularly difficult though. So, what do people this is the best way forward for Cython? Best, Martin From beach at verinet.com Tue Apr 8 16:33:33 2008 From: beach at verinet.com (David J. C. Beach) Date: Tue, 8 Apr 2008 08:33:33 -0600 Subject: [Cython] Fwd: Lisp inspired transforms References: Message-ID: <8AE9A196-EA5C-4C38-9152-7369AAE3790A@verinet.com> I've been experimenting with using Cython using a simple python preprocessor that I wrote. It would seem that everything I've done with my preprocessor could potentially be done with "Lisp inspired transforms". I've been particularly interested in creating a math library for small matrices and vectors which uses explicit loop- unrolling at compile time. I believe that the idea behind Cython calls for this type of metaprogramming more than Python itself. The reason for this is that Cython is capable of running sophisticated algorithms at C-speed, and this means that compile-time inlining, unrolling, and policy-based abstractions (a la Alexandrescu) become somewhat compelling. It also seems possible to implement generator functions in Pyrex using these code transforms. I do, however, understand the hesitance to add a Cython feature that falls completely outside both Python and C heritage. I have 11 years experience with Python and 13 with C++. The kind of hatred I've developed for C++ is the slow-building kind that grows over years of watching superior alternatives foregone because "C++ is an industry standard". I implore you *not* to adopt the C++ template meta-programming syntax! FWIW, I like options 3, 2, and 1, in that order. C++ is an absurdly hard-to-parse language. I fear that #1 could derail the project. Just my $0.02. David On Apr 8, 2008, at 7:46 AM, Martin C. Martin wrote: > To recap the motivation for having it in base Cython: > > - C++ template metaprogramming shows the value of compile time > metaprogramming, i.e. writing programs that run at compile time and > that > produce programs as their output. > > - However, C++ templates are essentially a separate, purely functional > programming language embedded in C++, with different syntax and > semantics than C+. Of course, they were never really intended or > designed for metaprogramming. But now that they're being used that > way, > it's clear that the substitution model of templates is awkward and > unnatural. If we're going to write little functions to run at compile > time, it makes sense to use the syntax and semantics of the base > language. > > Compile time metaprogramming doesn't exist in Python, so adding it to > Cython means extending Cython beyond what Python has. There are a > couple options: > > 1. Add a way to generate C++ templates, and use that for > metaprogramming. It keeps Cython as "writing both Python and C++ with > an extended Python syntax." > > 2. Don't add metaprogramming, even C++ templates. Keeps Cython > close to > Python and C and C++, including all the good & bad parts. > > 3. Add a way to specify Python code for the transformation. This > recognizes that metaprogramming is a valuable activity that Cython > developers will want to do; that the existing way to do it in C++ is > more-or-less not up to the task; and that it's better to provide a > new, > cleaner mechanism using what we've learned in hindsight. > > Of course, metaprogramming in an imperative, stateful language, > opens a > can of worms, e.g. it will be valuable to modify Python data at > compile > time, and have that serialized once all transforming is done, then > loaded at the start of runtime. I don't think any of these problems > are > particularly difficult though. > > So, what do people this is the best way forward for Cython? > > Best, > Martin > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev -- David Beach From stefan_ml at behnel.de Tue Apr 8 18:09:37 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 8 Apr 2008 18:09:37 +0200 (CEST) Subject: [Cython] Lisp inspired transforms In-Reply-To: <47FB7730.4060503@martincmartin.com> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47FB7730.4060503@martincmartin.com> Message-ID: <45162.194.114.62.67.1207670977.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Martin C. Martin wrote: > Compile time metaprogramming doesn't exist in Python, so adding it to > Cython means extending Cython beyond what Python has. Cython has a couple of additional features that make sense because it is a compiled language. I think what you call "metaprogramming" (and generally most things that allow doing things at compile-time instead of run-time) makes sense for Cython. > There are a couple options: > > 1. Add a way to generate C++ templates, and use that for > metaprogramming. It keeps Cython as "writing both Python and C++ with > an extended Python syntax." But that would be C++ specific and can't work with C. > 3. Add a way to specify Python code for the transformation. This > recognizes that metaprogramming is a valuable activity that Cython > developers will want to do; that the existing way to do it in C++ is > more-or-less not up to the task; and that it's better to provide a new, > cleaner mechanism using what we've learned in hindsight. I would say so. I would currently position it as a) an extension mechanism for Cython itself and b) an advanced feature that most people won't use (in the same way most people don't use metaclasses) - but as usual with OSS, you never know what people will use it for. > Of course, metaprogramming in an imperative, stateful language, opens a > can of worms, e.g. it will be valuable to modify Python data at compile > time, and have that serialized once all transforming is done, then > loaded at the start of runtime. I don't think any of these problems are > particularly difficult though. > > So, what do people this is the best way forward for Cython? I'll have to take a closer look at your proposal and compare it a bit more to the other approaches we had so far (especially Dag's work), before I make up my mind about it. Maybe others can already comment a bit deeper on this. Stefan From dagss at student.matnat.uio.no Tue Apr 8 18:47:42 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 Apr 2008 18:47:42 +0200 Subject: [Cython] Lisp inspired transforms In-Reply-To: <45162.194.114.62.67.1207670977.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47FB7730.4060503@martincmartin.com> <45162.194.114.62.67.1207670977.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <47FBA1AE.7020004@student.matnat.uio.no> +1 for polishing it and provide option c) as a plugin for now and see how it goes, and discuss inclusion in main Cython after it has proven itself. > I'll have to take a closer look at your proposal and compare it a bit more > to the other approaches we had so far (especially Dag's work), before I > make up my mind about it. Maybe others can already comment a bit deeper on > this. > Since you bring up my name: a) Clean NumPy integration (that is, with only a pxd file, not a full NumPy plugin) needs some kind of metaprogramming support, but can either work with Martin's explicit approach or my implicit approach, doesn't matter much. (The plan is to not use meta-programming at first, but that will be slow and metaprogramming is key to getting full NumPy speed). b) About my work in relation to this, see the uneval page: http://wiki.cython.org/enhancements/uneval If Martin's work is accepted now, and my own approach for meta-programming is ever done later, then uneval provides a very natural bridge between them. The two seems to be very complementary. Martin's is "explicit" and simple but for advanced users, mine is "easy-to-use" for beginners but more difficult to really understand for advanced users. So doing Martin's first, and then see if my more complicated approach is really needed should be fine as long as uneval provides a natural transition path. uneval() would return the same kind of tree that Martin allows work on, whatever that tree ends up being (as I understand it the exact syntax used is an example, one should add a small API layer on top to isolate it more from Cython core). -- Dag Sverre From stefan_ml at behnel.de Tue Apr 8 18:43:41 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 Apr 2008 18:43:41 +0200 Subject: [Cython] Cython 0.9.6.13 Released In-Reply-To: <99CAFB68-4433-4AFF-A25B-B093EF56D76A@math.washington.edu> References: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> <47F9E881.7080405@behnel.de> <99CAFB68-4433-4AFF-A25B-B093EF56D76A@math.washington.edu> Message-ID: <47FBA0BD.5050207@behnel.de> Hi, Robert Bradshaw wrote: > On Apr 7, 2008, at 2:25 AM, Stefan Behnel wrote: >> there seems to be a problem with cimporting public extension classes >> from another Cython module. > > I've fixed this now and released 0.9.6.13.1. Still broken, but in a different way. Changeset c2a988dd8e6c by Gary Furnish has this diff: ----------------------------------- @@ -356,23 +467,6 @@ class ModuleNode(Nodes.Node, Nodes.Block self.generate_struct_union_definition(entry, code) elif type.is_enum: self.generate_enum_definition(entry, code) - elif type.is_extension_type: - self.generate_obj_struct_definition(type, code) [...] ----------------------------------- removing these two lines keeps the extension type structs from appearing in the API header file. The changeset is pretty huge and contains more than one change, so I'm not sure I understand everything it does. Gary, could you explain it to me? Stefan From gfurnish at gfurnish.net Tue Apr 8 20:17:51 2008 From: gfurnish at gfurnish.net (Gary Furnish) Date: Tue, 8 Apr 2008 12:17:51 -0600 Subject: [Cython] Cython 0.9.6.13 Released In-Reply-To: <47FBA0BD.5050207@behnel.de> References: <4CCD2032-38DC-4334-9AD4-1B21A8EDF897@math.washington.edu> <47F9E881.7080405@behnel.de> <99CAFB68-4433-4AFF-A25B-B093EF56D76A@math.washington.edu> <47FBA0BD.5050207@behnel.de> Message-ID: <8f8f8530804081117q1dee0d66q7bc9188875bd22c8@mail.gmail.com> I'll have to look at the code later tonight. It is possible something got removed that should not have been. On Tue, Apr 8, 2008 at 10:43 AM, Stefan Behnel wrote: > Hi, > > Robert Bradshaw wrote: > > On Apr 7, 2008, at 2:25 AM, Stefan Behnel wrote: > >> there seems to be a problem with cimporting public extension classes > >> from another Cython module. > > > > I've fixed this now and released 0.9.6.13.1. > > Still broken, but in a different way. Changeset c2a988dd8e6c by Gary > Furnish > has this diff: > > ----------------------------------- > @@ -356,23 +467,6 @@ class ModuleNode(Nodes.Node, Nodes.Block > self.generate_struct_union_definition(entry, code) > elif type.is_enum: > self.generate_enum_definition(entry, code) > - elif type.is_extension_type: > - self.generate_obj_struct_definition(type, code) > [...] > ----------------------------------- > > removing these two lines keeps the extension type structs from appearing > in > the API header file. > > The changeset is pretty huge and contains more than one change, so I'm not > sure I understand everything it does. Gary, could you explain it to me? > > Stefan > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/cython-dev/attachments/20080408/605036f4/attachment.htm From stefan_ml at behnel.de Tue Apr 8 19:50:31 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 Apr 2008 19:50:31 +0200 Subject: [Cython] Cython circular cdef import patch In-Reply-To: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> References: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> Message-ID: <47FBB067.5080001@behnel.de> Hi, Gary Furnish wrote: > This patch adds extra logic to code generation to sort dependencies to > guarantee that C code is generated in the right order for circular cdef > imports. It does NOT solve python circular import issues. > a.pyx: cimport b > b.pyx: cimport a > Is thus legal, but you can not reference a or b in the global namespace of > b.pyx or a.pyx (such as to instantiate a class). > This patch also modifies cython to output the exact line in the C code where > an exception was thrown in addition to the currently displayed pyx file and > line. This enables significantly faster developmental debugging. > Finally it splits module initialization into two phases: one that initiates > types and handles imports, and another that executes python commands at the > global namespace level. This will be more useful as Cython starts to assume > more advanced optimization and code generation features > The patch is available at: > http://trac.sagemath.org/sage_trac/ticket/2655(the third attachment > only) and is based against 0.9.6.12, > although I can rebase if needed. I am hoping this can be merged into > Cython. The patch doesn't work for me. When I fix the problems with the header file generation, I end up with C code that defines the "__pyx_obj_*" class structs in the wrong order. Note that there are no circular dependencies involved in my case and that the two classes where the problem occurs are defined in the same source file (although their highest base class is defined in a separate file, not sure if that matters. Personally, seeing all problems I have with this patch and not really seeing a clear gain, I would just revert the patch for now. Stefan From gfurnish at gfurnish.net Wed Apr 9 00:00:54 2008 From: gfurnish at gfurnish.net (Gary Furnish) Date: Tue, 8 Apr 2008 16:00:54 -0600 Subject: [Cython] Cython circular cdef import patch In-Reply-To: <47FBB067.5080001@behnel.de> References: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> <47FBB067.5080001@behnel.de> Message-ID: <8f8f8530804081500p18b87acesd9e1d9103b67bfc6@mail.gmail.com> Can you produce a testcase for this issue? It is *critical* for fast symbolics in Sage, so I would prefer to fix the bug as opposed to reverting. I see the problem now, but I will need the test cases you are using to produce a fix. Once I have them it should be a relatively easy patch. On Tue, Apr 8, 2008 at 11:50 AM, Stefan Behnel wrote: > Hi, > > Gary Furnish wrote: > > This patch adds extra logic to code generation to sort dependencies to > > guarantee that C code is generated in the right order for circular cdef > > imports. It does NOT solve python circular import issues. > > a.pyx: cimport b > > b.pyx: cimport a > > Is thus legal, but you can not reference a or b in the global namespace > of > > b.pyx or a.pyx (such as to instantiate a class). > > This patch also modifies cython to output the exact line in the C code > where > > an exception was thrown in addition to the currently displayed pyx file > and > > line. This enables significantly faster developmental debugging. > > Finally it splits module initialization into two phases: one that > initiates > > types and handles imports, and another that executes python commands at > the > > global namespace level. This will be more useful as Cython starts to > assume > > more advanced optimization and code generation features > > The patch is available at: > > http://trac.sagemath.org/sage_trac/ticket/2655(thethird attachment > > only) and is based against 0.9.6.12, > > although I can rebase if needed. I am hoping this can be merged into > > Cython. > > The patch doesn't work for me. When I fix the problems with the header > file > generation, I end up with C code that defines the "__pyx_obj_*" class > structs > in the wrong order. Note that there are no circular dependencies involved > in > my case and that the two classes where the problem occurs are defined in > the > same source file (although their highest base class is defined in a > separate > file, not sure if that matters. > > Personally, seeing all problems I have with this patch and not really > seeing a > clear gain, I would just revert the patch for now. > > Stefan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/cython-dev/attachments/20080408/e2d4fe2f/attachment.htm From robertwb at math.washington.edu Wed Apr 9 01:32:03 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 8 Apr 2008 16:32:03 -0700 Subject: [Cython] Lisp inspired transforms In-Reply-To: <47FBA1AE.7020004@student.matnat.uio.no> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47FB7730.4060503@martincmartin.com> <45162.194.114.62.67.1207670977.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47FBA1AE.7020004@student.matnat.uio.no> Message-ID: <8BF96D07-9478-4E38-87A8-64966FBCCC8B@math.washington.edu> On Apr 8, 2008, at 9:47 AM, Dag Sverre Seljebotn wrote: > +1 for polishing it and provide option c) as a plugin for now and see > how it goes, and discuss inclusion in main Cython after it has proven > itself. >> I'll have to take a closer look at your proposal and compare it a >> bit more >> to the other approaches we had so far (especially Dag's work), >> before I >> make up my mind about it. Maybe others can already comment a bit >> deeper on >> this. >> > Since you bring up my name: > > a) Clean NumPy integration (that is, with only a pxd file, not a full > NumPy plugin) needs some kind of metaprogramming support, but can > either > work with Martin's explicit approach or my implicit approach, doesn't > matter much. (The plan is to not use meta-programming at first, but > that > will be slow and metaprogramming is key to getting full NumPy speed). There will be a little bit of metaprogramming required for NumPy support (e.g. to get the type declarations right) but I think the crucial piece to make things run efficiently and smoothly is extensive compile-time evaluation of expressions. To be very specific about Numpy, the array has the format ctypedef extern class numpy.ndarray [object PyArrayObject]: cdef char *data cdef int nd cdef npy_intp *dimensions cdef npy_intp *strides cdef object base cdef dtype descr cdef int flags To access an element (i.e. __getitem__) of an ndarray A on does A.data[sizeof(A.descr.type) * A.strides[0] * ix] (well, the actual code is a bit more complicated than this, using index2ptr and all). In any case, the point is that if we have compile- time information about A then this can be simplified to a single array lookup with nothing more than compile-time-evaluation (and a little compile-time type analysis). In this case the type parameters are exactly the instance member fields. If they are not known at compile time then the code produced is the same (though it won't be as completely evaluated). I'll admit I'm waiving my hands a bit as to how to handle the actual types themselves, but I think it could be done in a similar manner. This is much weaker than full metaprogramming, but is easy to understand, implement, and read (especially compared to trying to implement array indexing as a series of tree transformations). > b) About my work in relation to this, see the uneval page: > > http://wiki.cython.org/enhancements/uneval > > If Martin's work is accepted now, and my own approach for > meta-programming is ever done later, then uneval provides a very > natural > bridge between them. The two seems to be very complementary. > Martin's is > "explicit" and simple but for advanced users, mine is "easy-to-use" > for > beginners but more difficult to really understand for advanced > users. So > doing Martin's first, and then see if my more complicated approach is > really needed should be fine as long as uneval provides a natural > transition path. > > uneval() would return the same kind of tree that Martin allows work > on, > whatever that tree ends up being (as I understand it the exact syntax > used is an example, one should add a small API layer on top to isolate > it more from Cython core). The uneval idea is a very interesting one, and certainly has a very pythonic feel to it. One thing I don't like is all of these are very closely coupled to the actual Cython parse tree--you are right in that there should be some abstraction. There is also the question of when the transformations get done, as I'd imagine some of them would be type-dependent. - Robert From robertwb at math.washington.edu Wed Apr 9 01:32:10 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 8 Apr 2008 16:32:10 -0700 Subject: [Cython] Lisp inspired transforms In-Reply-To: <45162.194.114.62.67.1207670977.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <47F8C640.10901@martincmartin.com> <35BE3266-C5D5-4695-A90A-9B749E91453B@math.washington.edu> <47FA8C84.5070207@martincmartin.com> <489C6F51-72E7-4B39-B1BB-D1EF9C3B5365@math.washington.edu> <8568.194.114.62.67.1207659004.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47FB7730.4060503@martincmartin.com> <45162.194.114.62.67.1207670977.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <86943EA1-B72C-4AA9-9EA1-04CDB3685222@math.washington.edu> On Apr 8, 2008, at 9:09 AM, Stefan Behnel wrote: > Martin C. Martin wrote: >> Compile time metaprogramming doesn't exist in Python, so adding it to >> Cython means extending Cython beyond what Python has. > > Cython has a couple of additional features that make sense because > it is a > compiled language. I think what you call "metaprogramming" (and > generally > most things that allow doing things at compile-time instead of run- > time) > makes sense for Cython. > >> There are a couple options: >> >> 1. Add a way to generate C++ templates, and use that for >> metaprogramming. It keeps Cython as "writing both Python and C++ >> with >> an extended Python syntax." > > But that would be C++ specific and can't work with C. I agree--I don't think we should make any Cython features dependent on C++, and don't know that the mataprogramming would even map nicely (let alone comprehensibly) from the one language to the other. On a completely orthogonal note, I would like to make Cython template- aware (enough) to make it easy to wrap templated C++ libraries. >> 3. Add a way to specify Python code for the transformation. This >> recognizes that metaprogramming is a valuable activity that Cython >> developers will want to do; that the existing way to do it in C++ is >> more-or-less not up to the task; and that it's better to provide a >> new, >> cleaner mechanism using what we've learned in hindsight. > > I would say so. I would currently position it as a) an extension > mechanism > for Cython itself and b) an advanced feature that most people won't > use > (in the same way most people don't use metaclasses) - but as usual > with > OSS, you never know what people will use it for. > > >> Of course, metaprogramming in an imperative, stateful language, >> opens a >> can of worms, e.g. it will be valuable to modify Python data at >> compile >> time, and have that serialized once all transforming is done, then >> loaded at the start of runtime. I don't think any of these >> problems are >> particularly difficult though. >> >> So, what do people this is the best way forward for Cython? > > I'll have to take a closer look at your proposal and compare it a > bit more > to the other approaches we had so far (especially Dag's work), > before I > make up my mind about it. Maybe others can already comment a bit > deeper on > this. I have thought a lot about this and I think the best way forward for Cython is to become as close to Python (from the end-users point of view) as possible without sacrificing speed and the ability to easily wrap existing libraries. Cython as a tool to for use by Python developers has a much greater potential then Cython as a new language. Unfortunately, when it comes to adding new features to the language, these two views of Cython are mutually- opposed, and I think we should stick with the first. I do not want to devalue the benefits of metaprogramming (and probably several other features that will probably come up which Python is lacking) but if we decide to add such features I think there should be a clear distinction between this "Cython+" language and "normal Cython." (Not necessarily a separate compiler, but at least a separate file extension). Also, I think we should mostly focus on catching up to where Python is now before expending too much effort going beyond. - Robert From rasjidw at gmail.com Wed Apr 9 02:03:37 2008 From: rasjidw at gmail.com (Rasjid Wilcox) Date: Wed, 9 Apr 2008 10:03:37 +1000 Subject: [Cython] Embedding Cython Message-ID: Hi, I've just been playing a little with Pyrex and Cython. In particular, testing the Embedding Pyrex HOWTO at http://www.freenet.org.nz/python/embeddingpyrex/. It works with Pyrex, but not with Cython. In particular, $ cython testpyx.pyx Error converting Pyrex file to C: ------------------------------------------------------------ ... # IMPORTANT - we need to explicitly prototype the function cdef public void inittestpyx() ^ ------------------------------------------------------------ .../mycode/python/cython/testpyx.pyx:29:28: Non-extern C function declared but not defined ============================================ Any ideas on how to get this working with Cython? Cheers, Rasjid. From rasjidw at gmail.com Wed Apr 9 03:02:36 2008 From: rasjidw at gmail.com (Rasjid Wilcox) Date: Wed, 9 Apr 2008 11:02:36 +1000 Subject: [Cython] Embedding Cython In-Reply-To: References: Message-ID: On Wed, Apr 9, 2008 at 10:03 AM, Rasjid Wilcox wrote: > Hi, > > I've just been playing a little with Pyrex and Cython. In particular, > testing the Embedding Pyrex HOWTO at > http://www.freenet.org.nz/python/embeddingpyrex/. > > It works with Pyrex, but not with Cython. Ah, correction. It no longer works with either. More precisely, it works with the version of Pyrex in a default Ubuntu 7.10 install (Pyrex 0.9.5.1a) but does not work with the current version of Pyrex (0.9.6.4). I was testing it with Cython 0.9.6.13 which (if I understand things correctly) is based on Pyrex 0.9.6, so it is to be expected that it does not work. Any suggestions on how to get this working on the current version of either Pyrex or Cython would be appreciated. Cheers, Rasjid. From rasjidw at gmail.com Wed Apr 9 03:25:47 2008 From: rasjidw at gmail.com (Rasjid Wilcox) Date: Wed, 9 Apr 2008 11:25:47 +1000 Subject: [Cython] Embedding Cython In-Reply-To: References: Message-ID: Solved the problem. I changed: cdef public void inittestpyx() to: cdef extern from "testpyx.h": void inittestpyx() and it all works like a charm, at least on Linux. Cheers, Rasjid. From robertwb at math.washington.edu Wed Apr 9 04:48:18 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 8 Apr 2008 19:48:18 -0700 Subject: [Cython] Embedding Cython In-Reply-To: References: Message-ID: On Apr 8, 2008, at 6:25 PM, Rasjid Wilcox wrote: > Solved the problem. > > I changed: > > cdef public void inittestpyx() > > to: > > cdef extern from "testpyx.h": > void inittestpyx() > > and it all works like a charm, at least on Linux. Glad you were able to figure it out. This should work fine on other platforms too--I'm actually surprised the old way worked. - Robert From dagss at student.matnat.uio.no Wed Apr 9 11:51:13 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 09 Apr 2008 11:51:13 +0200 Subject: [Cython] Prototype patch for closures/inner functions Message-ID: <47FC9191.7020802@student.matnat.uio.no> Stefan wrote: > I agree with Robert. As long as Cython does not support closures, for > example, it cannot come close enough to being a real option for speeding I couldn't resist the challenge :-) Attached is a prototype using transforms to add closure support to Cython. NB! It's not ready for prime-time yet. Unfortunately I must leave it for some days now, so I post it it prototype state. Mainly so that others don't start on the same thing (though feel free to take over this, just give me a note). I mainly write it to see how it would be like to write a "real" transform. Which was not too bad... It is a bit hacky but I think the approach should give correct results. Known fatal bugs: - I don't know the first thing about reference counting, CPython etc.. At least on one occasion I messed up GC-ing. This is probably something that others will be much quicker at spotting. - No name mangling, inner functions cannot collide in name with outer. Quick fix but don't have time now; also one might want a more generic "name mangler" support mechanism rather than just checking for collisions, not sure about how to do this. - No consideration for anything nontrivial: The global keyword and accessing variables in modules comes to mind. Conscious limitations: - Only Python def's, not inner cdefs, and all bound vars are bound as Python objects. This restriction can be removed later (for optimization), but I think Python-only inner defs should work fine. Strategy: - First, run a transform that records used and assigned symbols within a function (don't know if this is already done anywhere, probably this might be redundant and done in the scope system? Suggestions?). - Then, run a transform which lifts out inner functions, and replace with assignment to a method bound to a tuple containing the variables that should be bound (apparently this works. But one might create a specific type containing a tuple as a field as well.) - The lifted-out functions gets an instruction added first to unpack the "self" tuple to the correct names (making the names shadowing the outer scope). How to run (if you want to help me out or have a look, otherwise don't bother): Apply patch. $ cat < test.pyx def make_adder(n): def adder(x): return x + n return adder def timesthree(n): def util(x): return x * 2 return n + util(n) END $ cat < test.sh python cython.py \ -Tafter_parse:Cython.Compiler.Transforms.InnerFunctions.FunctionSymbols \ -Tafter_parse:Cython.Compiler.Transforms.InnerFunctions.InnerFunctions \ -Tafter_analyse_function:Cython.Compiler.Transforms.InnerFunctions.MethodTableIndex \ test.pyx gcc -Wall -shared -fPIC -I/usr/include/python2.5 -o test.so test.c END $ python >>> import test >>> a = test.make_adder(20) >>> b = test.make_adder(10) >>> a >>> a(13) 33 >>> b(13) 23 >>> test.timesthree(100) 300 -- Dag Sverre -------------- next part -------------- A non-text attachment was scrubbed... Name: innerfuncs.diff Type: text/x-patch Size: 14964 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20080409/0d8e7126/attachment-0001.bin From dagss at student.matnat.uio.no Wed Apr 9 11:55:34 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 09 Apr 2008 11:55:34 +0200 Subject: [Cython] Prototype patch for closures/inner functions In-Reply-To: <47FC9191.7020802@student.matnat.uio.no> References: <47FC9191.7020802@student.matnat.uio.no> Message-ID: <47FC9296.2050105@student.matnat.uio.no> > > $ cat < test.sh > python cython.py \ > -Tafter_parse:Cython.Compiler.Transforms.InnerFunctions.FunctionSymbols > \ > -Tafter_parse:Cython.Compiler.Transforms.InnerFunctions.InnerFunctions \ > > -Tafter_analyse_function:Cython.Compiler.Transforms.InnerFunctions.MethodTableIndex > \ > test.pyx > gcc -Wall -shared -fPIC -I/usr/include/python2.5 -o test.so test.c > END (Line-wrap messed it up). Also, to be completely clear, when it is finished it will definitely self-register, no arguments should be needed :-) -- Dag Sverre From dagss at student.matnat.uio.no Wed Apr 9 12:02:59 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 09 Apr 2008 12:02:59 +0200 Subject: [Cython] Prototype patch for closures/inner functions In-Reply-To: <47FC9191.7020802@student.matnat.uio.no> References: <47FC9191.7020802@student.matnat.uio.no> Message-ID: <47FC9453.2000702@student.matnat.uio.no> > > Apply patch. Also, touch Cython/Compiler/Transforms/__init__.py Looks like hg diff didn't include an added, empty file in the diff... -- Dag Sverre From dagss at student.matnat.uio.no Wed Apr 9 12:30:17 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 09 Apr 2008 12:30:17 +0200 Subject: [Cython] Prototype patch for closures/inner functions In-Reply-To: <47FC9191.7020802@student.matnat.uio.no> References: <47FC9191.7020802@student.matnat.uio.no> Message-ID: <47FC9AB9.1070500@student.matnat.uio.no> > > Known fatal bugs: > - I don't know the first thing about reference counting, CPython etc.. > At least on one occasion I messed up GC-ing. This is probably > something that others will be much quicker at spotting. > - No name mangling, inner functions cannot collide in name with outer. > Quick fix but don't have time now; also one might want a more generic > "name mangler" support mechanism rather than just checking for > collisions, not sure about how to do this. > - No consideration for anything nontrivial: The global keyword and > accessing variables in modules comes to mind. Add this one: - If the inner function calls cdef functions, one will have problems because it will try to include the name of the cdef function in the closure. I guess this really wasn't that ready. But as I said, I have to take a break and better to post anything at all. +, if I'm heading in the wrong direction (scope is already calculated...) someone can help me out. -- Dag Sverre From stefan_ml at behnel.de Wed Apr 9 12:57:01 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 9 Apr 2008 12:57:01 +0200 (CEST) Subject: [Cython] Cython circular cdef import patch In-Reply-To: <8f8f8530804081500p18b87acesd9e1d9103b67bfc6@mail.gmail.com> References: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> <47FBB067.5080001@behnel.de> <8f8f8530804081500p18b87acesd9e1d9103b67bfc6@mail.gmail.com> Message-ID: <44871.194.114.62.66.1207738621.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Gary Furnish wrote: > Can you produce a testcase for this issue? It is *critical* for fast > symbolics in Sage, so I would prefer to fix the bug as opposed to > reverting. > I see the problem now, but I will need the test cases you are using to > produce a fix. Once I have them it should be a relatively easy patch. Personally, I do not consider the patch that was applied a clean patch, as it mixes a number of independent smaller and bigger changes - and a subset of them has proven to be problematic. So I attached a changeset that reverts the parts that I think were related to the behaviour I see and that does some minor source cleanup of the affected areas. I had to do that by hand, as the three related changesets (one by you, two by Robert) are split around the package merge, which makes it really hard to get them reverted cleanly. I will try to come up with a test case - which may take a bit longer as I am not sure the current test suite is ready to handle multi-file tests (although a mix of .pyx and .pxd should work, I'll have to check that). I would ask you to provide a clean patch against my reverted source version, so that we have a single clean changeset of what you wanted to achieve, possibly followed by one or more bug fix commits that eventually lead to a working patch. I'm not questioning the intention of your changes in general, I'm just concerned about a) getting the bugs out and b) making it traceable in the revision history what change had what effects. Smaller, separate self-contained commits are always better than a mix of different changes in a big patch. BTW, what is the "init2" function for? From what I read, you seem to have cut it out of the original module init function. What's its purpose? (implying: what would be a clearer name than "init2"?) And: why is it declared "PyMODINIT_FUNC" instead of plain "void", which seems to be the way you call it? Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: revert-circular-cdef-changes.patch Type: application/octet-stream Size: 15125 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20080409/f141747b/attachment-0001.obj From martin at martincmartin.com Wed Apr 9 17:25:58 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Wed, 09 Apr 2008 11:25:58 -0400 Subject: [Cython] Another potential optimization possible with transforms Message-ID: <47FCE006.9040301@martincmartin.com> Hi, There's a potential optimization I mentioned on the Lisp inspired transforms page, where you could reorder bitfields in order to pack them most efficiently. Eerily, someone at my job just committed something that did just that. We have a custom defstruct, called defstruct-bv, which allows you to specify bit fields. (Lisp doesn't come with bitfields.) Here's the checkin message: Log: At compile time, automatically optimize the layout of bits in a defstruct-bv definition to minimize the number of words used by the resulting struct. This is an instance of the bin-packing problem, which is NP-hard. Fortunately, it has a good heuristic solution, first-fit decreasing, which gets very close to optimal answers. I just implemented that as a wrapper around the existing defstruct-bv, which I renamed defstruct-bv-internal. There is no longer any reason to worry about the order of bit field definitions within a lisp struct. If you have a struct that you don't want this to happen to (perhaps because you have a performance consideration for the layout), add (:optimize-p NIL) as an option to defstruct-bv. I was somewhat disappointed with the results: most of our structs were already laid out as optimally as this algorithm is able to achieve. The single exception was the faring-atom struct, which got 16 bytes smaller. Best, Martin From robertwb at math.washington.edu Wed Apr 9 19:35:04 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 9 Apr 2008 10:35:04 -0700 Subject: [Cython] Another potential optimization possible with transforms In-Reply-To: <47FCE006.9040301@martincmartin.com> References: <47FCE006.9040301@martincmartin.com> Message-ID: <351CC26C-59C7-4A45-BA96-5F2440F992D5@math.washington.edu> On Apr 9, 2008, at 8:25 AM, Martin C. Martin wrote: > Hi, > > There's a potential optimization I mentioned on the Lisp inspired > transforms page, where you could reorder bitfields in order to pack > them > most efficiently. Eerily, someone at my job just committed something > that did just that. We have a custom defstruct, called defstruct-bv, > which allows you to specify bit fields. (Lisp doesn't come with > bitfields.) Here's the checkin message: > > Log: > At compile time, automatically optimize the layout of bits in a > defstruct-bv definition to minimize the number of words used by the > resulting struct. > > This is an instance of the bin-packing problem, which is NP-hard. > Fortunately, it has a good heuristic solution, first-fit decreasing, > which gets very close to optimal answers. I just implemented that as > a wrapper around the existing defstruct-bv, which I renamed > defstruct-bv-internal. > > There is no longer any reason to worry about the order of bit field > definitions within a lisp struct. > > If you have a struct that you don't want this to happen to (perhaps > because you have a performance consideration for the layout), add > (:optimize-p NIL) as an option to defstruct-bv. > > I was somewhat disappointed with the results: most of our structs were > already laid out as optimally as this algorithm is able to achieve. > The single exception was the faring-atom struct, which got 16 bytes > smaller. > > Best, > Martin I believe in Cython we would just rely on the C compiler to make optimizations of this nature. - Robert From martin at martincmartin.com Wed Apr 9 20:08:52 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Wed, 09 Apr 2008 14:08:52 -0400 Subject: [Cython] Another potential optimization possible with transforms In-Reply-To: <351CC26C-59C7-4A45-BA96-5F2440F992D5@math.washington.edu> References: <47FCE006.9040301@martincmartin.com> <351CC26C-59C7-4A45-BA96-5F2440F992D5@math.washington.edu> Message-ID: <47FD0634.5000407@martincmartin.com> Robert Bradshaw wrote: > On Apr 9, 2008, at 8:25 AM, Martin C. Martin wrote: >> Hi, >> >> There's a potential optimization I mentioned on the Lisp inspired >> transforms page, where you could reorder bitfields in order to pack them >> most efficiently. Eerily, someone at my job just committed something >> that did just that. We have a custom defstruct, called defstruct-bv, >> which allows you to specify bit fields. (Lisp doesn't come with >> bitfields.) Here's the checkin message: >> ... > > I believe in Cython we would just rely on the C compiler to make > optimizations of this nature. That's the point, C prescribes that it *can't* make such optimizations. C fields (including bitfields) must appear in memory in the order they appear in the source code. For bit fields, this is an advantage when you're, say, parsing the header of some binary file, or writing a device driver. But if you just want a struct that stores some data compactly, the C compiler can't help you out in that case. Best, Martin From ndbecker2 at gmail.com Wed Apr 9 20:34:30 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 09 Apr 2008 14:34:30 -0400 Subject: [Cython] Another potential optimization possible with transforms References: <47FCE006.9040301@martincmartin.com> <351CC26C-59C7-4A45-BA96-5F2440F992D5@math.washington.edu> <47FD0634.5000407@martincmartin.com> Message-ID: Martin C. Martin wrote: > > > Robert Bradshaw wrote: >> On Apr 9, 2008, at 8:25 AM, Martin C. Martin wrote: >>> Hi, >>> >>> There's a potential optimization I mentioned on the Lisp inspired >>> transforms page, where you could reorder bitfields in order to pack them >>> most efficiently. Eerily, someone at my job just committed something >>> that did just that. We have a custom defstruct, called defstruct-bv, >>> which allows you to specify bit fields. (Lisp doesn't come with >>> bitfields.) Here's the checkin message: >>> > ... >> >> I believe in Cython we would just rely on the C compiler to make >> optimizations of this nature. > > That's the point, C prescribes that it *can't* make such optimizations. > C fields (including bitfields) must appear in memory in the order they > appear in the source code. > > For bit fields, this is an advantage when you're, say, parsing the > header of some binary file, or writing a device driver. But if you just > want a struct that stores some data compactly, the C compiler can't help > you out in that case. > > Best, What about gcc's 'packed' attribute? `packed' The `packed' attribute specifies that a variable or structure field should have the smallest possible alignment--one byte for a variable, and one bit for a field, unless you specify a larger value with the `aligned' attribute. Here is a structure in which the field `x' is packed, so that it immediately follows `a': struct foo { char a; int x[2] __attribute__ ((packed)); }; From martin at martincmartin.com Wed Apr 9 20:39:24 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Wed, 09 Apr 2008 14:39:24 -0400 Subject: [Cython] Another potential optimization possible with transforms In-Reply-To: References: <47FCE006.9040301@martincmartin.com> <351CC26C-59C7-4A45-BA96-5F2440F992D5@math.washington.edu> <47FD0634.5000407@martincmartin.com> Message-ID: <47FD0D5C.6090605@martincmartin.com> Neal Becker wrote: > Martin C. Martin wrote: > >> >> Robert Bradshaw wrote: >>> On Apr 9, 2008, at 8:25 AM, Martin C. Martin wrote: >>>> Hi, >>>> >>>> There's a potential optimization I mentioned on the Lisp inspired >>>> transforms page, where you could reorder bitfields in order to pack them >>>> most efficiently. Eerily, someone at my job just committed something >>>> that did just that. We have a custom defstruct, called defstruct-bv, >>>> which allows you to specify bit fields. (Lisp doesn't come with >>>> bitfields.) Here's the checkin message: >>>> >> ... >>> I believe in Cython we would just rely on the C compiler to make >>> optimizations of this nature. >> That's the point, C prescribes that it *can't* make such optimizations. >> C fields (including bitfields) must appear in memory in the order they >> appear in the source code. >> >> For bit fields, this is an advantage when you're, say, parsing the >> header of some binary file, or writing a device driver. But if you just >> want a struct that stores some data compactly, the C compiler can't help >> you out in that case. >> >> Best, > > What about gcc's 'packed' attribute? > > `packed' > The `packed' attribute specifies that a variable or structure field > should have the smallest possible alignment--one byte for a > variable, and one bit for a field, unless you specify a larger > value with the `aligned' attribute. > > Here is a structure in which the field `x' is packed, so that it > immediately follows `a': > > struct foo > { > char a; > int x[2] __attribute__ ((packed)); > }; Access to non-word aligned ints can be very slow, that's why packed isn't the default. The use case is more like: struct foo { int a; void *p; int b; } On a 64 bit machine with 32 bit ints, that structure takes up 3 words (24 bytes). Instead, if you wrote it as: struct foo { int a; int b; void *p; } Everything is properly aligned, and it only takes 2 words (16 bytes). Best, Martin From stefan_ml at behnel.de Wed Apr 9 17:48:17 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 09 Apr 2008 17:48:17 +0200 Subject: [Cython] Cython circular cdef import patch In-Reply-To: <8f8f8530804081500p18b87acesd9e1d9103b67bfc6@mail.gmail.com> References: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> <47FBB067.5080001@behnel.de> <8f8f8530804081500p18b87acesd9e1d9103b67bfc6@mail.gmail.com> Message-ID: <47FCE541.9050204@behnel.de> Hi, Gary Furnish wrote: > Can you produce a testcase for this issue? It is *critical* for fast > symbolics in Sage, so I would prefer to fix the bug as opposed to reverting. > I see the problem now, but I will need the test cases you are using to > produce a fix. Once I have them it should be a relatively easy patch. Here is a test case for the ordering bug (and a fix for the test runner). A test for the header file generation is harder to write, as it requires breaking out of the current "compile[-run]" test scheme into a "compile-compile[-run]" scheme that builds a module against the previously generated header file. I have currently no idea how to enable that in anything but an ugly hackish way. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: test-case-for-cdef-extern-class-definitions.patch Type: text/x-patch Size: 2068 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20080409/05bd687f/attachment.bin From stefan_ml at behnel.de Wed Apr 9 19:07:58 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 09 Apr 2008 19:07:58 +0200 Subject: [Cython] "Strict aliasing" warnings on generated code Message-ID: <47FCF7EE.3050407@behnel.de> Hi, does anyone know how to get rid of all the "dereferencing type-punned pointer will break strict-aliasing rules" warnings that I see in the tests? I mean, except by switching them off on the gcc command line or passing "-fno-strict-aliasing". :) It complains about code lines where we cast public PyTypeObject pointers to PyObject pointers, as in the following examples: PyObject_Call(((PyObject*)&PyString_Type), __pyx_1, NULL); Py_INCREF(((PyObject*)&PyString_Type)); and in the first two lines of the function __Pyx_PyObject_IsTrue: static INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { if (x == Py_True) return 1; else if (x == Py_False) return 0; else return PyObject_IsTrue(x); } where Python's bool header file defines Py_True as #define Py_True ((PyObject *) &_Py_TrueStruct) It would be nice if gcc considered PyObject and PyTypeObject equivalent with respect to the strict aliasing assumptions. Is there any way to rewrite this to make gcc a bit smarter here? Stefan From sven at pyrex.berkvens.net Wed Apr 9 22:43:50 2008 From: sven at pyrex.berkvens.net (Sven Berkvens-Matthijsse) Date: Wed, 9 Apr 2008 22:43:50 +0200 Subject: [Cython] "Strict aliasing" warnings on generated code In-Reply-To: <47FCF7EE.3050407@behnel.de> References: <47FCF7EE.3050407@behnel.de> Message-ID: <20080409204350.GA15930@berkvens.net> > Hi, Hello, > does anyone know how to get rid of all the "dereferencing > type-punned pointer will break strict-aliasing rules" warnings that > I see in the tests? I mean, except by switching them off on the gcc > command line or passing "-fno-strict-aliasing". :) Technically, -fno-strict-aliasing is probably your only option. You are accessing the same memory through pointer to differently-sized types here, and that violates the strict aliasing rules, which allows GCC to assume that an object can only change (and thus, parts of it may be cached in registered and such) if an object of the same size (and nearly the same type) is written to. The warnings may actually be real warnings that GCC has in fact generated incorrect code. I've seen this happen before and debugging this kind of stuff is very difficult and actually required me to examine the assembler code in the end to see what GCC had made of my code (and it compiled it in such a way that the caching caused off by one errors). > It complains about code lines where we cast public PyTypeObject > pointers to PyObject pointers, as in the following examples: > PyObject_Call(((PyObject*)&PyString_Type), __pyx_1, NULL); > > Py_INCREF(((PyObject*)&PyString_Type)); Correct, these are pointers to differently-sized types (and they are non-const), so this causes a warning. > and in the first two lines of the function __Pyx_PyObject_IsTrue: > > static INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { > if (x == Py_True) return 1; > else if (x == Py_False) return 0; > else return PyObject_IsTrue(x); > } I'm not sure why GCC is complaining here, the value is not passed on to a function that may modify the object, only its address it used here. > where Python's bool header file defines Py_True as > > #define Py_True ((PyObject *) &_Py_TrueStruct) > > > It would be nice if gcc considered PyObject and PyTypeObject > equivalent with respect to the strict aliasing assumptions. Is there > any way to rewrite this to make gcc a bit smarter here? No, I don't think so. You need to compile with -fno-strict-aliasing. In fact, Python itself is compiled with this option as well (which is unavoidable and also necessary). Modules must be built with this option enabled too. It looks like distutils does NOT compile C files with this option, and this may actually build loadable modules that break for mysterious reasons! > Stefan -- With kind regards, Sven From dg at pnylab.com Wed Apr 9 23:10:44 2008 From: dg at pnylab.com (Dan Gindikin) Date: Wed, 9 Apr 2008 21:10:44 +0000 (UTC) Subject: [Cython] (no subject) References: <47FCF7EE.3050407@behnel.de> <20080409204350.GA15930@berkvens.net> Message-ID: Sven Berkvens-Matthijsse writes: > > > Hi, > > Hello, > > > does anyone know how to get rid of all the "dereferencing > > type-punned pointer will break strict-aliasing rules" warnings that > > I see in the tests? I mean, except by switching them off on the gcc > > command line or passing "-fno-strict-aliasing". :) > I think you really need "-fno-strict-aliasing". I've seen "cdef class" inheritance break without it: the vtable of class method pointers gets screwed up between base and derived classes, and you end up calling the wrong functions. From stefan_ml at behnel.de Thu Apr 10 13:59:26 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 10 Apr 2008 13:59:26 +0200 Subject: [Cython] (no subject) In-Reply-To: References: <47FCF7EE.3050407@behnel.de> <20080409204350.GA15930@berkvens.net> Message-ID: <47FE011E.3030203@behnel.de> Dan Gindikin wrote: > Sven Berkvens-Matthijsse writes: >>> does anyone know how to get rid of all the "dereferencing >>> type-punned pointer will break strict-aliasing rules" warnings that >>> I see in the tests? I mean, except by switching them off on the gcc >>> command line or passing "-fno-strict-aliasing". :) > > I think you really need "-fno-strict-aliasing". I've seen "cdef class" > inheritance break without it: the vtable of class method pointers gets screwed up > between base and derived classes, and you end up calling the wrong functions. Thanks Sven, Dan, I guess that's the option to take then. Stefan From rob at tvcentric.com Thu Apr 10 17:26:47 2008 From: rob at tvcentric.com (Rob Shortt) Date: Thu, 10 Apr 2008 12:26:47 -0300 Subject: [Cython] defining module constants Message-ID: <47FE31B7.9020507@tvcentric.com> Hello, I'm creating some python bindings to a C library. I've defined everything from the lib's header file in my pxd file and would like to expose many of the defined enums in my pyrex module, not as members of a class, but toplevel to my module. For example, in my pxd file I have stuff like: cdef extern from "directfb.h": ctypedef enum DFBResult: DFB_OK DFB_FAILURE DFB_INIT DFB_BUG and so on... What can I add to my pyx file so that my resulting module will have these, like: import directfb print directfb.DFB_OK Please don't tell me I don't have to redefine them all in my pyx file, but some equvalent to PyModule_AddIntConstant(m, "DFB_OK", DFB_OK) would be ok. Thanks, -Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080410/a33010e0/attachment.pgp From dagss at student.matnat.uio.no Thu Apr 10 18:45:12 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 Apr 2008 18:45:12 +0200 Subject: [Cython] Offtopic: Good Python IDEs? Message-ID: <47FE4418.40703@student.matnat.uio.no> This far I've been editing Python code with "gedit", and so I'm wondering: What's your favorite? emacs? Eclipse? eric? I'd especially like stuff like - Quick jumping to classes and up and down between class hierarchies on overriden methods - Refactoring tools - Automatic import writing - Completion... Though integrated debugging, test running etc. are "nice" too. Dag Sverre From ellisonbg.net at gmail.com Thu Apr 10 18:51:10 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 10:51:10 -0600 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) Message-ID: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> Hi, (dual posted to sage and cython) A few of us (ipython and mpi4py devs) are wondering what the best/safest way of allocating dynamic memory in a local scope (method/function) is when using cython. An example would be if you need an array of c ints that is locally scoped. The big question is how to make sure that the memory gets freed - even if something goes wrong in the function/method. That is, you want to prevent memory leaks. It looks like in sage, the sage_malloc/sage_free functions are used for this purpose: from sage/graphs/graph_isom.pyx: 176 def incorporate_permutation(self, gamma): 202 cdef int *_gamma = sage_malloc( n * sizeof(int) ) 203 if not _gamma: 204 raise MemoryError("Error allocating memory.") 205 for k from 0 <= k < n: 206 _gamma[k] = gamma[k] 207 self._incorporate_permutation(_gamma, n) 208 sage_free(_gamma) Because sage_malloc is #defined to malloc in stdsage.h, I think there is a significant potential for memory leaks in code like this. Are we thinking correctly on this issue? Isn't this a huge problem? Lisandro Dalcin (author of mpi4py) came up with the following trick that, while more complicated, prevents memory leaks: cdef extern from "Python.h": object PyString_FromStringAndSize(char*,Py_ssize_t) char* PyString_AS_STRING(object) cdef inline object pyalloc_i(int size, int **i): if size < 0: size = 0 cdef Py_ssize_t n = size * sizeof(int) cdef object ob = PyString_FromStringAndSize(NULL, n) i[0] = PyString_AS_STRING(ob) return ob and now def foo(sequence): cdef int size = len(sequence), cdef int *buf = NULL cdef object tmp = pyalloc_i(size, &buf) This could probably be adapted into a malloc-like function. What do people think? Thanks, Brian From ellisonbg.net at gmail.com Thu Apr 10 18:51:22 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 10:51:22 -0600 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) Message-ID: <6ce0ac130804100951q4e8d1a57o43d715ed56836cd7@mail.gmail.com> Hi, (dual posted to sage and cython) A few of us (ipython and mpi4py devs) are wondering what the best/safest way of allocating dynamic memory in a local scope (method/function) is when using cython. An example would be if you need an array of c ints that is locally scoped. The big question is how to make sure that the memory gets freed - even if something goes wrong in the function/method. That is, you want to prevent memory leaks. It looks like in sage, the sage_malloc/sage_free functions are used for this purpose: from sage/graphs/graph_isom.pyx: 176 def incorporate_permutation(self, gamma): 202 cdef int *_gamma = sage_malloc( n * sizeof(int) ) 203 if not _gamma: 204 raise MemoryError("Error allocating memory.") 205 for k from 0 <= k < n: 206 _gamma[k] = gamma[k] 207 self._incorporate_permutation(_gamma, n) 208 sage_free(_gamma) Because sage_malloc is #defined to malloc in stdsage.h, I think there is a significant potential for memory leaks in code like this. Are we thinking correctly on this issue? Isn't this a huge problem? Lisandro Dalcin (author of mpi4py) came up with the following trick that, while more complicated, prevents memory leaks: cdef extern from "Python.h": object PyString_FromStringAndSize(char*,Py_ssize_t) char* PyString_AS_STRING(object) cdef inline object pyalloc_i(int size, int **i): if size < 0: size = 0 cdef Py_ssize_t n = size * sizeof(int) cdef object ob = PyString_FromStringAndSize(NULL, n) i[0] = PyString_AS_STRING(ob) return ob and now def foo(sequence): cdef int size = len(sequence), cdef int *buf = NULL cdef object tmp = pyalloc_i(size, &buf) This could probably be adapted into a malloc-like function. What do people think? Thanks, Brian From lev at columbia.edu Thu Apr 10 18:52:10 2008 From: lev at columbia.edu (Lev Givon) Date: Thu, 10 Apr 2008 12:52:10 -0400 Subject: [Cython] Offtopic: Good Python IDEs? In-Reply-To: <47FE4418.40703@student.matnat.uio.no> References: <47FE4418.40703@student.matnat.uio.no> Message-ID: <20080410165210.GD8998@localhost.cc.columbia.edu> Received from Dag Sverre Seljebotn on Thu, Apr 10, 2008 at 12:45:12PM EDT: > This far I've been editing Python code with "gedit", and so I'm > wondering: What's your favorite? emacs? Eclipse? eric? > > I'd especially like stuff like > - Quick jumping to classes and up and down between class hierarchies on > overriden methods > - Refactoring tools > - Automatic import writing > - Completion... > > Though integrated debugging, test running etc. are "nice" too. > > Dag Sverre I don't know whether it is integrated into any IDEs, but support for the nice graphical Python debugger winpdb [1] has been added to the development version of ipython. L.G. [1] http://www.winpdb.org From wstein at gmail.com Thu Apr 10 18:57:04 2008 From: wstein at gmail.com (William Stein) Date: Thu, 10 Apr 2008 09:57:04 -0700 Subject: [Cython] [sage-devel] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> Message-ID: <85e81ba30804100957j547e4057k2576292522f09ab2@mail.gmail.com> On Thu, Apr 10, 2008 at 9:51 AM, Brian Granger wrote: > > Hi, > > (dual posted to sage and cython) > > A few of us (ipython and mpi4py devs) are wondering what the > best/safest way of allocating dynamic memory in a local scope > (method/function) is when using cython. An example would be if you > need an array of c ints that is locally scoped. > > The big question is how to make sure that the memory gets freed - even > if something goes wrong in the function/method. That is, you want to > prevent memory leaks. It looks like in sage, the > sage_malloc/sage_free functions are used for this purpose: > > from sage/graphs/graph_isom.pyx: > > 176 def incorporate_permutation(self, gamma): > 202 cdef int *_gamma = sage_malloc( n * sizeof(int) ) > 203 if not _gamma: > 204 raise MemoryError("Error allocating memory.") > 205 for k from 0 <= k < n: > 206 _gamma[k] = gamma[k] > 207 self._incorporate_permutation(_gamma, n) > 208 sage_free(_gamma) > > Because sage_malloc is #defined to malloc in stdsage.h, > I think there > is a significant potential for memory leaks in code like this. Are we > thinking correctly on this issue? Yes. In the above code one could easily cause a serious memory leak by input a gamma so that gamma[k], for some k with 0 <= k < n, results in an exception. > Isn't this a huge problem? Yes it's a problem. > Lisandro Dalcin (author of mpi4py) came up with the following trick > that, while more complicated, prevents memory leaks: > > cdef extern from "Python.h": > object PyString_FromStringAndSize(char*,Py_ssize_t) > char* PyString_AS_STRING(object) > > cdef inline object pyalloc_i(int size, int **i): > if size < 0: size = 0 > cdef Py_ssize_t n = size * sizeof(int) > cdef object ob = PyString_FromStringAndSize(NULL, n) > i[0] = PyString_AS_STRING(ob) > return ob > > and now > > def foo(sequence): > cdef int size = len(sequence), > cdef int *buf = NULL > cdef object tmp = pyalloc_i(size, &buf) > > This could probably be adapted into a malloc-like function. What do > people think? Could you explain what the point is? Is it that this is a trick so that Cython will correctly garbage collect the allocated memory, even if an exception occurs? -- William From michael.abshoff at googlemail.com Thu Apr 10 18:27:41 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Thu, 10 Apr 2008 18:27:41 +0200 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> Message-ID: <47FE3FFD.7030401@gmail.com> Brian Granger wrote: > Hi, Hi Brian, > (dual posted to sage and cython) > > A few of us (ipython and mpi4py devs) are wondering what the > best/safest way of allocating dynamic memory in a local scope > (method/function) is when using cython. An example would be if you > need an array of c ints that is locally scoped. > > The big question is how to make sure that the memory gets freed - even > if something goes wrong in the function/method. That is, you want to > prevent memory leaks. It looks like in sage, the > sage_malloc/sage_free functions are used for this purpose: They generally aren't used in most of the code. The idea for those functions is that in the future we can wrap other allocators like slab allocators. > from sage/graphs/graph_isom.pyx: > > 176 def incorporate_permutation(self, gamma): > 202 cdef int *_gamma = sage_malloc( n * sizeof(int) ) > 203 if not _gamma: > 204 raise MemoryError("Error allocating memory.") > 205 for k from 0 <= k < n: > 206 _gamma[k] = gamma[k] > 207 self._incorporate_permutation(_gamma, n) > 208 sage_free(_gamma) > > Because sage_malloc is #defined to malloc in stdsage.h, I think there > is a significant potential for memory leaks in code like this. Are we > thinking correctly on this issue? Isn't this a huge problem? Well, I don't see an advantage in using Python's allocator there. It is likely slower for large allocations and make debugging memory issues much more complicated since issues like pointer corruption is significantly harder to debug. > Lisandro Dalcin (author of mpi4py) came up with the following trick > that, while more complicated, prevents memory leaks: > > cdef extern from "Python.h": > object PyString_FromStringAndSize(char*,Py_ssize_t) > char* PyString_AS_STRING(object) > > cdef inline object pyalloc_i(int size, int **i): > if size < 0: size = 0 > cdef Py_ssize_t n = size * sizeof(int) > cdef object ob = PyString_FromStringAndSize(NULL, n) > i[0] = PyString_AS_STRING(ob) > return ob > > and now > > def foo(sequence): > cdef int size = len(sequence), > cdef int *buf = NULL > cdef object tmp = pyalloc_i(size, &buf) > > This could probably be adapted into a malloc-like function. What do > people think? We valgrind the complete test suite at least weekly and with ever increasing doctest coverage I don't see that we have a problem. Python's memory management has also some serious issues and I doubt it will offer any advantage for anything but loads of small allocs. And by switching to a slab allocator on our end will fix that problem. > Thanks, > > Brian Cheers, Michael > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From martin at martincmartin.com Thu Apr 10 19:03:26 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Thu, 10 Apr 2008 13:03:26 -0400 Subject: [Cython] Offtopic: Good Python IDEs? In-Reply-To: <47FE4418.40703@student.matnat.uio.no> References: <47FE4418.40703@student.matnat.uio.no> Message-ID: <47FE485E.7020006@martincmartin.com> For editing, eclipse and WingIDE are about the same, but WingIDE has a great debugger, far better than what Eclipse has. Dag Sverre Seljebotn wrote: > This far I've been editing Python code with "gedit", and so I'm > wondering: What's your favorite? emacs? Eclipse? eric? > > I'd especially like stuff like > - Quick jumping to classes and up and down between class hierarchies on > overriden methods > - Refactoring tools > - Automatic import writing > - Completion... > > Though integrated debugging, test running etc. are "nice" too. > > Dag Sverre > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From ellisonbg.net at gmail.com Thu Apr 10 19:04:06 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 11:04:06 -0600 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <85e81ba30804100957j547e4057k2576292522f09ab2@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <85e81ba30804100957j547e4057k2576292522f09ab2@mail.gmail.com> Message-ID: <6ce0ac130804101004j1b5cd58bx68653ac0343fb059@mail.gmail.com> > > Lisandro Dalcin (author of mpi4py) came up with the following trick > > that, while more complicated, prevents memory leaks: > > > > cdef extern from "Python.h": > > object PyString_FromStringAndSize(char*,Py_ssize_t) > > char* PyString_AS_STRING(object) > > > > cdef inline object pyalloc_i(int size, int **i): > > if size < 0: size = 0 > > cdef Py_ssize_t n = size * sizeof(int) > > cdef object ob = PyString_FromStringAndSize(NULL, n) > > i[0] = PyString_AS_STRING(ob) > > return ob > > > > and now > > > > def foo(sequence): > > cdef int size = len(sequence), > > cdef int *buf = NULL > > cdef object tmp = pyalloc_i(size, &buf) > > > > This could probably be adapted into a malloc-like function. What do > > people think? > > Could you explain what the point is? Is it that this is a trick so that > Cython will correctly garbage collect the allocated memory, even > if an exception occurs? Yes, that is the idea. By having a python object that knows about the memory, the garbage collection should prevent memory leaks if an exception occurs. I am not sure if Lisandro has proved that this is the case - he is just wondering how people typically handle this case. Seems like we are not alone in thinking this is a problem though. Brian > -- William > > --~--~---------~--~----~------------~-------~--~----~ > To post to this group, send email to sage-devel at googlegroups.com > To unsubscribe from this group, send email to sage-devel-unsubscribe at googlegroups.com > For more options, visit this group at http://groups.google.com/group/sage-devel > URLs: http://www.sagemath.org > -~----------~----~----~----~------~----~------~--~--- > > From wstein at gmail.com Thu Apr 10 19:05:23 2008 From: wstein at gmail.com (William Stein) Date: Thu, 10 Apr 2008 10:05:23 -0700 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <47FE3FFD.7030401@gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> Message-ID: <85e81ba30804101005q60f8432exa0a525c10174b663@mail.gmail.com> On Thu, Apr 10, 2008 at 9:27 AM, Michael.Abshoff wrote: > > Brian Granger wrote: > > Hi, > > Hi Brian, > > > > (dual posted to sage and cython) > > > > A few of us (ipython and mpi4py devs) are wondering what the > > best/safest way of allocating dynamic memory in a local scope > > (method/function) is when using cython. An example would be if you > > need an array of c ints that is locally scoped. > > > > The big question is how to make sure that the memory gets freed - even > > if something goes wrong in the function/method. That is, you want to > > prevent memory leaks. It looks like in sage, the > > sage_malloc/sage_free functions are used for this purpose: > > They generally aren't used in most of the code. They should be. > The idea for those > functions is that in the future we can wrap other allocators like slab > allocators. > > > > from sage/graphs/graph_isom.pyx: > > > > 176 def incorporate_permutation(self, gamma): > > 202 cdef int *_gamma = sage_malloc( n * sizeof(int) ) > > 203 if not _gamma: > > 204 raise MemoryError("Error allocating memory.") > > 205 for k from 0 <= k < n: > > 206 _gamma[k] = gamma[k] > > 207 self._incorporate_permutation(_gamma, n) > > 208 sage_free(_gamma) > > > > Because sage_malloc is #defined to malloc in stdsage.h, I think there > > is a significant potential for memory leaks in code like this. Are we > > thinking correctly on this issue? Isn't this a huge problem? > > Well, I don't see an advantage in using Python's allocator there. It is > likely slower for large allocations and make debugging memory issues > much more complicated since issues like pointer corruption is > significantly harder to debug. Since our answers differ, I should clarify. The point is that by giving bad input a user could cause a memory leak. It's something our doctests probably wouldn't notice. > > > > Lisandro Dalcin (author of mpi4py) came up with the following trick > > that, while more complicated, prevents memory leaks: > > > > cdef extern from "Python.h": > > object PyString_FromStringAndSize(char*,Py_ssize_t) > > char* PyString_AS_STRING(object) > > > > cdef inline object pyalloc_i(int size, int **i): > > if size < 0: size = 0 > > cdef Py_ssize_t n = size * sizeof(int) > > cdef object ob = PyString_FromStringAndSize(NULL, n) > > i[0] = PyString_AS_STRING(ob) > > return ob > > > > and now > > > > def foo(sequence): > > cdef int size = len(sequence), > > cdef int *buf = NULL > > cdef object tmp = pyalloc_i(size, &buf) > > > > This could probably be adapted into a malloc-like function. What do > > people think? > > We valgrind the complete test suite at least weekly and with ever > increasing doctest coverage I don't see that we have a problem. Python's > memory management has also some serious issues and I doubt it will offer > any advantage for anything but loads of small allocs. And by switching > to a slab allocator on our end will fix that problem. I think the point Brian is making is that if you think about a certain class of code that is common in Sage/Numpy/Scipy, you can construct situations where exceptions are raised and code isn't freed. We're almost surely not actually testing these situations since they are "exceptional". But they do exist, and we've been planning to address them "some day". -- William Stein Associate Professor of Mathematics University of Washington http://wstein.org From wstein at gmail.com Thu Apr 10 19:07:14 2008 From: wstein at gmail.com (William Stein) Date: Thu, 10 Apr 2008 10:07:14 -0700 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804101004j1b5cd58bx68653ac0343fb059@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <85e81ba30804100957j547e4057k2576292522f09ab2@mail.gmail.com> <6ce0ac130804101004j1b5cd58bx68653ac0343fb059@mail.gmail.com> Message-ID: <85e81ba30804101007j65880919s7b1206e1f84136ac@mail.gmail.com> On Thu, Apr 10, 2008 at 10:04 AM, Brian Granger wrote: > > > > Lisandro Dalcin (author of mpi4py) came up with the following trick > > > that, while more complicated, prevents memory leaks: > > > > > > cdef extern from "Python.h": > > > object PyString_FromStringAndSize(char*,Py_ssize_t) > > > char* PyString_AS_STRING(object) > > > > > > cdef inline object pyalloc_i(int size, int **i): > > > if size < 0: size = 0 > > > cdef Py_ssize_t n = size * sizeof(int) > > > cdef object ob = PyString_FromStringAndSize(NULL, n) > > > i[0] = PyString_AS_STRING(ob) > > > return ob > > > > > > and now > > > > > > def foo(sequence): > > > cdef int size = len(sequence), > > > cdef int *buf = NULL > > > cdef object tmp = pyalloc_i(size, &buf) > > > > > > This could probably be adapted into a malloc-like function. What do > > > people think? > > > > Could you explain what the point is? Is it that this is a trick so that > > Cython will correctly garbage collect the allocated memory, even > > if an exception occurs? > > Yes, that is the idea. By having a python object that knows about the > memory, the garbage collection should prevent memory leaks if an > exception occurs. I am not sure if Lisandro has proved that this is > the case - he is just wondering how people typically handle this case. > > Seems like we are not alone in thinking this is a problem though. > It's I think a general problem not only in Python but in programming in general. The authors of http://www.flintlib.org/ -- a pure C library -- spent a lot of time worrying about this, just to I think decide that it's a really hard problem. That said, there is definitely a significant class of problems in the context of Cython that could be fixed if Lisandro's suggestion works. Thanks for posting it! William From dagss at student.matnat.uio.no Thu Apr 10 19:07:39 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 Apr 2008 19:07:39 +0200 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> Message-ID: <47FE495B.7050803@student.matnat.uio.no> (I'm not on the SAGE list so not dual-posting) > A few of us (ipython and mpi4py devs) are wondering what the > best/safest way of allocating dynamic memory in a local scope > (method/function) is when using cython. An example would be if you > need an array of c ints that is locally scoped. > Won't try-finally work here? Dag Sverre From ellisonbg.net at gmail.com Thu Apr 10 19:11:43 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 11:11:43 -0600 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <47FE3FFD.7030401@gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> Message-ID: <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> On Thu, Apr 10, 2008 at 10:27 AM, Michael.Abshoff wrote: > Brian Granger wrote: > > > Hi, > > > > Hi Brian, > > > > > (dual posted to sage and cython) > > > > A few of us (ipython and mpi4py devs) are wondering what the > > best/safest way of allocating dynamic memory in a local scope > > (method/function) is when using cython. An example would be if you > > need an array of c ints that is locally scoped. > > > > The big question is how to make sure that the memory gets freed - even > > if something goes wrong in the function/method. That is, you want to > > prevent memory leaks. It looks like in sage, the > > sage_malloc/sage_free functions are used for this purpose: > > > > They generally aren't used in most of the code. The idea for those > functions is that in the future we can wrap other allocators like slab > allocators. Maybe so, but I am am not very familiar with the sage codebase and I quickly found numerous examples of sage_malloc :) Also, from Williams reponse it sounds like the idea is that sage code _would_ use these functions. Also, how else would you allocate dynamic memory? > > > > from sage/graphs/graph_isom.pyx: > > > > 176 def incorporate_permutation(self, gamma): > > 202 cdef int *_gamma = sage_malloc( n * sizeof(int) ) > > 203 if not _gamma: > > 204 raise MemoryError("Error allocating memory.") > > 205 for k from 0 <= k < n: > > 206 _gamma[k] = gamma[k] > > 207 self._incorporate_permutation(_gamma, n) > > 208 sage_free(_gamma) > > > > Because sage_malloc is #defined to malloc in stdsage.h, I think there > > is a significant potential for memory leaks in code like this. Are we > > thinking correctly on this issue? Isn't this a huge problem? > > > > Well, I don't see an advantage in using Python's allocator there. It is > likely slower for large allocations and make debugging memory issues much > more complicated since issues like pointer corruption is significantly > harder to debug. I am not concerned about performance here, but rather memory leaks. I don't think it is a good idea to trade memory leaks for performance. > > > > > Lisandro Dalcin (author of mpi4py) came up with the following trick > > that, while more complicated, prevents memory leaks: > > > > cdef extern from "Python.h": > > object PyString_FromStringAndSize(char*,Py_ssize_t) > > char* PyString_AS_STRING(object) > > > > cdef inline object pyalloc_i(int size, int **i): > > if size < 0: size = 0 > > cdef Py_ssize_t n = size * sizeof(int) > > cdef object ob = PyString_FromStringAndSize(NULL, n) > > i[0] = PyString_AS_STRING(ob) > > return ob > > > > and now > > > > def foo(sequence): > > cdef int size = len(sequence), > > cdef int *buf = NULL > > cdef object tmp = pyalloc_i(size, &buf) > > > > This could probably be adapted into a malloc-like function. What do > > people think? > > > We valgrind the complete test suite at least weekly and with ever > increasing doctest coverage I don't see that we have a problem. Python's > memory management has also some serious issues and I doubt it will offer any > advantage for anything but loads of small allocs. And by switching to a slab > allocator on our end will fix that problem. But, test test suite doesn't test for all of the odd input that users will feed to sage. These are the cases that will leak memory and there is not possible way to test for all of them. Also debugging memory leaks is super nasty. Compared to that pain, having a slightly slower memory allocator is not a big deal. Brian > > > Thanks, > > > > Brian > > > > Cheers, > > Michael > > > > _______________________________________________ > > Cython-dev mailing list > > Cython-dev at codespeak.net > > http://codespeak.net/mailman/listinfo/cython-dev > > > > > > From ellisonbg.net at gmail.com Thu Apr 10 19:12:45 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 11:12:45 -0600 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <47FE495B.7050803@student.matnat.uio.no> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE495B.7050803@student.matnat.uio.no> Message-ID: <6ce0ac130804101012i76c7865epf2c209b27438ee63@mail.gmail.com> > (I'm not on the SAGE list so not dual-posting) > > > A few of us (ipython and mpi4py devs) are wondering what the > > best/safest way of allocating dynamic memory in a local scope > > (method/function) is when using cython. An example would be if you > > need an array of c ints that is locally scoped. > > > Won't try-finally work here? I don't see why it wouldn't. But, typical usages of sage_malloc are not protected in that way. > Dag Sverre > > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From ellisonbg.net at gmail.com Thu Apr 10 19:13:43 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 11:13:43 -0600 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <85e81ba30804101007j65880919s7b1206e1f84136ac@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <85e81ba30804100957j547e4057k2576292522f09ab2@mail.gmail.com> <6ce0ac130804101004j1b5cd58bx68653ac0343fb059@mail.gmail.com> <85e81ba30804101007j65880919s7b1206e1f84136ac@mail.gmail.com> Message-ID: <6ce0ac130804101013k51cea873g6d978fc931857f26@mail.gmail.com> > It's I think a general problem not only in Python but in programming > in general. The authors of http://www.flintlib.org/ -- a pure C library -- > spent a lot of time worrying about this, just to I think decide that it's > a really hard problem. Very true. > That said, there is definitely a significant class of problems in the context > of Cython that could be fixed if Lisandro's suggestion works. Thanks > for posting it! I will talk with Lisandro more about this and see if we can explore his trick further. Brian > > > William > > --~--~---------~--~----~------------~-------~--~----~ > To post to this group, send email to sage-devel at googlegroups.com > To unsubscribe from this group, send email to sage-devel-unsubscribe at googlegroups.com > For more options, visit this group at http://groups.google.com/group/sage-devel > URLs: http://www.sagemath.org > -~----------~----~----~----~------~----~------~--~--- > > From michael.abshoff at googlemail.com Thu Apr 10 19:02:40 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Thu, 10 Apr 2008 19:02:40 +0200 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> Message-ID: <47FE4830.6070907@gmail.com> Brian Granger wrote: > On Thu, Apr 10, 2008 at 10:27 AM, Michael.Abshoff > wrote: >> Brian Granger wrote: >> >>> Hi, >>> >> Hi Brian, >> >> Hi Brian, >> >>> (dual posted to sage and cython) >>> >>> A few of us (ipython and mpi4py devs) are wondering what the >>> best/safest way of allocating dynamic memory in a local scope >>> (method/function) is when using cython. An example would be if you >>> need an array of c ints that is locally scoped. >>> >>> The big question is how to make sure that the memory gets freed - even >>> if something goes wrong in the function/method. That is, you want to >>> prevent memory leaks. It looks like in sage, the >>> sage_malloc/sage_free functions are used for this purpose: >>> >> They generally aren't used in most of the code. The idea for those >> functions is that in the future we can wrap other allocators like slab >> allocators. > > Maybe so, but I am am not very familiar with the sage codebase and I > quickly found numerous examples of sage_malloc :) Also, from Williams > reponse it sounds like the idea is that sage code _would_ use these > functions. Also, how else would you allocate dynamic memory? Well, in the end you end up using sbrk() anyway, but I don't see what is wrong with malloc itself? sage_malloc was introduced a while back to make it possible to switch to a slab allocator like omalloc potentially to see if there is any benefit from it. >> >>> from sage/graphs/graph_isom.pyx: >>> >>> 176 def incorporate_permutation(self, gamma): >>> 202 cdef int *_gamma = sage_malloc( n * sizeof(int) ) >>> 203 if not _gamma: >>> 204 raise MemoryError("Error allocating memory.") >>> 205 for k from 0 <= k < n: >>> 206 _gamma[k] = gamma[k] >>> 207 self._incorporate_permutation(_gamma, n) >>> 208 sage_free(_gamma) >>> >>> Because sage_malloc is #defined to malloc in stdsage.h, I think there >>> is a significant potential for memory leaks in code like this. Are we >>> thinking correctly on this issue? Isn't this a huge problem? >>> >> Well, I don't see an advantage in using Python's allocator there. It is >> likely slower for large allocations and make debugging memory issues much >> more complicated since issues like pointer corruption is significantly >> harder to debug. > > I am not concerned about performance here, but rather memory leaks. I > don't think it is a good idea to trade memory leaks for performance. Absolutely not, but if you want to write exception safe extensions just write them in exception safe C++ using autoptr & friends. While you might not be too concerned about performance here the issue is still debuggability. If you ran standard Python under valgrind (after disabling pymalloc) you will get mabshoff at sage:/scratch/mabshoff/release-cycle/sage-3.0.alpha2/local/bin$ valgrind --tool=memcheck --leak-resolution=high ./python ==12347== Memcheck, a memory error detector. ==12347== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. ==12347== Using LibVEX rev 1812, a library for dynamic binary translation. ==12347== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. ==12347== Using valgrind-3.4.0.SVN, a dynamic binary instrumentation framework. ==12347== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. ==12347== For more details, rerun with: -v ==12347== Python 2.5.1 (r251:54863, Apr 6 2008, 21:59:15) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> ==12347== ==12347== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 9 from 2) ==12347== malloc/free: in use at exit: 599,274 bytes in 2,441 blocks. ==12347== malloc/free: 12,890 allocs, 10,449 frees, 2,420,712 bytes allocated. ==12347== For counts of detected errors, rerun with: -v ==12347== searching for pointers to 2,441 not-freed blocks. ==12347== checked 998,864 bytes. ==12347== ==12347== LEAK SUMMARY: ==12347== definitely lost: 0 bytes in 0 blocks. ==12347== possibly lost: 15,736 bytes in 54 blocks. ==12347== still reachable: 583,538 bytes in 2,387 blocks. ==12347== suppressed: 0 bytes in 0 blocks. So we have roughly 2,400 blocks of memory that Python itself cannot properly deallocate due to problems with their own garbage collection. I will spare you the numbers for Sage (they are much worst due to still to be solved problems with Cython and extensions in general) and starting to poke for leaks in that pile of crap is not something I find appealing. Sure, once all mallocs in Sage are converted to sage_malloc one could switch and see what happens, but I can guarantee you that debugging some problem stemming from us doing something stupid with malloc compared to tracking it down inside Python is not a contest. Check out #1337 in our track to see such a case. And by the way: The python interpreter itself does leak some small bits of memory while running. So the above while it looks like a really good idea is far from a fool proof solution. >> >> >>> Lisandro Dalcin (author of mpi4py) came up with the following trick >>> that, while more complicated, prevents memory leaks: >>> >>> cdef extern from "Python.h": >>> object PyString_FromStringAndSize(char*,Py_ssize_t) >>> char* PyString_AS_STRING(object) >>> >>> cdef inline object pyalloc_i(int size, int **i): >>> if size < 0: size = 0 >>> cdef Py_ssize_t n = size * sizeof(int) >>> cdef object ob = PyString_FromStringAndSize(NULL, n) >>> i[0] = PyString_AS_STRING(ob) >>> return ob >>> >>> and now >>> >>> def foo(sequence): >>> cdef int size = len(sequence), >>> cdef int *buf = NULL >>> cdef object tmp = pyalloc_i(size, &buf) >>> >>> This could probably be adapted into a malloc-like function. What do >>> people think? >>> > >> We valgrind the complete test suite at least weekly and with ever >> increasing doctest coverage I don't see that we have a problem. Python's >> memory management has also some serious issues and I doubt it will offer any >> advantage for anything but loads of small allocs. And by switching to a slab >> allocator on our end will fix that problem. > > But, test test suite doesn't test for all of the odd input that users > will feed to sage. These are the cases that will leak memory and > there is not possible way to test for all of them. Also debugging > memory leaks is super nasty. Compared to that pain, having a slightly > slower memory allocator is not a big deal. Well, as long as you write code in C memory leaks when something goes wrong is something you have to live with. And python is far from perfect regarding memory management too IMHO as I point out above. Another issue is that once we have extensions that do work with threads we no longer can use Python's allocation since it isn't thread safe. Adding some more checks to the Sage codebase around allocations is something that ought to be done, but on the list of things to fix potential memory leaks from garbage input is low on the list of my personal priority as long as we have real leaks to deal with. Feel free to try out the above and report back if it fixes issues and how much of an impact on performance it has. > Brian Cheers, Michael >>> Thanks, >>> >>> Brian >>> >> Cheers, >> >> Michael >> >> >>> _______________________________________________ >>> Cython-dev mailing list >>> Cython-dev at codespeak.net >>> http://codespeak.net/mailman/listinfo/cython-dev >>> >>> >> > From wstein at gmail.com Thu Apr 10 19:35:54 2008 From: wstein at gmail.com (William Stein) Date: Thu, 10 Apr 2008 10:35:54 -0700 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804101012i76c7865epf2c209b27438ee63@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE495B.7050803@student.matnat.uio.no> <6ce0ac130804101012i76c7865epf2c209b27438ee63@mail.gmail.com> Message-ID: <85e81ba30804101035u61c92ev551c1b256e733c7e@mail.gmail.com> On Thu, Apr 10, 2008 at 10:12 AM, Brian Granger wrote: > > (I'm not on the SAGE list so not dual-posting) > > > > > A few of us (ipython and mpi4py devs) are wondering what the > > > best/safest way of allocating dynamic memory in a local scope > > > (method/function) is when using cython. An example would be if you > > > need an array of c ints that is locally scoped. > > > > > Won't try-finally work here? > > I don't see why it wouldn't. But, typical usages of sage_malloc are > not protected in that way. Maybe they should be. Note by the way though that it's a bit tricky since in the finally block you have to know whether things when wrong and memory has to be freed or not. -- William From wstein at gmail.com Thu Apr 10 19:40:31 2008 From: wstein at gmail.com (William Stein) Date: Thu, 10 Apr 2008 10:40:31 -0700 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <47FE4830.6070907@gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> <47FE4830.6070907@gmail.com> Message-ID: <85e81ba30804101040r1ec4412al1f3a66812cbe18ae@mail.gmail.com> On Thu, Apr 10, 2008 at 10:02 AM, Michael.Abshoff wrote: > > > But, test test suite doesn't test for all of the odd input that users > > will feed to sage. These are the cases that will leak memory and > > there is not possible way to test for all of them. Also debugging > > memory leaks is super nasty. Compared to that pain, having a slightly > > slower memory allocator is not a big deal. > > Well, as long as you write code in C memory leaks when something goes > wrong is something you have to live with. And python is far from perfect > regarding memory management too IMHO as I point out above. Another issue > is that once we have extensions that do work with threads we no longer > can use Python's allocation since it isn't thread safe. > > Adding some more checks to the Sage codebase around allocations is > something that ought to be done, but on the list of things to fix > potential memory leaks from garbage input is low on the list of my > personal priority as long as we have real leaks to deal with. Feel free > to try out the above and report back if it fixes issues and how much of > an impact on performance it has. > Just for the record, I think Brian isn't suggesting we do anything differently with Sage. He's writing lots of _new_ code using Cython for his distributed matrix arrays project, and ran into this problem, and thought -- surely the Sage folks have solved this. Then he looked at our code for "the solution" and noticed that we haven't. That said, this is definitely not the most important thing for *us* to worry about at this point. We have many more important problems to solve first. But I'm really glad Brian is raising this issue, etc. -- William From ellisonbg.net at gmail.com Thu Apr 10 19:54:58 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 11:54:58 -0600 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <47FE4830.6070907@gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> <47FE4830.6070907@gmail.com> Message-ID: <6ce0ac130804101054y3d5bdb73hcc54bb49e032edec@mail.gmail.com> > Well, in the end you end up using sbrk() anyway, but I don't see what is > wrong with malloc itself? sage_malloc was introduced a while back to > make it possible to switch to a slab allocator like omalloc potentially > to see if there is any benefit from it. And that makes sense. > Absolutely not, but if you want to write exception safe extensions just > write them in exception safe C++ using autoptr & friends. While you > might not be too concerned about performance here the issue is still > debuggability. If you ran standard Python under valgrind (after > disabling pymalloc) you will get I do think performance is important, but not at the expense of potential memory leaks. I don't have experience running valgrind with python, but from what I have gleaned from others, you need to run valgrind with valgrind-python.supp that is in the Misc directory of the Python source tree. Details are here: http://svn.python.org/projects/python/trunk/Misc/README.valgrind My impression from others is that the memory problems you are seeing here will go away if you use this .supp file. Not sure though. I do know that the python-devs use valgrind to detect real memory leaks and there is _no_ way that they actually have thousands of them. > mabshoff at sage:/scratch/mabshoff/release-cycle/sage-3.0.alpha2/local/bin$ > valgrind --tool=memcheck --leak-resolution=high ./python > ==12347== Memcheck, a memory error detector. > ==12347== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. > ==12347== Using LibVEX rev 1812, a library for dynamic binary translation. > ==12347== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. > ==12347== Using valgrind-3.4.0.SVN, a dynamic binary instrumentation > framework. > ==12347== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. > ==12347== For more details, rerun with: -v > ==12347== > Python 2.5.1 (r251:54863, Apr 6 2008, 21:59:15) > [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> > ==12347== > ==12347== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 9 from 2) > ==12347== malloc/free: in use at exit: 599,274 bytes in 2,441 blocks. > ==12347== malloc/free: 12,890 allocs, 10,449 frees, 2,420,712 bytes > allocated. > ==12347== For counts of detected errors, rerun with: -v > ==12347== searching for pointers to 2,441 not-freed blocks. > ==12347== checked 998,864 bytes. > ==12347== > ==12347== LEAK SUMMARY: > ==12347== definitely lost: 0 bytes in 0 blocks. > ==12347== possibly lost: 15,736 bytes in 54 blocks. > ==12347== still reachable: 583,538 bytes in 2,387 blocks. > ==12347== suppressed: 0 bytes in 0 blocks. > > So we have roughly 2,400 blocks of memory that Python itself cannot > properly deallocate due to problems with their own garbage collection. See above. > I > will spare you the numbers for Sage (they are much worst due to still to > be solved problems with Cython and extensions in general) and starting > to poke for leaks in that pile of crap is not something I find > appealing. Sure, once all mallocs in Sage are converted to sage_malloc > one could switch and see what happens, but I can guarantee you that > debugging some problem stemming from us doing something stupid with > malloc compared to tracking it down inside Python is not a contest. > Check out #1337 in our track to see such a case. True, malloc is more straightforward in that sense. > And by the way: The python interpreter itself does leak some small bits > of memory while running. So the above while it looks like a really good > idea is far from a fool proof solution. Is your argument: there are already lots of memory leaks in python and sage so a few more is not a big deal? Brian > > > >> > >> > >>> Lisandro Dalcin (author of mpi4py) came up with the following trick > >>> that, while more complicated, prevents memory leaks: > >>> > >>> cdef extern from "Python.h": > >>> object PyString_FromStringAndSize(char*,Py_ssize_t) > >>> char* PyString_AS_STRING(object) > >>> > >>> cdef inline object pyalloc_i(int size, int **i): > >>> if size < 0: size = 0 > >>> cdef Py_ssize_t n = size * sizeof(int) > >>> cdef object ob = PyString_FromStringAndSize(NULL, n) > >>> i[0] = PyString_AS_STRING(ob) > >>> return ob > >>> > >>> and now > >>> > >>> def foo(sequence): > >>> cdef int size = len(sequence), > >>> cdef int *buf = NULL > >>> cdef object tmp = pyalloc_i(size, &buf) > >>> > >>> This could probably be adapted into a malloc-like function. What do > >>> people think? > >>> > > > >> We valgrind the complete test suite at least weekly and with ever > >> increasing doctest coverage I don't see that we have a problem. Python's > >> memory management has also some serious issues and I doubt it will offer any > >> advantage for anything but loads of small allocs. And by switching to a slab > >> allocator on our end will fix that problem. > > > > But, test test suite doesn't test for all of the odd input that users > > will feed to sage. These are the cases that will leak memory and > > there is not possible way to test for all of them. Also debugging > > memory leaks is super nasty. Compared to that pain, having a slightly > > slower memory allocator is not a big deal. > > Well, as long as you write code in C memory leaks when something goes > wrong is something you have to live with. And python is far from perfect > regarding memory management too IMHO as I point out above. Another issue > is that once we have extensions that do work with threads we no longer > can use Python's allocation since it isn't thread safe. > > Adding some more checks to the Sage codebase around allocations is > something that ought to be done, but on the list of things to fix > potential memory leaks from garbage input is low on the list of my > personal priority as long as we have real leaks to deal with. Feel free > to try out the above and report back if it fixes issues and how much of > an impact on performance it has. > > > Brian > > Cheers, > > Michael > > > > >>> Thanks, > >>> > >>> Brian > >>> > >> Cheers, > >> > >> Michael > >> > >> > >>> _______________________________________________ > >>> Cython-dev mailing list > >>> Cython-dev at codespeak.net > >>> http://codespeak.net/mailman/listinfo/cython-dev > >>> > >>> > >> > > > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From ellisonbg.net at gmail.com Thu Apr 10 19:57:07 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 11:57:07 -0600 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <85e81ba30804101040r1ec4412al1f3a66812cbe18ae@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> <47FE4830.6070907@gmail.com> <85e81ba30804101040r1ec4412al1f3a66812cbe18ae@mail.gmail.com> Message-ID: <6ce0ac130804101057m261149e3y487923d634ad4682@mail.gmail.com> > Just for the record, I think Brian isn't suggesting we do anything > differently with Sage. He's writing lots of _new_ code using > Cython for his distributed matrix arrays project, and ran into this problem, > and thought -- surely the Sage folks have solved this. Then he looked > at our code for "the solution" and noticed that we haven't. Very true. It is somewhat comforting to know that we are not crazy in wondering what the best solution to the problem is. We very well may punt and use malloc :) > That said, this is definitely not the most important thing for *us* > to worry about at this point. We have many more important > problems to solve first. Understandable. Us too. > But I'm really glad Brian is raising this > issue, etc. > -- William > > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From michael.abshoff at googlemail.com Thu Apr 10 19:33:55 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Thu, 10 Apr 2008 19:33:55 +0200 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <85e81ba30804101040r1ec4412al1f3a66812cbe18ae@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> <47FE4830.6070907@gmail.com> <85e81ba30804101040r1ec4412al1f3a66812cbe18ae@mail.gmail.com> Message-ID: <47FE4F83.5070908@gmail.com> William Stein wrote: > On Thu, Apr 10, 2008 at 10:02 AM, Michael.Abshoff > wrote: >> > But, test test suite doesn't test for all of the odd input that users >> > will feed to sage. These are the cases that will leak memory and >> > there is not possible way to test for all of them. Also debugging >> > memory leaks is super nasty. Compared to that pain, having a slightly >> > slower memory allocator is not a big deal. >> >> Well, as long as you write code in C memory leaks when something goes >> wrong is something you have to live with. And python is far from perfect >> regarding memory management too IMHO as I point out above. Another issue >> is that once we have extensions that do work with threads we no longer >> can use Python's allocation since it isn't thread safe. >> >> Adding some more checks to the Sage codebase around allocations is >> something that ought to be done, but on the list of things to fix >> potential memory leaks from garbage input is low on the list of my >> personal priority as long as we have real leaks to deal with. Feel free >> to try out the above and report back if it fixes issues and how much of >> an impact on performance it has. >> > > Just for the record, I think Brian isn't suggesting we do anything > differently with Sage. He's writing lots of _new_ code using > Cython for his distributed matrix arrays project, and ran into this problem, > and thought -- surely the Sage folks have solved this. Then he looked > at our code for "the solution" and noticed that we haven't. Yes, without a doubt, but I am skeptical that that solution would work [Brian never claimed it did and actually raised that concern]. > That said, this is definitely not the most important thing for *us* > to worry about at this point. We have many more important > problems to solve first. But I'm really glad Brian is raising this > issue, etc. > Sure and it is certainly good to be discussed. I didn't want to be dismissive about the idea, it is just that I have been in the "debugging memory leaks in Cython extension" trenches for the last eight months and hence I do not trust python or its memory management at all any more. And having been burned over and over again has left me the way I am ;) > -- William Cheers, Michael > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From ellisonbg.net at gmail.com Thu Apr 10 20:14:44 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 12:14:44 -0600 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <47FE4F83.5070908@gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> <47FE4830.6070907@gmail.com> <85e81ba30804101040r1ec4412al1f3a66812cbe18ae@mail.gmail.com> <47FE4F83.5070908@gmail.com> Message-ID: <6ce0ac130804101114s38ce52e2xe52c7c85db36a4bc@mail.gmail.com> > Sure and it is certainly good to be discussed. I didn't want to be > dismissive about the idea, it is just that I have been in the "debugging > memory leaks in Cython extension" trenches for the last eight months and > hence I do not trust python or its memory management at all any more. > And having been burned over and over again has left me the way I am ;) Your comments make more sense in this light. Sounds painful :) Just our of curiosity - are the problem with cython itself or how people are using/misusing it? Brian > > -- William > > Cheers, > > Michael > > > > > _______________________________________________ > > Cython-dev mailing list > > Cython-dev at codespeak.net > > http://codespeak.net/mailman/listinfo/cython-dev > > > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From michael.abshoff at googlemail.com Thu Apr 10 19:48:57 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Thu, 10 Apr 2008 19:48:57 +0200 Subject: [Cython] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804101054y3d5bdb73hcc54bb49e032edec@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> <47FE4830.6070907@gmail.com> <6ce0ac130804101054y3d5bdb73hcc54bb49e032edec@mail.gmail.com> Message-ID: <47FE5309.2060803@gmail.com> Brian Granger wrote: Hi Brian, >> Well, in the end you end up using sbrk() anyway, but I don't see what is >> wrong with malloc itself? sage_malloc was introduced a while back to >> make it possible to switch to a slab allocator like omalloc potentially >> to see if there is any benefit from it. > > And that makes sense. > > >> Absolutely not, but if you want to write exception safe extensions just >> write them in exception safe C++ using autoptr & friends. While you >> might not be too concerned about performance here the issue is still >> debuggability. If you ran standard Python under valgrind (after >> disabling pymalloc) you will get > > I do think performance is important, but not at the expense of > potential memory leaks. Sure, we want both. > I don't have experience running valgrind with > python, but from what I have gleaned from others, you need to run > valgrind with valgrind-python.supp that is in the Misc directory of > the Python source tree. Details are here: > > http://svn.python.org/projects/python/trunk/Misc/README.valgrind > > My impression from others is that the memory problems you are seeing > here will go away if you use this .supp file. Not sure though. No, they are not. Suppressing still reachable memory doesn't make the problem go away. It is just a cosmetic solution and hides real bugs. > I do > know that the python-devs use valgrind to detect real memory leaks and > there is _no_ way that they actually have thousands of them. Well, those aren't technically leaks, but memory that is not properly deallocated at exit. The amount is more or less constant independent on how long you run a python session. The amount usually grows once you import more modules and it is a bug in my book if you do not properly dealloc all memory and let the heap tear down at the program exit take care of it. What happens is that if you do not free a reference for some piece of memory and you do that repeatedly in your code you end up with a lot of stale memory chunks that all get reaped at exit as still reachable. And that is a very real problem if you chose to ignore those chunks since the system frees them at python's exit anyway. I have found a bug like that for example in Singular where the slab allocator did hide the problem, so this is a real issue. >> mabshoff at sage:/scratch/mabshoff/release-cycle/sage-3.0.alpha2/local/bin$ >> valgrind --tool=memcheck --leak-resolution=high ./python >> ==12347== Memcheck, a memory error detector. >> ==12347== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. >> ==12347== Using LibVEX rev 1812, a library for dynamic binary translation. >> ==12347== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. >> ==12347== Using valgrind-3.4.0.SVN, a dynamic binary instrumentation >> framework. >> ==12347== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. >> ==12347== For more details, rerun with: -v >> ==12347== >> Python 2.5.1 (r251:54863, Apr 6 2008, 21:59:15) >> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> >> ==12347== >> ==12347== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 9 from 2) >> ==12347== malloc/free: in use at exit: 599,274 bytes in 2,441 blocks. >> ==12347== malloc/free: 12,890 allocs, 10,449 frees, 2,420,712 bytes >> allocated. >> ==12347== For counts of detected errors, rerun with: -v >> ==12347== searching for pointers to 2,441 not-freed blocks. >> ==12347== checked 998,864 bytes. >> ==12347== >> ==12347== LEAK SUMMARY: >> ==12347== definitely lost: 0 bytes in 0 blocks. >> ==12347== possibly lost: 15,736 bytes in 54 blocks. >> ==12347== still reachable: 583,538 bytes in 2,387 blocks. >> ==12347== suppressed: 0 bytes in 0 blocks. >> >> So we have roughly 2,400 blocks of memory that Python itself cannot >> properly deallocate due to problems with their own garbage collection. > > See above. > >> I >> will spare you the numbers for Sage (they are much worst due to still to >> be solved problems with Cython and extensions in general) and starting >> to poke for leaks in that pile of crap is not something I find >> appealing. Sure, once all mallocs in Sage are converted to sage_malloc >> one could switch and see what happens, but I can guarantee you that >> debugging some problem stemming from us doing something stupid with >> malloc compared to tracking it down inside Python is not a contest. >> Check out #1337 in our track to see such a case. > > True, malloc is more straightforward in that sense. > >> And by the way: The python interpreter itself does leak some small bits >> of memory while running. So the above while it looks like a really good >> idea is far from a fool proof solution. > > Is your argument: there are already lots of memory leaks in python and > sage so a few more is not a big deal? No: My argument is that the solution you suggested is not something that will work in the general case, but obfuscate real problems at the expense of some corner cases. As I mentioned in another email: We must do more input checking to avoid memory leaks, but on a C level that is the price you pay. Your needs might be different than Sage's and if from your perspective the [potential] performance penalty and also the [in my eyes] much higher debugging complexity are worth it I would be curious to hear how it works out. We do take memory leaks very, very seriously and have found numerous issues in Sage code as well as the external libraries. And if you look at mathematical code these days those leaks that cause trouble are real leaks in the code and not in the corner cases. Once all of those are wiped out we can go after the next problem. > Brian > Cheers, Michael From robertwb at math.washington.edu Thu Apr 10 20:23:37 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 10 Apr 2008 11:23:37 -0700 Subject: [Cython] [sage-devel] Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> Message-ID: <8EB835A2-CB78-4918-A61D-79CBB880EE8E@math.washington.edu> On Apr 10, 2008, at 9:51 AM, Brian Granger wrote: > > Hi, > > (dual posted to sage and cython) > > A few of us (ipython and mpi4py devs) are wondering what the > best/safest way of allocating dynamic memory in a local scope > (method/function) is when using cython. An example would be if you > need an array of c ints that is locally scoped. > > The big question is how to make sure that the memory gets freed - even > if something goes wrong in the function/method. That is, you want to > prevent memory leaks. It looks like in sage, the > sage_malloc/sage_free functions are used for this purpose: > > from sage/graphs/graph_isom.pyx: > > 176 def incorporate_permutation(self, gamma): > 202 cdef int *_gamma = sage_malloc( n * sizeof > (int) ) > 203 if not _gamma: > 204 raise MemoryError("Error allocating memory.") > 205 for k from 0 <= k < n: > 206 _gamma[k] = gamma[k] > 207 self._incorporate_permutation(_gamma, n) > 208 sage_free(_gamma) > > Because sage_malloc is #defined to malloc in stdsage.h, I think there > is a significant potential for memory leaks in code like this. Are we > thinking correctly on this issue? Isn't this a huge problem? > > Lisandro Dalcin (author of mpi4py) came up with the following trick > that, while more complicated, prevents memory leaks: > > cdef extern from "Python.h": > object PyString_FromStringAndSize(char*,Py_ssize_t) > char* PyString_AS_STRING(object) > > cdef inline object pyalloc_i(int size, int **i): > if size < 0: size = 0 > cdef Py_ssize_t n = size * sizeof(int) > cdef object ob = PyString_FromStringAndSize(NULL, n) > i[0] = PyString_AS_STRING(ob) > return ob > > and now > > def foo(sequence): > cdef int size = len(sequence), > cdef int *buf = NULL > cdef object tmp = pyalloc_i(size, &buf) > > This could probably be adapted into a malloc-like function. What do > people think? This is actually the same solution that popped into my mind when I first read your question, and I think it is a very good one. Using string objects is particularly interesting because the they are highly optimized (for example, the buffer allocation is done as part of the object allocation). I would probably have it take a void** rather than make an allocator specific to ints. If people like this idea, I could add such a function to Sage. Maybe it would even be worthwhile adding it to Cython (in one of the included header files). Alternatively, using a try-finally will work. If the buffers all are set to start as NULL then one just frees the non-NULL buffers at the end (so one doesn't have to keep track of where in the procedure things went wrong). - Robert From mmartin at itasoftware.com Wed Apr 9 17:21:07 2008 From: mmartin at itasoftware.com (Martin C. Martin) Date: Wed, 09 Apr 2008 11:21:07 -0400 Subject: [Cython] Another potential optimization possible with transforms Message-ID: <47FCDEE3.2080107@itasoftware.com> Hi, There's a potential optimization I mentioned on the Lisp inspired transforms page, where you could reorder bitfields in order to pack them most efficiently. Eerily, someone at my job just committed something that did just that. We have a custom defstruct, called defstruct-bv, which allows you to specify bit fields. (Lisp doesn't come with bitfields.) Here's the checkin message: Log: At compile time, automatically optimize the layout of bits in a defstruct-bv definition to minimize the number of words used by the resulting struct. This is an instance of the bin-packing problem, which is NP-hard. Fortunately, it has a good heuristic solution, first-fit decreasing, which gets very close to optimal answers. I just implemented that as a wrapper around the existing defstruct-bv, which I renamed defstruct-bv-internal. There is no longer any reason to worry about the order of bit field definitions within a lisp struct. If you have a struct that you don't want this to happen to (perhaps because you have a performance consideration for the layout), add (:optimize-p NIL) as an option to defstruct-bv. I was somewhat disappointed with the results: most of our structs were already laid out as optimally as this algorithm is able to achieve. The single exception was the faring-atom struct, which got 16 bytes smaller. Best, Martin From michael.abshoff at googlemail.com Thu Apr 10 20:10:12 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Thu, 10 Apr 2008 20:10:12 +0200 Subject: [Cython] [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <6ce0ac130804101114s38ce52e2xe52c7c85db36a4bc@mail.gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> <47FE4830.6070907@gmail.com> <85e81ba30804101040r1ec4412al1f3a66812cbe18ae@mail.gmail.com> <47FE4F83.5070908@gmail.com> <6ce0ac130804101114s38ce52e2xe52c7c85db36a4bc@mail.gmail.com> Message-ID: <47FE5804.1000409@gmail.com> Brian Granger wrote: Hi Brian, >> Sure and it is certainly good to be discussed. I didn't want to be >> dismissive about the idea, it is just that I have been in the "debugging >> memory leaks in Cython extension" trenches for the last eight months and >> hence I do not trust python or its memory management at all any more. >> And having been burned over and over again has left me the way I am ;) > > Your comments make more sense in this light. Sounds painful :) > > Just our of curiosity - are the problem with cython itself or how > people are using/misusing it? It is a general problem when writing extensions and Cython [via code written by Robert Bradshaw] has started to add code to deal with the situation. But the deallocation code can cause trouble when extensions are carefully written (i.e. interdependencies) and I hope that during Dev1 I will have time to delve into this. If I had time I would lock myself in a room with the Python code for two weeks and try to figure this out. So far that hasn't happened yet du to lack of time. Maybe I need to go on "vacation" for two weeks where due to unforeseen circumstances I cannot be reached ;) > Brian > Cheers, Michael From robertwb at math.washington.edu Thu Apr 10 20:47:06 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 10 Apr 2008 11:47:06 -0700 Subject: [Cython] [sage-devel] Re: [sage-devel] Re: Locally scoped dynamic memory (in SAGE and elsewhere) In-Reply-To: <47FE5804.1000409@gmail.com> References: <6ce0ac130804100951y1d5dd66wf9ec8e23f32e7720@mail.gmail.com> <47FE3FFD.7030401@gmail.com> <6ce0ac130804101011q5b14eea2sd9ef1f954edd17db@mail.gmail.com> <47FE4830.6070907@gmail.com> <85e81ba30804101040r1ec4412al1f3a66812cbe18ae@mail.gmail.com> <47FE4F83.5070908@gmail.com> <6ce0ac130804101114s38ce52e2xe52c7c85db36a4bc@mail.gmail.com> <47FE5804.1000409@gmail.com> Message-ID: <9C4B8C1E-EB48-4B8B-84AA-7BD4988F879E@math.washington.edu> On Apr 10, 2008, at 11:10 AM, Michael.Abshoff wrote: > > Brian Granger wrote: > > Hi Brian, > >>> Sure and it is certainly good to be discussed. I didn't want to be >>> dismissive about the idea, it is just that I have been in the >>> "debugging >>> memory leaks in Cython extension" trenches for the last eight >>> months and >>> hence I do not trust python or its memory management at all any >>> more. >>> And having been burned over and over again has left me the way I >>> am ;) >> >> Your comments make more sense in this light. Sounds painful :) >> >> Just our of curiosity - are the problem with cython itself or how >> people are using/misusing it? > > It is a general problem when writing extensions and Cython [via code > written by Robert Bradshaw] has started to add code to deal with the > situation. But the deallocation code can cause trouble when extensions > are carefully written (i.e. interdependencies) and I hope that during > Dev1 I will have time to delve into this. To rephrase the problem, as the Python interpreter and environment get torn down, it becomes less and less safe to run the code invoked by deallocating objects. Most people don't worry about this because the whole process is about to be terminated, releasing all requested memory, but it does produce noise if one is trying to do memory profiling (e.g. with valgrind). - Robert From wstein at gmail.com Fri Apr 11 01:31:53 2008 From: wstein at gmail.com (William Stein) Date: Thu, 10 Apr 2008 16:31:53 -0700 Subject: [Cython] Introductory Cython talk Message-ID: <85e81ba30804101631p728ca5e3j3e8235781c0abd0d@mail.gmail.com> Hi, Robert Bradshaw gave a very introductory talk about Cython yesterday to my undergraduate class, which I video'd and uploaded to google video here: http://wiki.wstein.org/2008/480a/schedule/2008-04-09 (The aspect ratio is wrong... sigh, but otherwise the video works fine.) -- William Stein Associate Professor of Mathematics University of Washington http://wstein.org From ellisonbg.net at gmail.com Fri Apr 11 05:24:19 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 10 Apr 2008 21:24:19 -0600 Subject: [Cython] Proposal: idea for automatic management of dynamic memory Message-ID: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> Hi all, After todays discussion about the best way to allocate dynamic memory in a cython method or function, Lisandro Dalcin and I discussed other ways of making a cleaner interface for this that doesn't have memory leaks. The purpose of this email is to summarize Lisandro's idea for how cython could handle this in a very nice manner. Background ----------------- In a function it is often necessary to allocate dynamic memory. It is desirable to have something that it 1) fast and 2) doesn't lead to memory leaks if something goes wrong and 3) is easy to debug. Proposal ------------- The main idea (Lisandro came up with this) is to build these capabilities into cython itself by introducing the following syntax: cdef foo(size): cdef double a[size] # This is like the dynamic arrays in C99 # do stuff with a, but don't worry about deallocating the memory!!! Underneath the hood cython would do the following: 1) It would use malloc to allocate the actual memory and cast it to the correct type. double *a = malloc(sizeof(double)*size); 2) It would create a private copy of the pointer to the memory location that could be used to free the memory later. It is important to make a copy so the user can do pointer arith and still have the free work properly. double *private_ptr = a; 3) The trick is to make sure that free(private_ptr) is called when the function's scope is finished. How can this be done. One way would be to have a private/hidden python object (it would have to be a C extension type) that 1) hold's private_ptr as an attribute and 2) calls free(private_ptr) when it is garbage collected. The idea is that cython itself could add the extra code to create the private pointer and the hidden python object. The user could then simply use the array in the local scope and not have to worry about memory leaks. If would also be fast and easy to debug as malloc would be used. There would be a very small overhead introduced due to the creation of the hidden python object, but it should be minimal. One other point. Lisandro thought that python's allocator _should_ be used. Here was what he said about this: > Anyway, I want to point you some fact: > * Python allocator is really fast, cleverly implemented. > * If you request more than 256 bytes, python allocator actually calls malloc !!! > * so the whole point of the python allocator is optimize memory allocation for small objects > > So I do not really think that using the python allocator is a bad > thing. I believe it is actually a good thing: you avoid system > malloc/free for allocating small arrays. So possibly this could be tried instead of using malloc in the implementation of this. But malloc (or some other allocator) could be used. But...neither Lisandro or I are familiar with cython's implementation, and thus we are probably not the ones to implement this. We at least wanted to run this by the list to see what people think. This would seem to solve the potential problems created in sage by the usage of sage_malloc. Maybe someone would like the idea and want to implement it in cython ;-) cheers, Brian From michael.abshoff at googlemail.com Fri Apr 11 05:25:32 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Fri, 11 Apr 2008 05:25:32 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> Message-ID: <47FEDA2C.7050009@gmail.com> Brian Granger wrote: > Hi all, Hi, > After todays discussion about the best way to allocate dynamic memory > in a cython method or function, Lisandro Dalcin and I discussed other > ways of making a cleaner interface for this that doesn't have memory > leaks. The purpose of this email is to summarize Lisandro's idea for > how cython could handle this in a very nice manner. > > Background > ----------------- > > In a function it is often necessary to allocate dynamic memory. It is > desirable to have something that it 1) fast and 2) doesn't lead to > memory leaks if something goes wrong and 3) is easy to debug. > > Proposal > ------------- > > The main idea (Lisandro came up with this) is to build these > capabilities into cython itself by introducing the following syntax: > > cdef foo(size): > cdef double a[size] # This is like the dynamic arrays in C99 > # do stuff with a, but don't worry about deallocating the memory!!! > > Underneath the hood cython would do the following: > > 1) It would use malloc to allocate the actual memory and cast it to > the correct type. > > double *a = malloc(sizeof(double)*size); > > 2) It would create a private copy of the pointer to the memory > location that could be used to free the memory later. It is important > to make a copy so the user can do pointer arith and still have the > free work properly. > > double *private_ptr = a; > > 3) The trick is to make sure that free(private_ptr) is called when the > function's scope is finished. How can this be done. One way would be > to have a private/hidden python object (it would have to be a C > extension type) that 1) hold's private_ptr as an attribute and 2) > calls free(private_ptr) when it is garbage collected. > > The idea is that cython itself could add the extra code to create the > private pointer and the hidden python object. The user could then > simply use the array in the local scope and not have to worry about > memory leaks. If would also be fast and easy to debug as malloc would > be used. There would be a very small overhead introduced due to the > creation of the hidden python object, but it should be minimal. > > One other point. Lisandro thought that python's allocator _should_ be > used. Here was what he said about this: > >> Anyway, I want to point you some fact: >> * Python allocator is really fast, cleverly implemented. >> * If you request more than 256 bytes, python allocator actually calls malloc !!! >> * so the whole point of the python allocator is optimize memory allocation for small objects My criticism about performance was related to turn the allocated memory into Python objects. As you point out for small allocs python is quite efficient. I have seen figures of a four time speedup thrown around, but I have never verified them, so I would be quite distrustful of the figures. There certainly is a speed improvement. >> So I do not really think that using the python allocator is a bad >> thing. I believe it is actually a good thing: you avoid system >> malloc/free for allocating small arrays. It can also be a question of heap fragmentation, so using a slab allocator in certain areas can be good. But the general purpose Cython code is unlikely to benefit from this. But at least offering the capability to wrap the allocation so that one could replace the default allocator would be a nice feature. > So possibly this could be tried instead of using malloc in the > implementation of this. But malloc (or some other allocator) could be > used. > > But...neither Lisandro or I are familiar with cython's implementation, > and thus we are probably not the ones to implement this. We at least > wanted to run this by the list to see what people think. This would > seem to solve the potential problems created in sage by the usage of > sage_malloc. Maybe someone would like the idea and want to implement > it in cython ;-) I like the idea a lot, but since I don't really double in Cython I won't be the one implementing them. > cheers, > > Brian Cheers, Michael > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From stefan_ml at behnel.de Fri Apr 11 07:55:17 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Apr 2008 07:55:17 +0200 Subject: [Cython] Cython circular cdef import patch In-Reply-To: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> References: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> Message-ID: <47FEFD45.3080809@behnel.de> Hi again, Gary Furnish wrote: > Finally it splits module initialization into two phases: one that initiates > types and handles imports, and another that executes python commands at the > global namespace level. If the goal is to clean up the module initialisation code, then two functions are not enough. IMHO, clean module init code consists of one top-level init function that calls separate functions to 1) create the module, 2) intern constants, 3) set up global names, 4) initialise the C-API, 5) handle cimports, 6) set up types, 7) execute module-level code. (Don't know if this is a complete list). Can you provide a patch that does that? Stefan From stefan_ml at behnel.de Fri Apr 11 10:12:33 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Apr 2008 10:12:33 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> Message-ID: <47FF1D71.1070500@behnel.de> Hi, Brian Granger wrote: > In a function it is often necessary to allocate dynamic memory. It is > desirable to have something that it 1) fast and 2) doesn't lead to > memory leaks if something goes wrong and 3) is easy to debug. > > Proposal > ------------- > > The main idea (Lisandro came up with this) is to build these > capabilities into cython itself by introducing the following syntax: > > cdef foo(size): > cdef double a[size] # This is like the dynamic arrays in C99 > # do stuff with a, but don't worry about deallocating the memory!!! I actually like the syntax as it is rather clear that this is memory local to the function. It's just like having it allocated on the stack, with the exception that it depends on run-time state. So the automatic deallocation at function exit is not unexpected. Regarding PyMalloc, I think it's a good idea to use it, but you have to take care in "nogil" functions. They require plain malloc()/free() calls. Stefan From dagss at student.matnat.uio.no Fri Apr 11 10:16:04 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 Apr 2008 10:16:04 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> Message-ID: <47FF1E44.6040202@student.matnat.uio.no> > The main idea (Lisandro came up with this) is to build these > capabilities into cython itself by introducing the following syntax: > > cdef foo(size): > cdef double a[size] # This is like the dynamic arrays in C99 > # do stuff with a, but don't worry about deallocating the memory!!! > It sure looks like it would be convenient; but I'll take the role as an advocate against it :-) 1) Having "cdef double a[size]" allocate anything is very against the Python syntax -- nothing is ever allocated in Python without a function call and an assignment (same as in Java). To somebody who comes from a strict Python background and not C, it really looks like only a variable type declaration, not object allocation. In Cython, this Python way of working is mostly kept, and this would be a step *away* from Python in terms of the "feel" of the language. 2) This kind of duplicates the behaviour of the "with" keyword (which is not present in Cython today (either), but definitely is a goal, and is present in current Python). However, using the with keyword would be a bit less convenient, given function template support and with keyword support in Cython it could look something like this: with carray(double, size) as buf: cdef double* a = buf.data # do stuff with a If you want a different memory allocator, simply use a different function than carray... 3) Long-term Python users with very little C experience might end up doing something like cdef double a[size] # fill in a return a (Or, some more complex variant that we can't emit a warning for). With the "with" keyword however, Python users will *know* not to return a. (*) This is assuming a new carray template function as well, and so assumes even more Cython development. One might have to type "a" as well but with type inference it might look something like the above. 3) Just an overall comment: I personally think NumPy arrays are excellent for this. I'd have no problems personally with using a NumPy array only in order to allocate memory and then pass that memory on to a C library for instance. (The problem is, I suppose, having to depend on the NumPy library...though investing effort in creating a garbage collected array type when NumPy already has that seems too much like reinventing the wheel to me.) This will become more convenient than today if Cython grows better NumPy support. > 3) The trick is to make sure that free(private_ptr) is called when the > function's scope is finished. How can this be done. One way would be > to have a private/hidden python object (it would have to be a C > extension type) that 1) hold's private_ptr as an attribute and 2) > calls free(private_ptr) when it is garbage collected. > This is very trivially implemented by having Cython automatically wrap the function with try/finally prior to generating C code. Not going to be a problem at all. (But it can be done the way you say as well...) Dag Sverre From dagss at student.matnat.uio.no Fri Apr 11 10:17:57 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 Apr 2008 10:17:57 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF1E44.6040202@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> Message-ID: <47FF1EB5.6000206@student.matnat.uio.no> > (*) This is assuming a new carray template function as well, and so > assumes even more Cython development. One might have to type "a" as well > but with type inference it might look something like the above. > Sorry, ignore this paragraph, artifact of email rewrite... Dag Sverre From stefan_ml at behnel.de Fri Apr 11 10:27:24 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Apr 2008 10:27:24 +0200 Subject: [Cython] Offtopic: Good Python IDEs? In-Reply-To: <47FE4418.40703@student.matnat.uio.no> References: <47FE4418.40703@student.matnat.uio.no> Message-ID: <47FF20EC.4040803@behnel.de> Hi, Dag Sverre Seljebotn wrote: > This far I've been editing Python code with "gedit", and so I'm > wondering: What's your favorite? emacs? Eclipse? eric? I exclusively use emacs for Python/Cython. It has a great Python major mode and the Cython mode is just as fine. It also has support for bicycle repair man (for refactorings), though I never really needed that. For debugging, I mostly use print and unit tests for Python and print, tests and valgrind for Cython, so I can't really comment on the debugging environments (which actually *are* available for emacs). Stefan From dagss at student.matnat.uio.no Fri Apr 11 10:29:05 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 Apr 2008 10:29:05 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF1E44.6040202@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> Message-ID: <47FF2151.3060304@student.matnat.uio.no> > 3) Just an overall comment: I personally think NumPy arrays are > excellent for this. I'd have no problems personally with using a NumPy > array only in order to allocate memory and then pass that memory on to a > C library for instance. (The problem is, I suppose, having to depend on > the NumPy library...though investing effort in creating a garbage > collected array type when NumPy already has that seems too much like > reinventing the wheel to me.) This will become more convenient than > today if Cython grows better NumPy support. > One more thought: In Python 3000, the "buffer" interface is going to put a standard on array objects for buffer interchange between any Python library: http://www.python.org/dev/peps/pep-3118/ Introducing a convenient syntax for C arrays (or our own, Invented Here, garbage collected array type) would mean lots of code incompatible with this API is written. (Which might be fine in most cases; but I'd rather not have to _think_ too much when writing Cython code, ie "do I need this type of array or this type", I'd rather just always use one array type that does it all.) Having the de facto array for Cython code being NumPy arrays conveniently solves this problem, as NumPy will implement that API. One could add through convenient wrapper functions and typedefs etc. Remember, having functionality available doesn't automatically mean "heavy". Dag Sverre From stefan_ml at behnel.de Fri Apr 11 11:13:24 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Apr 2008 11:13:24 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF1E44.6040202@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> Message-ID: <47FF2BB4.6050209@behnel.de> Hi, just a couple of additional comments here. I do see the advantage of the "with" proposal, but that would require us to have a couple of keywords for things like "carray" that the compiler would have to recognise. Dag Sverre Seljebotn wrote: > If you want a different memory allocator, simply use a different > function than carray... If you want a different allocator than used by the proposed syntax, don't use the syntax. IMHO, it's perfectly ok if syntactic sugar helps in most cases but not all. > 3) Long-term Python users with very little C experience might end up > doing something like > > cdef double a[size] > # fill in a > return a This is a problem that could be caught in the compiler. Returning a local non-scalar/non-pointer variable could just be forbidden. And the remaining case where you return an explicitly created pointer to the same thing is just stupidity. > 3) Just an overall comment: I personally think NumPy arrays are > excellent for this. I'd have no problems personally with using a NumPy > array only in order to allocate memory and then pass that memory on to a > C library for instance. (The problem is, I suppose, having to depend on > the NumPy library...though investing effort in creating a garbage > collected array type when NumPy already has that seems too much like > reinventing the wheel to me.) This will become more convenient than > today if Cython grows better NumPy support. Requiring NumPy for what the proposal tries to achieve clearly looks like overkill to me. >> 3) The trick is to make sure that free(private_ptr) is called when the >> function's scope is finished. That's actually trivial, as the generated function body always goes through the same (or a limited number of different) cleanup code sequences at the end. Stefan From dagss at student.matnat.uio.no Fri Apr 11 11:56:20 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 Apr 2008 11:56:20 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF2BB4.6050209@behnel.de> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> Message-ID: <47FF35C4.5080006@student.matnat.uio.no> (Hmm. I'm really not that against the approach. But I'll make sure the arguments against it are at least heard.) I see two kinds of uses for arrays in Cython: 1) Users that simply wants to allocate an array and do stuff with it. In these cases, having something a bit more capable than a standard array is always going to be an advantage -- even if one doesn't use anything but C array capabilities, being able to quickly insert a print statement to dump the array during debugging, pass it to NumPy functions for debugging purposes (hmmm, what's wrong...perhaps "any(x > 1)"?) and so on is convenient _during development_. (Though not necesarrily NumPy, see below). 2) One wants to know exactly what is going on and be as near C as possible, possibly when wrapping C libraries. But then, nothing can beat try/finally anyway, which really isn't that bad to write and makes it very clear exactly what is going on: cdef char* data = NULL try: data = getmem(100) ... finally: if data: free(data) # can't remember if free checks for null, but could make xfree anyway. Introducing some special syntax candy for the landscape that is "in-between" these two options just doesn't seem worth it (it makes the Cython language heavier and ultimately more difficult to learn). Especially when with this syntax candy a) it looks like the data is going to be allocated on the stack b) in a language that doesn't already have a concept of allocating objects on the stack (as opposed to C and C++), and c) magically it doesn't allocate it on the stack anyway (BTW, not using try/finally in the SAGE code posted does to me (to be honest) just fall into the category of bad and/or sloppy programming, and one shouldn't make changes to Cython on the basis of that code.) If NumPy is overkill then perhaps one should instead (as has been suggested a few times already the last day) make another "buffer" library that operates in the same manner with respect to Cython (reference counted etc., but no syntax candy) but is simpler (always one-dimensional char* buffer for instance). This could quickly be implemented in an inlineable pxd file that is shipped with Cython, and potentially be inlined completely in a few years of Cython development. BTW: Why would NumPy be overkill? Because of a few extra bytes of memory per array object? Invoking the incantation "overkill" to me only suggests the Not Built Here syndrome, I always like to talk about specific, more rational reasons like memory usage, runtime performance, library dependency, ... Dag Sverre From robertwb at math.washington.edu Fri Apr 11 12:20:31 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 03:20:31 -0700 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF1E44.6040202@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> Message-ID: On Apr 11, 2008, at 1:16 AM, Dag Sverre Seljebotn wrote: > >> The main idea (Lisandro came up with this) is to build these >> capabilities into cython itself by introducing the following syntax: >> >> cdef foo(size): >> cdef double a[size] # This is like the dynamic arrays in C99 >> # do stuff with a, but don't worry about deallocating the memory!!! I really like this syntax, and it seems easy enough to support. One could then even do things like "for x in a" correctly, do bounds checking (or should we let the user shoot themselves in the foot for speed--or perhaps it could usually be optimized away), and coerce it too (or from?) a Python object. It has one drawback, which is that it can never be re-sized, and nor can it be used in any control structures (e.g. inside an if statement or loop). > It sure looks like it would be convenient; but I'll take the role > as an > advocate against it :-) > > 1) Having "cdef double a[size]" allocate anything is very against the > Python syntax -- nothing is ever allocated in Python without a > function > call and an assignment (same as in Java). To somebody who comes from a > strict Python background and not C, it really looks like only a > variable > type declaration, not object allocation. In Cython, this Python way of > working is mostly kept, and this would be a step *away* from Python in > terms of the "feel" of the language. It looks like cdef double a[10], which is currently legal. > 2) This kind of duplicates the behaviour of the "with" keyword > (which is > not present in Cython today (either), but definitely is a goal, and is > present in current Python). > > However, using the with keyword would be a bit less convenient, given > function template support and with keyword support in Cython it could > look something like this: > > with carray(double, size) as buf: > cdef double* a = buf.data > # do stuff with a > > If you want a different memory allocator, simply use a different > function than carray... > > 3) Long-term Python users with very little C experience might end up > doing something like > > cdef double a[size] > # fill in a > return a If the return type is double*, an error can be given at compile time, and if it's object it could even coerce to a Python list. I would say a can neither be assigned to or used as an assignment to something else--just indexed. > (Or, some more complex variant that we can't emit a warning for). With > the "with" keyword however, Python users will *know* not to return a. I think it will be less clear that a is volatile, as it doesn't show up in the with statement at all (and imagine if the assignment were made much later). > (*) This is assuming a new carray template function as well, and so > assumes even more Cython development. One might have to type "a" as > well > but with type inference it might look something like the above. > > 3) Just an overall comment: I personally think NumPy arrays are > excellent for this. I'd have no problems personally with using a NumPy > array only in order to allocate memory and then pass that memory on > to a > C library for instance. (The problem is, I suppose, having to > depend on > the NumPy library...though investing effort in creating a garbage > collected array type when NumPy already has that seems too much like > reinventing the wheel to me.) This will become more convenient than > today if Cython grows better NumPy support. One can make something much lighter weight than numpy--the PyString_FromStringAndLength example that started this thread is *much* faster than creating a NumPy array, let alone a call to malloc () and free(). Although I don't anticipate using Cython much without having NumPy around, I don't want to make it a requirement for effectively using Cython. >> 3) The trick is to make sure that free(private_ptr) is called when >> the >> function's scope is finished. How can this be done. One way >> would be >> to have a private/hidden python object (it would have to be a C >> extension type) that 1) hold's private_ptr as an attribute and 2) >> calls free(private_ptr) when it is garbage collected. >> > This is very trivially implemented by having Cython automatically wrap > the function with try/finally prior to generating C code. Not going to > be a problem at all. (But it can be done the way you say as well...) It's even easier than that, the generated functions all have a single exit point (once they've successfully parsed arguments at least) that does cleanup code (such as deallocate any remaining temp variables, etc.). This could just be added here. - Robert From stefan_ml at behnel.de Fri Apr 11 12:40:48 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Apr 2008 12:40:48 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF35C4.5080006@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> Message-ID: <47FF4030.2030504@behnel.de> Hi Dag, I agree that the the impact of the syntactic change is likely low, given the fact that it doesn't support allocation any more fine-grained than a function body. BTW, do we actually support cdef's inside blocks? Dag Sverre Seljebotn wrote: > Introducing some special syntax candy for the landscape that is > "in-between" these two options just doesn't seem worth it (it makes the > Cython language heavier and ultimately more difficult to learn). > Especially when with this syntax candy > > a) it looks like the data is going to be allocated on the stack > b) in a language that doesn't already have a concept of allocating > objects on the stack (as opposed to C and C++), and Not much of a problem IMHO. > c) magically it doesn't allocate it on the stack anyway I was just suggesting the similarity, not in the way it works internally, but in the way it works from a user POV. There is no real difference between cdef int[10] myarray and cdef int[some_value] myarray except that the second is currently illegal. > BTW: Why would NumPy be overkill? Because of an external dependency for what seems to be a very simple feature? Stefan From stefan_ml at behnel.de Fri Apr 11 12:47:45 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Apr 2008 12:47:45 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> Message-ID: <47FF41D1.20507@behnel.de> Hi, Robert Bradshaw wrote: >>> The main idea (Lisandro came up with this) is to build these >>> capabilities into cython itself by introducing the following syntax: >>> >>> cdef foo(size): >>> cdef double a[size] # This is like the dynamic arrays in C99 >>> # do stuff with a, but don't worry about deallocating the memory!!! [...] > and coerce it too (or from?) a Python object. I find that a funny idea. We could generate an internal array iterator class that yields the coerced Python values in order, so that "list(a)" would work. :) >> 3) Long-term Python users with very little C experience might end up >> doing something like >> >> cdef double a[size] >> # fill in a >> return a > > If the return type is double*, an error can be given at compile time, > and if it's object it could even coerce to a Python list. I would say > a can neither be assigned to or used as an assignment to something > else--just indexed. Exactly. It's not a pointer, it's an array. Stefan From robertwb at math.washington.edu Fri Apr 11 12:52:16 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 03:52:16 -0700 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF35C4.5080006@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> Message-ID: On Apr 11, 2008, at 2:56 AM, Dag Sverre Seljebotn wrote: > (Hmm. I'm really not that against the approach. But I'll make sure the > arguments against it are at least heard.) Yes, your input is valued! > I see two kinds of uses for arrays in Cython: > > 1) Users that simply wants to allocate an array and do stuff with > it. In > these cases, having something a bit more capable than a standard array > is always going to be an advantage -- even if one doesn't use anything > but C array capabilities, being able to quickly insert a print > statement > to dump the array during debugging, pass it to NumPy functions for > debugging purposes (hmmm, what's wrong...perhaps "any(x > 1)"?) and so > on is convenient _during development_. (Though not necesarrily NumPy, > see below). I don't see the utility as being limited to development. > 2) One wants to know exactly what is going on and be as near C as > possible, possibly when wrapping C libraries. But then, nothing can > beat > try/finally anyway, which really isn't that bad to write and makes it > very clear exactly what is going on: > > cdef char* data = NULL > try: > data = getmem(100) > ... > finally: > if data: free(data) # can't remember if free checks for null, but > could make xfree anyway. > > > Introducing some special syntax candy for the landscape that is > "in-between" these two options just doesn't seem worth it (it makes > the > Cython language heavier and ultimately more difficult to learn). > Especially when with this syntax candy > > a) it looks like the data is going to be allocated on the stack That's what it acts like. > b) in a language that doesn't already have a concept of allocating > objects on the stack (as opposed to C and C++), and All of your other objects, including fixed-length arrays, are allocated on the stack. > c) magically it doesn't allocate it on the stack anyway If the user doesn't know about stack vs. heap allocation, this won't bother them. If they do, then they're probably savvy enough to not worry about it. > (BTW, not using try/finally in the SAGE code posted does to me (to be > honest) just fall into the category of bad and/or sloppy programming, > and one shouldn't make changes to Cython on the basis of that code.) I agree, but there has to be a better way so that the user doesn't have to worry about it. Cython takes care of refcounting, and it would be nice if it took care of simple c array memory management too (as most Python programmers are not familiar with managing their own memory, and that is an area where it is really easy to shoot oneself in the foot). > If NumPy is overkill then perhaps one should instead (as has been > suggested a few times already the last day) make another "buffer" > library that operates in the same manner with respect to Cython > (reference counted etc., but no syntax candy) but is simpler (always > one-dimensional char* buffer for instance). This could quickly be > implemented in an inlineable pxd file that is shipped with Cython, and > potentially be inlined completely in a few years of Cython > development. This is the direction I wold lean, and would be very easy to do with the kind of improvements that we have talked about for easy NumPy array support. It has the advantage of being able to create them, pass them around, etc. but the disadvantage that one needs the GIL to rely on Python's recounting infrastructure. In my mind, arrays are primitive enough that perhaps syntactic sugar should be developed, e.g. cdef double[] a = cdef double[size] Perhaps there should be a CEP with several alternatives/pros/cons? I think the main point is that people want to be able to use arrays without having to manually malloc/free (including error-handling). However, allowing non-constant sized array declarations seems like much lower hanging fruit (as well as a much smaller change). > BTW: Why would NumPy be overkill? Because of a few extra bytes of > memory > per array object? Invoking the incantation "overkill" to me only > suggests the Not Built Here syndrome, I always like to talk about > specific, more rational reasons like memory usage, runtime > performance, > library dependency, ... Library dependancy is the obvious drawback--as nice as NumPy is we are not going to require it to use Cython. Runtime performance is also an issue, NumPy arrays are fast, but if you've looked at the code it is obvious it is nowhere near as fast to create one as a call to malloc(). - Robert From dagss at student.matnat.uio.no Fri Apr 11 13:21:14 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 Apr 2008 13:21:14 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF4030.2030504@behnel.de> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> <47FF4030.2030504@behnel.de> Message-ID: <47FF49AA.5000207@student.matnat.uio.no> "cdef double a[10]" currently being legal is a good point. I consider myself beaten :-) (my argument from language complexity doesn't hold) Just keep the feature away from introductory tutorials for numerical programmers :-) I see lots of premature optimization coming from that; and NumPy definitely doesn't have overhead that's not O(1) within the function scope. Robert wrote: > In my mind, arrays are primitive enough that perhaps syntactic sugar should be developed, e.g. > cdef double[] a = cdef double[size] I like Brian's proposal better. This throws in all kind of questions about what kind of allocation semantics Cython really has. Currently one has two ways of variable allocation: i) "stack" (real or emulated); without assignments, ii) Python refcounted object construction with assignment to reference. What exactly is the above? The matter of Cython object allocation is confused enough as it is; in my opinion there already is a problem and one shouldn't make it worse. Simple demonstration: a) For instance, consider the type name being used for constructing objects in Python as well as specifying variable type in Cython. "int" means completely different things depending on the context! (And is really not a name collision issue, "numpy.ndarray" will also come to mean different things depending on context, or at least "c_numpy.ndarray" vs. "numpy.ndarray" which is hardly any better.) b) This mixture of models means that somebody coming from C++ might expect to be able to do cdef list x x.append(10) c) While somebody from a Python background will expect to be able to do: cdef int x[10] = ... cdef int y[10] = x ...and so on... (well, we might be able to fix c). My assertion here is that Cython is already a bit difficult to learn properly! (Meaning it doesn't get into the fingers as much as a more consistent, well thought through language does as Python or C++ or Java). I don't say we should solve this here and now (or even that it can be solved), I just say that all this means that real changes in this area should come after the current model has been evaluated and all the problems with intuivity that is already present has been discussed. That is however a discussion probably better left for after we have full Python support and one starts looking at type inference. (But as "cdef double a[10]" is already valid, allowing "cdef double a[n]" isn't really a change anyway, and so one can just go ahead). Stefan wrote: > BTW, do we actually support cdef's inside blocks? > No, I forgot. That was one of the first things that surprised me when starting to use Cython, it would be nice to have that (any philosophical reasons against it, or just that it was simpler to implement?) Dag Sverre From stefan_ml at behnel.de Fri Apr 11 13:35:47 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Apr 2008 13:35:47 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> Message-ID: <47FF4D13.4090806@behnel.de> Hi, I added a preliminary CEP 512 about better array support in Cython. http://wiki.cython.org/enhancements/arraytypes Stefan From robertwb at math.washington.edu Fri Apr 11 13:59:00 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 04:59:00 -0700 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF49AA.5000207@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> <47FF4030.2030504@behnel.de> <47FF49AA.5000207@student.matnat.uio.no> Message-ID: On Apr 11, 2008, at 4:21 AM, Dag Sverre Seljebotn wrote: > "cdef double a[10]" currently being legal is a good point. I consider > myself beaten :-) (my argument from language complexity doesn't hold) > > Just keep the feature away from introductory tutorials for numerical > programmers :-) I see lots of premature optimization coming from that; On the other hand, new programmers are even better at creating segfaults/memory leaks due to pointer mismanagement than the rest of us :) > and NumPy definitely doesn't have overhead that's not O(1) within the > function scope. > > Robert wrote: > >> In my mind, arrays are primitive enough that perhaps syntactic sugar > should be developed, e.g. >> cdef double[] a = cdef double[size] > > I like Brian's proposal better. Yes. These are two separate things. > This throws in all kind of questions > about what kind of allocation semantics Cython really has. > Currently one > has two ways of variable allocation: i) "stack" (real or emulated); > without assignments, ii) Python refcounted object construction with > assignment to reference. What exactly is the above? Python refcounted, which is more pythonic (though harder to implement). > The matter of Cython object allocation is confused enough as it is; in > my opinion there already is a problem and one shouldn't make it worse. > Simple demonstration: > > a) For instance, consider the type name being used for constructing > objects in Python as well as specifying variable type in Cython. "int" > means completely different things depending on the context! (And is > really not a name collision issue, "numpy.ndarray" will also come to > mean different things depending on context, or at least > "c_numpy.ndarray" vs. "numpy.ndarray" which is hardly any better.) Why would one have "c_numpy.ndarray" vs "numpy.ndarray"? int is strange because one has Python ints and C ints, but we didn't get to choose the names here. Both uses of types are already natural to python users, e.g. a = int(b) # as a constructor isinstance(a, int) # as a type declaration > b) This mixture of models means that somebody coming from C++ might > expect to be able to do > > cdef list x > x.append(10) I don't see why someone would expect this to work, but even so Cython behaves first like Python, then like C, and then (if ever) like C++. > c) While somebody from a Python background will expect to be able > to do: > > cdef int x[10] = ... > cdef int y[10] = x > > ...and so on... (well, we might be able to fix c). Yes, I think we could fix this by copying. > My assertion here is that Cython is already a bit difficult to learn > properly! (Meaning it doesn't get into the fingers as much as a more > consistent, well thought through language does as Python or C++ or > Java). Yes, I have to agree here, and this is because it sits on the boundary between two very different languages. I found Cython incredibly easy to learn (after learning Python) but sometimes it feels like I'm the exception. The goal is, however, to reduce the gap between Python and Cython so that the learning curve is even more shallow. A question that still remains (and I don't think there's a quick and easy answer to it) is whether it's better to create a way for the user to use C arrays as if they were Python objects (possibly introducing new syntactic sugar), or leave the language simpler and force users to learn the ins and outs of manual pointer memory management. > I don't say we should solve this here and now (or even that it can be > solved), I just say that all this means that real changes in this area > should come after the current model has been evaluated and all the > problems with intuivity that is already present has been discussed. > That > is however a discussion probably better left for after we have full > Python support and one starts looking at type inference. (But as "cdef > double a[10]" is already valid, allowing "cdef double a[n]" isn't > really > a change anyway, and so one can just go ahead). I fully agree. > > > Stefan wrote: >> BTW, do we actually support cdef's inside blocks? >> > No, I forgot. That was one of the first things that surprised me when > starting to use Cython, it would be nice to have that (any > philosophical > reasons against it, or just that it was simpler to implement?) I actually have no idea--this is just how Cython was. I don't think there's any technical barrier (and I do find it annoying sometimes), but it is a reminder that Python locals are not scoped by block (and so nor should Cython's). - Robert From stefan_ml at behnel.de Fri Apr 11 14:30:14 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Apr 2008 14:30:14 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> <47FF4030.2030504@behnel.de> <47FF49AA.5000207@student.matnat.uio.no> Message-ID: <47FF59D6.6020300@behnel.de> Hi, Robert Bradshaw wrote: > On Apr 11, 2008, at 4:21 AM, Dag Sverre Seljebotn wrote: >> Stefan wrote: >>> BTW, do we actually support cdef's inside blocks? >>> >> No, I forgot. That was one of the first things that surprised me when >> starting to use Cython, it would be nice to have that (any philosophical >> reasons against it, or just that it was simpler to implement?) > > I actually have no idea--this is just how Cython was. I don't think > there's any technical barrier (and I do find it annoying sometimes), but > it is a reminder that Python locals are not scoped by block (and so nor > should Cython's). I don't quite remember my first reaction, but I guess I would also be surprised as a newbie. This is (again) sort of an in-between C and Python question. "cdef" enters C space, where block-scoping makes sense. However, we'd have to resolve all sorts of weird semantic nonsense, such as: cdef object i for i in range(10): cdef long i = i ... I don't feel like bothering with that... Stefan From ndbecker2 at gmail.com Fri Apr 11 14:40:56 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 11 Apr 2008 08:40:56 -0400 Subject: [Cython] Proposal: idea for automatic management of dynamic memory References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> <47FF4030.2030504@behnel.de> <47FF49AA.5000207@student.matnat.uio.no> Message-ID: Too bad (IMO) that we're not using c++ - it would really simplify this memory management issue. From dagss at student.matnat.uio.no Fri Apr 11 14:51:04 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 Apr 2008 14:51:04 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF59D6.6020300@behnel.de> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> <47FF4030.2030504@behnel.de> <47FF49AA.5000207@student.matnat.uio.no> <47FF59D6.6020300@behnel.de> Message-ID: <47FF5EB8.4070904@student.matnat.uio.no> > This is (again) sort of an in-between C and Python question. "cdef" enters C > space, where block-scoping makes sense. However, we'd have to resolve all > sorts of weird semantic nonsense, such as: > > cdef object i > for i in range(10): > cdef long i = i > ... > > I don't feel like bothering with that... > Hmm. You're right. Even worse: if mytest: cdef float x = 2 else: cdef int x = 2 print x One would either have to use "C-like scope" for typed variables (doesn't smell good), or know about execution paths and raise compiler errors on the wrong scenarios (don't like that either). If there are some simple rules to allow it in a few simple obvious cases it would be good though, and just don't allow any possible nonsense. Something like: syntax error: The variable "x" is used outside of the typed block. Move the type declaration. syntax error: "x" has a type declared more than once within a function. If one can do this check it should handle most cases, and can be implement it simply by moving all the cdefs (splitting cdef and assignment if necesarry) to the top of the function during compile. But perhaps better left for later. Dag Sverre From dagss at student.matnat.uio.no Fri Apr 11 14:53:09 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 Apr 2008 14:53:09 +0200 Subject: [Cython] CEP 513 - Unified Python and Cython namespaces Message-ID: <47FF5F35.5040409@student.matnat.uio.no> I know I said deal with this later, but because it potentially affects type parametrization syntax, perhaps deal with a few, small aspects of it soon. Wrote this (I've thought about this for about a month but didn't see a need to tell anyone about it until now...): http://wiki.cython.org/enhancements/unifiednamespace Dag Sverre From martin at martincmartin.com Fri Apr 11 14:59:23 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Fri, 11 Apr 2008 08:59:23 -0400 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF59D6.6020300@behnel.de> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> <47FF4030.2030504@behnel.de> <47FF49AA.5000207@student.matnat.uio.no> <47FF59D6.6020300@behnel.de> Message-ID: <47FF60AB.3070108@martincmartin.com> Stefan Behnel wrote: > Hi, > > This is (again) sort of an in-between C and Python question. "cdef" enters C > space, where block-scoping makes sense. However, we'd have to resolve all > sorts of weird semantic nonsense, such as: > > cdef object i > for i in range(10): > cdef long i = i > ... > > I don't feel like bothering with that... I'd just point out that Stroustrup added "define variables anywhere" because there are times that constructors need values that are computed during the block. If we're going to support C++, we may have to support that. Best, Martin From ellisonbg.net at gmail.com Fri Apr 11 16:56:40 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Fri, 11 Apr 2008 08:56:40 -0600 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF1E44.6040202@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> Message-ID: <6ce0ac130804110756j2daf38ceq6932a69cbdbfb294@mail.gmail.com> > 1) Having "cdef double a[size]" allocate anything is very against the > Python syntax -- nothing is ever allocated in Python without a function call > and an assignment (same as in Java). To somebody who comes from a strict > Python background and not C, it really looks like only a variable type > declaration, not object allocation. In Cython, this Python way of working is > mostly kept, and this would be a step *away* from Python in terms of the > "feel" of the language. As other people have mentioned, Cython is (unapologetically) Python with C data types, so this syntax will be at home. > 2) This kind of duplicates the behaviour of the "with" keyword (which is > not present in Cython today (either), but definitely is a goal, and is > present in current Python). I very much like his idea of using with for these things. But, with is 2.5 only and I think this is an important enough issue that it should be a lower level syntax that cython supports. But, with could definitely be used for more complex or custom allocators. Question: does cython have something like a standard library? It seems like such context managers should go there. Also such a standard library would provide a place for lots of commonly used things that shouldn't be in the actual core language. I think that is better than the language itself growing all of these (with) capabilities. I do think that dynamic arrays should be a part of the language though. From ellisonbg.net at gmail.com Fri Apr 11 16:59:09 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Fri, 11 Apr 2008 08:59:09 -0600 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF2151.3060304@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2151.3060304@student.matnat.uio.no> Message-ID: <6ce0ac130804110759s65d8b882md144742578a1ac02@mail.gmail.com> > One more thought: In Python 3000, the "buffer" interface is going to put a > standard on array objects for buffer interchange between any Python library: > > http://www.python.org/dev/peps/pep-3118/ > > Introducing a convenient syntax for C arrays (or our own, Invented Here, > garbage collected array type) would mean lots of code incompatible with this > API is written. (Which might be fine in most cases; but I'd rather not have > to _think_ too much when writing Cython code, ie "do I need this type of > array or this type", I'd rather just always use one array type that does it > all.) I guess my view is that the buffer interface is not orthogonal to this type of thing - rather it will make it even easier to build things that could interoperate. Indeed one possible implementation strategy for these dynamic arrays would be to use buffers. From ellisonbg.net at gmail.com Fri Apr 11 17:02:31 2008 From: ellisonbg.net at gmail.com (Brian Granger) Date: Fri, 11 Apr 2008 09:02:31 -0600 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF4D13.4090806@behnel.de> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF4D13.4090806@behnel.de> Message-ID: <6ce0ac130804110802s490e42e5y8b0badc77245d754@mail.gmail.com> > I added a preliminary CEP 512 about better array support in Cython. > > http://wiki.cython.org/enhancements/arraytypes > Great! One thing to keep in mind (I will add this to the wiki) is that the goal o this proposal is to avoid people using raw unprotected malloc calls that could lead to memory leaks. Currently sage uses such unprotected calls through sage_malloc. This proposal would provide a simple, high-performance and safe alternative. I definitely think that NumPy arrays are much too heavyweight or this purpose and introduce an unacceptable dependency. Cheers, Brian From dagss at student.matnat.uio.no Fri Apr 11 17:09:18 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 Apr 2008 17:09:18 +0200 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <6ce0ac130804110756j2daf38ceq6932a69cbdbfb294@mail.gmail.com> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <6ce0ac130804110756j2daf38ceq6932a69cbdbfb294@mail.gmail.com> Message-ID: <47FF7F1E.7050403@student.matnat.uio.no> > I guess my view is that the buffer interface is not orthogonal to this > type of thing - rather it will make it even easier to build things > that could interoperate. Indeed one possible implementation strategy > for these dynamic arrays would be to use buffers. It is completely orthogonal to the "with" statement, but only almost to "cdef double a[size]". "cdef double a[size]" should probably be implemented as one specific type, and what has been considered natural in the discussion until now (I think) is pure double* with getmem and auto-free. That's what the syntax suggests anyway. This means that the "default" Cython array object then probably won't support that standard. Which I now think is fine anyway. I'm just saying that with a slightly different syntax, it could be more natural to support the buffer API with "native" Cython arrays. But one can do both. Or are you suggesting that the array allocation syntax you propose can have overridable allocators and create many different types of arrays? Dag Sverre From robertwb at math.washington.edu Fri Apr 11 21:31:56 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 12:31:56 -0700 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF5EB8.4070904@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <47FF2BB4.6050209@behnel.de> <47FF35C4.5080006@student.matnat.uio.no> <47FF4030.2030504@behnel.de> <47FF49AA.5000207@student.matnat.uio.no> <47FF59D6.6020300@behnel.de> <47FF5EB8.4070904@student.matnat.uio.no> Message-ID: <5316E3EA-8815-44CE-A0A5-A1D3E90F4DCF@math.washington.edu> On Apr 11, 2008, at 5:51 AM, Dag Sverre Seljebotn wrote: > >> This is (again) sort of an in-between C and Python question. >> "cdef" enters C >> space, where block-scoping makes sense. However, we'd have to >> resolve all >> sorts of weird semantic nonsense, such as: >> >> cdef object i >> for i in range(10): >> cdef long i = i >> ... >> >> I don't feel like bothering with that... >> > Hmm. You're right. Even worse: > > if mytest: > cdef float x = 2 > else: > cdef int x = 2 > print x > > One would either have to use "C-like scope" for typed variables > (doesn't smell good), or know about execution paths and raise > compiler errors on the wrong scenarios (don't like that either). If > there are some simple rules to allow it in a few simple obvious > cases it would be good though, and just don't allow any possible > nonsense. Something like: > > syntax error: The variable "x" is used outside of the typed block. > Move the type declaration. > syntax error: "x" has a type declared more than once within a > function. > > If one can do this check it should handle most cases, and can be > implement it simply by moving all the cdefs (splitting cdef and > assignment if necesarry) to the top of the function during compile. > > But perhaps better left for later. This is really easy to change, as currently one can do if mytest: x = 2 else: x = 2 print x which is implicitly if mytest: cdef object x = 2 else: cdef object x = 2 print x and all variable declarations are moved to the top of the function at compile time anyways. One is not allowed to redeclare variables, so there would be a compile-time error. The only question is if it is misleading that the variables are not block-level (and shouldn't be, otherwise things will be very inconsistant and confusing to Python developers)? - Robert From robertwb at math.washington.edu Fri Apr 11 22:06:24 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 13:06:24 -0700 Subject: [Cython] CEP 513 - Unified Python and Cython namespaces In-Reply-To: <47FF5F35.5040409@student.matnat.uio.no> References: <47FF5F35.5040409@student.matnat.uio.no> Message-ID: <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> I added several comments. I'm not convinced it will make things clearer (or if it will muddy things up even more) but definitely food for thought. - Robert On Apr 11, 2008, at 5:53 AM, Dag Sverre Seljebotn wrote: > I know I said deal with this later, but because it potentially affects > type parametrization syntax, perhaps deal with a few, small aspects of > it soon. > > Wrote this (I've thought about this for about a month but didn't see a > need to tell anyone about it until now...): > > http://wiki.cython.org/enhancements/unifiednamespace > > Dag Sverre > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From dalcinl at gmail.com Fri Apr 11 22:29:22 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 11 Apr 2008 17:29:22 -0300 Subject: [Cython] introduce myself, and unimportant gcc warning... Message-ID: Hi all, perhaps you remember my name from some previous posts (By Brian Granger) about automatic mem management. I started to experiment with Cython for (re)implementing mpi4py, a Python port to MPI[1/2] spec following the API of the standard C++ bindings for MPI-2. However, I'll to this by only accessing the C bindings of MPI as defined in MPI-1 and MPI-2. In about 4 nights, I was able to wrap a significant portion of MPI specs, maintaining backwards compatibility with the former mpi4py code (a mix of pyure Python hand-written C extension). To be honest, I believed that Cython was just another approach, now I believe Cython is THE approach. As it seems the development of Cython is very, very fast (thanks for that!), I'll do in everyday work following the HG repo 'cython-devel'. At some point, I'll try to dive in the Cython internals in order to be able to fix possible problems. Perhaps at some point I'll ask for commit permisions. And then, a small gcc (-Wall -pedantic) warning MPI.c: In function __Pyx_UnpackItem: MPI.c:36454: warning: ISO C90 does not support the `z' printf length modifier Of course, Python2.5 uses '%zd' everywere for printing Py_ssize_t. Does any one know is this can be worked around ? At first, it's seems to me that there is no way. Regards, -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From dalcinl at gmail.com Sat Apr 12 00:53:29 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 11 Apr 2008 19:53:29 -0300 Subject: [Cython] void foo(void) illegal? Message-ID: Whats the rationale of the following being illegal in Cython? cdef extern from "bar.h": void foo(void) If I'm not wrong, in a C context (but not C++), the declarations void foo(void); void foo(); are not equivalent, the second is actually means something like void foo(...) Anyway, as void foo(void) is a valid C (C++?) function declaration, I would ask for that form being legal in Cython, unless there is a some technical issue like leading to an ambiguous grammar. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From dalcinl at gmail.com Sat Apr 12 01:58:50 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 11 Apr 2008 20:58:50 -0300 Subject: [Cython] a = b = (cdef) cval Message-ID: It seems the code generated by cdef MyClass cval = MyClass() a = b = cval is not equivalent in semantics to Python, 'a is b' should be true. I acctually did this cdef int cval = 10 a = b = cval then 'a is b' is true, because of the internal Python cache for small integers. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From robertwb at math.washington.edu Sat Apr 12 02:00:15 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 17:00:15 -0700 Subject: [Cython] void foo(void) illegal? In-Reply-To: References: Message-ID: On Apr 11, 2008, at 3:53 PM, Lisandro Dalcin wrote: > Whats the rationale of the following being illegal in Cython? > > cdef extern from "bar.h": > void foo(void) This is only because foo() is the way to specify a function with no arguments in Python. it is more useful to think of Cython as Python (+ some static declarations) than to try and think of it as being a syntactically-different variant of C. > If I'm not wrong, in a C context (but not C++), the declarations > > void foo(void); > void foo(); > > are not equivalent, the second is actually means something like > > void foo(...) It's even worse, it means "I'm just to lazy to tell you what the arguments are, but if you use the wrong ones bad things could happen." IIRC it's only around for historical reasons and officially discouraged by ANSI C, and it would be bad (in my opinion) to allow such archaic and confusing notions into Cython. > Anyway, as void foo(void) is a valid C (C++?) function declaration, I > would ask for that form being legal in Cython, unless there is a some > technical issue like leading to an ambiguous grammar. If we allow foo() and foo(void) in Cython then people might wonder if the two have different meanings (as they do in C), and such a declaration might look confusing to a Python developer. On the other hand, it will make it easier to do copy-pasting from header files (which will eventually, I hope, could be automated in many cases). - Robert From robertwb at math.washington.edu Sat Apr 12 02:22:03 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 17:22:03 -0700 Subject: [Cython] a = b = (cdef) cval In-Reply-To: References: Message-ID: <28E29F85-ECD8-46BC-8B17-ECC61AAE5EC0@math.washington.edu> On Apr 11, 2008, at 4:58 PM, Lisandro Dalcin wrote: > It seems the code generated by > > cdef MyClass cval = MyClass() > > a = b = cval > > is not equivalent in semantics to Python, 'a is b' should be true. Actually it follows Python semantics exactly. It does not follow C semantics (which treats b = cval as an expression). There are many fewer cases in Python where code is executed on assignment, but unpacking is one of them. Here is a demonstration: class Unpackable: def __iter__(self): try: self.n += 1 print "unpacking", self.n return iter([self.n]) except AttributeError: self.n = 0 return iter(self) (a,) = (b,) = (c,) = Unpackable(); print a, b, c unpacking 1 unpacking 2 unpacking 3 1 2 3 > I acctually did this > > cdef int cval = 10 > a = b = cval > > then 'a is b' is true, because of the internal Python cache for > small integers. Yep. I am grateful for your attention to details, I am sure it will help make Cython better. - Robert From robertwb at math.washington.edu Sat Apr 12 02:28:36 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 17:28:36 -0700 Subject: [Cython] introduce myself, and unimportant gcc warning... In-Reply-To: References: Message-ID: On Apr 11, 2008, at 1:29 PM, Lisandro Dalcin wrote: > Hi all, perhaps you remember my name from some previous posts (By > Brian Granger) about automatic mem management. Hi. I have to say I thought your idea of using Strings as refocounted buffers was particularly clever :). > I started to experiment with Cython for (re)implementing mpi4py, a > Python port to MPI[1/2] spec following the API of the standard C++ > bindings for MPI-2. However, I'll to this by only accessing the C > bindings of MPI as defined in MPI-1 and MPI-2. > > In about 4 nights, I was able to wrap a significant portion of MPI > specs, maintaining backwards compatibility with the former mpi4py code > (a mix of pyure Python hand-written C extension). To be honest, I > believed that Cython was just another approach, now I believe Cython > is THE approach. Thanks. That's how I feel too (not that I'm bias or anything... :-). > As it seems the development of Cython is very, very fast (thanks for > that!), I'll do in everyday work following the HG repo 'cython-devel'. Be aware that cython-devel is occasionally unstable, but I usually try and keep it relatively clean. > At some point, I'll try to dive in the Cython internals in order to be > able to fix possible problems. Perhaps at some point I'll ask for > commit permisions. Sure. Also, mercurial is a distributed revision control system, so you don't even need commit permissions to do development. > And then, a small gcc (-Wall -pedantic) warning > > MPI.c: In function __Pyx_UnpackItem: > MPI.c:36454: warning: ISO C90 does not support the `z' printf > length modifier > > Of course, Python2.5 uses '%zd' everywere for printing Py_ssize_t. > Does any one know is this can be worked around ? At first, it's seems > to me that there is no way. I'm not sure, but I bet it's some compiler flag or compiler version issue. Are you using the same compiler you used to compile Python? If it is a compiler version issue, I believe most (all?) instances of % zd are surrounded by ifdef statements anyway, which could be modified to check the compiler as well as python version number. - Robert From dalcinl at gmail.com Fri Apr 11 18:59:08 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 11 Apr 2008 13:59:08 -0300 Subject: [Cython] Proposal: idea for automatic management of dynamic memory In-Reply-To: <47FF7F1E.7050403@student.matnat.uio.no> References: <6ce0ac130804102024p2942e662oeea61175859a4be9@mail.gmail.com> <47FF1E44.6040202@student.matnat.uio.no> <6ce0ac130804110756j2daf38ceq6932a69cbdbfb294@mail.gmail.com> <47FF7F1E.7050403@student.matnat.uio.no> Message-ID: After this long discussion, I would like to add some points: * variable-lenght arrays are already valid constructs in C99. Furthermore, 'sizeof' returns at runtime the size of the array!!!. * They are almost identical in syntax and semantics to the dynamic array extension of GCC, wich is available since long time ago. An example from the manpages of GCC: FILE * concat_fopen (char *s1, char *s2, char *mode) { char str[strlen (s1) + strlen (s2) + 1]; strcpy (str, s1); strcat (str, s2); return fopen (str, mode); } * Other no very well know way of semi-automatic dynamic memory allocation si the function 'alloca' (please consult your linux info pages). I'm not sure if this is GCC-only or supported on every platform. An example from the libc info pages: int open2 (char *str1, char *str2, int flags, int mode) { char *name = (char *) alloca (strlen (str1) + strlen (str2) + 1); stpcpy (stpcpy (name, str1), str2); return open_or_report_error (name, flags, mode); } Finally, I definitelly think that variable-sized arrays should be suported in Cython, as it already is a C99 feature. Moreover, in the case of a C99 compiler, perhaps Cython could just let the compiler manage the allocation. Of course, Cython should also emit code (protected with macros) for the case a generated C file is not being compiled with a C99 compiler. An finally, I now believe that if you do cdef double a[size] Then the 'a' pointer should not be allowed to change, that is, something like 'a++' or 'a+=1' will generate a Cython error. This restriction is perhaps premature, Cython should follow C99 rules here (and I do not know all the details) -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From robertwb at math.washington.edu Sat Apr 12 03:26:09 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 18:26:09 -0700 Subject: [Cython] void foo(void) illegal? In-Reply-To: References: <2D243B47-EE17-41BF-B354-B4DB8720FA21@math.washington.edu> Message-ID: On Apr 11, 2008, at 6:17 PM, Lisandro Dalcin wrote: > On 4/11/08, Robert Bradshaw > >> What if the error message made it very clear in the (compile >> time) error >> message that a function declared foo(void) in c should be declared >> foo() in >> Cython? Would that be sufficient? > > Yes, regarding all your comments, I now believe that the best is to > just generate a better error mensage OK, I'll put this on my todo list. >> because I hope that most copy/paste style declarations can be handled >> automatically (e.g. one could just write cinclude "header.h" and >> it would >> find all the functions/constants, though this would not be near as >> powerful). > > Nice! thats a nice feature. But then, in that case, the parser HAVE to > accept the void foo(void) inside a C header file, right? Yes. Note that this not yet implemented (and will be a fair amount of work) and will not necessarily be as good as a hand written one. >> BTW, any specific reason you took these discussions off-list (as >> I think >> they would be general interest?) > > Ups! just because I simply pick 'reply' in Gmail, and you continued > the thread writting to me and CC'ing to the list. In other mailing > lists configuration, the 'reply-to' is by default set to the list. > > BTW, why cython-devel list does not specify a 'reply-to' by default > being the list itself? This has annoyed me too, but it just now hit me that I could probably go in and change it. Done. - Robert >>> On 4/11/08, Robert Bradshaw wrote: >>> >>>> On Apr 11, 2008, at 3:53 PM, Lisandro Dalcin wrote: >>>> >>>> >>>>> Whats the rationale of the following being illegal in Cython? >>>>> >>>>> cdef extern from "bar.h": >>>>> void foo(void) >>>>> >>>>> >>>> >>>> This is only because foo() is the way to specify a function >>>> with no >>>> arguments in Python. it is more useful to think of Cython as >>>> Python (+some >>>> static declarations) than to try and think of it as being a >>>> syntactically-different variant of C. >>>> >>>> >>>> >>>>> If I'm not wrong, in a C context (but not C++), the declarations >>>>> >>>>> void foo(void); >>>>> void foo(); >>>>> >>>>> are not equivalent, the second is actually means something like >>>>> >>>>> void foo(...) >>>>> >>>>> >>>> >>>> It's even worse, it means "I'm just to lazy to tell you what the >> arguments >>>> are, but if you use the wrong ones bad things could happen." >>>> IIRC it's >> only >>>> around for historical reasons and officially discouraged by ANSI >>>> C, and >> it >>>> would be bad (in my opinion) to allow such archaic and confusing >>>> notions >>>> into Cython. >>>> >>>> >>>> >>>>> Anyway, as void foo(void) is a valid C (C++?) function >>>>> declaration, I >>>>> would ask for that form being legal in Cython, unless there is >>>>> a some >>>>> technical issue like leading to an ambiguous grammar. >>>>> >>>>> >>>> >>>> If we allow foo() and foo(void) in Cython then people might >>>> wonder if >> the >>>> two have different meanings (as they do in C), and such a >>>> declaration >> might >>>> look confusing to a Python developer. On the other hand, it will >>>> make it >>>> easier to do copy-pasting from header files (which will >>>> eventually, I >> hope, >>>> could be automated in many cases). >>>> >>>> - Robert >>>> >>>> >>>> >>> >>> >>> -- >>> Lisandro Dalc?n >>> --------------- >>> Centro Internacional de M?todos Computacionales en Ingenier?a >>> (CIMEC) >>> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica >>> (INTEC) >>> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) >>> PTLC - G?emes 3450, (3000) Santa Fe, Argentina >>> Tel/Fax: +54-(0)342-451.1594 >>> >> >> > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 From robertwb at math.washington.edu Sat Apr 12 03:27:25 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 11 Apr 2008 18:27:25 -0700 Subject: [Cython] a = b = (cdef) cval In-Reply-To: References: <28E29F85-ECD8-46BC-8B17-ECC61AAE5EC0@math.washington.edu> Message-ID: <045B3A32-CAFD-484A-B801-A522A68C84ED@math.washington.edu> On Apr 11, 2008, at 5:33 PM, Lisandro Dalcin wrote: > Ok and fine, jut just to be sure, if I do > > cdef int val = 123456789 # large int > a = b = val # 'a' and 'b' are then python integers > assert a is b # test for object identity > > the the assert should or should not pass in Cython? Should or should > not Cython behave like Python in the following case:: > > In [1]: a = b = 12345678 > > In [2]: a is b > Out[2]: True > > In [3]: a = 12345678 > > In [4]: b = 12345678 > > In [5]: a is b > Out[5]: False No, it should not be True in Cython--if conversion is needed it is executed at each assignment (as in my previous example). Of course one can't write that cython test case in Python to try it out. As another example, consider. cdef double a cdef int b a = b = float(1.5) # make it a Python object or otherwise the compiler will (fortunately) complain. print a, b 1.5 1 Note that integer literals in Cython are not Python objects (for obvious speed reasons). In Python they are. This is a subtle point, but fortunately one rarely has to worry about it. > On 4/11/08, Robert Bradshaw wrote: >> On Apr 11, 2008, at 4:58 PM, Lisandro Dalcin wrote: >> >> >>> It seems the code generated by >>> >>> cdef MyClass cval = MyClass() >>> >>> a = b = cval >>> >>> is not equivalent in semantics to Python, 'a is b' should be true. >>> >> >> Actually it follows Python semantics exactly. It does not follow C >> semantics (which treats b = cval as an expression). There are many >> fewer >> cases in Python where code is executed on assignment, but >> unpacking is one >> of them. Here is a demonstration: >> >> class Unpackable: >> def __iter__(self): >> try: >> self.n += 1 >> print "unpacking", self.n >> return iter([self.n]) >> except AttributeError: >> self.n = 0 >> return iter(self) >> >> (a,) = (b,) = (c,) = Unpackable(); print a, b, c >> >> unpacking 1 >> unpacking 2 >> unpacking 3 >> 1 2 3 >> >> >>> I acctually did this >>> >>> cdef int cval = 10 >>> a = b = cval >>> >>> then 'a is b' is true, because of the internal Python cache for >>> small >> integers. >>> >> >> Yep. >> >> I am grateful for your attention to details, I am sure it will >> help make >> Cython better. >> >> - Robert >> >> >> > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 From stefan_ml at behnel.de Sat Apr 12 14:42:43 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 12 Apr 2008 14:42:43 +0200 Subject: [Cython] Cython circular cdef import patch In-Reply-To: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> References: <8f8f8530803251117x4efeeafch7243af2b9b302f2e@mail.gmail.com> Message-ID: <4800AE43.7080006@behnel.de> Hi, Gary Furnish wrote: > This patch adds extra logic to code generation to sort dependencies to > guarantee that C code is generated in the right order for circular cdef > imports. And another issue with the original patch: https://bugs.launchpad.net/cython/+bug/215550 Stefan From dagss at student.matnat.uio.no Sat Apr 12 15:42:27 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 12 Apr 2008 15:42:27 +0200 Subject: [Cython] CEP 513 - Unified Python and Cython namespaces In-Reply-To: <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> Message-ID: <4800BC43.6030208@student.matnat.uio.no> Robert Bradshaw wrote: > I added several comments. I'm not convinced it will make things > clearer (or if it will muddy things up even more) but definitely food > for thought. Thanks, your comments definitely helped me seperate and present the concepts better. I renamed the page and moved some stuff around; it can now be found here: http://wiki.cython.org/enhancements/overlaypythonmodules http://wiki.cython.org/enhancements/builtins Clarity: It might be a matter of taste. The latter link is about something I think it makes the Cython language "less heavy" by removing a few special cases that's not allowed today. The idea in the former link I think will be a little complicated but as clear as the alternative I see; which is magical rewrites in the Cython compiler core and having to use seperate names for the type "c_numpy.ndarray" and the constructor "numpy.ndarray". (Using c_numpy is clearer from a low-level perspective but I think it adds a significant learning curve.) Most of this is not that important now though (though some of it is NumPy-relevant). Nailing a good type argument syntax is more important and can be considered seperately from a usability perspective -- the very first response I got from the NumPy community on the specs was that "the ndarray constructor should take the same parameters as we are used to", when what they were looking at was really a parametrization of the ndarray type. Dag Sverre From robertwb at math.washington.edu Sat Apr 12 19:12:38 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 12 Apr 2008 10:12:38 -0700 Subject: [Cython] CEP 513 - Unified Python and Cython namespaces In-Reply-To: <4800BC43.6030208@student.matnat.uio.no> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> Message-ID: <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> On Apr 12, 2008, at 6:42 AM, Dag Sverre Seljebotn wrote: > Robert Bradshaw wrote: >> I added several comments. I'm not convinced it will make things >> clearer (or if it will muddy things up even more) but definitely >> food for thought. > Thanks, your comments definitely helped me seperate and present the > concepts better. I renamed the page and moved some stuff around; it > can now be found here: > > http://wiki.cython.org/enhancements/overlaypythonmodules > http://wiki.cython.org/enhancements/builtins > > Clarity: It might be a matter of taste. The latter link is about > something I think it makes the Cython language "less heavy" by > removing a few special cases that's not allowed today. The idea in > the former link I think will be a little complicated but as clear > as the alternative I see; which is magical rewrites in the Cython > compiler core and having to use seperate names for the type > "c_numpy.ndarray" and the constructor "numpy.ndarray". (Using > c_numpy is clearer from a low-level perspective but I think it adds > a significant learning curve.) Just to clarify where you're coming from, where in Cython (as it is now) would one every have to use c_numpy.ndarray (rather than using numpy.ndarray everywhere)? > Most of this is not that important now though (though some of it is > NumPy-relevant). Nailing a good type argument syntax is more > important and can be considered seperately from a usability > perspective -- the very first response I got from the NumPy > community on the specs was that "the ndarray constructor should > take the same parameters as we are used to", when what they were > looking at was really a parametrization of the ndarray type. > > Dag Sverre From dagss at student.matnat.uio.no Sat Apr 12 19:30:38 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 12 Apr 2008 19:30:38 +0200 Subject: [Cython] CEP 513 - Unified Python and Cython namespaces In-Reply-To: <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> Message-ID: <4800F1BE.5030401@student.matnat.uio.no> > > Just to clarify where you're coming from, where in Cython (as it is > now) would one every have to use c_numpy.ndarray (rather than using > numpy.ndarray everywhere)? From the land of big, bad unwarranted assumptions. I think this thought entered my head because of how all the numpy examples where written; but I really should have investigated it more. So apparently, overlaying is already happening (defining C functions in the pxd currently overlays functions in the Python module correctly). I'll update the documents accordingly at some point (later); I still think there's a point in there though. -- Dag Sverre From dagss at student.matnat.uio.no Sat Apr 12 20:36:05 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 12 Apr 2008 20:36:05 +0200 Subject: [Cython] Getting hands dirty -- phase refactoring Message-ID: <48010115.3050809@student.matnat.uio.no> I thought I'd continue on my inner function code; however I discovered that it cannot be made efficient (ie, without packing/unpacking the closured variables into untyped Python objects) until some phase refactoring happens. So I'll leave it for when that is done (good to have a simple example to try it one though). I've written a little bit about possible strategies: http://wiki.cython.org/enhancements/phaseseperation I could have done more here but some of you know the source so much much better... so see if anything smells fishy about my assumptions, and perhaps comment and/or vote on strategies... -- Dag Sverre From dalcinl at gmail.com Sun Apr 13 00:34:21 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 12 Apr 2008 19:34:21 -0300 Subject: [Cython] automating inplace extension build with distutils Message-ID: Please take a look at the simple minded script attached. It can build a Cython pyx file producing an extension module in place. Perhaps this can serve a starting point of the way of reusing standard Python distutils to implement the experimental '-C' and '-X' flag in a portable way. The fist part of the script plays with the environment just because I needed it in the process of developing the new mpi4py (I need to pass 'mpicc'/'mpicxx' as the compiler/linker command). So in general this is not needed. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- A non-text attachment was scrubbed... Name: cy2py Type: application/octet-stream Size: 1240 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20080412/8c0c261e/attachment.obj From dalcinl at gmail.com Sun Apr 13 01:09:36 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 12 Apr 2008 20:09:36 -0300 Subject: [Cython] a warning on using string/buffer objects for getting tmp memory Message-ID: I've just realized that using a string or buffer object for automatic management of memory as I proposed has a pitfall: memory alignement is not guaranteed. So perhaps the only way to go is with this trick is to use a custom python object internally calling malloc/free. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From robertwb at math.washington.edu Sun Apr 13 02:33:43 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 12 Apr 2008 17:33:43 -0700 Subject: [Cython] a warning on using string/buffer objects for getting tmp memory In-Reply-To: References: Message-ID: On Apr 12, 2008, at 4:09 PM, Lisandro Dalcin wrote: > I've just realized that using a string or buffer object for automatic > management of memory as I proposed has a pitfall: memory alignement is > not guaranteed. > > So perhaps the only way to go is with this trick is to use a custom > python object internally calling malloc/free. It looks like strings are aligned on int boundaries (given their struct). What guarantee does one have about malloc? I would imagine we would have a custom object (which would be very simple) and even faster than a stringobject. - Robert From robertwb at math.washington.edu Sun Apr 13 02:41:50 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 12 Apr 2008 17:41:50 -0700 Subject: [Cython] Getting hands dirty -- phase refactoring In-Reply-To: <48010115.3050809@student.matnat.uio.no> References: <48010115.3050809@student.matnat.uio.no> Message-ID: <61C0F9CD-8EC3-4DB5-B05D-957C4A2BF5F5@math.washington.edu> I think the way to go about this is to make inner functions into classes, with the bound variables being class members (c or Python types). This will also allow us to use the framework to do yield statements as well. - Robert On Apr 12, 2008, at 11:36 AM, Dag Sverre Seljebotn wrote: > I thought I'd continue on my inner function code; however I discovered > that it cannot be made efficient (ie, without packing/unpacking the > closured variables into untyped Python objects) until some phase > refactoring happens. So I'll leave it for when that is done (good to > have a simple example to try it one though). > > I've written a little bit about possible strategies: > > http://wiki.cython.org/enhancements/phaseseperation > > I could have done more here but some of you know the source so much > much > better... so see if anything smells fishy about my assumptions, and > perhaps comment and/or vote on strategies... > > -- > Dag Sverre > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From martin at martincmartin.com Sun Apr 13 03:09:12 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Sat, 12 Apr 2008 21:09:12 -0400 Subject: [Cython] a warning on using string/buffer objects for getting tmp memory In-Reply-To: References: Message-ID: <48015D38.3070806@martincmartin.com> Robert Bradshaw wrote: > On Apr 12, 2008, at 4:09 PM, Lisandro Dalcin wrote: > >> I've just realized that using a string or buffer object for automatic >> management of memory as I proposed has a pitfall: memory alignement is >> not guaranteed. >> >> So perhaps the only way to go is with this trick is to use a custom >> python object internally calling malloc/free. > > It looks like strings are aligned on int boundaries (given their > struct). On 64 bit machines, gcc uses 32 bits for ints, so simply aligning on int boundaries wouldn't get you 64 bit aligned. > What guarantee does one have about malloc? It depends on the implementation. For glibc: http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html#Aligned-Memory-Blocks "The address of a block returned by malloc or realloc in the GNU system is always a multiple of eight (or sixteen on 64-bit systems)." You could always do what memalign() does: if e.g. you need something aligned on 8 byte boundaries, but Python's string allocation only uses 4 byte boundaries, then allocate an extra 4 bytes, and if the result isn't 8 byte aligned, return the address + 4. > I would imagine > we would have a custom object (which would be very simple) and even > faster than a stringobject. Best, Martin From robertwb at math.washington.edu Sun Apr 13 04:12:48 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 12 Apr 2008 19:12:48 -0700 Subject: [Cython] CEP 507/513 In-Reply-To: <4800F1BE.5030401@student.matnat.uio.no> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> Message-ID: <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> I am starting an email thread about these two Cython Enhancement Proposals because I think this is a better forum than embedding lots of comments in the wiki page. The original proposal was that some Python objects (e.g. list, tuple, dict, str) would be known by the compiler and optimized accordingly (e.g. using the appropriate macros for indexing). One thing that is radical about this proposal (and why a pxd file wouldn't suffice) is that something declared "cdef list" would have to be *exactly* a list, not a subclass of list. In this case I think it is worth it because there are enormous gains to be made and 99% of Python lists are actually just lists. These would be exactly like C types which have no hierarchy. In general type declarations of Python objects should accept subclasses of that object. Great pains are taken to make subclassing work well for extension types (vtables for cdef methods, all the magic that makes cpdef methods and optional arguments work). This is in fact one of the main tenants of object oriented programing. This is why statements like > cdef T a = x > > * With the exception of object, subtype instances can not be > assigned. x has to be exactly of the type T. Some extension syntax > (descendants(T) or similar) can be added later, but it doesn't seem > like a normal usecase. make me quite hesitant. In the question of being allowed to do > cdef T a = x for T a python class (not cimported, and not even necessarily a type) I am not sure this is a good thing. The *only* reason we declare types for python objects is to be able to do static binding. If T is not statically declared, then there is no advantage (other than perhaps type checking which can be done anyways). With no advantages, and it goes against the "duck typing" philosophy of Python (though one can always manually check the type if one needs it), I'm not convinced that we want to go this route. I would like more feedback on this from the general community before rejecting it outright however. - Robert From martin at martincmartin.com Sun Apr 13 13:48:48 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Sun, 13 Apr 2008 07:48:48 -0400 Subject: [Cython] CEP 507/513 In-Reply-To: <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> Message-ID: <4801F320.80804@martincmartin.com> I too would be worried about changing the semantics of cdef T a = x. But what about: cdef T a = x assert a.__class__ == T This makes it valid Python, which pins down the type of "a" exactly. In fact, at this point you don't even need the "cdef T". It's slightly ugly, but with luck, that will discourage people from premature optimization. Best, Martin Robert Bradshaw wrote: > I am starting an email thread about these two Cython Enhancement > Proposals because I think this is a better forum than embedding lots > of comments in the wiki page. > > The original proposal was that some Python objects (e.g. list, tuple, > dict, str) would be known by the compiler and optimized accordingly > (e.g. using the appropriate macros for indexing). One thing that is > radical about this proposal (and why a pxd file wouldn't suffice) is > that something declared "cdef list" would have to be *exactly* a > list, not a subclass of list. In this case I think it is worth it > because there are enormous gains to be made and 99% of Python lists > are actually just lists. These would be exactly like C types which > have no hierarchy. > > In general type declarations of Python objects should accept > subclasses of that object. Great pains are taken to make subclassing > work well for extension types (vtables for cdef methods, all the > magic that makes cpdef methods and optional arguments work). This is > in fact one of the main tenants of object oriented programing. This > is why statements like > >> cdef T a = x >> >> * With the exception of object, subtype instances can not be >> assigned. x has to be exactly of the type T. Some extension syntax >> (descendants(T) or similar) can be added later, but it doesn't seem >> like a normal usecase. > > make me quite hesitant. > > In the question of being allowed to do > >> cdef T a = x > > > for T a python class (not cimported, and not even necessarily a type) > I am not sure this is a good thing. The *only* reason we declare > types for python objects is to be able to do static binding. If T is > not statically declared, then there is no advantage (other than > perhaps type checking which can be done anyways). With no advantages, > and it goes against the "duck typing" philosophy of Python (though > one can always manually check the type if one needs it), I'm not > convinced that we want to go this route. > > I would like more feedback on this from the general community before > rejecting it outright however. > > - Robert > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From dagss at student.matnat.uio.no Sun Apr 13 15:12:53 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 13 Apr 2008 15:12:53 +0200 Subject: [Cython] Getting hands dirty -- phase refactoring In-Reply-To: <48010115.3050809@student.matnat.uio.no> References: <48010115.3050809@student.matnat.uio.no> Message-ID: <480206D5.8020201@student.matnat.uio.no> > I've written a little bit about possible strategies: > > http://wiki.cython.org/enhancements/phaseseperation > > I could have done more here but some of you know the source so much much > better... so see if anything smells fishy about my assumptions, and > perhaps comment and/or vote on strategies... > OK, following the example of taking things on the mailing list: Robert added that the current approach of per-function phases "[...] is done because function bodies should have access to (fully typed) module-level symbols." My response is (better formatted on http://wiki.cython.org/enhancements/phaseseperation): === Problem 1: Functions needs type information of other functions prior to type analysis === The problem is with this code snippet: {{{ #!python def bar(): cdef int x = foo() print x cdef int foo(): return 42 }}} Here, the type analysis phase inside bar() should know that foo() returns an int, however analysis hasn't reached this point yet. Luckily, this problem doesn't "recurse": Inner functions and classes has to be declared before they are used. Also, if inner class support gets added to Cython (I don't think it is now? but not 100% sure), then this presents a possible problem: {{{ #!python cdef int foo(): return A.B.something class A: class B: cdef int field = foo() }}} However, Python specifies that code within class definitions is run at class definition time, meaning there isn't a problem here: It would be illegal to move the definition of foo() to below class A. Still, this example points to how complicated this can be -- from 10 minutes of experimentation, a working ruleset for this seems to be: 1. In module scope, handle functions breadth-first but recurse depth-first into classes. 2. Deal with the functions, doing depth-first of inner functions and classes. '''Solution 1:''' The naive approach: Have two type analysis passes, one at module level and one at function level. These can be seperate phases, both controlled from the top level. Basically that is the same approach as today, but more explicitly and with better control over phases. '''Solution 2:''' (Strategy 2 or 3 below is mandatory for this): In the type analysis controller visitor, one can embed custom rules for the flow. So basically one starts off visiting the tree breadth-first rather than depth-first in module scope; while within functions, one should do depth-first to properly process inner functions and classes. dagss: +1 for 2. Since this is a bit complicated to get right, solution 2 seems to allow more explicitly programming the right control flow for type analysis, and do necesarry tuning etc. -- Dag Sverre From dagss at student.matnat.uio.no Sun Apr 13 15:36:49 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 13 Apr 2008 15:36:49 +0200 Subject: [Cython] CEP 507/513 In-Reply-To: <4801F320.80804@martincmartin.com> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> Message-ID: <48020C71.6090100@student.matnat.uio.no> Martin C. Martin wrote: > I too would be worried about changing the semantics of cdef T a = x. > But what about: > > cdef T a = x > assert a.__class__ == T > > This makes it valid Python, which pins down the type of "a" exactly. > In fact, at this point you don't even need the "cdef T". > > It's slightly ugly, but with luck, that will discourage people from > premature optimization. It's still dangerous: a.myfunc = myoverride assert a.__class__ == T a.myfunc() (On the subject of optimizations, one could also augment the CEP by having the notation "exactly(MyClass)" mean "early-bind all calls and I don't care about the consequences". Using "exactly(list)" would then mean that builtins wouldn't need to be a special-case either. But I'm fine with "list" being a special-case here.) But at any rate, the point of the CEP was to have more consistency in Cython syntax, not necesarrily adding optimizations. Robert makes a very good case (and has fully convinced me) for needing to support descendants being assigned (that not allowing that wasn't one of my brightest ideas). But independtly of this, it looks like C types, Python builtins, and Python extension types declared in pxd files are going to be supported. Simply allowing Python types as well simply makes it more consistent, even if it isn't that useful... On the other hand, if it doesn't have a useful features, there definitely is a case for not bothering to implement it (well, it is useful for function overloading, but then again function overloading is less useful for Python objects :-) ). So I really wouldn't mind if this is rejected. CEP 513 should be independent of this though? -- Dag Sverre From dagss at student.matnat.uio.no Sun Apr 13 16:12:52 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 13 Apr 2008 16:12:52 +0200 Subject: [Cython] CEP 507/513 In-Reply-To: <48020C71.6090100@student.matnat.uio.no> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> Message-ID: <480214E4.7080503@student.matnat.uio.no> > (On the subject of optimizations, one could also augment the CEP by > having the notation "exactly(MyClass)" mean "early-bind all calls and I > don't care about the consequences". Using "exactly(list)" would then > mean that builtins wouldn't need to be a special-case either. But I'm > fine with "list" being a special-case here.) > A digression: While cdef exactly(MyClass) x = MyClass() x.foo() # compile-time bound isn't something that would be that much used by human developers, having it as a language feature could make some optimizations from type inference conceptually simpler, so that x = MyClass() x.foo() would be compile-time bound. Still, dict overrides makes knowing when it is safe to do so non-trivial. Like I said, a digression. -- Dag Sverre From stefan_ml at behnel.de Sun Apr 13 17:13:08 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 13 Apr 2008 17:13:08 +0200 Subject: [Cython] CEP 507/513 In-Reply-To: <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> Message-ID: <48022304.9010801@behnel.de> Hi, Robert Bradshaw wrote: > The original proposal was that some Python objects (e.g. list, tuple, > dict, str) would be known by the compiler and optimized accordingly > (e.g. using the appropriate macros for indexing). One thing that is > radical about this proposal (and why a pxd file wouldn't suffice) is > that something declared "cdef list" would have to be *exactly* a > list, not a subclass of list. In this case I think it is worth it > because there are enormous gains to be made and 99% of Python lists > are actually just lists. These would be exactly like C types which > have no hierarchy. I think this is the right way to do it. Due to the usual naming convention of having extension types start with an upper-case letter, these declarations even look like C types - and the only reason I could imagine why you would declare a variable "cdef list" is to avoid having to write PyList_Append(l, x) print PyList_GetItem(l, i) instead of l.append(x) print l[i] You wouldn't gain *anything* by declaring a variable "cdef list" that holds a subtype of list. BTW, at least PyList_Append() seems to work just fine for subtypes: ------------------------------------ cdef extern from "Python.h": ctypedef class __builtin__.list [object PyListObject]: pass cdef int PyList_Append(object l, object obj) except -1 cdef class csublist(list): def append(l, x): print "APPEND", x l = csublist() PyList_Append(l, 1) print l import __builtin__ class pysublist(__builtin__.list): def append(l, x): print "APPEND", x l = pysublist() PyList_Append(l, 1) print l ------------------------------------ The "APPEND"s are not printed (as expected), so the output is: ------------------------------------ [1] [1] ------------------------------------ Stefan From martin at martincmartin.com Sun Apr 13 17:34:46 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Sun, 13 Apr 2008 11:34:46 -0400 Subject: [Cython] CEP 507/513 In-Reply-To: <48020C71.6090100@student.matnat.uio.no> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> Message-ID: <48022816.2000104@martincmartin.com> Dag Sverre Seljebotn wrote: > Martin C. Martin wrote: >> I too would be worried about changing the semantics of cdef T a = x. >> But what about: >> >> cdef T a = x >> assert a.__class__ == T >> >> This makes it valid Python, which pins down the type of "a" exactly. >> In fact, at this point you don't even need the "cdef T". >> >> It's slightly ugly, but with luck, that will discourage people from >> premature optimization. > It's still dangerous: > > a.myfunc = myoverride > assert a.__class__ == T > a.myfunc() True. It would be good to have a way to turn off the per-instance metaclass for instances of a given type, i.e. not allow functions in the object's __dict__. It wouldn't be that hard, in the constructor you could replace the object's __dict__ with a subclass of __dict__ that overrides the setter. But then you have the same problem with the class, namely a = T() T.myfunc = myoverride a.myfunc() # Calls myoverride. Not sure how you'd disable it in the class. > (On the subject of optimizations, one could also augment the CEP by > having the notation "exactly(MyClass)" mean "early-bind all calls and I > don't care about the consequences". Using "exactly(list)" would then > mean that builtins wouldn't need to be a special-case either. But I'm > fine with "list" being a special-case here.) Well, special cases make things very confusing for the programmer, so I'm against them when possible. Best, Martikn From stefan_ml at behnel.de Sun Apr 13 18:06:01 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 13 Apr 2008 18:06:01 +0200 Subject: [Cython] CEP 507/513 In-Reply-To: <48020C71.6090100@student.matnat.uio.no> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> Message-ID: <48022F69.3070106@behnel.de> Hi, Dag Sverre Seljebotn wrote: > a.myfunc = myoverride > assert a.__class__ == T > a.myfunc() This doesn't work with extension types, they don't have a __dict__. Stefan From dalcinl at gmail.com Mon Apr 14 15:51:50 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 14 Apr 2008 10:51:50 -0300 Subject: [Cython] CyRe: a warning on using string/buffer objects for getting tmp memory Message-ID: One point more for supporting the idea of Cython handling this automatically in the future!!. He have to malloc and perhaps memalign for generating temporary buffers. I forgot to mention the memalign issue because GNU systems automatically do that for you. On 4/12/08, Martin C. Martin wrote: > > > Robert Bradshaw wrote: > > > On Apr 12, 2008, at 4:09 PM, Lisandro Dalcin wrote: > > > > > > > I've just realized that using a string or buffer object for automatic > > > management of memory as I proposed has a pitfall: memory alignement is > > > not guaranteed. > > > > > > So perhaps the only way to go is with this trick is to use a custom > > > python object internally calling malloc/free. > > > > > > > It looks like strings are aligned on int boundaries (given their struct). > > > > On 64 bit machines, gcc uses 32 bits for ints, so simply aligning on int > boundaries wouldn't get you 64 bit aligned. > > > > What guarantee does one have about malloc? > > > > It depends on the implementation. For glibc: > > http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html#Aligned-Memory-Blocks > > "The address of a block returned by malloc or realloc in the GNU system is > always a multiple of eight (or sixteen on 64-bit systems)." > > You could always do what memalign() does: if e.g. you need something > aligned on 8 byte boundaries, but Python's string allocation only uses 4 > byte boundaries, then allocate an extra 4 bytes, and if the result isn't 8 > byte aligned, return the address + 4. > > > > I would imagine we would have a custom object (which would be very > simple) and even faster than a stringobject. > > > > Best, > Martin > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From dalcinl at gmail.com Mon Apr 14 16:45:17 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 14 Apr 2008 11:45:17 -0300 Subject: [Cython] bitten by const Message-ID: Is there a way to handle a call to this guy, regarding current Cython limitations about const? int PyObject_AsReadBuffer (PyObject*, const void**, Py_ssize_t*) -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From stefan_ml at behnel.de Mon Apr 14 17:10:33 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 14 Apr 2008 17:10:33 +0200 Subject: [Cython] bitten by const In-Reply-To: References: Message-ID: <480373E9.8000304@behnel.de> Lisandro Dalcin wrote: > Is there a way to handle a call to this guy, regarding current Cython > limitations about const? > > int PyObject_AsReadBuffer (PyObject*, const void**, Py_ssize_t*) You want to call that function? Then declare it cdef extern from "Python.h": cdef int PyObject_AsReadBuffer (PyObject* a, void** b, Py_ssize_t* c) Doesn't that work? Stefan From dalcinl at gmail.com Mon Apr 14 19:03:58 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 14 Apr 2008 14:03:58 -0300 Subject: [Cython] improving readability of generated sources Message-ID: In the attached script (please not I'm a complete beast using regular expressions), I've attempted to transform Cython output related to checking errors, saving file and lineno, and jumping to label. IMHO, the generated source is more readable. Additionally, the final source is slightly smaller in size. What do you think? BTW, I noticed that an expression like 'raise Something' generates the corresponding C code (jumping to label) one line below the actual call to __Pyx_Raise. Perhaps the goto should be emitted in the same line (in order the __LINE__ macro is used at the right line). Sorry is this is to much noise. Please ask me to stop with all this crap at any point. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- A non-text attachment was scrubbed... Name: cycompact Type: application/octet-stream Size: 954 bytes Desc: not available Url : http://codespeak.net/pipermail/cython-dev/attachments/20080414/0f1d1a8e/attachment.obj From stefan_ml at behnel.de Tue Apr 15 10:55:31 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 Apr 2008 10:55:31 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? Message-ID: <48046D83.2070806@behnel.de> Hi, one of the goals of Cython is to "compile Python code". I think we should be clearer here. I would opt for making Python 2.6 the target syntax and eventually write a separate/enhanced/whatever parser for Python 3.0 syntax and semantics (unicode/bytes literals, new keywords, etc.). This has several advantages, the most important one being code compatibilty. While it will be work to migrate from Py2 to Py3, it shouldn't affect Cython users and the existing Cython code. Also, I really like the fact that "test" is a plain byte string in Cython that can directly be converted to a C char*, depending on its use. This shouldn't change, even if Py3 dictates that this literal becomes a Unicode string. Cython positions itself between Python and C, and that's a place where the plain string literal semantics make perfect sense. Supporting the b"test" bytes syntax *in addition* is ok with me, as is the u"unicode" syntax, which Python2 and Cython currently use. I think it makes sense to be explicit about unicode objects in the context of Cython. Having a separate Cython frontend (cython3? or a command line option "-3"?) and a distutils Extension option for compiling Python3 code with Python3 semantics might be a way to deal with the syntax issue. But I would actually prefer a different source file extension (.cy3) or a special comment in the first code line, or something like that. The language level is an integral part of the source file, not so much of the build system. Even from __future__ import python3 might work. ;) Any comments on this? Stefan From dagss at student.matnat.uio.no Tue Apr 15 11:18:40 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 15 Apr 2008 11:18:40 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48046D83.2070806@behnel.de> References: <48046D83.2070806@behnel.de> Message-ID: <480472F0.9030305@student.matnat.uio.no> > Also, I really like the fact that "test" is a plain byte string in Cython that > can directly be converted to a C char*, depending on its use. This shouldn't > change, even if Py3 dictates that this literal becomes a Unicode string. > What exactly are the consequences here... if it is just about the runtime object used then I suppose it can be inferred from context? (I.e., coercion to char* deals with it...) Or does it mean that string literals converted to char* should be UTF-8 strings or something? What is the current behaviour for string literals anyway..probably that the encoding of the Cython source gets carried through to the strings in C source? > Having a separate Cython frontend (cython3? or a command line option "-3"?) > and a distutils Extension option for compiling Python3 code with Python3 > semantics might be a way to deal with the syntax issue. But I would actually > prefer a different source file extension (.cy3) or a special comment in the > first code line, or something like that. The language level is an integral > part of the source file, not so much of the build system. Even > +1 for comment in first lines; Python already has "encoding:". Something like "lang:". This way, one could in the future also have different Cython syntax levels if one needs to break Cython backwards compatability (though let's hope not), or to specify that extensions to Cython++ (or whatever) is used. Also it could be nice (though not at all a priority!) to have a mode that actually gave a syntax error on "cdef", if one is strictly compiling Python code because one wants to be Python compatible and not "slip up" into writing some Cython. Probably this can be triggered by .py extension. (This is not as much a suggestion, as it is just brainstorming in order to see what kind of syntax levels one could have in order to sort Stefan's question out.) One could even load Cython plugins (that modify Cython syntax) this way. Perhaps a free-form comma-seperated list of keywords like this that operate along different "axes" of language choices: #lang: cython, py2, martins-macro-goodies I.e. "cython" would enable any Cython-specific syntax like "cdef" and types, pyX would set the Python level support for the Python part of the syntax, and anything else could load a plugin with the same name if found. Loading the plugin from the source file itself makes sense because the syntax in the source file is inherently linked to which plugins can parse it; it's a way of saying that "I'll use "deftrans" in this file". Using #lang: py2 it would still be compileable in Cython, but Cython extensions to syntax like "cdef" etc. would be disallowed. If we need to break Cython backwards-compatability, one could do it by only enabling the changes if writing "cython2" instead. The #lang statement would override anything, but without it one would look at the file extension (pyx could mean "cython, py2" while py would mean "py2"). Also, there's the question of input and output. This has all been about input, however it seems like code generation is partially independent of this. Probably least confusing will be outputting 2.6 input to 2.3+, while outputting 3 in 3; that might be an assumption already? Dag Sverre From stefan_ml at behnel.de Tue Apr 15 12:02:06 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 Apr 2008 12:02:06 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <480472F0.9030305@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> Message-ID: <48047D1E.2050103@behnel.de> Hi, just answering the first part of your comments for now. Dag Sverre Seljebotn wrote: >> Also, I really like the fact that "test" is a plain byte string in Cython that >> can directly be converted to a C char*, depending on its use. This shouldn't >> change, even if Py3 dictates that this literal becomes a Unicode string. >> > What exactly are the consequences here... if it is just about the > runtime object used then I suppose it can be inferred from context? "In the face of ambiguity, refuse the temptation to guess." :) Somehow "inferring" the difference between str and unicode literals is the wrong thing to do. > (I.e., coercion to char* deals with it...) Or does it mean that string > literals converted to char* should be UTF-8 strings or something? You cannot automatically convert a unicode object to a char*, that's why I said that a byte string makes more sense in the Cython context. > What is the current behaviour for string literals anyway..probably that > the encoding of the Cython source gets carried through to the strings in > C source? Yes, they are passed through to the C compiler as they are - although that's not really what I'd call "well defined semantics". We can improve on this by supporting PEP 263. http://www.python.org/doc/2.3/whatsnew/section-encodings.html The current string literal semantics in Cython are: "text" is a literal byte sequence that translates directly to a Py2 str object or a C char*. u"text" is a unicode literal that is parsed as UTF-8 encoded byte sequence and converted into a Python unicode object (at runtime). Stefan From stefan_ml at behnel.de Tue Apr 15 12:13:44 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 Apr 2008 12:13:44 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48046D83.2070806@behnel.de> References: <48046D83.2070806@behnel.de> Message-ID: <48047FD8.3060701@behnel.de> Stefan Behnel wrote: > from __future__ import python3 or rather from __future__ import unicode_literals for the specific purpose of string semantics. Stefan From dagss at student.matnat.uio.no Tue Apr 15 14:34:05 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 15 Apr 2008 14:34:05 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48047D1E.2050103@behnel.de> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> Message-ID: <4804A0BD.8040702@student.matnat.uio.no> > "In the face of ambiguity, refuse the temptation to guess." :) > > Somehow "inferring" the difference between str and unicode literals is the > wrong thing to do. > I don't think I explained my question well enough; I'll try again. The thing is, this kind of inferring already happens; you can do cdef char c = "c" and the string literal "c" becomes a single character value, while you can do cdef char* s = "hello" and you get a C string literal (which is passed through straight from Cython source), while py_s = "hello" gives a Python object. Somehow the "natural" thing to do for Py3 is to continue allowing "direct" assignments to char* of the type above; but generate unicode objects on coercion to Python object. (Hmm. So the problem is that one can no longer auto-coerce from Python string objects to char*...) Hmm. This might come from a wrong understanding of the problem, but from my limited knowledge, it looks like the reason we get this problem is because the current Cython behaviour is wrong, even in a Python 2.6 context. Suggestion: - Support PEP 263 as you say. This is for *input* from Cython source *only*; the whole point is that whether you edit your source files on a UTF-8 or BIG-5 system shouldn't impact anything about runtime behaviour as long as you declare the encoding of the source file. - Have a seperate mechanism for specifying what encoding should be used for conversion to C buffers. One solution is command-line options; however this is also a candidate for a Cython language extensions, as the "right" answer really depends on what encoding the C library you are calling is using! (char* is basically "encoding-less" in itself). One might even hard-code it to ASCII or latin1 for now. - String literals to buffers (cdef char* s = "hello") are reencoded in Cython compilation to the right target encoding, so that if latin1 is specified for the C library in question I can get correct results editing the Cython source in UTF-8. In fact, for maximum portability of C source, one can use the literal if only ASCII is used, and otherwise generate stuff like char* s = {-20, 54, 50, 0} . If there's a mismatch between input and output encoding (I defined the C library I'm calling as ASCII but try to use my native "?????") then it's a compile-time error. - On coercions from Python strings (unicode or whatever) to char*, the same reencoding is used (call s.encode(ENCODING) or similar). This will raise the appropiate exceptions. It would be good to solve this anyway and I fail to see the connection with Python 3, and I definitely don't think that Cython behaviuor needs to be different between the two (even if everything is unicode in Python 3 there should be functionality somewhere in the library to generate byte data in other encodings?) Dag Sverre From dagss at student.matnat.uio.no Tue Apr 15 15:01:46 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 15 Apr 2008 15:01:46 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804A0BD.8040702@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> Message-ID: <4804A73A.5050703@student.matnat.uio.no> > - Have a seperate mechanism for specifying what encoding should be used > for conversion to C buffers. One solution is command-line options; > however this is also a candidate for a Cython language extensions, as > the "right" answer really depends on what encoding the C library you are > calling is using! (char* is basically "encoding-less" in itself). One > might even hard-code it to ASCII or latin1 for now. > I don't know much about this, but at least in the Linux world it looks like C libraries will usually use the encoding specified in the current locale (for instance, if you're on an UTF-8 system, like I am, then glibc fopen will expect UTF-8 character data). This can be exemplified by a Cython program printing a text using libc: # coding: utf-8 # declare libc printf... def usage(): printf("Usage: ?\n") Keep in mind that there are three environments: The system of the Cython developer (developer's local workstation), the system for compilation (might be a big build-farm on a system with a different encoding), and the runtime system (end-user workstation, might have a third encoding). - What will happen now: The character will be output on screen using the encoding of the developer who wrote the Cython program, no matter what the encoding is on the compilation system or runtime system. - What should happen: Whatever is an "?" should be output on the ta One solution here is to detect string literals that doesn't contain ASCII characters and always make them into Python unicode objects (explicitly using the encoding of the source file upon the construction of the unicode object), and call encode (or any Py3 equivalent) at runtime (on module load, for instance) to generate the required char* buffer for the target system (the build system is then kept out of the loop). Ie the above code will be generated to something like (very psuedo-code): char* sourcefileencoding = "utf-8"; char* strcnst1_bytesbuf = { 0x55, 0x73, 0x61, 0x67, 0x65, 0x3a, 0x20, 0xc3, 0x85, 0x0a } PyObject* strcnst1_pyobj = PyObjectNewUnicodeWhatever(strcnst1_bytesbuf, sourcefileencoding); // on module load... function __pyx_..._usage(PyObject* self, PyObject* args) { ... char* __pyx_1; EncodeToCurrentSystemLocale(strcnst1_pyobj, __pyx_1); fprintf(__pyx_1); } ...you get the idea. Note that the "?" ends up as to hex characters in utf-8. I suppose an alternative would be to standardize on utf-8 in C source files and use unicode espace sequences in strings for all non-ASCII, rather than the rather less readable hex sequence above. Dag Sverre From stefan_ml at behnel.de Tue Apr 15 15:03:59 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 Apr 2008 15:03:59 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804A0BD.8040702@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> Message-ID: <4804A7BF.80004@behnel.de> Hi, Dag Sverre Seljebotn wrote: >> "In the face of ambiguity, refuse the temptation to guess." :) >> >> Somehow "inferring" the difference between str and unicode literals is the >> wrong thing to do. >> > I don't think I explained my question well enough; I'll try again. > > The thing is, this kind of inferring already happens; you can do > > cdef char c = "c" Isn't this illegal? > Somehow the "natural" thing to do for Py3 is to > continue allowing "direct" assignments to char* of the type above; but > generate unicode objects on coercion to Python object. Assuming we have a well-defined source code encoding (i.e. PEP 263). > (Hmm. So the > problem is that one can no longer auto-coerce from Python string objects > to char*...) Right. > Hmm. This might come from a wrong understanding of the problem, but from > my limited knowledge, it looks like the reason we get this problem is > because the current Cython behaviour is wrong, even in a Python 2.6 > context. Suggestion: > > - Support PEP 263 as you say. This is for *input* from Cython source > *only*; the whole point is that whether you edit your source files on a > UTF-8 or BIG-5 system shouldn't impact anything about runtime behaviour > as long as you declare the encoding of the source file. That would be required by the implementation, yes. In practice, all that matters here are string literals, both bytes and unicode. > - Have a seperate mechanism for specifying what encoding should be used > for conversion to C buffers. I don't see a reason to go that route, given the existing PEP. > - String literals to buffers (cdef char* s = "hello") are reencoded in > Cython compilation to the right target encoding, so that if latin1 is > specified for the C library in question We do not know the target library at Cython compile time. > char* s = {-20, 54, 50, 0} or the respective "\xAB" escape sequences. But I would generally expect 8-bit values to pass cleanly - as long as they are correctly encoded *during* the code generation. > . If there's a mismatch between input and output encoding (I defined the > C library I'm calling as ASCII but try to use my native "?????") then > it's a compile-time error. Same as above, we don't know the C environment. I'm pretty sure the PEP is the right way to go. Stefan From stefan_ml at behnel.de Tue Apr 15 15:08:21 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 Apr 2008 15:08:21 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804A73A.5050703@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> <4804A73A.5050703@student.matnat.uio.no> Message-ID: <4804A8C5.7040500@behnel.de> Hi, Dag Sverre Seljebotn wrote: > # coding: utf-8 > # declare libc printf... > def usage(): > printf("Usage: ?\n") > > Keep in mind that there are three environments: The system of the Cython > developer (developer's local workstation), the system for compilation > (might be a big build-farm on a system with a different encoding), and > the runtime system (end-user workstation, might have a third encoding). > > - What will happen now: The character will be output on screen using the > encoding of the developer who wrote the Cython program, no matter what > the encoding is on the compilation system or runtime system. > - What should happen: Whatever is an "?" should be output on the ta No. We are talking about byte strings here, not unicode strings. I don't want this to print anything but what the user's locale gives you. If, on the other hand, you write def usage(): printf(u"Usage: ?\n") under Py2, or def usage(): printf("Usage: ?\n") under Py3 semantics, you should get a compiler error that you can't convert a unicode string to a char*. Stefan From dagss at student.matnat.uio.no Tue Apr 15 15:08:25 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 15 Apr 2008 15:08:25 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804A7BF.80004@behnel.de> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> <4804A7BF.80004@behnel.de> Message-ID: <4804A8C9.6050205@student.matnat.uio.no> >> - Have a seperate mechanism for specifying what encoding should be used >> for conversion to C buffers. >> > > I don't see a reason to go that route, given the existing PEP. > I argued: The PEP is about **input** source code. It declares the encoding of your source file, which likely depends on the preferred programming environment of the Cython coder, and has absolutely nothing to do with the encoding of the runtime C library, which is likely on a different system with a potentially different encoding. The moment the compiled behaviour of your code depends on the encoding of the source file, you have big problems, and the exact reason for the PEP was to *avoid* the behaviour you seem to want. Dag Sverre From stefan_ml at behnel.de Tue Apr 15 15:38:07 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 Apr 2008 15:38:07 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804A8C9.6050205@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> <4804A7BF.80004@behnel.de> <4804A8C9.6050205@student.matnat.uio.no> Message-ID: <4804AFBF.40608@behnel.de> Dag Sverre Seljebotn wrote: >>> - Have a seperate mechanism for specifying what encoding should be used >>> for conversion to C buffers. >>> >> I don't see a reason to go that route, given the existing PEP. >> > I argued: The PEP is about **input** source code. It declares the > encoding of your source file, which likely depends on the preferred > programming environment of the Cython coder, and has absolutely nothing > to do with the encoding of the runtime C library, which is likely on a > different system with a potentially different encoding. > > The moment the compiled behaviour of your code depends on the encoding > of the source file, you have big problems, and the exact reason for the > PEP was to *avoid* the behaviour you seem to want. I want to have distinct behaviour between byte sequences and unicode character sequences. If you use a byte (string) literal in your code, Cython must not alter it (except for PEP 263 input encoding) and must support any conversion from and to a char*. This works fine with current Cython as long as you use the same input encoding for Cython code and C code. If you use a unicode literal in your code, Cython must take care that it gets correctly converted from source code bytes to a unicode character sequence (PEP 263), which then behaves the same on all systems. Cython must raise a compiler error if you try to convert it to a char*. Both things work just fine with current Cython as long as your code is UTF-8 encoded. I don't see why anything beyond PEP 263 is needed here. Stefan From dagss at student.matnat.uio.no Tue Apr 15 16:30:47 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 15 Apr 2008 16:30:47 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804AFBF.40608@behnel.de> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> <4804A7BF.80004@behnel.de> <4804A8C9.6050205@student.matnat.uio.no> <4804AFBF.40608@behnel.de> Message-ID: <4804BC17.6090806@student.matnat.uio.no> Yes, cdef char c = "a" works, but it is rather new ("Allow single-character ascii strings to be treated as c character literals", Robert, Feb 28). > I want to have distinct behaviour between byte sequences and unicode character > sequences. > > If you use a byte (string) literal in your code, Cython must not alter it > (except for PEP 263 input encoding) and must support any conversion from and > to a char*. This works fine with current Cython as long as you use the same > input encoding for Cython code and C code. > Yes, you are absolutely right when it comes to Python 2, and Python 3 *does* come into it. Sorry. (My experiments indicate that with a non-unicode string, no PEP 263 conversion happens. What character set would there be to convert to?) Still I think I disagree about this though: == Also, I really like the fact that "test" is a plain byte string in Cython that can directly be converted to a C char*, depending on its use. This shouldn't change, even if Py3 dictates that this literal becomes a Unicode string. == Because in my mind this change in Python 3 changes what I consider a real deficiency in Python 2, which is that the source input encoding matter. There's a strong tendency already to let Python semantics play a strong role, and in this area Python 3 is a real improvement over how C and Python 2 handles things. (At least keep compatability with Python 3 when compiling a pure Python 3 file -- what happens with C interfacing is less important, and I suppose you could do both.) Most recent C libraries will happily pass through char* buffers in the current runtime encoding as strings, and if one is crazy enough to write Python code like: # note: Python 3 code against libc in Cython handle = libc.stdlib.fopen("F?dsels?r.txt", "r") ...then having automatic, runtime platform default dependant conversion to char* will make this work on different systems. It will however break on different systems with your suggestion. One can always use the "b" literal if your wanted behaviour is wanted. (When the Python community didn't make Python 3 source backwards compatible with Python 2 then I don't think we can make a better job of it..) (One could also parametrize the char type for C libraries that didn't use the platform default, ie something like ... external import header etc. cdef foo(char("iso-9959-1")* s) but I think I can see the Cython community recoiling in collective disgust already :-) Perhaps another word than "char"...). === As for using a C library on different encodings, consider the following example on my UTF-8 machine: $ touch ?? $ ./checkfile ?? C3 A5 C3 A5 -> fopen: 6295568 Contents of checkfile.c: int main(int argc, char* argv[]) { char* ch; for (ch = argv[1]; *ch != 0; ++ch) { printf("%hhX ", *ch); } printf(" -> fopen: %ld\n", (long)fopen(argv[1], "r")); } -- Dag Sverre From stefan_ml at behnel.de Tue Apr 15 16:41:58 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 Apr 2008 16:41:58 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804BC17.6090806@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> <4804A7BF.80004@behnel.de> <4804A8C9.6050205@student.matnat.uio.no> <4804AFBF.40608@behnel.de> <4804BC17.6090806@student.matnat.uio.no> Message-ID: <4804BEB6.3060005@behnel.de> Hi, Dag Sverre Seljebotn wrote: > Still I think I disagree about this though: > > == > Also, I really like the fact that "test" is a plain byte string in Cython that > can directly be converted to a C char*, depending on its use. This shouldn't > change, even if Py3 dictates that this literal becomes a Unicode string. including PEP 263 input conversion, obviously. > == > > Because in my mind this change in Python 3 changes what I consider a > real deficiency in Python 2, which is that the source input encoding > matter. Well, it does matter in both Py2 and Py3. See PEP 263. > Most recent C libraries will happily pass through char* buffers in the > current runtime encoding as strings, and if one is crazy enough to write > Python code like: > > # note: Python 3 code against libc in Cython > handle = libc.stdlib.fopen("F?dsels?r.txt", "r") This is an entirely independent matter as it depends on the *file system encoding*, not the locale. I hope you do not want Cython to do this kind of magic for you. > ...then having automatic, runtime platform default dependant conversion > to char* will make this work on different systems. I would prefer the phrasing: "break" on different systems, in different ways. > As for using a C library on different encodings, consider the following > example on my UTF-8 machine: > > $ touch ?? > $ ./checkfile ?? > C3 A5 C3 A5 -> fopen: 6295568 > > Contents of checkfile.c: > > int main(int argc, char* argv[]) { > char* ch; > for (ch = argv[1]; *ch != 0; ++ch) { > printf("%hhX ", *ch); > } > printf(" -> fopen: %ld\n", (long)fopen(argv[1], "r")); > } so you have a UTF-8 filesystem and a UTF-8 console, just as I do. Have you tried this on a latin1 filesystem, or a latin1 filesystem respectively? Stefan From stefan_ml at behnel.de Tue Apr 15 16:42:08 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 Apr 2008 16:42:08 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804BC17.6090806@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> <4804A7BF.80004@behnel.de> <4804A8C9.6050205@student.matnat.uio.no> <4804AFBF.40608@behnel.de> <4804BC17.6090806@student.matnat.uio.no> Message-ID: <4804BEC0.8060700@behnel.de> Hi, Dag Sverre Seljebotn wrote: > (My experiments indicate that with a non-unicode string, no PEP 263 > conversion happens. What character set would there be to convert to?) Have you actually *read* the PEP? Stefan From dalcinl at gmail.com Tue Apr 15 16:54:58 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 15 Apr 2008 11:54:58 -0300 Subject: [Cython] followup: gcc warning about printf 'z' modifier Message-ID: Dear All, I've just see this comment in Py2.5 'pyport.h' * These "high level" Python format functions interpret "z" correctly on * all platforms (Python interprets the format string itself, and does whatever * the platform C requires to convert a size_t/Py_ssize_t argument): * * PyString_FromFormat * PyErr_Format * PyString_FromFormatV So using the 'z' modifier should work as expected with the above functions. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From dagss at student.matnat.uio.no Tue Apr 15 17:47:23 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 15 Apr 2008 17:47:23 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4804BEC0.8060700@behnel.de> References: <48046D83.2070806@behnel.de> <480472F0.9030305@student.matnat.uio.no> <48047D1E.2050103@behnel.de> <4804A0BD.8040702@student.matnat.uio.no> <4804A7BF.80004@behnel.de> <4804A8C9.6050205@student.matnat.uio.no> <4804AFBF.40608@behnel.de> <4804BC17.6090806@student.matnat.uio.no> <4804BEC0.8060700@behnel.de> Message-ID: <4804CE0B.9070908@student.matnat.uio.no> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/cython-dev/attachments/20080415/669d0fe3/attachment.htm From languitar at semipol.de Tue Apr 15 19:05:44 2008 From: languitar at semipol.de (Johannes Wienke) Date: Tue, 15 Apr 2008 19:05:44 +0200 Subject: [Cython] Question about providing functions to existing C code Message-ID: <4804E068.9040109@semipol.de> Hi, I hope this is the right way to ask questions about cython. I am currently wrappin an existing C plugin API into a Python project. The way from my plugin loader into the plugins is not the problem, but the original software provides a bunch of function that the plugins can use. Is there a way to provide these functions with cython, that means reimplementing them with cython but generating them as defined by the existing header files? Thanks Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://codespeak.net/pipermail/cython-dev/attachments/20080415/43c2168f/attachment.pgp From wstein at gmail.com Tue Apr 15 20:58:59 2008 From: wstein at gmail.com (William Stein) Date: Tue, 15 Apr 2008 11:58:59 -0700 Subject: [Cython] codespeak.net Message-ID: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> Hi Cython-devel, Since our mailing list is hosted at codespeak.net, can a link be added here to Cython: http://codespeak.net/ or at least to the list archive? -- William Stein Associate Professor of Mathematics University of Washington http://wstein.org From robertwb at math.washington.edu Wed Apr 16 02:48:47 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 15 Apr 2008 17:48:47 -0700 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48046D83.2070806@behnel.de> References: <48046D83.2070806@behnel.de> Message-ID: <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> On Apr 15, 2008, at 1:55 AM, Stefan Behnel wrote: > Hi, > > one of the goals of Cython is to "compile Python code". I think we > should be > clearer here. I would opt for making Python 2.6 the target syntax and > eventually write a separate/enhanced/whatever parser for Python 3.0 > syntax and > semantics (unicode/bytes literals, new keywords, etc.). > > This has several advantages, the most important one being code > compatibilty. > While it will be work to migrate from Py2 to Py3, it shouldn't > affect Cython > users and the existing Cython code. I would say the target language syntax of Python is certainly 2.6 in the near future. Python 3.0 hasn't even been finalized yet, and even when it is it will be quite a while (I anticipate) before the majority of projects migrate over. Hopefully Fabrizio's GSoC project gets approved and supporting another syntax will be as easy as reading in another grammar file. On the other end of things, I would really like to output .c files that can be compiled and linked into either 2.x or 3.x extensions without having to re-run Cython (modulo, perhaps, new builtins). > Also, I really like the fact that "test" is a plain byte string in > Cython that > can directly be converted to a C char*, depending on its use. This > shouldn't > change, even if Py3 dictates that this literal becomes a Unicode > string. > Cython positions itself between Python and C, and that's a place > where the > plain string literal semantics make perfect sense. Supporting the > b"test" > bytes syntax *in addition* is ok with me, as is the u"unicode" > syntax, which > Python2 and Cython currently use. I think it makes sense to be > explicit about > unicode objects in the context of Cython. Using PEP 263 to determine the encoding of string literals seems the right thing to do. I don't want to loose the ability to do cdef char* s = "test" (stored as an ASCII string), nor do I want to make the behavior dependent on the runtime system. Treating "xxx" as a char* if it is pure ASCII, and as a unicode object otherwise, seems like the obvious things to do. What hasn't been resolved is conversions cdef object o = s # s is a char* If s is not pure ASCII, should a runtime error be raised, or should an encoding be chosen (at compile time?) Could one specify an encoding, or do any decoding manually via a bytes object? Should it be a unicode or a str? Should that depend on whether or not it's compiled with 3k syntax, or linked against 3k to create the .so file? cdef char* s = o # o is a python unicode object (or, equivalently, the result of str(o)) Should this raise a compile time error? (That would break a lot of code...including really nice code like declaring a function argument to be char*) A runtime error if o is not pure ASCII? Or what encoding should be used? Currently it gets a pointer to the data, which is very convenient, but wouldn't work for a unicode object. Perhaps we should just choose an internal Cython encoding (preferably UTF-8, so ASCII strings are handled normally and everything is terminated with \0 as expected). Conversion to and from char* and unicode would always be via utf-8. One could manually create a bytes object to use other conversions, but most of the time this probably wouldn't even be needed. The user experience from Python would not be impacted, and if one is interfacing with external C libraries using non-ASCII char* then one would probably be forced to think about things explicitly anyways. Whatever happens, I think o == o and s == s are important. > Having a separate Cython frontend (cython3? or a command line > option "-3"?) > and a distutils Extension option for compiling Python3 code with > Python3 > semantics might be a way to deal with the syntax issue. But I would > actually > prefer a different source file extension (.cy3) or a special > comment in the > first code line, or something like that. The language level is an > integral > part of the source file, not so much of the build system. Even > > from __future__ import python3 > > might work. ;) > > Any comments on this? I like Dag's "lang: ..." proposal, though I'm hesitant on the idea of "plugins" (in the sense that one would have to look at the contents of the files to determine dependancies, and I don't want it to fracture into multiple dialects depending on the exact set of lang parameters specified). I think the default language should be determined by the runtime environment of the compiler, i.e. (which can always be overridden, ether globally or file-by-file, but probably won't need to be most of the time). - Robert From stefan_ml at behnel.de Wed Apr 16 13:25:22 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 Apr 2008 13:25:22 +0200 (CEST) Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> Message-ID: <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Robert Bradshaw wrote: > On Apr 15, 2008, at 1:55 AM, Stefan Behnel wrote: > Hopefully Fabrizio's GSoC project gets approved and supporting > another syntax will be as easy as reading in another grammar file. That would be cool, yes. > On > the other end of things, I would really like to output .c files that > can be compiled and linked into either 2.x or 3.x extensions without > having to re-run Cython (modulo, perhaps, new builtins). Even builtins that are known to be a builtin in *some* but not all versions of Python could be supported with some module load time checking code. If you use them in your code, you won't be able to load the module into the interpreter if the builtin is not available in the running version. That's just like Python handles it. > Using PEP 263 to determine the encoding of string literals seems the > right thing to do. I don't want to loose the ability to do cdef char* > s = "test" (stored as an ASCII string) although the exact byte sequence in the C file would depend on the source encoding of the Cython file. > Treating "xxx" as a char* > if it is pure ASCII, and as a unicode object otherwise, seems like > the obvious things to do. That's what I meant with "too much magic". Cython shouldn't distinguish between the two based on the *content*. The distinction should be explicit in the source and Cython should raise an error if it doesn't work out. Above all, this means: no automatic recoding behind the scenes. That's the main reason why Py3 has a well defined "bytes" type and a Unicode "str" type instead of a Unicode "unicode" type and an underdefined "str" type in Py2. > What hasn't been resolved is conversions > > cdef object o = s # s is a char* Sure, the semantics are clear: char* is a byte sequence in C, so the result is the equivalent of a byte sequence in Python: a byte string, i.e. a str object in Python2 and a bytes object in Py3. If you want a unicode string, use cdef object o = (s).decode('UTF-8') or whatever, maybe even the C-API Unicode decoding functions. But make sure the encoding you use is explicit. > cdef char* s = o # o is a python unicode object (or, > equivalently, the result of str(o)) That's not equivalent in Python 2, but it is in Py3. > Should this raise a compile time error? If the compiler knows that o *really* is of type "unicode", it can raise an error here. Otherwise, you'd get a runtime error from Python's string conversion functions. > (That would break a lot of > code...including really nice code like declaring a function argument > to be char*) That would still accept any kind of byte string or a bytes object in Py3, which is just fine IMHO. > Whatever happens, I think o == o and s > == s are important. This will continue to work as we are dealing with plain byte strings here. > I like Dag's "lang: ..." proposal. [...] > I think the default language should be > determined by the runtime environment of the compiler, i.e. (which > can always be overridden, ether globally or file-by-file, but > probably won't need to be most of the time). I actually prefer having it in the source file. Nothing keeps you from writing one source file in Py2 and another in Py3 and combining them into one module. :) Stefan From stefan_ml at behnel.de Wed Apr 16 18:47:04 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 Apr 2008 18:47:04 +0200 (CEST) Subject: [Cython] codespeak.net In-Reply-To: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> References: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> Message-ID: <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> William Stein wrote: > Since our mailing list is hosted at codespeak.net, can > a link be added here to Cython: > http://codespeak.net/ > or at least to the list archive? I asked the admins and the answer was "no, because that page only links to projects that are hosted on codespeak.net". Is there a specific reason why you would want a link from that page? Stefan From wstein at gmail.com Wed Apr 16 18:49:34 2008 From: wstein at gmail.com (William Stein) Date: Wed, 16 Apr 2008 09:49:34 -0700 Subject: [Cython] codespeak.net In-Reply-To: <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <85e81ba30804160949r2c6527calbfa18c1095b6673f@mail.gmail.com> On Wed, Apr 16, 2008 at 9:47 AM, Stefan Behnel wrote: > William Stein wrote: > > Since our mailing list is hosted at codespeak.net, can > > a link be added here to Cython: > > http://codespeak.net/ > > or at least to the list archive? > > I asked the admins and the answer was "no, because that page only links to > projects that are hosted on codespeak.net". > > Is there a specific reason why you would want a link from that page? I don't know. What the heck *is* codespeak anyways? I've never heard of it until the Cython mailing list was moved there. -- William From michael.abshoff at googlemail.com Wed Apr 16 18:24:50 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Wed, 16 Apr 2008 18:24:50 +0200 Subject: [Cython] codespeak.net In-Reply-To: <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <48062852.4000304@gmail.com> Stefan Behnel wrote: > William Stein wrote: >> Since our mailing list is hosted at codespeak.net, can >> a link be added here to Cython: >> http://codespeak.net/ >> or at least to the list archive? > > I asked the admins and the answer was "no, because that page only links to > projects that are hosted on codespeak.net". Well, just looking at the other projects hosted at codespeak lets on extrapolate where the Cython mailing list archive is, so it might be hidden but trivially to find. And that policy is just plain silly because I often like to search mailing list archives and having them indexed by Google would also help other users. > Is there a specific reason why you would want a link from that page? > > Stefan > Cheers, Michael > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From stefan_ml at behnel.de Wed Apr 16 19:01:21 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 Apr 2008 19:01:21 +0200 (CEST) Subject: [Cython] codespeak.net In-Reply-To: <85e81ba30804160949r2c6527calbfa18c1095b6673f@mail.gmail.com> References: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <85e81ba30804160949r2c6527calbfa18c1095b6673f@mail.gmail.com> Message-ID: <46256.194.114.62.34.1208365281.squirrel@groupware.dvs.informatik.tu-darmstadt.de> William Stein wrote: > What the heck *is* codespeak anyways? I've never heard > of it until the Cython mailing list was moved there. It's mainly a Python project site hosted by a couple of Zope/Plone related people in Germany. It also hosts PyPy, amongst other things. Stefan From wstein at gmail.com Wed Apr 16 19:04:16 2008 From: wstein at gmail.com (William Stein) Date: Wed, 16 Apr 2008 10:04:16 -0700 Subject: [Cython] codespeak.net In-Reply-To: <46256.194.114.62.34.1208365281.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <85e81ba30804160949r2c6527calbfa18c1095b6673f@mail.gmail.com> <46256.194.114.62.34.1208365281.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <85e81ba30804161004y2176e434h3d87ad1401225ceb@mail.gmail.com> On Wed, Apr 16, 2008 at 10:01 AM, Stefan Behnel wrote: > William Stein wrote: > > What the heck *is* codespeak anyways? I've never heard > > of it until the Cython mailing list was moved there. > > It's mainly a Python project site hosted by a couple of Zope/Plone related > people in Germany. It also hosts PyPy, amongst other things. > Any chance that they would be interested in hosting a mirror of the Cython webpage? William From stefan_ml at behnel.de Wed Apr 16 19:47:11 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 Apr 2008 19:47:11 +0200 Subject: [Cython] codespeak.net In-Reply-To: <48062852.4000304@gmail.com> References: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48062852.4000304@gmail.com> Message-ID: <48063B9F.9040107@behnel.de> Michael.Abshoff wrote: > Stefan Behnel wrote: >> William Stein wrote: >>> Since our mailing list is hosted at codespeak.net, can >>> a link be added here to Cython: >>> http://codespeak.net/ >>> or at least to the list archive? >> I asked the admins and the answer was "no, because that page only links to >> projects that are hosted on codespeak.net". > > Well, just looking at the other projects hosted at codespeak lets on > extrapolate where the Cython mailing list archive is There is a link from both the ML subscription page (see the end of this e-mail) and the Wiki. http://blog.gmane.org/gmane.comp.python.cython.devel It would be good to have that link on *our* home page. Stefan From robertwb at math.washington.edu Wed Apr 16 20:49:14 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 16 Apr 2008 11:49:14 -0700 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: On Apr 16, 2008, at 4:25 AM, Stefan Behnel wrote: > Robert Bradshaw wrote: >> On >> the other end of things, I would really like to output .c files that >> can be compiled and linked into either 2.x or 3.x extensions without >> having to re-run Cython (modulo, perhaps, new builtins). > > Even builtins that are known to be a builtin in *some* but not all > versions of Python could be supported with some module load time > checking > code. If you use them in your code, you won't be able to load the > module > into the interpreter if the builtin is not available in the running > version. That's just like Python handles it. Good idea. Actually, with our cached builtins, this might already happens (i.e. at load time it does a lookup on all the builtin names it uses). >> Using PEP 263 to determine the encoding of string literals seems the >> right thing to do. I don't want to loose the ability to do cdef char* >> s = "test" (stored as an ASCII string) > > although the exact byte sequence in the C file would depend on the > source > encoding of the Cython file. I think our C files should always be pure ascii. >> Treating "xxx" as a char* >> if it is pure ASCII, and as a unicode object otherwise, seems like >> the obvious things to do. > > That's what I meant with "too much magic". Cython shouldn't > distinguish > between the two based on the *content*. The distinction should be > explicit > in the source and Cython should raise an error if it doesn't work out. > Above all, this means: no automatic recoding behind the scenes. In light of my proposal to use UTF-8 everywhere, this could actually be turned into a char*. > That's the main reason why Py3 has a well defined "bytes" type and a > Unicode "str" type instead of a Unicode "unicode" type and an > underdefined > "str" type in Py2. > >> What hasn't been resolved is conversions >> >> cdef object o = s # s is a char* > > Sure, the semantics are clear: char* is a byte sequence in C, so the > result is the equivalent of a byte sequence in Python: a byte > string, i.e. > a str object in Python2 and a bytes object in Py3. I understand this distinction. Technically a char* is a byte string. The problem is that people are going to want to implicitly handle unicode <-> char* much more often. > If you want a unicode string, use > > cdef object o = (s).decode('UTF-8') > > or whatever, maybe even the C-API Unicode decoding functions. But make > sure the encoding you use is explicit. > > >> cdef char* s = o # o is a python unicode object (or, >> equivalently, the result of str(o)) > > That's not equivalent in Python 2, but it is in Py3. > > >> Should this raise a compile time error? > > If the compiler knows that o *really* is of type "unicode", it can > raise > an error here. Otherwise, you'd get a runtime error from Python's > string > conversion functions. > > >> (That would break a lot of >> code...including really nice code like declaring a function argument >> to be char*) > > That would still accept any kind of byte string or a bytes object > in Py3, > which is just fine IMHO. I think this significantly impacts usability. For example, if I have a function def foo(char* x): ... then users of my module won't be able to write foo("eggs") anymore, they will have to write foo(b"eggs") or even foo(x.encode('UTF-8')) if x is given to them from elsewhere. I don't think the user wants to bother with that. Likewise, if I have def foo(): cdef char* s ... return s Then the user won't be able to write print "The answer is %s" % foo() or foo() + "eggs" You could say, well, do the conversion manually in the Cython file. But one of the huge benifits of Cython is that it handles C <-> Python conversions naturally for you. char* might technically be a bytes object, but conceptually it's equivalent to the default Python string type (which happens to be unicode in Python 3000). What is the disadvantage of simply using UTF-8 as the default encoding for conversion to and from char* objects? (I am assuming bytes(s) will be taken care of directly rather than attempting to encode s (assumed to be a char*) into a unicode first). >> Whatever happens, I think o == o and s >> == s are important. > > This will continue to work as we are dealing with plain byte > strings here. > > >> I like Dag's "lang: ..." proposal. [...] >> I think the default language should be >> determined by the runtime environment of the compiler, i.e. (which >> can always be overridden, ether globally or file-by-file, but >> probably won't need to be most of the time). > > I actually prefer having it in the source file. Nothing keeps you from > writing one source file in Py2 and another in Py3 and combining > them into > one module. :) Yes, this should always be an option. But having it default to the target language of the compile-time environment lets the compiler transition when the user does. - Robert From robertwb at math.washington.edu Wed Apr 16 20:56:24 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 16 Apr 2008 11:56:24 -0700 Subject: [Cython] codespeak.net In-Reply-To: <48063B9F.9040107@behnel.de> References: <85e81ba30804151158q574de0c0m562590d8672980bb@mail.gmail.com> <30819.194.114.62.34.1208364424.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48062852.4000304@gmail.com> <48063B9F.9040107@behnel.de> Message-ID: <784317F9-845C-4AB8-B3F2-EE29B79E3338@math.washington.edu> On Apr 16, 2008, at 10:47 AM, Stefan Behnel wrote: > > Michael.Abshoff wrote: >> Stefan Behnel wrote: >>> William Stein wrote: >>>> Since our mailing list is hosted at codespeak.net, can >>>> a link be added here to Cython: >>>> http://codespeak.net/ >>>> or at least to the list archive? >>> I asked the admins and the answer was "no, because that page only >>> links to >>> projects that are hosted on codespeak.net". >> >> Well, just looking at the other projects hosted at codespeak lets on >> extrapolate where the Cython mailing list archive is > > There is a link from both the ML subscription page (see the end of > this > e-mail) and the Wiki. > > http://blog.gmane.org/gmane.comp.python.cython.devel > > It would be good to have that link on *our* home page. Done. From dagss at student.matnat.uio.no Wed Apr 16 22:17:08 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 16 Apr 2008 22:17:08 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <48065EC4.4000900@student.matnat.uio.no> > I think this significantly impacts usability. For example, if I have > a function > > def foo(char* x): > ... > For this specific example, one could hypothetically do something like def foo(utf8charbuf x): ... (or more generally encodedcharbuf("utf-8"), but one could typedef). The caller shouldn't notice. I.e., such a new type would then automatically coerce unicode <-> char* using the encoding indicated by the type. Behaves exactly like char* in every situation, except that it can be assigned to/from a Python unicode object and knows what to do (compile-time). One could even be able to define external functions using such a type, and it would be understood that the external function took a char buffer. I believe most usecases could be made practical and easy in this manner; while keeping what goes on very explicit. But I see that a) being able to use the name "char*" will be much more friendly to C users, and b) it doesn't help with backwards compatability. (The class above could potentially be implemented in a pxd using some of the same (hypothetical/planned) features as in my NumPy project proposal. Though native Cython support wouldn't hurt either for such a feature.) -- Dag Sverre From robertwb at math.washington.edu Wed Apr 16 22:57:31 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 16 Apr 2008 13:57:31 -0700 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48065EC4.4000900@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48065EC4.4000900@student.matnat.uio.no> Message-ID: <76C9E2DD-DFCE-4323-AEDD-0B5BDC971E51@math.washington.edu> On Apr 16, 2008, at 1:17 PM, Dag Sverre Seljebotn wrote: >> I think this significantly impacts usability. For example, if I have >> a function >> >> def foo(char* x): >> ... >> > For this specific example, one could hypothetically do something like > > def foo(utf8charbuf x): > ... > > (or more generally encodedcharbuf("utf-8"), but one could typedef). > The > caller shouldn't notice. > > I.e., such a new type would then automatically coerce unicode <-> > char* > using the encoding indicated by the type. Behaves exactly like > char* in > every situation, except that it can be assigned to/from a Python > unicode > object and knows what to do (compile-time). One could even be able to > define external functions using such a type, and it would be > understood > that the external function took a char buffer. > > I believe most usecases could be made practical and easy in this > manner; > while keeping what goes on very explicit. But I see that a) being able > to use the name "char*" will be much more friendly to C users, and > b) it > doesn't help with backwards compatability. I think both (a) and (b) are non-negligible issues, especially in the context of wrapping existing C libraries. Having to learn a new type like utf8charbuf, (which it masks the pointer nature of it as well, is its memory managed?) isn't desirable, especially if one is casting everywhere back between any object and char*. It also creates the expectation that all different kinds of encodings need to be supported with their own special type, and I don't think we want anything as heavy as a class. The more basic question is if we can transparently support unicode in char*, why not? Even for non-English speakers, the majority of strings being passed around will be ASCII. > (The class above could potentially be implemented in a pxd using > some of > the same (hypothetical/planned) features as in my NumPy project > proposal. Though native Cython support wouldn't hurt either for such a > feature.) > > -- > Dag Sverre > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From dagss at student.matnat.uio.no Wed Apr 16 23:25:36 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 16 Apr 2008 23:25:36 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <76C9E2DD-DFCE-4323-AEDD-0B5BDC971E51@math.washington.edu> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48065EC4.4000900@student.matnat.uio.no> <76C9E2DD-DFCE-4323-AEDD-0B5BDC971E51@math.washington.edu> Message-ID: <48066ED0.9060806@student.matnat.uio.no> > The more basic question is if we can transparently support unicode in > char*, why not? Even for non-English speakers, the majority of > strings being passed around will be ASCII. > Always defaulting to UTF-8 for this could be confusing in some contexts. For instance, if one has a Cython source file in latin1, and calls a spelling correction library that works exclusively in latin1 (I've worked with such a library once...), and in general don't touch UTF-8 anywhere, it might seem confusing that UTF-8 is passed to the library. All in all it seems to be the lesser of evils though. (In particular I like defaulting to UTF-8 a lot better than having the encoding of the Cython source matter, which is where Stefan would disagree if I understand correctly.) > I think both (a) and (b) are non-negligible issues, especially in the > context of wrapping existing C libraries. Having to learn a new type > like utf8charbuf, (which it masks the pointer nature of it as well, > is its memory managed?) isn't desirable, especially if one is casting > everywhere back between any object and char*. It also creates the > expectation that all different kinds of encodings need to be > supported with their own special type, and I don't think we want > anything as heavy as a class. > OK, I've polished it to deal with some of these. Your main points are still valid though so I'll consider it dismissed... It wouldn't be beyond the Cython compiler to do something like cdef uchar("utf-8")* buf = "my ????" Which would directly be translated to cdef char* buf = "my \some\escape\sequence" and have cdef uchar("utf-8")* buf = pyobj become cdef char* buf = unicode(pyobj).encode("utf-8") It wouldn't be complicated to support many encodings, they would just be passed on to CPython. No heavy class involved. -- Dag Sverre From dagss at student.matnat.uio.no Wed Apr 16 23:30:37 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 16 Apr 2008 23:30:37 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48066ED0.9060806@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48065EC4.4000900@student.matnat.uio.no> <76C9E2DD-DFCE-4323-AEDD-0B5BDC971E51@math.washington.edu> <48066ED0.9060806@student.matnat.uio.no> Message-ID: <48066FFD.7090704@student.matnat.uio.no> > and have > > cdef uchar("utf-8")* buf = pyobj > > become > > cdef char* buf = unicode(pyobj).encode("utf-8") > This is wrong. More like: cdef char* buf if not isinstance(pyobj, unicode): raise TypeError... tmp_buf = pyobj.encode("utf-8") buf = tmp_buf -- Dag Sverre From robertwb at math.washington.edu Thu Apr 17 01:11:27 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 16 Apr 2008 16:11:27 -0700 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48066ED0.9060806@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48065EC4.4000900@student.matnat.uio.no> <76C9E2DD-DFCE-4323-AEDD-0B5BDC971E51@math.washington.edu> <48066ED0.9060806@student.matnat.uio.no> Message-ID: On Apr 16, 2008, at 2:25 PM, Dag Sverre Seljebotn wrote: > >> The more basic question is if we can transparently support unicode in >> char*, why not? Even for non-English speakers, the majority of >> strings being passed around will be ASCII. >> > Always defaulting to UTF-8 for this could be confusing in some > contexts. > For instance, if one has a Cython source file in latin1, and calls a > spelling correction library that works exclusively in latin1 (I've > worked with such a library once...), and in general don't touch UTF-8 > anywhere, it might seem confusing that UTF-8 is passed to the library. True. If you're using an external library that takes non-ASCII strings and you don't bother to think about encoding, you should be surprised if things just work. My goals are to make it natural to go object -> char* -> object for unicode objects, and object -> char* -> c library for ASCII unicode objects. When string literals become unicode objects, people are going to have unicode objects floating around everywhere, not bytes objects. > All in all it seems to be the lesser of evils though. (In particular I > like defaulting to UTF-8 a lot better than having the encoding of the > Cython source matter, which is where Stefan would disagree if I > understand correctly.) Having the source files be transferable from computer to computer is a big plus, and UTF-8 plays nice with ASCII and most standard c string processing functions. At least we don't have people clamoring for EDBIC support :). >> I think both (a) and (b) are non-negligible issues, especially in the >> context of wrapping existing C libraries. Having to learn a new type >> like utf8charbuf, (which it masks the pointer nature of it as well, >> is its memory managed?) isn't desirable, especially if one is casting >> everywhere back between any object and char*. It also creates the >> expectation that all different kinds of encodings need to be >> supported with their own special type, and I don't think we want >> anything as heavy as a class. >> > OK, I've polished it to deal with some of these. Your main points are > still valid though so I'll consider it dismissed... > > It wouldn't be beyond the Cython compiler to do something like > > cdef uchar("utf-8")* buf = "my ????" > > Which would directly be translated to > > cdef char* buf = "my \some\escape\sequence" > > and have > > cdef uchar("utf-8")* buf = pyobj > > become > > cdef char* buf = unicode(pyobj).encode("utf-8") ^^^ I always want to support people being able to do this if the need to be explicit. The magic of uchar("gb5)* getting translated to the above, it would complicate the type system (both in terms of codebase and user's perspective). Would conversion be performed assigning from a uchar ("gb5") to a uchar("utf-8"), or to a uchar*? If we decide to support such automatic conversions, this seems like the best syntax I've seen, but still think the default should be accept unicode objects (via utf-8). > It wouldn't be complicated to support many encodings, they would > just be > passed on to CPython. No heavy class involved. OK, I wasn't sure if your utf8charbuf was a class or not. - Robert From stefan_ml at behnel.de Thu Apr 17 07:14:38 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 17 Apr 2008 07:14:38 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <4806DCBE.2080400@behnel.de> Hi, Robert Bradshaw wrote: > On Apr 16, 2008, at 4:25 AM, Stefan Behnel wrote: > I think our C files should always be pure ascii. You mean with C string escapes? >> That's the main reason why Py3 has a well defined "bytes" type and a >> Unicode "str" type instead of a Unicode "unicode" type and an >> underdefined >> "str" type in Py2. >> >>> What hasn't been resolved is conversions >>> >>> cdef object o = s # s is a char* >> Sure, the semantics are clear: char* is a byte sequence in C, so the >> result is the equivalent of a byte sequence in Python: a byte >> string, i.e. >> a str object in Python2 and a bytes object in Py3. > > I understand this distinction. Technically a char* is a byte string. > The problem is that people are going to want to implicitly handle > unicode <-> char* much more often. But they shouldn't do that. Python3 is very strict here. There is no automatic conversion between bytes and str. You must be explicit about the way you want to convert it. And believe me, they didn't break it doing that, they fixed it. Doing magic in Cython would actually be unexpected in that light. >>> (That would break a lot of >>> code...including really nice code like declaring a function argument >>> to be char*) >> That would still accept any kind of byte string or a bytes object >> in Py3, which is just fine IMHO. > > I think this significantly impacts usability. For example, if I have > a function > > def foo(char* x): > ... > > then users of my module won't be able to write foo("eggs") anymore, > they will have to write foo(b"eggs") or even foo(x.encode('UTF-8')) > if x is given to them from elsewhere. That's fine, because your function expects a byte string. It cannot handle a unicode string, and it even says so in its signature. > Likewise, if I have > > def foo(): > cdef char* s > ... > return s > > Then the user won't be able to write > > print "The answer is %s" % foo() > > or > > foo() + "eggs" But again, that's because of Python semantics, not of Cython semantics. Stefan From stefan_ml at behnel.de Thu Apr 17 07:21:54 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 17 Apr 2008 07:21:54 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48065EC4.4000900@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48065EC4.4000900@student.matnat.uio.no> Message-ID: <4806DE72.80303@behnel.de> Hi, Dag Sverre Seljebotn wrote: >> I think this significantly impacts usability. For example, if I have >> a function >> >> def foo(char* x): >> ... >> > For this specific example, one could hypothetically do something like > > def foo(utf8charbuf x): > ... I could live with such syntactic sugar. UTF-8 is common enough to support it this way. But it has to be intuitively clear from the syntax that this is a) a real char* that is compatible with any other char*, and b) a UTF-8 encoded byte string that will only work when passing a unicode string from Python. If we can achieve both goals in one syntax, I'll be happy. Stefan From dagss at student.matnat.uio.no Thu Apr 17 10:05:16 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Apr 2008 10:05:16 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48065EC4.4000900@student.matnat.uio.no> <76C9E2DD-DFCE-4323-AEDD-0B5BDC971E51@math.washington.edu> <48066ED0.9060806@student.matnat.uio.no> Message-ID: <480704BC.8090309@student.matnat.uio.no> > The magic of uchar("gb5)* getting translated to the above, it would > complicate the type system (both in terms of codebase and user's > perspective). Would conversion be performed assigning from a uchar > ("gb5") to a uchar("utf-8"), or to a uchar*? If we decide to support > such automatic conversions, this seems like the best syntax I've > seen, but still think the default should be accept unicode objects > (via utf-8). > If the uchar way is convenient enough to use, one could in situations where one simply "hasn't thought about encoding" use auto-conversion with ascii encoding instead of utf-8, then one can still do roundtrips for safe ascii strings but be warned it situations where one *should have* thought through encoding issues (though one looses automatic unicode -> char* -> unicode roundtrip for non-ASCII string, which is a case where one wouldn't have to think through it...one would have to do unicode -> uchar(utf-8)* -> unicode). I'll spell out the problems you mention with uchar. I think the most natural (but still not very good) behaviour would be: cdef uchar("utf-8")* x = ... 1) cdef uchar("gb5")* y = x print y # Should probably either disallow the coercion, or # does charset conversion (implementation could be coercing # to Python unicode and back). 2) cdef object o = x cdef uchar("gb5")* y = o print y # This is ok, conversion done 3) cdef char* c = x cdef uchar("gb5")* y = c print y # Here we have problems I don't think there's a way around this. Still, case 3) is pretty specific; if one actively specifies an encoding and then assigns a char* buffer into it, it is kind of implied that one should somehow know that it actually has that encoding. -- Dag Sverre From robertwb at math.washington.edu Thu Apr 17 11:14:35 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 17 Apr 2008 02:14:35 -0700 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <4806DCBE.2080400@behnel.de> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> Message-ID: <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> On Apr 16, 2008, at 10:14 PM, Stefan Behnel wrote: > Hi, > > Robert Bradshaw wrote: >> On Apr 16, 2008, at 4:25 AM, Stefan Behnel wrote: >> I think our C files should always be pure ascii. > > You mean with C string escapes? Yes, that's what I mean. >>> That's the main reason why Py3 has a well defined "bytes" type and a >>> Unicode "str" type instead of a Unicode "unicode" type and an >>> underdefined >>> "str" type in Py2. >>> >>>> What hasn't been resolved is conversions >>>> >>>> cdef object o = s # s is a char* >>> Sure, the semantics are clear: char* is a byte sequence in C, so the >>> result is the equivalent of a byte sequence in Python: a byte >>> string, i.e. >>> a str object in Python2 and a bytes object in Py3. >> >> I understand this distinction. Technically a char* is a byte string. >> The problem is that people are going to want to implicitly handle >> unicode <-> char* much more often. > > But they shouldn't do that. Python3 is very strict here. There is > no automatic > conversion between bytes and str. You must be explicit about the > way you want > to convert it. And believe me, they didn't break it doing that, > they fixed it. I fully agree that Python is moving in the right direction here. There are two kinds of mistakes that languages can make with strings. The first is to assume 1 byte == 1 character, which is obviously bad and what Python is moving away from. The second, however, is to require explicit mention of the conversion for every trivial task. I don't want Cython to be like this. > Doing magic in Cython would actually be unexpected in that light. I don't think it's a question of magic, it's a question of the relationship between bytes, unicode, and char*. I'm not saying there should be conversion between Python bytes and unicode, I'm saying that the C type char* corresponds better to the Python unicode type than the Python bytes type. >>>> (That would break a lot of >>>> code...including really nice code like declaring a function >>>> argument >>>> to be char*) >>> That would still accept any kind of byte string or a bytes object >>> in Py3, which is just fine IMHO. >> >> I think this significantly impacts usability. For example, if I have >> a function >> >> def foo(char* x): >> ... >> >> then users of my module won't be able to write foo("eggs") anymore, >> they will have to write foo(b"eggs") or even foo(x.encode('UTF-8')) >> if x is given to them from elsewhere. > > That's fine, because your function expects a byte string. It cannot > handle a > unicode string, and it even says so in its signature. > >> Likewise, if I have >> >> def foo(): >> cdef char* s >> ... >> return s >> >> Then the user won't be able to write >> >> print "The answer is %s" % foo() >> >> or >> >> foo() + "eggs" > > But again, that's because of Python semantics, not of Cython > semantics. You say "that's fine" but my issue was one of usability, which hasn't been addressed. Technically, a char* is a pointer to a char. It doesn't even have a length, which is one thing that distinguishes it from a Python bytes object. But what char* means (in the conventional sense) is a c string. A c string should get converted into a Python string (which, in Python 3000 is a unicode object). Put another way, the type of "foo" in C should get converted to the type of "foo" in Python. We get to decide what the relationship is between a char* and a PyObject*. I am advocating that whenever an implicit conversion between the two, char* is treated as a null-terminated utf-8 string. This will allow maximum backwards compatibility and ease of use. - Robert From dagss at student.matnat.uio.no Thu Apr 17 12:43:59 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Apr 2008 12:43:59 +0200 (CEST) Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> Message-ID: <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> > You say "that's fine" but my issue was one of usability, which hasn't > been addressed. I think this might be a good point for trying to list the actual use-cases. I'll make a start and you can see if you find more; and how important each of them are. There seems to be a usecase for every possible stance (which I'll iterate as utf-8 auto-conversion, ascii auto-conversion failing on non-ascii data, and no automatic conversion), so it is about weighing the importancy of each. Interfacing with C code/libs: - Language libraries (spell checking etc.). These will often work in one specific encoding or allow you to specify the encoding the data is in; typically, one would want to be specific about conversions in this case. - Passing filenames. This seems to be a common case; open a file picker in a Python GUI lib and pass the resulting filename to a library taking a datafile parameter. Assuming the file picker returns a str/unicode (would be nice if it returned bytes though) then auto-conversion would be nice to have, however UTF-8 would be the wrong choice on many platforms (including Windows, I think? Not sure about Vista.) - Getting error messages. These are likely to either be in a hard-coded encoding or platform default, no guarantee for UTF-8 so require encoding consciousness. - Passing UI messages. Think writing a wrapper around a GUI lib. In that case it is again usually platform default that is wanted, which is not UTF-8 for very many users (not sure about newer Windows libs, in the old libs one had the choice between 8-bit and 16-bit Windows codepages IIRC). So encoding consciousness is needed. - En-/decryption and (de)compression libs, binary serialization libs, etc. Here, UTF-8 auto-conversion would be incredibly excellent (ie if one wants to encrypt or compress strings, and read them back again into the same environment they came from). - Text parsing/serialization libs: One would need to be consciuos about encoding one way or another, likely encoding would have to be part of the API, or in some cases, one would deal with bytes in Cython. Internal Cython usecases: All in all, using Python strings seems better when not dealing with external C code and I've failed to find good usecases; perhaps anyone else has got one? - Using char* rather than unicode for optimization purposes. Early-binding unicode objects: typedef str s should deal with some of these cases, if something like this doesn't happen already like with list (will it be as efficient as copying between buffers with strcat and friends? I can imagine more efficient due to less copying potentially happening with a smarter string type...) - Then there are cases where one wants to do some string modification quickly, element by element. But almost all cases I could think of would fail on a UTF-8 char* (string reversal, palindrome creation, merging strings character by character, alphabet-based ROT-13... all such things would fail with a naive UTF-8 char*, and if one is conscious about understanding UTF-8 in order to do these properly one should be able to explicitly convert as well). Dag Sverre From dagss at student.matnat.uio.no Thu Apr 17 12:46:47 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Apr 2008 12:46:47 +0200 (CEST) Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> Message-ID: <55917.193.157.243.12.1208429207.squirrel@webmail.uio.no> > > - Using char* rather than unicode for optimization purposes. Early-binding > unicode objects: > > typedef str s I mean cdef, obviously. Dag Sverre From stefan_ml at behnel.de Thu Apr 17 13:44:15 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 17 Apr 2008 13:44:15 +0200 (CEST) Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> Message-ID: <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Hi, I think it is a good idea to start this. Dag Sverre Seljebotn wrote: > - Language libraries (spell checking etc.). These will often work in one > specific encoding or allow you to specify the encoding the data is in; > typically, one would want to be specific about conversions in this case. Right. No magic here. > - Passing filenames. This seems to be a common case; open a file picker in > a Python GUI lib and pass the resulting filename to a library taking a > datafile parameter. Assuming the file picker returns a str/unicode (would > be nice if it returned bytes though) then auto-conversion would be nice to > have, however UTF-8 would be the wrong choice on many platforms (including > Windows, I think? Not sure about Vista.) Correct again. Trying to do magic here is futile. > - Getting error messages. These are likely to either be in a hard-coded > encoding or platform default, no guarantee for UTF-8 so require encoding > consciousness. Right again. They are either locale dependent or language dependent - and they may even be translated in a function, as in raise TypeError(_("Wrong type")) or printf(_("Wrong type")) Cython shouldn't interfere here any step beyond getting the input string correctly decoded from the source input. > - Passing UI messages. Think writing a wrapper around a GUI lib. In that > case it is again usually platform default that is wanted, which is not > UTF-8 for very many users (not sure about newer Windows libs, in the old > libs one had the choice between 8-bit and 16-bit Windows codepages IIRC). > So encoding consciousness is needed. Same case as libraries in general, I'd say. > - En-/decryption and (de)compression libs, binary serialization libs, etc. > Here, UTF-8 auto-conversion would be incredibly excellent (ie if one wants > to encrypt or compress strings, and read them back again into the same > environment they came from). Two cases here. Most likely, you are dealing with binary data, not unicode strings, so there is not much to gain. Then, auto conversion here is dangerous, as it may come unexpected. You pass in a Python object and get a UTF-8 encoded byte sequence at the end??? Imagine you had Cython on the other end, too. Wouldn't you be surprised to pass in a unicode object on one side and have Cython return a bytes object on the other? Because it couldn't possibly know that the original input was a unicode object. > - Text parsing/serialization libs: One would need to be consciuos about > encoding one way or another, likely encoding would have to be part of the > API, or in some cases, one would deal with bytes in Cython. Yes, encodings are crucial here, so again, not a big gain, just a source for potential laziness bugs. > - Using char* rather than unicode for optimization purposes. Early-binding > unicode objects: > > cdef str s > > should deal with some of these cases, if something like this doesn't > happen already like with list (will it be as efficient as copying between > buffers with strcat and friends? I can imagine more efficient due to less > copying potentially happening with a smarter string type...) The most efficient way to deal with this is early or late conversion, i.e. at the API level. And it's good to be explicit here to avoid common bugs and potential API incompatibilities. > - Then there are cases where one wants to do some string modification > quickly, element by element. But almost all cases I could think of would > fail on a UTF-8 char* (string reversal, palindrome creation, merging > strings character by character, alphabet-based ROT-13... all such things > would fail with a naive UTF-8 char*, and if one is conscious about > understanding UTF-8 in order to do these properly one should be able to > explicitly convert as well). Again, I agree. If you want UTF-8, it's better to say so as the thing you will do with the result afterwards totally depends on the encoding in use. I find it easier to read def dostuff(str text): cdef char* s = text.encode("UTF-8") # do UTF-8 handling stuff return s.decode("UTF-8") than anything you could do with internal magic. In lxml, for example, I try to be very explicit about the point where stuff is converted to UTF-8. There is a utility function called "_utf8()" that takes a Python object and returns a UTF-8 encoded byte string or raises an exception if the input was neither an ASCII byte string nor a unicode string. You will find this function at the beginning of almost all API functions, and I am very happy to have it there. Because this makes it explicit what is happening and when, and it makes sure that whatever string we use in internal functions will be a UTF-8 encoded byte string. I do not want Cython to do that for me, as the conversion is part of lxml's API and this includes the semantics of its input checking. Take this example again: def dostuff(char* input): # do some UTF-8 handling stuff return input Now imagine you call it with a byte string like this: dostuff(u"?rgl?mpf".encode('iso-8859-1')) This will not give you an API error, but it will most likely break the function in one place or another or might even return an incorrect result without error notice. I seriously doubt that there are many applications that would be fine with a simple cdef char* s = some_unicode_string and I'm quite confident that even the remaining applications would be better off using explicit conversion than *ignoring* the fact that there is a semantic difference between a sequence of characters and a sequence of bytes. Making people aware of this difference is a good thing. Doing magic to support laziness is not a good thing. Stefan From dagss at student.matnat.uio.no Thu Apr 17 14:17:55 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Apr 2008 14:17:55 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <48073FF3.4050302@student.matnat.uio.no> > I find it easier to read > > def dostuff(str text): > cdef char* s = text.encode("UTF-8") > # do UTF-8 handling stuff > return s.decode("UTF-8") > > than anything you could do with internal magic. > Will this work though? (I'm ignorant in this area of Cython). I.e., when will the temporary returned from text.encode exit scope and be collected? So one would need two more lines (or magic support for keeping temporaries to the end of the function scope when assigning temporaries to char*). I forgot one important case in my list though: Passing string constants to C libraries. With no conversion, one cannot at the same time keep nice language consistency and also allow cdef char* s = "asdf" Robert's proposal has the advantage that it allows this notation in a more consistent way. Personally I'm now (forget about earlier opinions :-) ) ready to take the "b" at this point, rather than breaking consistency or doing undeclared magic. It's a nice reminder that using char* for strings is not trivial, and probably avoids more bugs. Also, nowadays it is rather seldom I think that char* is used directly for strings... C++ have std::string, Linux GUI apps use specific QT or GTK/GObject strings, and so on. Console applications sometimes use char* but are conscious about encoding matters at a low level. You suggested to use whatever encoding the source file is in in the case above? Or have you backtracked from that now? Dag Sverre From stefan_ml at behnel.de Thu Apr 17 15:17:52 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 17 Apr 2008 15:17:52 +0200 (CEST) Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <48073FF3.4050302@student.matnat.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48073FF3.4050302@student.matnat.uio.no> Message-ID: <29415.194.114.62.66.1208438272.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Dag Sverre Seljebotn wrote: > Stefan Behnel wrote: >> I find it easier to read >> >> def dostuff(str text): >> cdef char* s = text.encode("UTF-8") >> # do UTF-8 handling stuff >> return s.decode("UTF-8") >> >> than anything you could do with internal magic. >> > Will this work though? Sorry, my fault. As I described, what I use in lxml is a function that actually returns an encoded Python byte string, that's what I had in mind. The above will give you a compiler error about converting a temporary result. > I forgot one important case in my list though: Passing string constants > to C libraries. With no conversion, one cannot at the same time keep > nice language consistency and also allow > > cdef char* s = "asdf" What is the problem here? Py2 semantics give you a byte string which, given PEP 263, has a well defined byte sequence that is assigned to the char*. If you are referring to Py3 source code semantics, then yes, this will hopefully raise a compile error. The right thing to do in Py3 Cython code is to write b"asdf" (which could even be valid Cython code regardless of the target code version). Stefan From dagss at student.matnat.uio.no Thu Apr 17 15:22:47 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Apr 2008 15:22:47 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <29415.194.114.62.66.1208438272.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <48073FF3.4050302@student.matnat.uio.no> <29415.194.114.62.66.1208438272.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <48074F27.4010903@student.matnat.uio.no> > If you are referring to Py3 source code semantics, then yes, this will > hopefully raise a compile error. The right thing to do in Py3 Cython code > is to write b"asdf" (which could even be valid Cython code regardless of > the target code version). > Yes, I was referring to Py3. And I agee. Dag Sverre From robertwb at math.washington.edu Thu Apr 17 21:33:26 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 17 Apr 2008 12:33:26 -0700 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: On Apr 17, 2008, at 4:44 AM, Stefan Behnel wrote: > Hi, > > I think it is a good idea to start this. > > Dag Sverre Seljebotn wrote: >> - Language libraries (spell checking etc.). These will often work >> in one >> specific encoding or allow you to specify the encoding the data is >> in; >> typically, one would want to be specific about conversions in this >> case. > > Right. No magic here. > > >> - Passing filenames. This seems to be a common case; open a file >> picker in >> a Python GUI lib and pass the resulting filename to a library >> taking a >> datafile parameter. Assuming the file picker returns a str/unicode >> (would >> be nice if it returned bytes though) then auto-conversion would be >> nice to >> have, however UTF-8 would be the wrong choice on many platforms >> (including >> Windows, I think? Not sure about Vista.) > > Correct again. Trying to do magic here is futile. > > >> - Getting error messages. These are likely to either be in a hard- >> coded >> encoding or platform default, no guarantee for UTF-8 so require >> encoding >> consciousness. > > Right again. They are either locale dependent or language dependent > - and > they may even be translated in a function, as in > > raise TypeError(_("Wrong type")) > > or > > printf(_("Wrong type")) > > Cython shouldn't interfere here any step beyond getting the input > string > correctly decoded from the source input. Yes. This will not be interpreted as a C string anywhere along the way. >> - Passing UI messages. Think writing a wrapper around a GUI lib. >> In that >> case it is again usually platform default that is wanted, which is >> not >> UTF-8 for very many users (not sure about newer Windows libs, in >> the old >> libs one had the choice between 8-bit and 16-bit Windows codepages >> IIRC). >> So encoding consciousness is needed. > > Same case as libraries in general, I'd say. > > >> - En-/decryption and (de)compression libs, binary serialization >> libs, etc. >> Here, UTF-8 auto-conversion would be incredibly excellent (ie if >> one wants >> to encrypt or compress strings, and read them back again into the >> same >> environment they came from). > > Two cases here. Most likely, you are dealing with binary data, not > unicode > strings, so there is not much to gain. Then, auto conversion here is > dangerous, as it may come unexpected. You pass in a Python object > and get > a UTF-8 encoded byte sequence at the end??? No, if you try and turn any char* into an object, you get unicode. If you are assuming it is null-terminated, you are assuming it is a string. If it really is binary data, then one would need to specify the length. > Imagine you had Cython on the > other end, too. Wouldn't you be surprised to pass in a unicode > object on > one side and have Cython return a bytes object on the other? > Because it > couldn't possibly know that the original input was a unicode object. No, see above. >> - Text parsing/serialization libs: One would need to be consciuos >> about >> encoding one way or another, likely encoding would have to be part >> of the >> API, or in some cases, one would deal with bytes in Cython. > > Yes, encodings are crucial here, so again, not a big gain, just a > source > for potential laziness bugs. > > >> - Using char* rather than unicode for optimization purposes. Early- >> binding >> unicode objects: >> >> cdef str s >> >> should deal with some of these cases, if something like this doesn't >> happen already like with list (will it be as efficient as copying >> between >> buffers with strcat and friends? I can imagine more efficient due >> to less >> copying potentially happening with a smarter string type...) > > The most efficient way to deal with this is early or late > conversion, i.e. > at the API level. And it's good to be explicit here to avoid common > bugs > and potential API incompatibilities. > > >> - Then there are cases where one wants to do some string modification >> quickly, element by element. But almost all cases I could think of >> would >> fail on a UTF-8 char* (string reversal, palindrome creation, merging >> strings character by character, alphabet-based ROT-13... all such >> things >> would fail with a naive UTF-8 char*, and if one is conscious about >> understanding UTF-8 in order to do these properly one should be >> able to >> explicitly convert as well). Strings are supposed to be immutable. The only heavy string- processing I've done is a parser for mathematical expressions, and it would work just fine with UTF-8 (as all the "special" characters are ASCII, and it treats all other byte sequences as names). > Again, I agree. If you want UTF-8, it's better to say so as the > thing you > will do with the result afterwards totally depends on the encoding > in use. > > I find it easier to read > > def dostuff(str text): > cdef char* s = text.encode("UTF-8") > # do UTF-8 handling stuff > return s.decode("UTF-8") > > than anything you could do with internal magic. > > In lxml, for example, I try to be very explicit about the point where > stuff is converted to UTF-8. There is a utility function called > "_utf8()" > that takes a Python object and returns a UTF-8 encoded byte string or > raises an exception if the input was neither an ASCII byte string > nor a > unicode string. You will find this function at the beginning of > almost all > API functions, and I am very happy to have it there. Because this > makes it > explicit what is happening and when, and it makes sure that whatever > string we use in internal functions will be a UTF-8 encoded byte > string. I > do not want Cython to do that for me, as the conversion is part of > lxml's > API and this includes the semantics of its input checking. This is because you *want* to think about encoding when you're processing XML. > Take this example again: > > def dostuff(char* input): > # do some UTF-8 handling stuff > return input > > Now imagine you call it with a byte string like this: > > dostuff(u"?rgl?mpf".encode('iso-8859-1')) > > This will not give you an API error, but it will most likely break the > function in one place or another or might even return an incorrect > result > without error notice. No, it would work just fine, because it would specify utf-8 in the decoding phase. > I seriously doubt that there are many applications that would be > fine with > a simple > > cdef char* s = some_unicode_string I tried finding an example in the Sage codebase where this would cause problems, and wasn't able to do so (other than the fact that some libraries would barf on bad input, but then some of them would barf if some_unicode_string wasn't a decimal number.) > and I'm quite confident that even the remaining applications would be > better off using explicit conversion than *ignoring* the fact that > there > is a semantic difference between a sequence of characters and a > sequence > of bytes. There is a semantic difference between a pointer to a byte, a null- terminated sequence of bytes, and a sequence of bytes with specified length. We are ignoring this distinction. I primarily see the bytes object as binary data, not assumed to be null-terminated. In any case, normal Python uses are not going to want to pass/receive bytes when they really want to be manipulating strings, so the burden is on the Cython coder. > Making people aware of this difference is a good thing. I agree. Forcing the user to deal with it everywhere they want to use a string is, in my opinion, not. > Doing magic to support laziness is not a good thing. No, magic is a very good thing. That's what makes Cython so much better than writing against the C/API explicitly. Would you say all he magic that converts between C and Python ints is a bad thing? I can see that you are both convinced that forcing the user to manually convert using an encoding via def dostuff(str text): cdef bytes tmp_text = text.encode("UTF-8") cdef char* s = tmp_text # do UTF-8 (often just ASCII) handling stuff cdef bytes another_tmp = s # if one didn't use UTF-8 one may have to worry about specifying the length too. return another_tmp.decode("UTF-8") is worth the price paid in usability, backwards compatibility, and efficiency. And since no one else has spoken up I guess there aren't any other strong opinions on the matter. - Robert From dagss at student.matnat.uio.no Thu Apr 17 23:11:34 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Apr 2008 23:11:34 +0200 (CEST) Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <56242.193.157.243.12.1208466694.squirrel@webmail.uio.no> >> printf(_("Wrong type")) >> >> Cython shouldn't interfere here any step beyond getting the input >> string >> correctly decoded from the source input. > > Yes. This will not be interpreted as a C string anywhere along the way. Odds are that printf statement is within some C library, using char*, using standard libc translation, meaning that when a Chinese BIG-5 system translates this into something crazy then you *do* get a problem. Meanwhile, the original coder doesn't notice because he/she is not the one doing the chinese translation. I think this is partially a culture thing: Me and Stefan live using non-ASCII alphabets daily, and still (in 2008) have to live with lots of software that just doesn't handle things properly or have small nags. This is a real problem, and most coders don't bother with it. In all the library interface cases I listed, auto-conversion has the possibility to seriously bite non-experienced coders and UTF-8 is almost never what is wanted when wrapping C libraries. This does not mean that you "have a data buffer without known length" -- you know that you have a string, but the reality is that when wrapping C libraries you a) know it is a string, b) have no reason to assume anything about the encoding. (I did initially argue for conversion to platform run-time default -- because that would have a possibility of working when wrapping C libraries. (But I've gone away from that now, at least under the name of char*).) Meanwhile, in Cython code, there's no reason you have to call your utf-8 buffers "char*" except for the warm fuzzy C feeling. Make a "mutable_str" type if the purpose is speeding things up, that will have the possibility for infinitely more nice candy, and one can still generate char*. You can even treat UTF-8 properly then (ie have [] return potentially something >255). > I can see that you are both convinced that forcing the user to > manually convert using an encoding via > > def dostuff(str text): > cdef bytes tmp_text = text.encode("UTF-8") > cdef char* s = tmp_text > # do UTF-8 (often just ASCII) handling stuff > cdef bytes another_tmp = s # if one didn't use UTF-8 > one may have to worry about specifying the length too. > return another_tmp.decode("UTF-8") > > is worth the price paid in usability, backwards compatibility, and > efficiency. And since no one else has spoken up I guess there aren't > any other strong opinions on the matter. I think the price paid in usability and backwards compatability (which Python 3 breaks anyway, and people will have scripts to add b"" to their strings...) is more than weighed up for by the price paid for subtle bugs introduced by coders who didn't bother to learn about it properly when it worked perfectly on their US system. C is the language where you are allowed to shoot yourself in the foot, not Python. Also your example is unfair -- you take an example of the current Cython compiler and compare it with a fully candied up alternative. For the UTF-8 autoconversion to work there must be some candy as well (basically something like the above must be generated by Cython, right?), so there shouldn't be any reason (or is there?) that one can't stop adding candy just one layer before, ie waiting with releasing temporaries assigned to char* as one would have to with UTF-8 anyway: def dostuff(str text): cdef char* s = text.encode("UTF-8") # Do stuff return str(s, "UTF-8") (OK, this is implying that char* auto-coerces to bytes using null-termination, but that hardly seems to be an argument against it when your candidate solution is _also_ assuming null-termination, it is just assuming encoding in addition. Assuming only null-termination is fine, it is a "middle ground".) Finally some wisdom from the Zen of Python: >>> import this ... Explicit is better than implicit ... >>> Pewh. I'll try to make this my last one, this should rest a bit :-) Dag Sverre From dagss at student.matnat.uio.no Thu Apr 17 23:22:06 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Apr 2008 23:22:06 +0200 (CEST) Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <56242.193.157.243.12.1208466694.squirrel@webmail.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <56242.193.157.243.12.1208466694.squirrel@webmail.uio.no> Message-ID: <56245.193.157.243.12.1208467326.squirrel@webmail.uio.no> >>> printf(_("Wrong type")) >>> >>> Cython shouldn't interfere here any step beyond getting the input >>> string >>> correctly decoded from the source input. >> >> Yes. This will not be interpreted as a C string anywhere along the way. > > Odds are that printf statement is within some C library, using char*, > using standard libc translation, meaning that when a Chinese BIG-5 system > translates this into something crazy then you *do* get a problem. > Meanwhile, the original coder doesn't notice because he/she is not the one Just one more: I realize that the above doesn't make sense, don't bother to correct it :-) Assume a function returning an error string using C _-translation instead... Dag Sverre From robertwb at math.washington.edu Fri Apr 18 00:52:44 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 17 Apr 2008 15:52:44 -0700 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <56242.193.157.243.12.1208466694.squirrel@webmail.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <56242.193.157.243.12.1208466694.squirrel@webmail.uio.no> Message-ID: <932949C4-3E42-4155-95CF-B9A63136E48F@math.washington.edu> On Apr 17, 2008, at 2:11 PM, Dag Sverre Seljebotn wrote: >>> printf(_("Wrong type")) >>> >>> Cython shouldn't interfere here any step beyond getting the input >>> string >>> correctly decoded from the source input. >> >> Yes. This will not be interpreted as a C string anywhere along the >> way. > > Odds are that printf statement is within some C library, using char*, > using standard libc translation, meaning that when a Chinese BIG-5 > system > translates this into something crazy then you *do* get a problem. > Meanwhile, the original coder doesn't notice because he/she is not > the one > doing the chinese translation. > > I think this is partially a culture thing: Me and Stefan live using > non-ASCII alphabets daily, and still (in 2008) have to live with > lots of > software that just doesn't handle things properly or have small > nags. This > is a real problem, and most coders don't bother with it. I agree that it could be partially a cultural issue. I speak French and studies Chinese for several years, so it's not like I haven't dealt with these issues, but not on a daily basis (anymore). Even more significantly, the strings I deal with are almost all in a math context, which rarely use unicode (tex is the de-facto standard for typeset output). > In all the library interface cases I listed, auto-conversion has the > possibility to seriously bite non-experienced coders and UTF-8 is > almost > never what is wanted when wrapping C libraries. This does not mean > that > you "have a data buffer without known length" -- you know that you > have a > string, but the reality is that when wrapping C libraries you a) > know it > is a string, b) have no reason to assume anything about the encoding. > > (I did initially argue for conversion to platform run-time default -- > because that would have a possibility of working when wrapping C > libraries. (But I've gone away from that now, at least under the > name of > char*).) > > Meanwhile, in Cython code, there's no reason you have to call your > utf-8 > buffers "char*" except for the warm fuzzy C feeling. And the fact that that's what they really are (rather than making the user learn a new type). > Make a "mutable_str" > type if the purpose is speeding things up, that will have the > possibility > for infinitely more nice candy, and one can still generate char*. > You can > even treat UTF-8 properly then (ie have [] return potentially > something >> 255). > >> I can see that you are both convinced that forcing the user to >> manually convert using an encoding via >> >> def dostuff(str text): >> cdef bytes tmp_text = text.encode("UTF-8") >> cdef char* s = tmp_text >> # do UTF-8 (often just ASCII) handling stuff >> cdef bytes another_tmp = s # if one didn't use UTF-8 >> one may have to worry about specifying the length too. >> return another_tmp.decode("UTF-8") >> >> is worth the price paid in usability, backwards compatibility, and >> efficiency. And since no one else has spoken up I guess there aren't >> any other strong opinions on the matter. > > I think the price paid in usability and backwards compatability (which > Python 3 breaks anyway, and people will have scripts to add b"" to > their > strings...) is more than weighed up for by the price paid for > subtle bugs > introduced by coders who didn't bother to learn about it properly > when it > worked perfectly on their US system. C is the language where you are > allowed to shoot yourself in the foot, not Python. > > Also your example is unfair -- you take an example of the current > Cython > compiler and compare it with a fully candied up alternative. For > the UTF-8 > autoconversion to work there must be some candy as well (basically > something like the above must be generated by Cython, right?), so > there > shouldn't be any reason (or is there?) I think you underestimate how complicated it would be to figure out when it will be safe to release the temp, and if you're creating a copy whether or not you have to worry about freeing s. What you really want to do is use the buffer interface of unicode objects, and it is unclear how to do this without using magic or the C/API directly. > that one can't stop adding candy > just one layer before, ie waiting with releasing temporaries > assigned to > char* as one would have to with UTF-8 anyway: > > def dostuff(str text): > cdef char* s = text.encode("UTF-8") > # Do stuff > return str(s, "UTF-8") > > (OK, this is implying that char* auto-coerces to bytes using > null-termination, but that hardly seems to be an argument against > it when > your candidate solution is _also_ assuming null-termination, it is > just > assuming encoding in addition. Assuming only null-termination is > fine, it > is a "middle ground".) I was just saying that we don't want to take the no assumption route, so it's a question of what assumptions to make. The code above will still break for a lot of encodings (e.g. UCS-2 or UCS-4). > Finally some wisdom from the Zen of Python: >>>> import this > ... > Explicit is better than implicit This is a good point. It's a step backwards in terms of usability. Worth it? I'm unconvinced but outvoted. > ... >>>> > > > Pewh. I'll try to make this my last one, this should rest a bit :-) Same here. - Robert From robertwb at math.washington.edu Fri Apr 18 06:12:58 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 17 Apr 2008 21:12:58 -0700 Subject: [Cython] [Pyrex] The newbie's question about wrapping c++ class with pyrex In-Reply-To: <4807FDA7.1090109@Sun.COM> References: <4807FDA7.1090109@Sun.COM> Message-ID: On Apr 17, 2008, at 6:47 PM, Yong Sun wrote: > Hi, experts, > > I am following the tutorial at http://wiki.cython.org/ > WrappingCPlusPlus, and found a minor issue, > > cdef class Rectangle: > - c_Rectangle *thisptr # hold a C++ instance which we're wrapping > + cdef c_Rectangle *thisptr # hold a C++ instance which we're > wrapping Thanks. I've made this change on the Cython wiki. I also corrected an error about operator overloading. I'd like to add this example to the automated testing infrastructure, but it seems unclear how to do so with the current runtests.py. > > I need the above change to make it successfully compiled by cython. > While, when I tried to compile it with pyrexc, I met following errors, > > $ pyrexc --cplus rect.pyx > rect.pyx:4:21: Non-extern C function declared but not defined > rect.pyx:5:21: Non-extern C function declared but not defined > rect.pyx:6:19: Non-extern C function declared but not defined > rect.pyx:7:17: Non-extern C function declared but not defined > rect.pyx:18:27: Object of type 'c_Rectangle' has no attribute > 'getLength' > rect.pyx:20:27: Object of type 'c_Rectangle' has no attribute > 'getHeight' > rect.pyx:22:27: Object of type 'c_Rectangle' has no attribute > 'getArea' > rect.pyx:24:20: Object of type 'c_Rectangle' has no attribute 'move' > > BTW, I am using the latest stable version, > > $ pyrexc --version > Pyrex version 0.9.6.4 In Pyrex you need to declare functions as function pointers, i.e. cdef extern from "Rectangle.h": ctypedef struct c_Rectangle "Rectangle": int x0, y0, x1, y1 (int *getLength)() ... > > Thank you very much! > > Regards, > > > > cdef extern from "Rectangle.h": > ctypedef struct c_Rectangle "Rectangle": > int x0, y0, x1, y1 > int getLength() > int getHeight() > int getArea() > void move(int dx, int dy) > c_Rectangle *new_Rectangle "new Rectangle" (int x0, int y0, int > x1, int y1) > void del_Rectangle "delete" (c_Rectangle *rect) > > cdef class Rectangle: > cdef c_Rectangle *thisptr # hold a C++ instance which > we're wrapping > def __cinit__(self, int x0, int y0, int x1, int y1): > self.thisptr = new_Rectangle(x0, y0, x1, y1) > def __dealloc__(self): > del_Rectangle(self.thisptr) > def getLength(self): > return self.thisptr.getLength() > def getHeight(self): > return self.thisptr.getHeight() > def getArea(self): > return self.thisptr.getArea() > def move(self, dx, dy): > self.thisptr.move(dx, dy) > _______________________________________________ > Pyrex mailing list > Pyrex at lists.copyleft.no > http://lists.copyleft.no/mailman/listinfo/pyrex From robertwb at math.washington.edu Fri Apr 18 08:34:57 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 17 Apr 2008 23:34:57 -0700 Subject: [Cython] CEP 507/513 In-Reply-To: <48020C71.6090100@student.matnat.uio.no> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> Message-ID: <724DC697-D01D-4904-A54E-2F190A898BB7@math.washington.edu> On Apr 13, 2008, at 6:36 AM, Dag Sverre Seljebotn wrote: > Martin C. Martin wrote: >> I too would be worried about changing the semantics of cdef T a = x. >> But what about: >> >> cdef T a = x >> assert a.__class__ == T >> >> This makes it valid Python, which pins down the type of "a" exactly. >> In fact, at this point you don't even need the "cdef T". >> >> It's slightly ugly, but with luck, that will discourage people from >> premature optimization. > It's still dangerous: > > a.myfunc = myoverride > assert a.__class__ == T > a.myfunc() > > (On the subject of optimizations, one could also augment the CEP by > having the notation "exactly(MyClass)" mean "early-bind all calls > and I > don't care about the consequences". Using "exactly(list)" would then > mean that builtins wouldn't need to be a special-case either. But I'm > fine with "list" being a special-case here.) > > But at any rate, the point of the CEP was to have more consistency in > Cython syntax, not necesarrily adding optimizations. > > Robert makes a very good case (and has fully convinced me) for needing > to support descendants being assigned (that not allowing that > wasn't one > of my brightest ideas). But independtly of this, it looks like C > types, > Python builtins, and Python extension types declared in pxd files are > going to be supported. Simply allowing Python types as well simply > makes > it more consistent, even if it isn't that useful... > > On the other hand, if it doesn't have a useful features, there > definitely is a case for not bothering to implement it (well, it is > useful for function overloading, but then again function > overloading is > less useful for Python objects :-) ). So I really wouldn't mind if > this > is rejected. > > CEP 513 should be independent of this though? The did seem very related, at least in my initial reading, and the whole mixin idea seemed to rely on declaring non-cdef'ed types. Reading CEP 513 again, I think the whole thing boils down to two things: 1) Automatic cimport on import, under certain conditions (e.g. if a corresponding .pxd file is found in the internal cython path) 2) Function overloading The only reason Python didn't have (2) is because it had no way of declaring types, and hence distinguishing between overloaded functions (well, perhaps parameter count, but that's not very compelling)). This will change in Python 3000 http://www.python.org/ dev/peps/pep-3124/ , and I think we should backport that to Cython (maybe not the whole PEP, but overloadable functions using @overload or type signatures. In math.pxd one would declare sqrt for several native types, and the object one would be the fallback. Def vs. cdef could be interchangeable at this level. Taking external types into consideration (see http:// www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/Manual/ extension_types.html#ExternalExtTypes ), I think the rest of the CEP could be greatly simplified. It is possible I missed something not covered by (1) and (2), but I believe this is the gist of the proposal. - Robert From robertwb at math.washington.edu Fri Apr 18 08:40:15 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 17 Apr 2008 23:40:15 -0700 Subject: [Cython] a warning on using string/buffer objects for getting tmp memory In-Reply-To: <48015D38.3070806@martincmartin.com> References: <48015D38.3070806@martincmartin.com> Message-ID: <8E1DCF8A-E0F3-4445-A5E7-60FC7239B643@math.washington.edu> On Apr 12, 2008, at 6:09 PM, Martin C. Martin wrote: > > > Robert Bradshaw wrote: >> On Apr 12, 2008, at 4:09 PM, Lisandro Dalcin wrote: >> >>> I've just realized that using a string or buffer object for >>> automatic >>> management of memory as I proposed has a pitfall: memory >>> alignement is >>> not guaranteed. >>> >>> So perhaps the only way to go is with this trick is to use a custom >>> python object internally calling malloc/free. >> >> It looks like strings are aligned on int boundaries (given their >> struct). > > On 64 bit machines, gcc uses 32 bits for ints, so simply aligning > on int > boundaries wouldn't get you 64 bit aligned. Yes. I was just noting that int was the smallest type in the string struct before the data. > >> What guarantee does one have about malloc? > > It depends on the implementation. For glibc: > > http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory- > Blocks.html#Aligned-Memory-Blocks > > "The address of a block returned by malloc or realloc in the GNU > system > is always a multiple of eight (or sixteen on 64-bit systems)." > > You could always do what memalign() does: if e.g. you need something > aligned on 8 byte boundaries, but Python's string allocation only > uses 4 > byte boundaries, then allocate an extra 4 bytes, and if the result > isn't > 8 byte aligned, return the address + 4. If we're going to provide something like this, I'd make our own object that's even simpler than a string. - Robert From stefan_ml at behnel.de Fri Apr 18 07:41:21 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 18 Apr 2008 07:41:21 +0200 Subject: [Cython] The newbie's question about wrapping c++ class with pyrex In-Reply-To: References: <4807FDA7.1090109@Sun.COM> Message-ID: <48083481.9050405@behnel.de> Hi, Robert Bradshaw wrote: > I'd like to add this example to the automated testing infrastructure, > but it seems unclear how to do so with the current runtests.py. Hmm, sure it's a C++ example, so Cython/distutils will have to know that in advance. I think that's the same problem as Py2/Py3 source code. It's actually a different language that you want to target with your source (as a backend like C/C++ or as a frontend like Py2/3), but there is no way to say so from within your source file. But at least for the test suite, we could add a comment like "#c++" in the first line, read the first few bytes of each test file and configure the distutils Extension language option accordingly. Stefan From stefan_ml at behnel.de Fri Apr 18 07:32:28 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 18 Apr 2008 07:32:28 +0200 Subject: [Cython] target language syntax of Cython: Py2.6 or Py3.0? In-Reply-To: <56242.193.157.243.12.1208466694.squirrel@webmail.uio.no> References: <48046D83.2070806@behnel.de> <0A33B165-C62A-4004-A895-BB3A95B272BD@math.washington.edu> <47074.194.114.62.39.1208345122.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <4806DCBE.2080400@behnel.de> <6DB25273-E0D8-49CE-9B60-6313DC6AA027@math.washington.edu> <55913.193.157.243.12.1208429039.squirrel@webmail.uio.no> <63691.194.114.62.66.1208432655.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <56242.193.157.243.12.1208466694.squirrel@webmail.uio.no> Message-ID: <4808326C.9070209@behnel.de> Hi, Dag Sverre Seljebotn wrote: > I think this is partially a culture thing: Me and Stefan live using > non-ASCII alphabets daily, and still (in 2008) have to live with lots of > software that just doesn't handle things properly or have small nags. This > is a real problem, and most coders don't bother with it. Yes, I think that's the main problem here. There just is no such thing as a default encoding. There is only Unicode and tons of different mappings to bytes that all have their respective corner. Giving Cython a default encoding would just ignore that fact. If you give unexperienced people tools that keep them from thinking about real problems, they will not think about them. And even if you generally know what you're doing, there may always be situations where you write a quick def f(char* s): ... without caring about the implications, and it just breaks in a bizarre way when a user from the other end of the world passes something really unexpected. It doesn't even have to be that obvious, think of a call chain of functions from the API to some C level string treatment. It is good design to have a designated, explicit point in that chain where conversion is taking place. And it's worth bothering with that. lxml, for example, and its ancestor ElementTree, allow unicode strings and byte strings in their interface wherever Unicode input makes sense. However, if you pass a byte string, it must be a plain 7-bit ASCII string or it will be explicitly rejected, very close to the API entry point. Allowing byte strings is just for convenience, as many, many users work with ISO encodings, ASCII or UTF-8, and most XML names in the world really are plain ASCII. However, as soon as you allow any 8-bit data, users will run into the trap of accidentally passing things as they receive them, without thinking. And they may not even notice until much later, when things get decoded on the way out again and break. And believe me, they will not say "oh, my bad", as it will take them days to debug these things to figure out where the broken string really came from. And then they will come to the mailing list and shout "why didn't your software tell me?!". One thing I learned is that explicit input checking is worth it. And this is definitely true in the string world. Stefan BTW, I might even decide to reject byte string input in lxml when it runs in Py3 (except for XML byte streams, obviously). I think that would match the way Py3 code works. From stefan_ml at behnel.de Fri Apr 18 09:17:58 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 18 Apr 2008 09:17:58 +0200 Subject: [Cython] CEP 507/513 In-Reply-To: <724DC697-D01D-4904-A54E-2F190A898BB7@math.washington.edu> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> <724DC697-D01D-4904-A54E-2F190A898BB7@math.washington.edu> Message-ID: <48084B26.6020003@behnel.de> Hi, Robert Bradshaw wrote: > Reading CEP 513 again, I think the whole thing boils down to two things: > > 1) Automatic cimport on import, under certain conditions (e.g. if a > corresponding .pxd file is found in the internal cython path) > 2) Function overloading > > The only reason Python didn't have (2) is because it had no way of > declaring types, and hence distinguishing between overloaded > functions (well, perhaps parameter count, but that's not very > compelling)). This will change in Python 3000 http://www.python.org/ > dev/peps/pep-3124/ , and I think we should backport that to Cython > (maybe not the whole PEP, but overloadable functions using @overload > or type signatures. In math.pxd one would declare sqrt for several > native types, and the object one would be the fallback. Def vs. cdef > could be interchangeable at this level. The @overload decorator lives in a package called "overloading", according to the examples. Would it make sense to distinguish between from overloading import overload and from overloading cimport overload ? Mind the "cimport" in the second example. Cython could handle cimported decorators at compile time and move the evaluation of Python-imported decorators to runtime, thus not even caring about where the decorator comes from or what it does. Should we use different semantics here or would this just complicate things? Stefan From robertwb at math.washington.edu Fri Apr 18 09:29:03 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 18 Apr 2008 00:29:03 -0700 Subject: [Cython] The newbie's question about wrapping c++ class with pyrex In-Reply-To: <48083481.9050405@behnel.de> References: <4807FDA7.1090109@Sun.COM> <48083481.9050405@behnel.de> Message-ID: <85468C66-173A-440E-8C3F-A2E6A0DE1B01@math.washington.edu> On Apr 17, 2008, at 10:41 PM, Stefan Behnel wrote: > Hi, > > Robert Bradshaw wrote: >> I'd like to add this example to the automated testing infrastructure, >> but it seems unclear how to do so with the current runtests.py. > > Hmm, sure it's a C++ example, so Cython/distutils will have to know > that in > advance. > > I think that's the same problem as Py2/Py3 source code. It's > actually a > different language that you want to target with your source (as a > backend like > C/C++ or as a frontend like Py2/3), but there is no way to say so > from within > your source file. > > But at least for the test suite, we could add a comment like "#c++" > in the > first line, read the first few bytes of each test file and > configure the > distutils Extension language option accordingly. This is discussed a bit in http://wiki.cython.org/enhancements/build I think it would be nice to come up with a specification (like the encoding one proposed for Python) to specify several things, including c vs. c++, Python 2.x vs 3.0, libraries/extra c files needed, etc. rather than having to put all this logic into setup.py. For the Sage project, it seems that *every single person* who makes their first .pyx file in the Sage library wonders why it doesn't get compiled and loaded (because they didn't know they needed to add it to our (massive) setup.py). Not that we should do away with setup.py, but many of the options could be specified more locally. - Robert From robertwb at math.washington.edu Fri Apr 18 09:43:41 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 18 Apr 2008 00:43:41 -0700 Subject: [Cython] CEP 507/513 In-Reply-To: <48084B26.6020003@behnel.de> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> <724DC697-D01D-4904-A54E-2F190A898BB7@math.washington.edu> <48084B26.6020003@behnel.de> Message-ID: On Apr 18, 2008, at 12:17 AM, Stefan Behnel wrote: > Hi, > > Robert Bradshaw wrote: >> Reading CEP 513 again, I think the whole thing boils down to two >> things: >> >> 1) Automatic cimport on import, under certain conditions (e.g. if a >> corresponding .pxd file is found in the internal cython path) >> 2) Function overloading >> >> The only reason Python didn't have (2) is because it had no way of >> declaring types, and hence distinguishing between overloaded >> functions (well, perhaps parameter count, but that's not very >> compelling)). This will change in Python 3000 http://www.python.org/ >> dev/peps/pep-3124/ , and I think we should backport that to Cython >> (maybe not the whole PEP, but overloadable functions using @overload >> or type signatures. In math.pxd one would declare sqrt for several >> native types, and the object one would be the fallback. Def vs. cdef >> could be interchangeable at this level. > > The @overload decorator lives in a package called "overloading", > according to > the examples. > > Would it make sense to distinguish between > > from overloading import overload > > and > > from overloading cimport overload > > ? Mind the "cimport" in the second example. Cython could handle > cimported > decorators at compile time and move the evaluation of Python-imported > decorators to runtime, thus not even caring about where the > decorator comes > from or what it does. > > Should we use different semantics here or would this just > complicate things? I think this would just complicate things...the semantics should be identical, but if we can do static linking we should. Sometimes runtime dispatching will be needed though. cdef foo(int i): print "Got an int", i cdef foo(double d): print "Got a double", d It is clear overloading is intended, does one have to make it explicit? (I think probably so, just for consistency, even though I think it makes things unnecessarily more verbose.) - Robert From dagss at student.matnat.uio.no Fri Apr 18 09:53:52 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 18 Apr 2008 09:53:52 +0200 Subject: [Cython] CEP 507/513 In-Reply-To: <724DC697-D01D-4904-A54E-2F190A898BB7@math.washington.edu> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> <724DC697-D01D-4904-A54E-2F190A898BB7@math.washington.edu> Message-ID: <48085390.9040706@student.matnat.uio.no> Yes, I realize that CEP 513 must be cleaned up quite a bit at this stage (these things happen -- concepts aren't clear in my mind until a week has passed and I can write it for the third time. I suppose branding it a CEP right away was the error.). I've decided to spend my Cython time on exploring phase refactoring this week though, so I don't want to switch my trail of thoughts just yet (this lies much further ahead in time anyway). I'll work through the CEP in detail sometime later. > Reading CEP 513 again, I think the whole thing boils down to two things: > > 1) Automatic cimport on import, under certain conditions (e.g. if a > corresponding .pxd file is found in the internal cython path) > 2) Function overloading > Yeah. Likely it will end up as a small CEP advocating 1). Function overloading lives in CEP 502 (and is mentioned in my NumPy project proposal as well). Note that PEP 3124 is (unfortunately) deferred, as far as I can see? (however it wouldn't hurt to converge syntax with it anyway) Not sure in what detail you meant, but I do consider it a bit unwieldy to have to use two different type declaration syntaxes (one for typing and one for overloading); I think one should treat the issue of overloading orthogonally to the issue of where the type is declared. I.e.: - One issue is declaring variables as Python types (my addition to CEP 507), if only as a simple typecheck facility. If not, overloading can only distinguish between builtins and builtins vs. object. (And I am perfectly ok with that because the stuff that I propose to use overloading for is mainly depending on C types or C types vs. object.) - Another issue is whether we add support for decorator-style type declarations. This would mainly have to do with the syntax and parser. In particular, I think there should be a 1:1 correspondance between the functionality of def foo(int bar) and def foo(bar: int) (or, alternatively, @cython.typed def too(bar: int) in order to allow for other decorator use). The alternative (that there are subtly different meanings between "T x" and "x: T") seems very confusing to use. With the @cython.typed decorator the contents of decorators can be treated in type context so it will be unambigious and the same as today. But there are usability problems; @cython.typed def foo(arr: ndarray(2)) is going to look a lot more like calling the constructor ndarray, not using the parametrized type ndarray. (Which is why I ended up thinking we should make parametrized types look less like constructors, somehow). > Taking external types into consideration (see http:// > www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/Manual/ > extension_types.html#ExternalExtTypes ), I think the rest of the CEP > could be greatly simplified. It is possible I missed something not > covered by (1) and (2), but I believe this is the gist of the proposal. > I certainly has extension types in the back of my mind when I wrote it, but that didn't cater for the CEP 507 addons and so.... I'll clean it up some day. -- Dag Sverre From michael.abshoff at googlemail.com Fri Apr 18 09:28:18 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Fri, 18 Apr 2008 09:28:18 +0200 Subject: [Cython] The newbie's question about wrapping c++ class with pyrex In-Reply-To: <85468C66-173A-440E-8C3F-A2E6A0DE1B01@math.washington.edu> References: <4807FDA7.1090109@Sun.COM> <48083481.9050405@behnel.de> <85468C66-173A-440E-8C3F-A2E6A0DE1B01@math.washington.edu> Message-ID: <48084D92.2080908@gmail.com> Robert Bradshaw wrote: > On Apr 17, 2008, at 10:41 PM, Stefan Behnel wrote: >> Hi, >> >> Robert Bradshaw wrote: >>> I'd like to add this example to the automated testing infrastructure, >>> but it seems unclear how to do so with the current runtests.py. >> Hmm, sure it's a C++ example, so Cython/distutils will have to know >> that in >> advance. >> >> I think that's the same problem as Py2/Py3 source code. It's >> actually a >> different language that you want to target with your source (as a >> backend like >> C/C++ or as a frontend like Py2/3), but there is no way to say so >> from within >> your source file. >> >> But at least for the test suite, we could add a comment like "#c++" >> in the >> first line, read the first few bytes of each test file and >> configure the >> distutils Extension language option accordingly. > > This is discussed a bit in > > http://wiki.cython.org/enhancements/build > > I think it would be nice to come up with a specification (like the > encoding one proposed for Python) to specify several things, > including c vs. c++, Python 2.x vs 3.0, libraries/extra c files > needed, etc. rather than having to put all this logic into setup.py. > For the Sage project, it seems that *every single person* who makes > their first .pyx file in the Sage library wonders why it doesn't get > compiled and loaded (because they didn't know they needed to add it > to our (massive) setup.py). Not that we should do away with setup.py, > but many of the options could be specified more locally. Yes, I think that would be great since there are also some issues like that when using Cython from Sage's notebook interface. The wish to specific c99 for example has come up and moving all the build logic into pyx files would also make the coexistence of the new parallel build system in Sage by Gary with the old setup.py based code much easier. > - Robert Cheers, Michael > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > From robertwb at math.washington.edu Fri Apr 18 10:21:04 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 18 Apr 2008 01:21:04 -0700 Subject: [Cython] CEP 507/513 In-Reply-To: <48085390.9040706@student.matnat.uio.no> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> <724DC697-D01D-4904-A54E-2F190A898BB7@math.washington.edu> <48085390.9040706@student.matnat.uio.no> Message-ID: <7BC24727-E1AB-470F-9E8E-DF47667E0E99@math.washington.edu> On Apr 18, 2008, at 12:53 AM, Dag Sverre Seljebotn wrote: > Yes, I realize that CEP 513 must be cleaned up quite a bit at this > stage > (these things happen -- concepts aren't clear in my mind until a week > has passed and I can write it for the third time. I suppose > branding it > a CEP right away was the error.). I've decided to spend my Cython time > on exploring phase refactoring this week though, so I don't want to > switch my trail of thoughts just yet (this lies much further ahead in > time anyway). I'll work through the CEP in detail sometime later. That's fine, it's good to put stuff in writing. Lots of CEPs have popped up recently (lots due to old ideas that hadn't been put down before) but I think perhaps things should be nailed down more/ discussed on the mailing list before a formal CEP is created (otherwise, at this rate, three-digit CEPs won't be enough :). > >> Reading CEP 513 again, I think the whole thing boils down to two >> things: >> >> 1) Automatic cimport on import, under certain conditions (e.g. if a >> corresponding .pxd file is found in the internal cython path) >> 2) Function overloading >> > Yeah. Likely it will end up as a small CEP advocating 1). > > Function overloading lives in CEP 502 (and is mentioned in my NumPy > project proposal as well). > > Note that PEP 3124 is (unfortunately) deferred, as far as I can see? > (however it wouldn't hurt to converge syntax with it anyway) I agree. I think we want function overloading, as with static binding, non-python types, and external libraries we have a *lot* more to gain than plain vanilla Python. This makes me even more likely to accept function overloading without an explicit @overload decorator (for old-style typed functions where there can be no ambiguity). > Not sure in what detail you meant, but I do consider it a bit unwieldy > to have to use two different type declaration syntaxes (one for typing > and one for overloading); I think one should treat the issue of > overloading orthogonally to the issue of where the type is > declared. I.e.: Yes. This has always been how I thought of it too. > - One issue is declaring variables as Python types (my addition to CEP > 507), if only as a simple typecheck facility. If not, overloading can > only distinguish between builtins and builtins vs. object. (And I am > perfectly ok with that because the stuff that I propose to use > overloading for is mainly depending on C types or C types vs. object.) > > - Another issue is whether we add support for decorator-style type > declarations. This would mainly have to do with the syntax and parser. > > In particular, I think there should be a 1:1 correspondance between > the > functionality of > > def foo(int bar) > > and > > def foo(bar: int) > > (or, alternatively, > > @cython.typed > def too(bar: int) > > in order to allow for other decorator use). The alternative (that > there > are subtly different meanings between "T x" and "x: T") seems very > confusing to use. > > With the @cython.typed decorator the contents of decorators can be > treated in type context so it will be unambigious and the same as > today. I don't like the idea of requiring @cython.typed everywhere, way too verbose. The "x: T" notation will have the disadvantage that if T is not interpretable as a type, it will just ignore it (maybe with a warning in strict mode or something). > But there are usability problems; > > @cython.typed > def foo(arr: ndarray(2)) > > is going to look a lot more like calling the constructor ndarray, not > using the parametrized type ndarray. (Which is why I ended up thinking > we should make parametrized types look less like constructors, > somehow). I agree with this. >> Taking external types into consideration (see http:// >> www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/Manual/ >> extension_types.html#ExternalExtTypes ), I think the rest of the CEP >> could be greatly simplified. It is possible I missed something not >> covered by (1) and (2), but I believe this is the gist of the >> proposal. >> > I certainly has extension types in the back of my mind when I wrote > it, > but that didn't cater for the CEP 507 addons and so.... I'll clean > it up > some day. OK. I added a comment to the top to that effect. - Robert From dagss at student.matnat.uio.no Fri Apr 18 10:37:53 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 18 Apr 2008 10:37:53 +0200 Subject: [Cython] CEP 507/513 In-Reply-To: <7BC24727-E1AB-470F-9E8E-DF47667E0E99@math.washington.edu> References: <47FF5F35.5040409@student.matnat.uio.no> <7AB3D0CC-E6B3-4C19-92F6-82E86CB86675@math.washington.edu> <4800BC43.6030208@student.matnat.uio.no> <614B9B60-4F31-49A0-A271-17F7F0E93325@math.washington.edu> <4800F1BE.5030401@student.matnat.uio.no> <3E04E7B5-884E-4177-8F5A-82D301F58333@math.washington.edu> <4801F320.80804@martincmartin.com> <48020C71.6090100@student.matnat.uio.no> <724DC697-D01D-4904-A54E-2F190A898BB7@math.washington.edu> <48085390.9040706@student.matnat.uio.no> <7BC24727-E1AB-470F-9E8E-DF47667E0E99@math.washington.edu> Message-ID: <48085DE1.2000901@student.matnat.uio.no> > I don't like the idea of requiring @cython.typed everywhere, way too > verbose. The "x: T" notation will have the disadvantage that if T is > not interpretable as a type, it will just ignore it (maybe with a > warning in strict mode or something). This is a whole different discussion, but just wanted to note an idea I had: For "cdef" functions, one could drop the decorator and only allow types (raising syntax errors otherwise -- runtime introspection won't be available to read the decorators anyway, so that's the only usecase). On "def" functions I'd go for decorator though, as a means of sanely compiling Python 3 code that is not written with Cython in mind at all. Also, I'm going a little back on the orthogonality: Within the "x: T" syntax one might want to think seriously about not allowing some of the type C syntax, ie require importing type names (without spaces) from cython.types and so on. Otherwise, compiling pure Python 3 code with "x: int" (if we find we can treat it as a Cython type declaration) will have problems as well. -- Dag Sverre From robertwb at math.washington.edu Fri Apr 18 10:58:45 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 18 Apr 2008 01:58:45 -0700 Subject: [Cython] Question about providing functions to existing C code In-Reply-To: <4804E068.9040109@semipol.de> References: <4804E068.9040109@semipol.de> Message-ID: <13A0F221-3C62-4255-958F-757EE06CC683@math.washington.edu> On Apr 15, 2008, at 10:05 AM, Johannes Wienke wrote: > Hi, > > I hope this is the right way to ask questions about cython. > > I am currently wrappin an existing C plugin API into a Python project. > The way from my plugin loader into the plugins is not the problem, but > the original software provides a bunch of function that the plugins > can > use. Is there a way to provide these functions with cython, that means > reimplementing them with cython but generating them as defined by the > existing header files? I am having trouble understanding what you're asking for, but perhaps http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/ Manual/external.html would help? - Robert -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://codespeak.net/pipermail/cython-dev/attachments/20080418/d66551fc/attachment.pgp From stefan_ml at behnel.de Fri Apr 18 11:12:52 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 18 Apr 2008 11:12:52 +0200 Subject: [Cython] Question about providing functions to existing C code In-Reply-To: <4804E068.9040109@semipol.de> References: <4804E068.9040109@semipol.de> Message-ID: <48086614.4060902@behnel.de> Hi, Johannes Wienke wrote: > I am currently wrappin an existing C plugin API into a Python project. > The way from my plugin loader into the plugins is not the problem, but > the original software provides a bunch of function that the plugins can > use. Is there a way to provide these functions with cython, that means > reimplementing them with cython but generating them as defined by the > existing header files? Are you asking about automated wrapper generation from header files? Cython will not do that for you. There are tools like SWIG or sip (C++) that are made for that purpose. Cython is better when there are non-trivial things happening inside the wrapper, while the other two are better when it comes to providing the same and potentially large API to a different language. Stefan From stefan_ml at behnel.de Fri Apr 18 11:25:28 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 18 Apr 2008 11:25:28 +0200 Subject: [Cython] reply-to header In-Reply-To: References: <2D243B47-EE17-41BF-B354-B4DB8720FA21@math.washington.edu> Message-ID: <48086908.4000800@behnel.de> Hi, Robert Bradshaw wrote: >>> BTW, any specific reason you took these discussions off-list (as >>> I think they would be general interest?) >> Ups! just because I simply pick 'reply' in Gmail, and you continued >> the thread writting to me and CC'ing to the list. In other mailing >> lists configuration, the 'reply-to' is by default set to the list. > >> BTW, why cython-devel list does not specify a 'reply-to' by default >> being the list itself? > > This has annoyed me too, but it just now hit me that I could probably > go in and change it. Done. I actually find the changed behaviour annoying. I'm quite capable of deciding who I want to reply to. A "reply all" is the intuitively obvious thing to do when replying to all members of the mailing list. Currently, there is no difference between "reply"ing to the author of an e-mail and replying to the list. Normally, when I hit "reply all", my e-mail client (Thunderbird) will reply to the author directly and to the list in CC. Mailman handles that just fine, and it just feels right. When I hit "reply", I find it quite obvious that I want to reply to the author. So, when I want to take a discussion off-line, I have to hit "reply" (or "reply-all"), delete the addressee copied from the "reply-to" header and then copy the e-mail address of the author over to the e-mail. Now, that's annoying. It was definitely not the Cython list that was broken here. Any chance we could change that back? Stefan From robertwb at math.washington.edu Fri Apr 18 11:43:25 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 18 Apr 2008 02:43:25 -0700 Subject: [Cython] reply-to header In-Reply-To: <48086908.4000800@behnel.de> References: <2D243B47-EE17-41BF-B354-B4DB8720FA21@math.washington.edu> <48086908.4000800@behnel.de> Message-ID: <94F3A24D-BB43-4F67-AAF1-C67424050E7C@math.washington.edu> On Apr 18, 2008, at 2:25 AM, Stefan Behnel wrote: > Hi, > > Robert Bradshaw wrote: >>>> BTW, any specific reason you took these discussions off-list (as >>>> I think they would be general interest?) >>> Ups! just because I simply pick 'reply' in Gmail, and you continued >>> the thread writting to me and CC'ing to the list. In other mailing >>> lists configuration, the 'reply-to' is by default set to the list. >> >>> BTW, why cython-devel list does not specify a 'reply-to' by default >>> being the list itself? >> >> This has annoyed me too, but it just now hit me that I could probably >> go in and change it. Done. > > I actually find the changed behaviour annoying. I'm quite capable > of deciding > who I want to reply to. A "reply all" is the intuitively obvious > thing to do > when replying to all members of the mailing list. Currently, there > is no > difference between "reply"ing to the author of an e-mail and > replying to the list. > > Normally, when I hit "reply all", my e-mail client (Thunderbird) > will reply to > the author directly and to the list in CC. Mailman handles that > just fine, and > it just feels right. When I hit "reply", I find it quite obvious > that I want > to reply to the author. So, when I want to take a discussion off- > line, I have > to hit "reply" (or "reply-all"), delete the addressee copied from the > "reply-to" header and then copy the e-mail address of the author > over to the > e-mail. Now, that's annoying. > > It was definitely not the Cython list that was broken here. Any > chance we > could change that back? I like using the reply-to header because I want to avoid accidentally taking discussions offline (and even if I'm really careful I can't change the behavior of others). It is only in the (uncommon) cases of taking it offline that I want to stop and think about who is in the recipient list. Something like this is clearly a matter of opinion. I think we should have a poll among those who actually use the list. My vote is +1 to using the reply-to header. Yours is obviously -1. I'm willing to concede on a tie but lets see if anyone else has an opinion. - Robert From dagss at student.matnat.uio.no Fri Apr 18 12:05:46 2008 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 18 Apr 2008 12:05:46 +0200 Subject: [Cython] reply-to header In-Reply-To: <94F3A24D-BB43-4F67-AAF1-C67424050E7C@math.washington.edu> References: <2D243B47-EE17-41BF-B354-B4DB8720FA21@math.washington.edu> <48086908.4000800@behnel.de> <94F3A24D-BB43-4F67-AAF1-C67424050E7C@math.washington.edu> Message-ID: <4808727A.3090902@student.matnat.uio.no> > Something like this is clearly a matter of opinion. I think we should > have a poll among those who actually use the list. My vote is +1 to > using the reply-to header. Yours is obviously -1. I'm willing to > concede on a tie but lets see if anyone else has an opinion. +1. I never got around to modifying my mail filters to inspect CC as well as List-id for inbox filtering, and now I don't have to :-) And I find it is less tiring, I always proofread the To and CC earlier for some reason. (Though usually the argument is against because it can be confusing to newcomer to mailing lists who then post something that should have been offlist onlist; which has potentially much much worse consequences than the opposite. But I think the cython-dev audience is experienced enough...) -- Dag Sverre From michael.abshoff at googlemail.com Fri Apr 18 11:37:46 2008 From: michael.abshoff at googlemail.com (Michael.Abshoff) Date: Fri, 18 Apr 2008 11:37:46 +0200 Subject: [Cython] reply-to header In-Reply-To: <4808727A.3090902@student.matnat.uio.no> References: <2D243B47-EE17-41BF-B354-B4DB8720FA21@math.washington.edu> <48086908.4000800@behnel.de> <94F3A24D-BB43-4F67-AAF1-C67424050E7C@math.washington.edu> <4808727A.3090902@student.matnat.uio.no> Message-ID: <48086BEA.5080403@gmail.com> Dag Sverre Seljebotn wrote: >> Something like this is clearly a matter of opinion. I think we should >> have a poll among those who actually use the list. My vote is +1 to >> using the reply-to header. Yours is obviously -1. I'm willing to >> concede on a tie but lets see if anyone else has an opinion. > > +1. I never got around to modifying my mail filters to inspect CC as > well as List-id for inbox filtering, and now I don't have to :-) > > And I find it is less tiring, I always proofread the To and CC earlier > for some reason. > > (Though usually the argument is against because it can be confusing to > newcomer to mailing lists who then post something that should have been > offlist onlist; which has potentially much much worse consequences than > the opposite. But I think the cython-dev audience is experienced enough...) > I can find arguments for either way, so a 0 from me on this. It is customary on mailing lists to go with Stefan's suggestion because most mailers have a "reply to" and a "reply-to-all" mode. Google mail seems to be an exception and I am surprised that Google hasn't fixed the problem. I have had similar discussion about the same issue with Ondrej Certik for example and at http://www.unicom.com/pw/reply-to-harmful.html you will find some argument for the position Stefan is taking. In the end it is all about convenience and it happens to me too every once in a a while that I take things off-list when I shouldn't. On the other hand I have never replied to the list when I intended to take something off-list, but that has probably to do with the fact that most of my email stay on-list. Cheers, Michael From martin at martincmartin.com Fri Apr 18 12:49:26 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Fri, 18 Apr 2008 06:49:26 -0400 Subject: [Cython] reply-to header In-Reply-To: <94F3A24D-BB43-4F67-AAF1-C67424050E7C@math.washington.edu> References: <2D243B47-EE17-41BF-B354-B4DB8720FA21@math.washington.edu> <48086908.4000800@behnel.de> <94F3A24D-BB43-4F67-AAF1-C67424050E7C@math.washington.edu> Message-ID: <48087CB6.5050208@martincmartin.com> +1 for reply-to header, since it seems to be the standard these days. Having Cython-dev be different than the vast majority of lists would be confusing. Best, Martin Robert Bradshaw wrote: > On Apr 18, 2008, at 2:25 AM, Stefan Behnel wrote: > >> Hi, >> >> Robert Bradshaw wrote: >>>>> BTW, any specific reason you took these discussions off-list (as >>>>> I think they would be general interest?) >>>> Ups! just because I simply pick 'reply' in Gmail, and you continued >>>> the thread writting to me and CC'ing to the list. In other mailing >>>> lists configuration, the 'reply-to' is by default set to the list. >>>> BTW, why cython-devel list does not specify a 'reply-to' by default >>>> being the list itself? >>> This has annoyed me too, but it just now hit me that I could probably >>> go in and change it. Done. >> I actually find the changed behaviour annoying. I'm quite capable >> of deciding >> who I want to reply to. A "reply all" is the intuitively obvious >> thing to do >> whe