From robertwb at math.washington.edu Thu Jan 24 21:10:17 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 24 Jan 2008 12:10:17 -0800 Subject: [Cython] [Pyrex] faster in / output from objects [long post + code!] In-Reply-To: <47988379.6060300@zw9.nl> References: <47988379.6060300@zw9.nl> Message-ID: <0D3A9E02-F696-449B-ABB4-ACE9CE778299@math.washington.edu> On Jan 24, 2008, at 4:24 AM, Martijn Meijers wrote: > Dear list members, > > Currently I'm working in the geo-informatics field and I'm doing > research on storage of vector data in a DBMS. For this my programming > language of choice is Python. . Although there are some vector > libraries > in C with Python bindings, I feel that those are not really > comfortable > to work with (due to their API).Therefore, I decided to roll my own > library for educational and research purposes and I'm using Cython for > this purpose (as I'm not really proficient in C or C++, and I'm not > really willing to go that route, as it involves quite a steep learning > curve). Sounds like a good choice. > Below, you'll find my library that I created. Creation of objects is > fairly fast, compared with the C-lib-with-python bindings that I have > available for comparison (my approach is around 1.5 times faster with > object creation). However, I'm stuck with in/output of my objects: Two > formats I'd like to support: a text based format and a binary format. > Here, I have the feeling I don't understand how I can use Cython to > push > the throughput to the limits. My approach (with Visitors) is fairly > slow. As I understand it, Cython is more geared towards (mathematical) > computations, then to text processing... Our Sage branch of Pyrex used to be called SageX, and we were all surprised after the first year how little our improvements were specific to the mathematics infrastructure we were supporting. However, it is true that the Python/C api doesn't make it easy to naively do fast string processing without having to think about the underlaying string representation. > > I'd like to know some things about my code: > (a) Did I do things the right way, or can the code be optimized more > (while staying in Cython)? Lots. I didn't read all of your code, but here's some things that jumped out at me: 1) Use a more object-oriented style (this should clean up code as well as optimizing). E.g. def is_empty(Geometry geom): if geom.type == __POINT: return False # Point cannot be empty, at the moment elif geom.type == __LINESTRING: return num_points(geom) == 0 elif geom.type == __POLYGON: return num_rings(geom) == 0 would be better as a method of Point, LineString, and Polygon rather than branching on geom.type 2) Store just the actual data, rather than list of python objects wrapping the data. E.g. in LineString, rather than points being a python list, let it be a c-array of Coordinate structs. Only construct the Point class for __getitem__ or other methods that expose it to the outside. 3) You're using def functions all over the place, consider using more cdef (or cpdef) functions. > (b) Is it possible to speed up the in- and output of text and binary > formats (here a lot of python functions are still used, but I can't > seem > to find examples of how to do text/binary stream processing with > Cython)...? See above, especially (3). If one's writing to a file, one can access the c FILE* pointer and operate on that directly. I notice you keep converting back and forth between strings and streams--this has got to be expensive. I had to write something that is very similar to what you're doing (but in 3d) and the fastest way I found was to output a (possibly) neseted list of strings, which are then joined at the very end. See http://www.sagemath.org/hg/sage-main/file/a66354d13708/sage/ plot/plot3d/index_face_set.pyx specifically [tachyon | obj | jmol]_repr(). this is passed to an extremely optimized "flatten_list" command at the end of http://www.sagemath.org/hg/sage-main/file/a66354d13708/sage/ plot/plot3d/base.pyx Also relevant is http://www.sagemath.org/hg/sage-main/file/a66354d13708/sage/ plot/plot3d/point_c.pxi Note, code doesn't need to be near as tightly written, or use the Python/C API directly to take advantage of the ideas illustrated.) There's been several requests on this streaming/fast IO, but no examples of using buffers/stringio in cython directly, so I hope the above is useful to lots of people. - Robert > > Thanks very much for your time and advice in advance! > > Kind regards, > > Martijn Meijers > Delft University of Technology, The Netherlands > OTB, Section GIS-technology > ... From martin at martincmartin.com Mon Jan 28 03:08:33 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Sun, 27 Jan 2008 21:08:33 -0500 Subject: [Cython] Documentation link broken Message-ID: <479D3921.9030006@martincmartin.com> Hi, On www.cython.org, the first link under "Documentation," to the Official Pyrex Language Overview, is broken. Best, Martin From robertwb at math.washington.edu Mon Jan 28 19:01:56 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 28 Jan 2008 10:01:56 -0800 Subject: [Cython] Documentation link broken In-Reply-To: <479D3921.9030006@martincmartin.com> References: <479D3921.9030006@martincmartin.com> Message-ID: <9FC12853-9BC7-4E38-82EA-8002FBE9B195@math.washington.edu> Thanks for the note--looks like he renamed the file. I just fixed it. - Robert On Jan 27, 2008, at 6:08 PM, Martin C. Martin wrote: > Hi, > > On www.cython.org, the first link under "Documentation," to the > Official > Pyrex Language Overview, is broken. > > Best, > Martin > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From stefan_ml at behnel.de Tue Jan 29 08:59:58 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 29 Jan 2008 08:59:58 +0100 Subject: [Cython] [Pyrex] [Cython-dev] cdef private class ... ? In-Reply-To: <479E85B3.1000505@canterbury.ac.nz> References: <478659C8.404@behnel.de> <4788D731.8010909@behnel.de> <479E85B3.1000505@canterbury.ac.nz> Message-ID: <479EDCFE.9070500@behnel.de> Hi Greg, Greg Ewing wrote: > Stefan Behnel wrote: >> I looked through the code to see how this would work. Apparently, all >> cdef >> types and cdef functions in Pyrex/Cython *are* "private" by default >> (entry.visibility). But only classes are exported to Python space. >> How's that for consistency! > > The public/private distinction isn't about visibility to Python, > it's about visibility to external C code. > > Visibility to Python is determined by what kind of thing it is. > C types and cdef functions only exist in the C world and are > therefore not visible to Python. Def functions and extension types > are Python objects, so they are visible to Python. But the difference is that there is a way to make functions local (by switching from 'def' to 'cdef'), but there isn't a way to make types local. I just find it a bit of a bad design that 'cdef' changes the visibility in the first case but not in the second. Basically, 'cdef' types behave like being 'cpdef'-ed, rather than 'cdef'-ed, while functions have a clean distinction here. > It would be better to find another word for this distinction, > such as 'hidden' or 'internal', if it's really desirable -- and > I'm not yet convinced that it is. I would really like to have something like this. In large modules, this helps in uncluttering the module namespace. There really are cases where a user of the module has no interest in being bothered with internal implementation details of the 'module type space'. Maybe we're back to the "del MyType" proposal then, which just removes the name from the current scope. I think that would be the least intrusive, after all, but also has an unclear interaction with Pyrex's global scoping rules. Stefan From robertwb at math.washington.edu Wed Jan 30 09:51:02 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 30 Jan 2008 00:51:02 -0800 Subject: [Cython] [Pyrex] [Cython-dev] cdef private class ... ? In-Reply-To: <47A0298F.8090300@behnel.de> References: <478659C8.404@behnel.de> <4788D731.8010909@behnel.de> <479E85B3.1000505@canterbury.ac.nz> <479EDCFE.9070500@behnel.de> <479FA595.3040703@canterbury.ac.nz> <47A0298F.8090300@behnel.de> Message-ID: On Jan 29, 2008, at 11:38 PM, Stefan Behnel wrote: > Hi, > > Greg Ewing wrote: >> Stefan Behnel wrote: >>> I would really like to have something like this. >> >> The main thing that worries me is the cimport issue -- >> if you remove the module dict entry, then other Pyrex >> modules (and any other C-implemented modules, for that >> matter) are prevented from accessing the type as well. >> This applies regardless of whether it's done by a keyword >> or by del. > > Not quite, if you use a keyword (or modifer), you could check at > compile time > that the type does not appear in the associated .pxd, i.e. that it > cannot be > cimported. What you're saying is that you would disallow cimporting a "private" type? Then this might work... While we're on the topic of visibility, what would you think about making any cdef function declared in a pxd file "cimportable" (i.e. public api, or whatever the keyword combination is--I'm never quite sure). This is the only reason one would put a cdef function in a .pxd file. - Robert From robertwb at math.washington.edu Wed Jan 30 10:02:13 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 30 Jan 2008 01:02:13 -0800 Subject: [Cython] Fwd: Cython and C++ exceptions References: <85e81ba30801291823u2f9e16ddt165549fefa754b92@mail.gmail.com> Message-ID: <10032034-D81D-4D54-9A3F-C0089987888A@math.washington.edu> > ---------- Forwarded message ---------- > From: Felix Wu > Date: Jan 29, 2008 8:37 PM > Subject: Cython enhancement suggestion > To: wstein at gmail.com > > > Hi William, > > I have been playing with Cython for a couple of weeks by now, and I > think it is a great tool for exposing C++ APIs to Python (much leaner > than SWIG :). However, our existing C++ API uses C++ exceptions a lot > and it would be a real pain to write many shims just for handling > exceptions. So I kind of hacked Cython compiler (0.9.6.11b) to handle > C++ exceptions directly via the following approach: > > (1) Extend the C (extern) function declaration syntax for the error > value with a fourth form: "except +" > (2) If a C++ function is declared as "R f(A a, B b) except +", the > code generated for its call looks like the following: > try { > __tmp = f(x, y); > } catch(...) { // catch all exceptions > CppExn2PyErr(); > ... set up error file/lineno then goto error_exit > } > (3) CppExn2PyErr() is a user provided function/macro that should > (re-throw and) catch the exception and convert it to proper Python > exception (e.g. via PyErr_SetString). > > I just did minimal hack to make it work for our purpose, which I don't > think it's enough to be included as the standard feature. But > something along this line might be very useful in supporting C++ > better. I'm wondering if you folks have already considered adding C++ > exception handling natively to Cython, if not, would this approach > make any sense for your consideration? > > Thanks a lot, > - Felix Wu > > > > -- > William Stein > Associate Professor of Mathematics > University of Washington > http://wstein.org > > > > -- > William Stein > Associate Professor of Mathematics > University of Washington > http://wstein.org From robertwb at math.washington.edu Wed Jan 30 10:22:59 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 30 Jan 2008 01:22:59 -0800 Subject: [Cython] [Pyrex] Subclassing a non-GC type In-Reply-To: <479EDC85.8060608@telus.net> References: <47855404.2010903@behnel.de> <4788887E.2090401@behnel.de> <4789001A.90106@telus.net> <479E779A.9020901@canterbury.ac.nz> <479EDC85.8060608@telus.net> Message-ID: <5524F92F-87B0-4154-BEA0-682C04E4397F@math.washington.edu> On Jan 28, 2008, at 11:57 PM, Lenard Lindstrom wrote: > Greg Ewing wrote: >> Lenard Lindstrom wrote: >> >> >>> Looking at the inherit_special function in typeobject.c I see that >>> PyType_Ready promotes tp_traverse, tp_clear and HAVE_GC. >>> >> >> In that case, it should be sufficient to just omit tp_traverse, >> tp_clear and HAVE_GC on any type that doesn't have Python >> valued C attributes. >> >> I think the reason I didn't do that initially was that >> PyType_Ready didn't do all the right things back then, and >> not fully understanding what was going on, I didn't want >> to be too clever. >> >> > PyType_Ready does plenty of clever things like fill tp slots when > special methods are found and add special methods when tp slots are > filled. It handles all the inheritance requirements to make an > extension > type look like a new-style class. The code now is a bit more clever than that. Say one has cdef class A: cdef object a cdef class B(A): pass cdef class C(B): pass cdef class D(C): cdef object d The tp_new/clear/traverse/dealloc slots for D need to be implemented to handle the member d, and recursively call up to handle A.a. But since B and C don't have any attributes, it calls directly to A's slots, and what's even better is that if A and D are in the same module, it will call by function name (rather than the pointer in the type object) which facilitates inlining by the C compiler (as these are all tiny functions). - Robert From stefan_ml at behnel.de Wed Jan 30 10:59:46 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 30 Jan 2008 10:59:46 +0100 Subject: [Cython] [Pyrex] [Cython-dev] cdef private class ... ? In-Reply-To: References: <478659C8.404@behnel.de> <4788D731.8010909@behnel.de> <479E85B3.1000505@canterbury.ac.nz> <479EDCFE.9070500@behnel.de> <479FA595.3040703@canterbury.ac.nz> <47A0298F.8090300@behnel.de> Message-ID: <47A04A92.1070806@behnel.de> Hi, Robert Bradshaw wrote: > On Jan 29, 2008, at 11:38 PM, Stefan Behnel wrote: >> Greg Ewing wrote: >>> Stefan Behnel wrote: >>>> I would really like to have something like this. >>> >>> The main thing that worries me is the cimport issue -- >>> if you remove the module dict entry, then other Pyrex >>> modules (and any other C-implemented modules, for that >>> matter) are prevented from accessing the type as well. >>> This applies regardless of whether it's done by a keyword >>> or by del. >> >> Not quite, if you use a keyword (or modifer), you could check at >> compile time >> that the type does not appear in the associated .pxd, i.e. that it >> cannot be >> cimported. > > What you're saying is that you would disallow cimporting a "private" > type? Then this might work... Actually, you could still allow that for the C-API, which could add public types to the "_pyx_capi" dict (which currently only contains C function pointers), and then have external modules import them from there instead of the module dict. But then, a 'public private' type looks kind of scary. Maybe a 'private api' type would match this use case... Anyway, I'd be fine with making 'private' types really internal to the module, including "no cimports". > While we're on the topic of visibility, what would you think about > making any cdef function declared in a pxd file "cimportable" (i.e. > public api, or whatever the keyword combination is--I'm never quite > sure). This is the only reason one would put a cdef function in a .pxd > file. I'll leave this question to Greg, I'm not familiar enough with the current mechanisms here. Stefan From robertwb at math.washington.edu Wed Jan 30 11:06:08 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 30 Jan 2008 02:06:08 -0800 Subject: [Cython] [Pyrex] [Cython-dev] cdef private class ... ? In-Reply-To: <47A04A92.1070806@behnel.de> References: <478659C8.404@behnel.de> <4788D731.8010909@behnel.de> <479E85B3.1000505@canterbury.ac.nz> <479EDCFE.9070500@behnel.de> <479FA595.3040703@canterbury.ac.nz> <47A0298F.8090300@behnel.de> <47A04A92.1070806@behnel.de> Message-ID: <506F02A3-760B-4C81-BAFF-94C44347979A@math.washington.edu> On Jan 30, 2008, at 1:59 AM, Stefan Behnel wrote: > Hi, > > Robert Bradshaw wrote: >> On Jan 29, 2008, at 11:38 PM, Stefan Behnel wrote: >>> Greg Ewing wrote: >>>> Stefan Behnel wrote: >>>>> I would really like to have something like this. >>>> >>>> The main thing that worries me is the cimport issue -- >>>> if you remove the module dict entry, then other Pyrex >>>> modules (and any other C-implemented modules, for that >>>> matter) are prevented from accessing the type as well. >>>> This applies regardless of whether it's done by a keyword >>>> or by del. >>> >>> Not quite, if you use a keyword (or modifer), you could check at >>> compile time >>> that the type does not appear in the associated .pxd, i.e. that it >>> cannot be >>> cimported. >> >> What you're saying is that you would disallow cimporting a "private" >> type? Then this might work... > > Actually, you could still allow that for the C-API, which could add > public > types to the "_pyx_capi" dict (which currently only contains C > function > pointers), and then have external modules import them from there > instead of > the module dict. But then, a 'public private' type looks kind of > scary. Maybe > a 'private api' type would match this use case... > > Anyway, I'd be fine with making 'private' types really internal to > the module, > including "no cimports". That'd certainly be much simpler. I don't think I'd ever personally use the functionality either way though. >> While we're on the topic of visibility, what would you think about >> making any cdef function declared in a pxd file "cimportable" (i.e. >> public api, or whatever the keyword combination is--I'm never quite >> sure). This is the only reason one would put a cdef function in >> a .pxd >> file. > > I'll leave this question to Greg, I'm not familiar enough with the > current > mechanisms here. I know how to do it (that's easy enough), it's a question of whether or not you think it's a good idea. I think it is. - Robert From martin at martincmartin.com Wed Jan 30 16:53:20 2008 From: martin at martincmartin.com (Martin C. Martin) Date: Wed, 30 Jan 2008 10:53:20 -0500 Subject: [Cython] Typing of Python objects? Message-ID: <47A09D70.900@martincmartin.com> Thanks a lot for Pyrex and Cython. They're going to save us a ton of work in speeding up the performance critical parts of our app. Are there any plans to add typing and type inference to Python objects in Cython? For example, it would be great if the following function could call PyDict_SetItemString: cdef foo(char *string, mydict): mydict[string] = 5 Of course, we'd need some way to indicate that mydict is a dict. WingIDE uses this: cdef foo(char *string, mydict): isinstance(mydict, dict) mydict[string] = 5 But in a pinch, this would do: cdef foo(char *string, mydict): if isinstance(mydict, dict): mydict[string] = 5 Or cdef foo(char *string, mydict): assert isinstance(mydict, dict) mydict[string] = 5 For now, I guess I need to do a cdef extern and call PyDict_SetItemString explicitly? Thanks a lot, Martin From stefan_ml at behnel.de Wed Jan 30 17:26:00 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 30 Jan 2008 17:26:00 +0100 Subject: [Cython] Typing of Python objects? In-Reply-To: <47A09D70.900@martincmartin.com> References: <47A09D70.900@martincmartin.com> Message-ID: <47A0A518.8030104@behnel.de> Hi, Martin C. Martin wrote: > Are there any plans to add typing and type inference to Python objects > in Cython? For example, it would be great if the following function > could call PyDict_SetItemString: > > cdef foo(char *string, mydict): > mydict[string] = 5 Note that SetItemString() is not as fast as it might look, as it creates a Python string internally. > Of course, we'd need some way to indicate that mydict is a dict. > WingIDE uses this: > > cdef foo(char *string, mydict): > isinstance(mydict, dict) > mydict[string] = 5 > > But in a pinch, this would do: > > cdef foo(char *string, mydict): > if isinstance(mydict, dict): > mydict[string] = 5 > > Or > > cdef foo(char *string, mydict): > assert isinstance(mydict, dict) > mydict[string] = 5 None of these is enough. You might just as well have a subclass of a dictionary, in which case calling PyDict_SetItem() might not do the right thing. Cython could, however, add this optimisation based on "PyDict_CheckExact()" at runtime - although that would break compatibility to Python 2.3... (ok, minus #ifdef's) I could imagine adding checks for exact types like this: for lists and dicts on item assignments, and additionally for tuples on item access. Cython already does things like these in a couple of places. However, it can become expensive if we add too many conditional branches into simple instructions, both in terms of time (CPU pipeline and branch prediction) and space (code size, CPU cache). I do not know how this compares to the performance of the generic calls. > For now, I guess I need to do a cdef extern and call > PyDict_SetItemString explicitly? Yes, this is how it currently works (and how it definitely works best). Stefan From robertwb at math.washington.edu Wed Jan 30 21:40:04 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 30 Jan 2008 12:40:04 -0800 Subject: [Cython] Typing of Python objects? In-Reply-To: <47A0A518.8030104@behnel.de> References: <47A09D70.900@martincmartin.com> <47A0A518.8030104@behnel.de> Message-ID: We have added runtime type inference for some cases (e.g. tuple indexing) where the specialized function is significantly faster (for example a macro) compared to a generic call. However, too much of this can actually slow things down. What we definitely would like to provide in the future are special types for dict, list, tuple (and maybe some others), so one could write cdef foo(x, dict mydict): mydict[x] = 5 which would use the PyDict_SetItem method directly. This would raise a TypeError if mydict was not exactly a python dictionary object. - Robert On Jan 30, 2008, at 8:26 AM, Stefan Behnel wrote: > Hi, > > Martin C. Martin wrote: >> Are there any plans to add typing and type inference to Python >> objects >> in Cython? For example, it would be great if the following function >> could call PyDict_SetItemString: >> >> cdef foo(char *string, mydict): >> mydict[string] = 5 > > Note that SetItemString() is not as fast as it might look, as it > creates a > Python string internally. > > >> Of course, we'd need some way to indicate that mydict is a dict. >> WingIDE uses this: >> >> cdef foo(char *string, mydict): >> isinstance(mydict, dict) >> mydict[string] = 5 >> >> But in a pinch, this would do: >> >> cdef foo(char *string, mydict): >> if isinstance(mydict, dict): >> mydict[string] = 5 >> >> Or >> >> cdef foo(char *string, mydict): >> assert isinstance(mydict, dict) >> mydict[string] = 5 > > None of these is enough. You might just as well have a subclass of a > dictionary, in which case calling PyDict_SetItem() might not do the > right > thing. Cython could, however, add this optimisation based on > "PyDict_CheckExact()" at runtime - although that would break > compatibility to > Python 2.3... (ok, minus #ifdef's) > > I could imagine adding checks for exact types like this: for lists > and dicts > on item assignments, and additionally for tuples on item access. > Cython > already does things like these in a couple of places. However, it > can become > expensive if we add too many conditional branches into simple > instructions, > both in terms of time (CPU pipeline and branch prediction) and > space (code > size, CPU cache). I do not know how this compares to the > performance of the > generic calls. > > >> For now, I guess I need to do a cdef extern and call >> PyDict_SetItemString explicitly? > > Yes, this is how it currently works (and how it definitely works > best). > > Stefan > > _______________________________________________ > Cython-dev mailing list > Cython-dev at codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev From robertwb at math.washington.edu Thu Jan 31 01:31:59 2008 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 30 Jan 2008 16:31:59 -0800 Subject: [Cython] [Pyrex] Subclassing a non-GC type In-Reply-To: <47A10CD8.3030505@canterbury.ac.nz> References: <47855404.2010903@behnel.de> <4788887E.2090401@behnel.de> <4789001A.90106@telus.net> <479E779A.9020901@canterbury.ac.nz> <479EDC85.8060608@telus.net> <5524F92F-87B0-4154-BEA0-682C04E4397F@math.washington.edu> <47A10CD8.3030505@canterbury.ac.nz> Message-ID: <944DF382-77B4-4442-9E99-48092BBDF7D1@math.washington.edu> On Jan 30, 2008, at 3:48 PM, Greg Ewing wrote: > Robert Bradshaw wrote: >> if A and D are in the same >> module, it will call by function name (rather than the pointer in the >> type object) which facilitates inlining by the C compiler (as these >> are all tiny functions). > > I'm not convinced that such inlining would produce enough > of a performance gain to be worth the extra complexity in > the Pyrex compiler. The last thing it wants at the moment > is any more complexity than it really needs. For ephemeral objects deep down in the inheritance tree (for example, our integers in Sage) it does, but you're right that for many cases there would be virtually no gain. - Robert