[Cython] Profiling Cython
Robert Bradshaw
robertwb at math.washington.edu
Thu Jun 12 02:06:32 CEST 2008
On Jun 11, 2008, at 4:51 PM, Ondrej Certik wrote:
> On Sun, Jun 8, 2008 at 2:59 PM, Stefan Behnel <stefan_ml at behnel.de>
> wrote:
>> Hi,
>>
>> I did a tiny bit of profiling on Cython compiling lxml.etree. Here
>> are the
>> numbers I get:
>>
>> """
>> 3127018 function calls (2777951 primitive calls) in 25.128
>> CPU seconds
>>
>> Ordered by: internal time, call count
>> List reduced from 1035 to 20 due to restriction <20>
>>
>> ncalls tottime percall cumtime percall filename:lineno
>> (function)
>> 119144 4.579 0.000 4.671 0.000 Scanners.py:148
>> (run_machine_inlined)
>> 18316 1.586 0.000 1.586 0.000 codecs.py:371(read)
>> 119144 1.074 0.000 5.746 0.000 Scanners.py:109
>> (scan_a_token)
>> 88362 1.023 0.000 7.976 0.000 Scanners.py:88(read)
>> 29680 0.673 0.000 1.659 0.000 Code.py:103(mark_pos)
>> 77 0.588 0.008 0.588 0.008 posixpath.py:168(exists)
>> 88362 0.538 0.000 8.514 0.000 Scanning.py:397(next)
>> 23318/2985 0.495 0.000 0.517 0.000 Nodes.py:155(end_pos)
>> 70446 0.442 0.000 0.591 0.000 Code.py:62(put)
>> 90657/10562 0.351 0.000 4.549 0.000 Parsing.py:59
>> (p_binop_expr)
>> [...]
>> """
>>
>> So the major headache here is Scanners.py in Plex. The method at
>> the top-rank
>> is a huge function. According to the comments, it's the result of
>> inlining a
>> couple of method calls that originally lead to slow code, and it
>> looks heavily
>> profiled already.
>>
>> Assuming that further optimisation attempts were rather futile, I
>> just
>> compiled the module with Cython. The first (obvious) result is
>> that the
>> internal calls disappear from the profile log, as they are now
>> internal C
>> calls. The call that remains is Scanner.next(), which originally
>> took an
>> accumulated 8.5 seconds. In the compiled version, it's down to
>> just over 5
>> seconds, that's more than 40 percent faster.
>>
>> """
>> 2595627 function calls (2246560 primitive calls) in 18.681
>> CPU seconds
>>
>> Ordered by: internal time, call count
>> List reduced from 1028 to 20 due to restriction <20>
>>
>> ncalls tottime percall cumtime percall filename:lineno
>> (function)
>> 88362 4.246 0.000 5.041 0.000 Scanning.py:397(next)
>> 29680 0.673 0.000 1.632 0.000 Code.py:103(mark_pos)
>> 70446 0.439 0.000 0.586 0.000 Code.py:62(put)
>> 90657/10562 0.335 0.000 3.228 0.000 Parsing.py:59
>> (p_binop_expr)
>> 23318/2985 0.316 0.000 0.338 0.000 Nodes.py:155(end_pos)
>> 65791 0.295 0.000 0.903 0.000 Code.py:47(putln)
>> 29677 0.289 0.000 0.915 0.000 Code.py:93
>> (file_contents)
>> 4724 0.287 0.000 0.292 0.000 Symtab.py:532
>> (allocate_temp)
>> 88070 0.247 0.000 0.247 0.000 ExprNodes.py:192
>> (subexpr_nodes)
>> 52071 0.232 0.000 0.232 0.000 Nodes.py:82(__init__)
>> [...]
>> """
>>
>> In total, I get an improvement of 12% in compilation time. That
>> makes me think
>> that it's actually worth putting the compilation into Cython's own
>> setup.py,
>> and installing the compiled Scanning module next to the Python one
>> (Python
>> prefers C extensions on import). Here's a patch, what do you think?
I was looking at this patch a bit earlier and have to say this is
very neat.
> Maybe it'd be cool if one could instruct cython using comments, or
> strings, so that one has one file, that works in regular python, and
> you can use it to compile itself, but more optimized, than your patch
> does. Otherwise, how can you improve the cythonized code (not having
> to maintain the pure python and cython codes side by side)?
Hiding semantic information in comments and strings is something we
want to avoid. However we have discussed doing this via decorators
(which wouldn't do anything in Python, but would act as compiler
hints to Cython). Functoin annotations ( http://www.python.org/dev/
peps/pep-3107/ ) and type inference could make a huge difference here
as well.
- Robert
More information about the Cython-dev
mailing list