[Cython] Profiling Cython
Ondrej Certik
ondrej at certik.cz
Thu Jun 12 03:41:58 CEST 2008
On Thu, Jun 12, 2008 at 2:06 AM, Robert Bradshaw
<robertwb at math.washington.edu> wrote:
> On Jun 11, 2008, at 4:51 PM, Ondrej Certik wrote:
>
>> On Sun, Jun 8, 2008 at 2:59 PM, Stefan Behnel <stefan_ml at behnel.de>
>> wrote:
>>> Hi,
>>>
>>> I did a tiny bit of profiling on Cython compiling lxml.etree. Here
>>> are the
>>> numbers I get:
>>>
>>> """
>>> 3127018 function calls (2777951 primitive calls) in 25.128
>>> CPU seconds
>>>
>>> Ordered by: internal time, call count
>>> List reduced from 1035 to 20 due to restriction <20>
>>>
>>> ncalls tottime percall cumtime percall filename:lineno
>>> (function)
>>> 119144 4.579 0.000 4.671 0.000 Scanners.py:148
>>> (run_machine_inlined)
>>> 18316 1.586 0.000 1.586 0.000 codecs.py:371(read)
>>> 119144 1.074 0.000 5.746 0.000 Scanners.py:109
>>> (scan_a_token)
>>> 88362 1.023 0.000 7.976 0.000 Scanners.py:88(read)
>>> 29680 0.673 0.000 1.659 0.000 Code.py:103(mark_pos)
>>> 77 0.588 0.008 0.588 0.008 posixpath.py:168(exists)
>>> 88362 0.538 0.000 8.514 0.000 Scanning.py:397(next)
>>> 23318/2985 0.495 0.000 0.517 0.000 Nodes.py:155(end_pos)
>>> 70446 0.442 0.000 0.591 0.000 Code.py:62(put)
>>> 90657/10562 0.351 0.000 4.549 0.000 Parsing.py:59
>>> (p_binop_expr)
>>> [...]
>>> """
>>>
>>> So the major headache here is Scanners.py in Plex. The method at
>>> the top-rank
>>> is a huge function. According to the comments, it's the result of
>>> inlining a
>>> couple of method calls that originally lead to slow code, and it
>>> looks heavily
>>> profiled already.
>>>
>>> Assuming that further optimisation attempts were rather futile, I
>>> just
>>> compiled the module with Cython. The first (obvious) result is
>>> that the
>>> internal calls disappear from the profile log, as they are now
>>> internal C
>>> calls. The call that remains is Scanner.next(), which originally
>>> took an
>>> accumulated 8.5 seconds. In the compiled version, it's down to
>>> just over 5
>>> seconds, that's more than 40 percent faster.
>>>
>>> """
>>> 2595627 function calls (2246560 primitive calls) in 18.681
>>> CPU seconds
>>>
>>> Ordered by: internal time, call count
>>> List reduced from 1028 to 20 due to restriction <20>
>>>
>>> ncalls tottime percall cumtime percall filename:lineno
>>> (function)
>>> 88362 4.246 0.000 5.041 0.000 Scanning.py:397(next)
>>> 29680 0.673 0.000 1.632 0.000 Code.py:103(mark_pos)
>>> 70446 0.439 0.000 0.586 0.000 Code.py:62(put)
>>> 90657/10562 0.335 0.000 3.228 0.000 Parsing.py:59
>>> (p_binop_expr)
>>> 23318/2985 0.316 0.000 0.338 0.000 Nodes.py:155(end_pos)
>>> 65791 0.295 0.000 0.903 0.000 Code.py:47(putln)
>>> 29677 0.289 0.000 0.915 0.000 Code.py:93
>>> (file_contents)
>>> 4724 0.287 0.000 0.292 0.000 Symtab.py:532
>>> (allocate_temp)
>>> 88070 0.247 0.000 0.247 0.000 ExprNodes.py:192
>>> (subexpr_nodes)
>>> 52071 0.232 0.000 0.232 0.000 Nodes.py:82(__init__)
>>> [...]
>>> """
>>>
>>> In total, I get an improvement of 12% in compilation time. That
>>> makes me think
>>> that it's actually worth putting the compilation into Cython's own
>>> setup.py,
>>> and installing the compiled Scanning module next to the Python one
>>> (Python
>>> prefers C extensions on import). Here's a patch, what do you think?
>
> I was looking at this patch a bit earlier and have to say this is
> very neat.
>
>> Maybe it'd be cool if one could instruct cython using comments, or
>> strings, so that one has one file, that works in regular python, and
>> you can use it to compile itself, but more optimized, than your patch
>> does. Otherwise, how can you improve the cythonized code (not having
>> to maintain the pure python and cython codes side by side)?
>
>
> Hiding semantic information in comments and strings is something we
> want to avoid. However we have discussed doing this via decorators
> (which wouldn't do anything in Python, but would act as compiler
> hints to Cython). Functoin annotations ( http://www.python.org/dev/
> peps/pep-3107/ ) and type inference could make a huge difference here
> as well.
Yes, that's even a better solution.
Ondrej
More information about the Cython-dev
mailing list