[Cython] Profiling Cython

Stefan Behnel stefan_ml at behnel.de
Sun Jun 8 14:59:28 CEST 2008


Hi,

I did a tiny bit of profiling on Cython compiling lxml.etree. Here are the
numbers I get:

"""
         3127018 function calls (2777951 primitive calls) in 25.128 CPU seconds

   Ordered by: internal time, call count
   List reduced from 1035 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   119144    4.579    0.000    4.671    0.000 Scanners.py:148(run_machine_inlined)
    18316    1.586    0.000    1.586    0.000 codecs.py:371(read)
   119144    1.074    0.000    5.746    0.000 Scanners.py:109(scan_a_token)
    88362    1.023    0.000    7.976    0.000 Scanners.py:88(read)
    29680    0.673    0.000    1.659    0.000 Code.py:103(mark_pos)
       77    0.588    0.008    0.588    0.008 posixpath.py:168(exists)
    88362    0.538    0.000    8.514    0.000 Scanning.py:397(next)
23318/2985    0.495    0.000    0.517    0.000 Nodes.py:155(end_pos)
    70446    0.442    0.000    0.591    0.000 Code.py:62(put)
90657/10562    0.351    0.000    4.549    0.000 Parsing.py:59(p_binop_expr)
[...]
"""

So the major headache here is Scanners.py in Plex. The method at the top-rank
is a huge function. According to the comments, it's the result of inlining a
couple of method calls that originally lead to slow code, and it looks heavily
profiled already.

Assuming that further optimisation attempts were rather futile, I just
compiled the module with Cython. The first (obvious) result is that the
internal calls disappear from the profile log, as they are now internal C
calls. The call that remains is Scanner.next(), which originally took an
accumulated 8.5 seconds. In the compiled version, it's down to just over 5
seconds, that's more than 40 percent faster.

"""
         2595627 function calls (2246560 primitive calls) in 18.681 CPU seconds

   Ordered by: internal time, call count
   List reduced from 1028 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    88362    4.246    0.000    5.041    0.000 Scanning.py:397(next)
    29680    0.673    0.000    1.632    0.000 Code.py:103(mark_pos)
    70446    0.439    0.000    0.586    0.000 Code.py:62(put)
90657/10562    0.335    0.000    3.228    0.000 Parsing.py:59(p_binop_expr)
23318/2985    0.316    0.000    0.338    0.000 Nodes.py:155(end_pos)
    65791    0.295    0.000    0.903    0.000 Code.py:47(putln)
    29677    0.289    0.000    0.915    0.000 Code.py:93(file_contents)
     4724    0.287    0.000    0.292    0.000 Symtab.py:532(allocate_temp)
    88070    0.247    0.000    0.247    0.000 ExprNodes.py:192(subexpr_nodes)
    52071    0.232    0.000    0.232    0.000 Nodes.py:82(__init__)
[...]
"""

In total, I get an improvement of 12% in compilation time. That makes me think
that it's actually worth putting the compilation into Cython's own setup.py,
and installing the compiled Scanning module next to the Python one (Python
prefers C extensions on import). Here's a patch, what do you think?

Stefan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: compile-scanner-on-install.patch
Type: text/x-patch
Size: 1744 bytes
Desc: not available
Url : http://codespeak.net/pipermail/cython-dev/attachments/20080608/c251a481/attachment.bin 


More information about the Cython-dev mailing list