[lxml-dev] special string subclasses for XPath string results

Stefan Behnel stefan_ml at behnel.de
Tue Jan 8 12:45:33 CET 2008


Hi again,

Stefan Behnel wrote:
> How do you instantiate a
> custom unicode subclass from a UTF-8 char*? You can't use the normal C-API
> functions, so I guess you'd have to instantiate a normal unicode object, then
> determine its length, and then build the custom subclass for the result length
> and copy the string over. That's ugly and it would certainly slow things down.

Ok, I looked at the Python source and found that this is partially special
cased already. All that is left to do is decode a unicode object from the
char* and instantiate the subclass with it. The copying will be done
internally. So here are some performance numbers for Py2.5.1.

At first site, this looks like we have a clear winner:

$ python -m timeit -s 'unicode("testtest")'
10000000 loops, best of 3: 0.0464 usec per loop
$ python -m timeit -s 'class t(unicode): pass' 't("testtest")'
1000000 loops, best of 3: 1.67 usec per loop

Now, a little more instantiating a subclass and copying the string:

$ python -m timeit -s 'class t(unicode): pass' -s 's=unicode("test" * 20)' 't(s)'
1000000 loops, best of 3: 0.794 usec per loop
$ python -m timeit -s 'class t(unicode): pass' -s 's=unicode("test" * 200)' 't(s)'
1000000 loops, best of 3: 1.09 usec per loop
$ python -m timeit -s 'class t(unicode): pass' -s 's=unicode("test" * 2000)'
't(s)'
100000 loops, best of 3: 6.22 usec per loop

Same for str:

$ python -m timeit -s 'class t(str): pass' -s 's="test" * 200' 't(s)'
1000000 loops, best of 3: 1.27 usec per loop
$ python -m timeit -s 'class t(str): pass' -s 's="test" * 2000' 't(s)'
100000 loops, best of 3: 7.03 usec per loop

Funny enough, this is actually slower than unicode on my machine. As the
following numbers show, however, the task at hand is clearly dominated by
decoding:

$ python -m timeit -s 'class t(unicode): pass' -s 's="test" * 200'
't(unicode(s, "utf-8"))'
100000 loops, best of 3: 5.23 usec per loop
$ python -m timeit -s 'class t(unicode): pass' -s 's="test" * 2000'
't(unicode(s, "utf-8"))'
10000 loops, best of 3: 41.2 usec per loop

Decoding by itself:

$ python -m timeit -s 's="test" * 2000' 'unicode(s, "utf-8")'
10000 loops, best of 3: 34.9 usec per loop

Even going straight through the C-API doesn't help much - 'decode' is a little
test module written in Cython for that purpose:

$ python -m timeit -s 'from decode import decode' -s 's="test" * 2000' 'decode(s)'
10000 loops, best of 3: 34.4 usec per loop
$ python -m timeit -s 'class t(unicode): pass' -s 'from decode import decode'
-s 's="test" * 2000' 't(decode(s))'
10000 loops, best of 3: 40.7 usec per loop

We also shouldn't forget that we are talking microseconds here, so,
performance-wise, there is no reason why we shouldn't use a subclass,
especially after having just given the XPath engine a run.

I'll give it a try. Maybe this can even still go in for 2.0.

Stefan



More information about the lxml-dev mailing list