[lxml-dev] [objectify] optimization issues

Holger Joukl Holger.Joukl at LBBW.de
Fri Oct 6 17:24:24 CEST 2006


Hi,
as a followup to my last post some more strange observations.
To find out why the call to str(root) aka objectify.dump(root)
speeds up things:

python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
objectify.dump(root)
""" "n = root.i; n = root.f; n = root.s; n = root.d"
10 loops -> 0.000898 secs
100 loops -> 0.00887 secs
1000 loops -> 0.0885 secs
10000 loops -> 0.887 secs
raw times: 0.893 0.899 0.903
10000 loops, best of 3: 89.3 usec per loop

I implemented a visit function that does
nothing more than visit every node:

def visit(_Element element not None):
    """Return a recursively generated string representation of an element.
    """
    _visit(element)

cdef object _visit(_Element element):
    for child in element.iterchildren():
        _visit(child)

But:

/apps/pydev/gcc/3.4.4/bin/python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
objectify.visit(root)
""" "n = root.i; n = root.f; n = root.s; n = root.d"
10 loops -> 0.0104 secs
100 loops -> 0.103 secs
1000 loops -> 1.04 secs
raw times: 1.04 1.02 1.03
1000 loops, best of 3: 1.02 msec per loop

This is actually much slower, again.

Now if I change the visit code to:

def visit(_Element element not None):
    """Return a recursively generated string representation of an element.
    """
    _visit(element)

cdef object _visit(_Element element):
    element.items() # my only addition
    for child in element.iterchildren():
        _visit(child)


Now it's fast, again:

python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
objectify.visit(root)
""" "n = root.i; n = root.f; n = root.s; n = root.d"
10 loops -> 0.000887 secs
100 loops -> 0.0087 secs
1000 loops -> 0.088 secs
10000 loops -> 0.874 secs
raw times: 0.876 0.865 0.87
10000 loops, best of 3: 86.5 usec per loop

All of this because of the additional element.items()???
I'm lost. Hope somebody can point out a serious misunderstanding of mine,
where my systematic testing error lies or come up with an actual
explanation :)

As I'm abroad next week I'll follow up on this Tuesday in a week.

Greetings,
Holger

Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht
gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.

The contents of this  e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail.  Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.




More information about the lxml-dev mailing list