[lxml-dev] [objectify] optimization issues
Holger Joukl
Holger.Joukl at LBBW.de
Thu Oct 5 17:23:24 CEST 2006
Hi,
I'm currently running into some optimization issues. Be warned this
post is rather lenghty...
First some background:
I'm experimenting with a custom objectified datetime class based on
Python's
datetime that employs the dateutil.parser module to detect if some element
value
is in a valid datetime format, i.e. the parse function from dateutil.parser
is used to implement the type_check for the PyType type registry.
1)
Invoking this parse method is quite expensive, so I want this to happen
rarely. As I am using "recursive element dumping" as default I found that
for every __str__ call .pyval of the ObjectifiedDataElements in a tree is
accessed, which in turn triggers parsing for my custom datetime class.
As I don't really see a way to avoid this I propose the introduction of
an additional property "_pyval_repr" that can be overridden in subclasses,
which makes it possible to simply return element.text, if getting .pyval
is expensive. S.th. like:
*** ORIG/lxml-1.1/src/lxml/objectify.pyx Wed Sep 27 09:18:30 2006
--- src/lxml/objectify.pyx Wed Oct 4 11:00:09 2006
***************
*** 484,489 ****
--- 484,493 ----
def __get__(self):
return textOf(self._c_node)
+ property _pyval_repr:
+ def __get__(self):
+ return self.pyval
+
def __str__(self):
return textOf(self._c_node) or ''
***************
*** 931,938 ****
cdef object _dump(_Element element, int indent):
indentstr = " " * indent
! if hasattr(element, "pyval"):
! value = element.pyval
else:
value = textOf(element._c_node)
if value and not value.strip():
--- 935,942 ----
cdef object _dump(_Element element, int indent):
indentstr = " " * indent
! if hasattr(element, "_pyval_repr"):
! value = element._pyval_repr
else:
value = textOf(element._c_node)
if value and not value.strip():
This can substantially speed up things for complicated type_check
routines (in my usecase :)
2)
Then, I figured to reduce the calls to ObjectifiedElement.__str__ in
general.
I am using a custom logging module that implies a function that converts
its
input arguments to strings, concatenates them and then writes them out
through
the logger (which substitutes stdout) if the loglevel of the caller meets
the
set loglevel for the output file/stdout.
As the conversion to strings is performed before any loglevel checking,
reversing
this order leads to a lot less str() calls on the objects. To my
astonishment
things actually slowed massively down, though.
I tried to come up with a minimal example of what seems to happen, using
only lxml standard:
Runs slow:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root.i
print root.f
print root.s
print root.d
""" "n = root.i; n = root.f; n = root.s; n = root.d"
17
238.3343
what
2006-03-03
10 loops -> 0.0102 secs
17
238.3343
what
2006-03-03
100 loops -> 0.101 secs
17
238.3343
what
2006-03-03
1000 loops -> 1.02 secs
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
raw times: 1.03 1.02 1.02
1000 loops, best of 3: 1.02 msec per loop
Runs fast:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root
""" "n = root.i; n = root.f; n = root.s; n = root.d"
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
10 loops -> 0.00109 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
100 loops -> 0.00928 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
1000 loops -> 0.0897 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
10000 loops -> 0.905 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
raw times: 0.893 0.911 0.911
10000 loops, best of 3: 89.3 usec per loop
Recursively outputting root before accessing its child elements
really speeds things up, even though I accessed all elements in
the slow example, too.
Why is this? I'm clueless.
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht
gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
More information about the lxml-dev
mailing list