[lxml-dev] [lxml][objectify] optimization questions
Holger Joukl
Holger.Joukl at LBBW.de
Mon Oct 23 13:37:28 CEST 2006
Hi,
sorry for the inconvenience, I now put this into a new thread.
And I'd have gotten back to that sooner but have been ill.
>Then: what you observe are most likely GC 'issues'. The thing is: if the
>element already exists as Python object, it is reused, which is much
faster
>then creating a new one. So in the cases where your code runs faster, you
can
>assume that the object survived a larger portion of your code without
being
>re-instantiated.
I probably have some misunderstandings how the reuse of elements works.
When I "visit" a node, like:
>>> from lxml import etree
>>> from lxml import objectify
>>> parser = etree.XMLParser(remove_blank_text=True)
>>> lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
>>> parser.setElementClassLookup(lookup)
>>> objectify.setDefaultParser(parser)
>>> objectify.enableRecursiveStr()
>>> root = objectify.Element('root')
>>> root.i = 17
>>> root.i
<Element i at 1e94b0>
>>>
the Python Element object for "i" is being created.
Will that Python Element be garbage-collected afterwards, if I do not
explicitly delete "i"
from the xml tree? I thought this element survived in the element proxy.
>Especially recursive printing instantiates the entire tree, so if the
objects
>are not deleted directly afterwards, this has a performance effect on code
>that runs afterwards.
I see, but why would "manual access" of the nodes not have the same effect:
Runs slow:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root.i
print root.f
print root.s
print root.d
""" "n = root.i; n = root.f; n = root.s; n = root.d"
17
238.3343
what
2006-03-03
10 loops -> 0.0102 secs
17
238.3343
what
2006-03-03
100 loops -> 0.101 secs
17
238.3343
what
2006-03-03
1000 loops -> 1.02 secs
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
raw times: 1.03 1.02 1.02
1000 loops, best of 3: 1.02 msec per loop
Runs fast:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root
""" "n = root.i; n = root.f; n = root.s; n = root.d"
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
10 loops -> 0.00109 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
100 loops -> 0.00928 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
1000 loops -> 0.0897 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
10000 loops -> 0.905 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
raw times: 0.893 0.911 0.911
10000 loops, best of 3: 89.3 usec per loop
Recursively outputting root before accessing its child elements
really speeds things up, even though I accessed all elements in
the slow example, too.
Why is this? I'm clueless.
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht
gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
More information about the lxml-dev
mailing list