[lxml-dev] [lxml][objectify] optimization of recursive object dumping

Holger Joukl Holger.Joukl at LBBW.de
Wed Oct 25 09:38:05 CEST 2006


Hi,

I've posted this before but messed with the threads, so here it is again:
(Note: patch line numbers might differ, this was based on 1.1 branch of
2 weeks ago, but I could of course update this and send a new patch)

First some background:
I'm experimenting with a custom objectified datetime class based on
Python's
datetime that employs the dateutil.parser module to detect if some element
value
is in a valid datetime format, i.e. the parse function from dateutil.parser
is used to implement the type_check for the PyType type registry.

1)
Invoking this parse method is quite expensive, so I want this to happen
rarely. As I am using "recursive element dumping" as default I found that
for every __str__ call .pyval of the ObjectifiedDataElements in a tree is
accessed, which in turn triggers parsing for my custom datetime class.

As I don't really see a way to avoid this I propose the introduction of
an additional property "_pyval_repr" that can be overridden in subclasses,
which makes it possible to simply return element.text, if getting .pyval
is expensive. S.th. like:

*** ORIG/lxml-1.1/src/lxml/objectify.pyx        Wed Sep 27 09:18:30 2006
--- src/lxml/objectify.pyx      Wed Oct  4 11:00:09 2006
***************
*** 484,489 ****
--- 484,493 ----
          def __get__(self):
              return textOf(self._c_node)

+     property _pyval_repr:
+         def __get__(self):
+             return self.pyval
+
      def __str__(self):
          return textOf(self._c_node) or ''

***************
*** 931,938 ****

  cdef object _dump(_Element element, int indent):
      indentstr = "    " * indent
!     if hasattr(element, "pyval"):
!         value = element.pyval
      else:
          value = textOf(element._c_node)
          if value and not value.strip():
--- 935,942 ----

  cdef object _dump(_Element element, int indent):
      indentstr = "    " * indent
!     if hasattr(element, "_pyval_repr"):
!         value = element._pyval_repr
      else:
          value = textOf(element._c_node)
          if value and not value.strip():

This can substantially speed up things for complicated type_check
routines (in my usecase :)

Holger

Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht
gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.

The contents of this  e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail.  Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.




More information about the lxml-dev mailing list