[lxml-dev] [lxml][objectify] optimization of recursive object dumping
Stefan Behnel
behnel_ml at gkec.informatik.tu-darmstadt.de
Wed Oct 25 18:20:48 CEST 2006
Hi Holger,
Holger Joukl worote:
> I'm experimenting with a custom objectified datetime class based on
> Python's
> datetime that employs the dateutil.parser module to detect if some element
> value
> is in a valid datetime format, i.e. the parse function from dateutil.parser
> is used to implement the type_check for the PyType type registry.
>
> Invoking this parse method is quite expensive, so I want this to happen
> rarely. As I am using "recursive element dumping" as default I found that
> for every __str__ call .pyval of the ObjectifiedDataElements in a tree is
> accessed, which in turn triggers parsing for my custom datetime class.
But that should only happen for normal text content (well, and dates). Numbers
should always be parsed first.
> As I don't really see a way to avoid this I propose the introduction of
> an additional property "_pyval_repr" that can be overridden in subclasses,
> which makes it possible to simply return element.text, if getting .pyval
> is expensive.
Hmmm, I don't really like the idea of adding a new Python method only to
optimise the debug output (which is what dump() is essentially meant for). I
understand that you use this as default, but I don't think many people will
rely on the performance of this function...
Have you considered switching from "dump() by default" to "implement __str__()
for all data types by hand"? There are not that many standard types...
On the other hand, what if we did something like this:
cdef object _dump(_Element element, int indent):
indentstr = " " * indent
if isinstance(element, ObjectifiedDataElement):
value = str(element)
else:
...
Would that help?
Stefan
More information about the lxml-dev
mailing list