[lxml-dev] objectify parses 't' and 'f' as BoolElement?
Stefan Behnel
stefan_ml at behnel.de
Wed Jun 17 18:00:45 CEST 2009
jholg at gmx.de wrote:
> I noticed some optimization wrt to using _cstr() on the text and then
> comparing the resulting C string in the 2.0alpha I previously used.
>
> Curious about what _cstr() does I checked, it is a shortcut to
> PyString_AS_STRING.
>
> Python docs say:
> char* PyString_AS_STRING(PyObject *string)¶
> Macro form of PyString_AsString but without error checking. Only string
> objects are supported; no Unicode objects should be passed.
>
> That basically means I *cannot* use _cstr() and C string comparison for
> __parseBoolAsInt(), as I would run into havoc with unicode, right?
Right. _cstr() is only used as an optimisation when we know we have a plain
byte string. It's quite a bit less overhead than the couple of checks that
a call to PyString_AsString() does, plus there's no error handling code
generated by Cython.
The bool parsing code in lxml.objectify isn't very fast. Making it faster
requires some special casing and a call to utf8() - that may be doable
without too much code bloat. Not sure it's worth it, though. There's a lot
more overhead in other places of lxml.objectify (I'm not saying it's slow,
just that the convenient API is bought with some overhead).
Stefan
More information about the lxml-dev
mailing list