[lxml-dev] objectify parses 't' and 'f' as BoolElement?

Stefan Behnel stefan_ml at behnel.de
Wed Jun 17 18:00:45 CEST 2009


jholg at gmx.de wrote:
> I noticed some optimization wrt to using _cstr() on the text and then
> comparing the resulting C string in the 2.0alpha I previously used.
> 
> Curious about what _cstr() does I checked, it is a shortcut to
> PyString_AS_STRING.
> 
> Python docs say:
> char* PyString_AS_STRING(PyObject *string)¶
> Macro form of PyString_AsString but without error checking. Only string
> objects are supported; no Unicode objects should be passed.
> 
> That basically means I *cannot* use _cstr() and C string comparison for
> __parseBoolAsInt(), as I would run into havoc with unicode, right?

Right. _cstr() is only used as an optimisation when we know we have a plain
byte string. It's quite a bit less overhead than the couple of checks that
a call to PyString_AsString() does, plus there's no error handling code
generated by Cython.

The bool parsing code in lxml.objectify isn't very fast. Making it faster
requires some special casing and a call to utf8() - that may be doable
without too much code bloat. Not sure it's worth it, though. There's a lot
more overhead in other places of lxml.objectify (I'm not saying it's slow,
just that the convenient API is bought with some overhead).

Stefan


More information about the lxml-dev mailing list