[lxml-dev] Writing TargetParser in Cython

Stefan Behnel stefan_ml at behnel.de
Thu Oct 2 21:26:48 CEST 2008


Hi,

Max Ivanov wrote:
> I'm trying to write TargetParser in Cython just to compare perfomance.
> The problem is with data types. If I define data method as "def
> data(self, char *data):" I'm unable to use it as TargetParser. I get
> " def data(self, char *data):
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 0-4: ordinal not in range(128)"  error.

That's because you get a unicode string as input, which is not compatible with
a char*.


>     def data(self, char *data):
>         self._data.append(data)

This is actually very inefficient. Cython will generate code here that
retrieves the char* from the Python input string and then creates a new Python
string from it to pass it into the .append() method.

lxml uses a C interface internally, but AFAIR, it's not exposed at the C API
level. Check the sources in parser.pxi and parsertarget.pxi.

Stefan


More information about the lxml-dev mailing list