[lxml-dev] Writing TargetParser in Cython
Stefan Behnel
stefan_ml at behnel.de
Thu Oct 2 21:26:48 CEST 2008
Hi,
Max Ivanov wrote:
> I'm trying to write TargetParser in Cython just to compare perfomance.
> The problem is with data types. If I define data method as "def
> data(self, char *data):" I'm unable to use it as TargetParser. I get
> " def data(self, char *data):
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 0-4: ordinal not in range(128)" error.
That's because you get a unicode string as input, which is not compatible with
a char*.
> def data(self, char *data):
> self._data.append(data)
This is actually very inefficient. Cython will generate code here that
retrieves the char* from the Python input string and then creates a new Python
string from it to pass it into the .append() method.
lxml uses a C interface internally, but AFAIR, it's not exposed at the C API
level. Check the sources in parser.pxi and parsertarget.pxi.
Stefan
More information about the lxml-dev
mailing list