[lxml-dev] Writing TargetParser in Cython

Max Ivanov ivanov.maxim at gmail.com
Mon Sep 29 16:56:41 CEST 2008


Hi all!
I'm trying to write TargetParser in Cython just to compare perfomance.
The problem is with data types. If I define data method as "def
data(self, char *data):" I'm unable to use it as TargetParser. I get
" def data(self, char *data):
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-4: ordinal not in range(128)"  error. I could instance and directly
call data() and close() methods and everything works fine, but it
refuses to work with lxml. Small testcase following:

----- _target.pyx -----------
cdef class Target:
    cdef list _data

    def __init__(self):
        self._data = []

    def data(self, char *data):
        self._data.append(data)

    def close(self):
        return ''.join(self._data)
---- end of target.pyx ------

---- test.py -------
# -*- encoding: utf-8 -*-

import lxml.html
from lxml import etree
from _target import Target

res = etree.HTML(u"<span>ABCD</span>",
parser=lxml.html.HTMLParser(target = Target()))

-------end of target.pyx ------


More information about the lxml-dev mailing list