<DIV><BR>2009-07-20,"Stefan Behnel" <stefan_ml@behnel.de> :<BR>><BR>>qhlonline wrote:<BR>>> I have tried to alter the libxml2 source to add a callback telling the <BR>>> current position when an element parsed.<BR>><BR>>Note that something that requires patching libxml2 will not make it into an<BR>>lxml release.<BR>><BR>>As you noted before, the parser context already provides this information<BR>>at any time, not only when parsing elements. So adding a callback for it is<BR>>not a sensible approach.<BR>><BR>>I'm not even sure what this position means exactly. Is it (1) the byte<BR>>position in the original (undecoded) data stream, (2) the byte position in<BR>>the UTF-8 encoded parse stream, or (3) the character position in the XML<BR>>stream?<BR>></DIV>
<DIV>My change is taking place on the 'htmlParseStartTag' function in HTMLparser.c source file, I think may be its a UTF-8 stream.</DIV>
<DIV> </DIV>
<DIV>>According to the libxml2 docs:<BR>><BR>>        long nbChars : number of xmlChar processed<BR>><BR>>This sounds like it's the second information. That would not be useful and<BR>>shouldn't get exposed in lxml's API as it's rather error prone to rely on<BR>>it: works for ASCII and UTF-8, obviously, may work for some other encodings<BR>>depending on the data, but fails for most other streams. OTOH, the first<BR>>and the third information /might/ be of interest, depending on your use<BR>>case, but are not easily recovered from the information that the parser<BR>>provides.</DIV>
<DIV>This positon may not be precise after some encoding changement form other encoding to UTF-8, but I think it can meet our needs according to my leader's requirement.<BR>><BR>>> I have nerver compile cython source before. Can any body give me some<BR>>> suggestion?<BR>><BR>>If you just change lxml's sources, running setup.py will build it just as<BR>>before. All you need to do is install Cython 0.11 or later.<BR>></DIV>
<DIV>Thank you for your help! <BR>>http://codespeak.net/lxml/build.html<BR>><BR>>Stefan<BR></DIV><br><!-- footer --><br><span title="neteasefooter"/><hr/>
<a href="http://count.mail.163.com/redirect/footer.htm?f=http://gouwu.youdao.com">200万种商品,最低价格,疯狂诱惑你</a>
</span>