<DIV><BR>2009-07-20,"Stefan Behnel" <A href="mailto:stefan_ml@behnel.de">stefan_ml@behnel.de</A>:<BR>><BR>>qhlonline wrote:<BR>>> I have tried to alter the libxml2 source to add a callback telling the <BR>>> current position when an element parsed.<BR>><BR>>Note that something that requires patching libxml2 will not make it into an<BR>>lxml release.<BR>><BR>>As you noted before, the parser context already provides this information<BR>>at any time, not only when parsing elements. So adding a callback for it is<BR>>not a sensible approach.<BR>><BR>>I'm not even sure what this position means exactly. Is it (1) the byte<BR>>position in the original (undecoded) data stream, (2) the byte position in<BR>>the UTF-8 encoded parse stream, or (3) the character position in the XML<BR>>stream?<BR>><BR>>According to the libxml2 docs:<BR>><BR>>        long nbChars : number of xmlChar processed<BR>><BR>>This sounds like it's the second information. That would not be useful and<BR>>shouldn't get exposed in lxml's API as it's rather error prone to rely on<BR>>it: works for ASCII and UTF-8, obviously, may work for some other encodings<BR>>depending on the data, but fails for most other streams. OTOH, the first<BR>>and the third information /might/ be of interest, depending on your use<BR>>case, but are not easily recovered from the information that the parser<BR>>provides.<BR>><BR>><BR>>> I have nerver compile cython source before. Can any body give me some<BR>>> suggestion?<BR>><BR>>If you just change lxml's sources, running setup.py will build it just as<BR>>before. All you need to do is install Cython 0.11 or later.<BR>><BR>>http://codespeak.net/lxml/build.html<BR>><BR>>Stefan<BR></DIV>
<DIV>Now the key problem for me is I don't konw whether can I and how to change the lxml target parser defination to let it support my new callback in libxml2, I suspect that the Treebuilder class in Saxparser.pxi is the base class of target parser,because it support functions like 'start', 'end', 'close', 'data', just like the target parser, But I am not sure, because this class seems to be used to bulid a ElmentTree or dom for parserd HTML document from its name and has no relationship with target parser. Am I steping the wrong place?</DIV><br><!-- footer --><br><span title="neteasefooter"/><hr/>
<a href="http://count.mail.163.com/redirect/footer.htm?f=http://gouwu.youdao.com">200万种商品,最低价格,疯狂诱惑你</a>
</span>