<DIV>Stefan Behnel" <stefan_ml@behnel.de><BR>><BR>>qhlonline wrote:<BR>>> Hi, all I am parsing html files with lxml target parser, now I wan't to<BR>>> know when I have reached some HTML tag, how can I know the position of<BR>>> the HTML document I am parsing?<BR>><BR>>These are two different requirements. Do you really need the line/character<BR>>information here? Isn't the structural position enough?<BR>></DIV>
<DIV>I have to know the real parsing position when some special tags found by target parser. Is the 'structural position ' means information about which line and which column, like that in Parsing Error Report? I think they are helpless in compute the parsing stream length. In libxml2 source file SAX2.c there is an callback interface (charactersSAXFunc) for character event:</DIV>
<DIV> hdlr->characters = xmlSAX2Characters </DIV>
<DIV>The event handler has a 'len' parameter which tells current parsed HTML stream length. and I noticed that lxml source Saxparser.pxi there is a function defination:</DIV>
<DIV> cdef void _handleSaxData(void* ctxt, char* c_data, int data_len) with gil:</DIV>
<DIV> It works just as processer of the sax.character event. How can I change the lxml source code of target parser to add sax.character event processing to it with 'data_len' parameter? Not the default 'data' function in target parser of couse, It has no parameter like 'data_len' and its 'data' parameter is only the text between an element, not the whole parsed string.<BR>>> Is there any callbacks in target parser<BR>>> who can tell me the total stream length I have parsed?<BR>><BR>>Not that I know of. Same as in ElementTree, I'd say.<BR>><BR>>Stefan<BR></DIV><br><!-- footer --><br><span title="neteasefooter"/><hr/>
<a href="http://count.mail.163.com/redirect/footer.htm?f=http://gouwu.youdao.com">200万种商品,最低价格,疯狂诱惑你</a>
</span>