<DIV><BR>2009-07-17,"Stefan Behnel" <stefan_ml@behnel.de> :<BR>><BR>>qhlonline wrote:<BR>>> I have to know the real parsing position when some special tags found by<BR>>> target parser.<BR>><BR>>Interesting requirement. I wonder who designs XML formats where you <BR>>know the stream position to read them. Do you actually mean the bytes<BR>>position or the character position?</DIV>
<DIV>We are parsing some HTML files. I don't think HTML parser has any obligation to provide position infomation too. But Our leader requires position information for Traget events. The position is not necessarily accurate, but it must be character position, like how much Bytes of the HTML file have been parsed .<BR>><BR>><BR>>> Is the 'structural position ' means information about which<BR>>> line and which column, like that in Parsing Error Report?<BR>><BR>>No, with "structural position" I meant the position of the element within<BR>>the tree structure, such as the unique path from the root element to the<BR>>currently parsed element.<BR>><BR>><BR>>> In libxml2 source file<BR>>> SAX2.c there is an callback interface (charactersSAXFunc) for character event:<BR>>><BR>>> hdlr->characters = xmlSAX2Characters<BR>>><BR>>> The event handler has a 'len' parameter which tells current parsed HTML<BR>>> stream length. and I noticed that lxml source Saxparser.pxi there is a<BR>>> function defination:<BR>>> <BR>>> cdef void _handleSaxData(void* ctxt, char* c_data, int data_len) with gil:<BR>>><BR>>> It works just as processer of the sax.character event. How can I change<BR>>> the lxml source code of target parser to add sax.character event<BR>>> processing to it with 'data_len' parameter?<BR>><BR>>You don't have to. Just take the string that you receive in .data(), encode<BR>>it as UTF-8, and take its len(). However, that doesn't help you with your<BR>>problem, as it is not the information you are looking for.<BR>></DIV>
<DIV>Yes, I have tested for libxml2 library directly. I have defined my function according to the character event handlers' arg-list of libxml2. The result shows that its 'len' argument only shows the string length of text between closed tags. It doesn't help to solve my problem. The worse, I found its "ctxt" parameter, whose type is "xmlParserCtxtPtr ", is NULL. That means I can't get ParserContext now. The xmlParserCtxt structer has a member showing it's current parsing position. Now I am trying to know, If a lxml target parser (or if a self-defined libxml2 sax parser) generate a "xmlParserCtextPtr" like ParserContext? How Can I get it? I guess may be I have steped on the wrong way. I don't know which I should focus on ( libxml2 or lxml ) to solve my problem, would you give me some suggestion?<BR>>Stefan<BR>><BR></DIV><br><!-- footer --><br><span title="neteasefooter"/><hr/>
<a href="http://count.mail.163.com/redirect/footer.htm?f=http://gouwu.youdao.com">200万种商品,最低价格,疯狂诱惑你</a>
</span>