<DIV><BR>2009-07-17,"Stefan Behnel" <stefan_ml@behnel.de> :<BR>><BR>>qhlonline wrote:<BR>>> If there are some way for me to get the parsing context, and if I can<BR>>> access this structure directly, may be this problem can get solved. In<BR>>> libxml2 there is a defination of "struct _xmlParserCtxt". This structure<BR>>> have a member "long nbChars; " , It is just the "number of xmlChar<BR>>> processed" .<BR>><BR>>You could subtype the XMLParser class in Cython. That's not trivial, since<BR>>it's not exported at the C-API level. You'll have to redefine the class<BR>>hierarchy in a separate lxml.etree.pxd file to do that. Note that you only<BR>>need to access the _parser_context and (maybe) _push_parser_context. The<BR>>other object type fields in the classes can be set to type "object" instead<BR>>of their real type.<BR>><BR>>But remember that the type isn't public. Future lxml versions may change<BR>>it, which means that you will have to adapt your code.<BR>></DIV>
<DIV>I have changed the libxml2 code to add a new callback telling the current position when an element was seen. I think this can avoid the direct access to parser context. But I am now thinking of how to change target parser to let it access the newly defined callback on python level. I even did't know where to find the target related lxml source code. nor do I know whether my idea is feasble. Is target parser inherited from the TreeBuilder class? can It be changed? and how? I am in urgent need of more and deeper lxml source information.<BR>>That said, I still do not understand why you need the character stream<BR>>position for parsing. Could you elaborate on that?</DIV>
<DIV>Well, the position information is usefull. Some outside source of HTML document is declared in a seperate file, like <style> 'css' file. We may get the HTML document and its related source on net concurrently. But the outside source should be inserted in the proper position of HTML document in out application after parse. so the related tag position is usefull now.<BR>><BR>>Stefan<BR>><BR></DIV><br><!-- footer --><br><span title="neteasefooter"/><hr/>
<a href="http://count.mail.163.com/redirect/footer.htm?f=http://gouwu.youdao.com">200万种商品,最低价格,疯狂诱惑你</a>
</span>