<DIV>
<DIV>Hi, all</DIV>
<DIV> I am a lxml experimental user. from site <A href="http://codespeak.net/lxml/FAQ.html" target="_blank">http://codespeak.net/lxml/FAQ.html</A> I know that python can support multithread parsing without GIL. I have tried to write multi-thread parsing program to run on a eight-core CPU computer, but the total CPU used was only 180%, only about 20% of each core had been used.but when I tried the libxml2 directly, It run much faster, and more then 50% of each CPU core were used. My goal is to parse a HTML file on a disk to get special HTML tags and their relative data, like attributes and texts. I will not use DOM tree creation, renew, delete, or XPath operations. then how can my HTML-Parsing program run faster? I have used the Target SAX parser to parse a HTML file ,but the speed is not good enough. the Iterparse can't parse HTML file eigher(I have set the "html=True" parameter), the parser said my HTML file had misplaced the DOCTYPE declaration,but this web page is caught from a popular website and is truly subject to the HTML protocal. Now there are more HTML files to process, so now I wan't to speed up the parsing by multi-thread process. My question are whether the LXML had freed GIL completely<BR>on memery and disk file prasing? how can my multi-thread program run faster on multi-core CPU computer? Can I make some change on lxml source to jump some unwanted operation to Improve my program?</DIV>
<DIV> Thanks a lot</DIV>
<DIV> Yours Sincere<BR><BR></DIV></DIV><br><!-- footer --><br><span title="neteasefooter"/><hr/>
<a href="http://512.mail.163.com/mailstamp/stamp/dz/activity.do?from=footer">穿越地震带 纪念汶川地震一周年</a>
</span>