[lxml-dev] Some benchmarks

Ian Bicking ianb at colorstudy.com
Mon Mar 10 10:54:30 CET 2008


For the curious, I've attached some benchmarks.  These are preliminary, 
I'm putting together the numbers for my HTML talk at PyCon.

One thing that I'd like to test is the memory use for documents.  To do 
this I'm parsing about 4.5Mb of documents and keeping them in memory, 
and looking at the VSZ/RSS sizes reported by ps before and after.  I 
don't think this is the right/best way to do this.  For instance, 
transient memory use by some parsers makes Python grab a bunch of 
memory, but it might be free after parsing, and usable for other things. 
  Also, I don't know if VSZ/RSS is valid at all.  I get the impression 
it isn't that valid.  And the increases I'm seeing for lxml don't seem 
to be sufficient; at least the process should grow by 4.5Mb, right? 
lxml can't be that much more efficient than the serialized form of these 
files.

Another clear indication that we're measuring transient stuff is that 
when using the BeautifulSoup or html5 parser with an lxml document the 
memory increases substantially.  So any ideas on how to test memory 
would be much appreciated.

(Maybe I could look at ps, and then start creating Python objects until 
the memory use increases, so that I know I've used up any extra 
allocated memory?)

I've also attached the script, though you'll need to grab your own HTML 
files.  html_lxml is broken; I patched it locally to work 
(http://code.google.com/p/html5lib/issues/detail?id=65).

   Ian
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: performance-results.txt
Url: http://codespeak.net/pipermail/lxml-dev/attachments/20080310/c8eb7ddd/attachment-0001.txt 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tester.py
Type: text/x-python
Size: 8794 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080310/c8eb7ddd/attachment-0001.py 


More information about the lxml-dev mailing list