[lxml-dev] gzip compression detection

Stefan Behnel stefan_ml at behnel.de
Mon Jul 28 20:27:26 CEST 2008


Hi,

Dr R. Sanderson wrote:
> In the documentation it says that lxml can automatically detect and 
> process gzipped xml (.gz).    Which I'm sure (but haven't tried) works 
> when it's parsing from a file with the appropriate extension, but is 
> this possible from an in memory string?
> 
> My situation:  I have a berkeley db based storage system which maintains 
> gzipped xml.  I currently just use python's gzip module to uncompress 
> before sending to lxml, but if I could skip this step I'm sure there'd 
> be good performance benefits.

Yes, I recently thought about that, too, mainly in the context of pickling.

http://comments.gmane.org/gmane.comp.gnome.lib.xml.general/14465

It would be something to implement, though, as the support in libxml2 is
restricted to files. Supporting this for in-memory data isn't that hard, but
it would require writing a callback-driven filter for a libxml2 I/O output
buffer: buffer what gets written, compress it, write it out to the next output
buffer. Not hard, but not entirely trivial either.

Stefan


More information about the lxml-dev mailing list