From john at nmt.edu Tue Aug 9 04:01:12 2011 From: john at nmt.edu (John W. Shipman) Date: Mon, 8 Aug 2011 20:01:12 -0600 (MDT) Subject: [lxml-dev] Saving memory Message-ID: I have a Python CGI script that builds a pretty sizeable XHTML table. It works fine for most cases, but for larger tables it's getting a MemoryError. I have one case where it produces a table of a bit under a megabyte in serialized form, and it works, but if I add just a bit more data, MemoryError. My choices are not great. I may not be able to migrate to a different Web server easily. The server has about a 2004 version of lxml installed. The script relies extensively on the builder.py module and the "E()" factory function to build the entire tree before serialization. I'm not terribly keen on rewriting my app to use clunkier XHTML generation just to cut down on memory usage. Any suggestions for reducing my memory usage? There's not much other memory outside the ElementTree; the bulk of the input is straight out of a database engine. Documentation for these scripts is here: http://www.nmt.edu/~shipman/z/cbc/cbchist/ Best regards, John Shipman (john at nmt.edu), Applications Specialist, NM Tech Computer Center, Speare 119, Socorro, NM 87801, (575) 835-5735, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber From sergio at sergiomb.no-ip.org Tue Aug 9 21:05:49 2011 From: sergio at sergiomb.no-ip.org (Sergio Monteiro Basto) Date: Tue, 09 Aug 2011 20:05:49 +0100 Subject: [lxml-dev] Saving memory In-Reply-To: References: Message-ID: <1312916750.10284.0.camel@segulix> Please note that this mailing list is closed down and going out of service. Please subscribe to the new mailing list instead. http://lxml.de/mailinglist/ On Mon, 2011-08-08 at 20:01 -0600, John W. Shipman wrote: > I have a Python CGI script that builds a pretty sizeable XHTML > table. It works fine for most cases, but for larger tables it's > getting a MemoryError. I have one case where it produces a table > of a bit under a megabyte in serialized form, and it works, but > if I add just a bit more data, MemoryError. > > My choices are not great. I may not be able to migrate to a > different Web server easily. The server has about a 2004 version > of lxml installed. The script relies extensively on the > builder.py module and the "E()" factory function to build the > entire tree before serialization. > > I'm not terribly keen on rewriting my app to use clunkier > XHTML generation just to cut down on memory usage. > > Any suggestions for reducing my memory usage? There's not much > other memory outside the ElementTree; the bulk of the input is > straight out of a database engine. > > Documentation for these scripts is here: > > http://www.nmt.edu/~shipman/z/cbc/cbchist/ > > Best regards, > John Shipman (john at nmt.edu), Applications Specialist, NM Tech Computer Center, > Speare 119, Socorro, NM 87801, (575) 835-5735, http://www.nmt.edu/~john > ``Let's go outside and commiserate with nature.'' --Dave Farber > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev -- S?rgio M. B. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110809/ac41f653/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3309 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110809/ac41f653/attachment.bin From info at whywouldwe.com Thu Aug 18 08:05:51 2011 From: info at whywouldwe.com (info at whywouldwe.com) Date: Thu, 18 Aug 2011 13:05:51 +0700 Subject: [lxml-dev] lxml/html5lib errors Message-ID: <4E4CABBF.50903@whywouldwe.com> I'm using lxml 2.3 and html5lib 0.9 This doesn't work |from lxml.htmlimport html5parser html5parser.document_fromstring(''' t''') The traceback | |Traceback (most recent calllast): File "/tmp/t.py", line4, in t''') File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/lxml/html/html5parser.py", line 54, in document_fromstring return parser.parse(html, useChardet=guess_charset).getroot() File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 211, in parse parseMeta=parseMeta, useChardet=useChardet) File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 111, in _parse self.mainLoop() File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 189, in mainLoop self.phase.processDoctype(token) File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 482, in processDoctype self.tree.insertDoctype(token) TypeError: insertDoctype() takes exactly 4 arguments (2 given)| Apparently lxml isn't compatibly with html5lib 0.9 (http://stackoverflow.com/questions/5529857/parsing-html-using-lxml-and-html5lib-getting-typeerror-insertdoctype-takes-e), is this a known bug? Any plans to fix it? Where does the problem lie, with lxml or html5lib? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110818/6542b42b/attachment.htm