From paul at zeapartners.org Sat Dec 1 18:12:04 2007 From: paul at zeapartners.org (Paul Everitt) Date: Sat, 01 Dec 2007 12:12:04 -0500 Subject: [lxml-dev] BUG: Trunk fails on Leopard (OS X 10.5) for missing schematron.h Message-ID: Hi all. I just tried compiling lxml trunk on Leopard and got: building 'lxml.etree' extension gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/include/libxml2 -I/Users/paul/opt/include/python2.4 -c src/lxml/lxml.etree.c -o build/temp.macosx-10.3-i386-2.4/src/lxml/lxml.etree.o -w src/lxml/lxml.etree.c:69:31: error: libxml/schematron.h: No such file or directory This also happens with the current cheeseshop egg (2.0alpha5). Here's the libxml2 version that ships with Leopard: $ xmllint --version xmllint: using libxml version 20616 compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer XInclude Unicode Regexps Automata Schemas --Paul From stefan_ml at behnel.de Sat Dec 1 19:35:12 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 01 Dec 2007 19:35:12 +0100 Subject: [lxml-dev] BUG: Trunk fails on Leopard (OS X 10.5) for missing schematron.h In-Reply-To: References: Message-ID: <4751A960.609@behnel.de> Paul Everitt wrote: > build/temp.macosx-10.3-i386-2.4/src/lxml/lxml.etree.o -w > src/lxml/lxml.etree.c:69:31: error: libxml/schematron.h: No such file or > directory > > This also happens with the current cheeseshop egg (2.0alpha5). > > Here's the libxml2 version that ships with Leopard: > > $ xmllint --version > xmllint: using libxml version 20616 > compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer > XInclude Unicode Regexps Automata Schemas You are trying to build lxml 2.0 with libxml2 2.6.16. As the FAQ states, lxml 2.0 requires libxml2 2.6.20 or later. http://codespeak.net/lxml/dev/FAQ.html#which-version-of-libxml2-and-libxslt-should-i-use-or-require You can either try lxml 1.3.x or install a newer libxml2 version. Stefan From bkc at murkworks.com Mon Dec 3 01:13:41 2007 From: bkc at murkworks.com (Brad Clements) Date: Sun, 02 Dec 2007 19:13:41 -0500 Subject: [lxml-dev] broken document('') in xslt .. Message-ID: <47534A35.1070500@murkworks.com> I am having trouble with a transform on 2.0alpha5 (and 2.0alpha3). It works on a system with a newer libxml2/libxslt, but fails on a slightly older libxml2/libxslt. However when using xsltproc on the "older" system, the transform works. So I think maybe there is an issue with custom resolvers. basically the .xsl looks like this: And elsewhere in the xsl file is: manage_pending.css I am using a custom resolver. I don't see any requests to resolve '' (and I wouldn't expect any either). At first I thought this was a 2.0alpha5 issue, but I downgraded to 2.0alpha3 and I still have the problem on one system, but it works on the other. working system, centos 4, lxml 2.0alpha3 libxml2 (2, 6, 28) / libxslt (1, 1, 20) non-working system, centos 4, lxml 2.0alpha3 (or 5), libxml2 (2, 6, 23) / libxslt (1, 1, 15) On the non-working system, if I use xsltproc, the transform works correctly. [bkc at sch package_data]$ xsltproc -version Using libxml 20623, libxslt 10115 and libexslt 812 xsltproc was compiled against libxml 20623, libxslt 10115 and libexslt 812 libxslt 10115 was compiled against libxml 20623 libexslt 812 was compiled against libxml 20623 output (in part) I tried going through the libxslt changelog, but nothing appears obvious to me as being a candidate issue. Can anyone suggest something? (upgrading libxslt on this system will be difficult) -- Brad Clements, bkc at murkworks.com (315)268-1000 http://www.murkworks.com AOL-IM: BKClements From stefan_ml at behnel.de Mon Dec 3 13:02:22 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 03 Dec 2007 13:02:22 +0100 Subject: [lxml-dev] broken document('') in xslt .. In-Reply-To: <47534A35.1070500@murkworks.com> References: <47534A35.1070500@murkworks.com> Message-ID: <4753F04E.9040701@behnel.de> Hi, Brad Clements wrote: > I am having trouble with a transform on 2.0alpha5 (and 2.0alpha3). It > works on a system with a newer libxml2/libxslt, > but fails on a slightly older libxml2/libxslt. However when using > xsltproc on the "older" system, the transform works. We actually do have test cases for "document('')" in XSLT and my last test didn't show any problems here. Maybe it's only an issue when custom resolvers come into play. I'll look into this. > Can anyone suggest something? (upgrading libxslt on this system will be > difficult) It looks like you have a sane build environment available. You can try to build a recent libxml2 and libxslt by hand (without installing them) and link them statically into lxml to work around this. It will add some 4-5 megs to the size of the lxml install, though. This should get you going: http://codespeak.net/lxml/dev/build.html#static-linking-on-windows Stefan From dfedoruk at gmail.com Mon Dec 3 14:17:52 2007 From: dfedoruk at gmail.com (Dmitri Fedoruk) Date: Mon, 3 Dec 2007 16:17:52 +0300 Subject: [lxml-dev] Error with thread In-Reply-To: <4726EA45.2000406@behnel.de> References: <200710291702.01081.mantegazza@ill.fr> <200710300833.31078.mantegazza@ill.fr> <4726EA45.2000406@behnel.de> Message-ID: Hello, Looks like I've got the same problem. Sometimes I get the same error message: 'stylesheet is not usable in this thread'. As far as I could understand, that's because of an attempt to use in one thread the xslt obejct initiated in another thread. Sounds reasonable. > Correct. As I said, a work-around would be to either create them on the fly or > cache the XSLT objects in thread-local storage and reuse them from there. Nice. My application works under the same scheme. I'm using mod_python and several apache processes started in prefork mode. In every apache process I'm using a global general object that contains xslt objects inside. When a request comes to the next apache process, my general object is initialized (if it has not been done yet) and then is used inside this thread and this process. I cannot see the reason why one instance of mod_python should conflict with another. Nevertheless, I happen to get this error messages without any idea why. I can not see any dependency or rule yet. The only solution is to restart apache. My software is the following: Apache/2.0.61 Python 2.5.1 mod_python-3.3.1 lxml-1.3.4 libxslt-1.1.20 freebsd 6.2-20070330-SNAP Do you have any idea how can I fix this situation or at least how can I track the reasons? Maybe this is the question for some other mailing lists too? Dmitri From hjh at alterras.de Mon Dec 3 21:37:20 2007 From: hjh at alterras.de (=?ISO-8859-1?Q?Hans-J=FCrgen?= Hay) Date: Mon, 03 Dec 2007 21:37:20 +0100 Subject: [lxml-dev] Error with thread In-Reply-To: References: <200710291702.01081.mantegazza@ill.fr> <200710300833.31078.mantegazza@ill.fr> <4726EA45.2000406@behnel.de> Message-ID: <1196714240.12565.10.camel@matrix.local> Hi, I had the same problem with mod_python, a while ago, it seems mod_python does some trickery with threads in its internals. The only solution I found is setting PythonInterpreter "myoneandonlyinterpreter" in the apache config of each virtual host while running prefork servers. I coudn't find any other solution beside hoping that the threading problem in lxml will go away, sometime. Hans Am Montag, den 03.12.2007, 16:17 +0300 schrieb Dmitri Fedoruk: > Hello, > > Looks like I've got the same problem. > Sometimes I get the same error message: 'stylesheet is not usable in > this thread'. As far as I could understand, that's because of an > attempt to use in one thread the xslt obejct initiated in another > thread. Sounds reasonable. > > > Correct. As I said, a work-around would be to either create them on the fly or > > cache the XSLT objects in thread-local storage and reuse them from there. > > Nice. My application works under the same scheme. I'm using mod_python > and several apache processes started in prefork mode. In every apache > process I'm using a global general object that contains xslt objects > inside. When a request comes to the next apache process, my general > object is initialized (if it has not been done yet) and then is used > inside this thread and this process. I cannot see the reason why one > instance of mod_python should conflict with another. > > Nevertheless, I happen to get this error messages without any idea > why. I can not see any dependency or rule yet. The only solution is to > restart apache. > > My software is the following: > Apache/2.0.61 > Python 2.5.1 > mod_python-3.3.1 > lxml-1.3.4 > libxslt-1.1.20 > freebsd 6.2-20070330-SNAP > > Do you have any idea how can I fix this situation or at least how can > I track the reasons? Maybe this is the question for some other mailing > lists too? > > Dmitri > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev -- ----------------------------------------------------------------------- Alterras GmbH, Allersbergerstr. 185-N, D-90461 N?rnberg http://www.alterras.de/, info at alterras.de Tel: (+49) 0911-480039-0 Handelsreg.: AG N?rnberg HRB 18488, Gesch?ftsf?hrer: H.-J. Hay, H. Sivak From paul at zeapartners.org Tue Dec 4 17:16:22 2007 From: paul at zeapartners.org (Paul Everitt) Date: Tue, 04 Dec 2007 11:16:22 -0500 Subject: [lxml-dev] BUG: Trunk fails on Leopard (OS X 10.5) for missing schematron.h In-Reply-To: <4751A960.609@behnel.de> References: <4751A960.609@behnel.de> Message-ID: <47557D56.9030308@zeapartners.org> Total brainfart on my part. I had read somewhere else that libxml2 was updated, then looked at the version number *directly*, and *still* misread it. Sorry sorry sorry. --Paul Stefan Behnel wrote: > Paul Everitt wrote: >> build/temp.macosx-10.3-i386-2.4/src/lxml/lxml.etree.o -w >> src/lxml/lxml.etree.c:69:31: error: libxml/schematron.h: No such file or >> directory >> >> This also happens with the current cheeseshop egg (2.0alpha5). >> >> Here's the libxml2 version that ships with Leopard: >> >> $ xmllint --version >> xmllint: using libxml version 20616 >> compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer >> XInclude Unicode Regexps Automata Schemas > > You are trying to build lxml 2.0 with libxml2 2.6.16. As the FAQ states, lxml > 2.0 requires libxml2 2.6.20 or later. > > http://codespeak.net/lxml/dev/FAQ.html#which-version-of-libxml2-and-libxslt-should-i-use-or-require > > You can either try lxml 1.3.x or install a newer libxml2 version. > > Stefan From ebgssth at gmail.com Wed Dec 5 14:41:06 2007 From: ebgssth at gmail.com (js) Date: Wed, 5 Dec 2007 22:41:06 +0900 Subject: [lxml-dev] Now lxml is available through MacPorts Message-ID: I just want to announce that lxml is accepted by MacPorts community so now you can install lxml just by doing "port install py25-lxml". IMHO, MacPorts is more flexible and easier to use than easy_install. Please give it a try. Thanks. From cz at gocept.com Thu Dec 6 08:59:51 2007 From: cz at gocept.com (Christian Zagrodnick) Date: Thu, 6 Dec 2007 08:59:51 +0100 Subject: [lxml-dev] Processing instruction roundtrip (again) Message-ID: Hi, i've got a problem with Processing Instructions (2.0alpha5, Python2.4): >>> obj = lxml.objectify.XML('') >>> lxml.etree.tostring(obj) '' >>> obj = lxml.objectify.XML('') >>> lxml.etree.tostring(obj) '' >>> So a PI not contained in the root node gets lost. Any way to fix this? -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From artur.siekielski at gmail.com Fri Dec 7 01:56:28 2007 From: artur.siekielski at gmail.com (Artur Siekielski) Date: Fri, 07 Dec 2007 01:56:28 +0100 Subject: [lxml-dev] Huge memory leak in latest 2.0 Message-ID: <47589A3C.5070305@gmail.com> Hi. I'm using latest 2.0 version from trunk, rev. 49494 (because it supports 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in loop, 100-200kB each. I have noticed that memory used by my program increases about 1MB after each document processed, so after a few hundreds of passes system is about to hang. Running the same code with lxml 1.3.6 doesn't cause such memory usage increase. I'm using the following library calls: tree = etree.parse( , HTMLParser(encoding=...)) etree.tostring(tree) el.xpath(...) getting children and attributes of elements I'm using libxml2 version 2.6.28. If anyone knows about solution/workaround, please write. Regards, Artur From stefan_ml at behnel.de Mon Dec 3 22:24:44 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 03 Dec 2007 22:24:44 +0100 Subject: [lxml-dev] broken document('') in xslt .. In-Reply-To: <47534A35.1070500@murkworks.com> References: <47534A35.1070500@murkworks.com> Message-ID: <4754741C.9020107@behnel.de> Hi, Brad Clements wrote: > I am having trouble with a transform on 2.0alpha5 (and 2.0alpha3). It > works on a system with a newer libxml2/libxslt, > but fails on a slightly older libxml2/libxslt. However when using > xsltproc on the "older" system, the transform works. > So I think maybe there is an issue with custom resolvers. I added the following test case, which works for me on the current lxml 2.0 trunk and also on 2.0alpha5 using libxml2 2.6.20-30 and libxslt 1.1.15-22. Is there anything you do different? Stefan def test_xslt_document_XML_resolver(self): assertEquals = self.assertEquals called = {'count' : 0} class TestResolver(etree.Resolver): def resolve(self, url, id, context): assertEquals(url, 'file://ANYTHING') called['count'] += 1 return self.resolve_string('', context) parser = etree.XMLParser() parser.resolvers.add(TestResolver()) xslt = etree.XSLT(etree.XML("""\ A B """, parser)) self.assertEquals(called['count'], 0) result = xslt(etree.XML('')) self.assertEquals(called['count'], 1) root = result.getroot() self.assertEquals(root.tag, 'test') self.assertEquals(len(root), 4) self.assertEquals(root[0].tag, 'CALLED') self.assertEquals(root[1].tag, '{local}entry') self.assertEquals(root[1].text, None) self.assertEquals(root[1].get("value"), 'A') self.assertEquals(root[2].tag, 'CALLED') self.assertEquals(root[3].tag, '{local}entry') self.assertEquals(root[3].text, None) self.assertEquals(root[3].get("value"), 'B') From stefan_ml at behnel.de Tue Dec 4 10:03:40 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 04 Dec 2007 10:03:40 +0100 Subject: [lxml-dev] easy_install issues In-Reply-To: <87zlxc674l.fsf@pfhawkins.com> References: <87zlxc674l.fsf@pfhawkins.com> Message-ID: <475517EC.5010502@behnel.de> Hi, just in case this is still an issue: P.F. Hawkins wrote: > I'm having trouble installing any version of lxml higher than 1.3.3. The error > says that easy_install can't find a proper version of dateutil, even though I > have a high enough version of "python-dateutil" installed on this system. lxml doesn't need dateutil, so this looks like a setuptools problem. The distutils-sig would be the right place to ask. Are you sure the package is installed for the correct Python version? - just in case you have, say, Py 2.4 and 2.5 installed in parallel. Have you tried reinstalling the "python-dateutil" package? > I assume that the issue has to do with "dateutil" vs. "python-dateutil", but I'm > not sure. That shouldn't be a problem as it won't change the name of the Python package. Stefan From stefan_ml at behnel.de Sat Dec 8 16:12:01 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 08 Dec 2007 16:12:01 +0100 Subject: [lxml-dev] Now lxml is available through MacPorts In-Reply-To: References: Message-ID: <475AB441.7040606@behnel.de> js wrote: > I just want to announce that lxml is accepted by MacPorts community > so now you can install lxml just by doing "port install py25-lxml". > IMHO, MacPorts is more flexible and easier to use than easy_install. Cool, thanks! I'll add a hint to the install file. Stefan From stefan_ml at behnel.de Sat Dec 8 16:17:11 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 08 Dec 2007 16:17:11 +0100 Subject: [lxml-dev] Processing instruction roundtrip (again) In-Reply-To: References: Message-ID: <475AB577.1060002@behnel.de> Christian Zagrodnick wrote: >>>> obj = lxml.objectify.XML('') >>>> lxml.etree.tostring(obj) > '' You are requesting a serialisation of the Element, so that's the expected result. This should work: etree.tostring(etree.ElementTree(obj)) or: etree.tostring(obj.getroottree()) Stefan From stefan_ml at behnel.de Sat Dec 8 17:20:58 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 08 Dec 2007 17:20:58 +0100 Subject: [lxml-dev] Huge memory leak in latest 2.0 In-Reply-To: <47589A3C.5070305@gmail.com> References: <47589A3C.5070305@gmail.com> Message-ID: <475AC46A.2090906@behnel.de> Hi, Artur Siekielski wrote: > I'm using latest 2.0 version from trunk, rev. 49494 (because it supports > 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in > loop, 100-200kB each. I have noticed that memory used by my program > increases about 1MB after each document processed, so after a few > hundreds of passes system is about to hang. Running the same code with > lxml 1.3.6 doesn't cause such memory usage increase. > > I'm using the following library calls: > tree = etree.parse( , HTMLParser(encoding=...)) > etree.tostring(tree) > el.xpath(...) > getting children and attributes of elements thanks for the report, I can reproduce this with a simple call to the parser. I'll look into it. Stefan From nslater at bytesexual.org Sat Dec 8 23:10:55 2007 From: nslater at bytesexual.org (Noah Slater) Date: Sat, 8 Dec 2007 22:10:55 +0000 Subject: [lxml-dev] Question about newlines Message-ID: <20071208221055.GI10157@bytesexual.org> Hey, When serialising a document there are two places that I would expect lxml to insert newlines and yet there are non. 1) When adding a PI via the element.addprevious method and PI has it's tail trimmed and so when serialising the PI runs into the root element. 2) At the very end of the document. POSIX states that all files must end in a newline so I consider this to be a bug. Perhaps I am missing something, help is much appreciated! :) Thanks, -- Noah Slater "Creativity can be a social contribution, but only in so far as society is free to use the results." - R. Stallman From nslater at bytesexual.org Sun Dec 9 02:04:54 2007 From: nslater at bytesexual.org (Noah Slater) Date: Sun, 9 Dec 2007 01:04:54 +0000 Subject: [lxml-dev] How to serialise with doctype Message-ID: <20071209010454.GD10613@bytesexual.org> Hello, How do you serialise a document with a doctype? Thanks, -- Noah Slater "Creativity can be a social contribution, but only in so far as society is free to use the results." - R. Stallman From stefan_ml at behnel.de Sun Dec 9 08:48:17 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 09 Dec 2007 08:48:17 +0100 Subject: [lxml-dev] Question about newlines In-Reply-To: <20071208221055.GI10157@bytesexual.org> References: <20071208221055.GI10157@bytesexual.org> Message-ID: <475B9DC1.8070702@behnel.de> Hi, Noah Slater wrote: > When serialising a document there are two places that I would expect > lxml to insert newlines and yet there are non. Serialisation will never alter content. That said, there is a separate serialisation API in libxml2 (the xmlSave* functions) that inserts a newline at the end, maybe also after PIs (don't know). But it will not be used in lxml for a while due to API stability issues. > 1) When adding a PI via the element.addprevious method and PI has > it's tail trimmed and so when serialising the PI runs into the > root element. > > 2) At the very end of the document. POSIX states that all files must > end in a newline so I consider this to be a bug. XML works on more systems than those that support POSIX. :) One reason these don't happen automatically is that ET doesn't insert newlines either. This is not a hard reason, and maybe we could even change this in 2.0. I'll think about it. Any other opinions on this? Stefan From stefan_ml at behnel.de Sun Dec 9 08:52:19 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 09 Dec 2007 08:52:19 +0100 Subject: [lxml-dev] How to serialise with doctype In-Reply-To: <20071209010454.GD10613@bytesexual.org> References: <20071209010454.GD10613@bytesexual.org> Message-ID: <475B9EB3.7080009@behnel.de> Noah Slater wrote: > How do you serialise a document with a doctype? Serialising an ElementTree object should do that. Stefan From nslater at bytesexual.org Sun Dec 9 13:49:34 2007 From: nslater at bytesexual.org (Noah Slater) Date: Sun, 9 Dec 2007 12:49:34 +0000 Subject: [lxml-dev] Question about newlines In-Reply-To: <475B9DC1.8070702@behnel.de> References: <20071208221055.GI10157@bytesexual.org> <475B9DC1.8070702@behnel.de> Message-ID: <20071209124934.GB13117@bytesexual.org> On Sun, Dec 09, 2007 at 08:48:17AM +0100, Stefan Behnel wrote: > Serialisation will never alter content. [snip] > > 1) When adding a PI via the element.addprevious method and PI has > > it's tail trimmed and so when serialising the PI runs into the > > root element. Well, this is well and good but lxml REMOVES the PI tail so I cannot insert a newline even if I want to. -- Noah Slater "Creativity can be a social contribution, but only in so far as society is free to use the results." - R. Stallman From cz at gocept.com Mon Dec 10 09:33:06 2007 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 10 Dec 2007 09:33:06 +0100 Subject: [lxml-dev] Processing instruction roundtrip (again) In-Reply-To: <475AB577.1060002@behnel.de> References: <475AB577.1060002@behnel.de> Message-ID: On 08.12.2007, at 16:17, Stefan Behnel wrote: > > Christian Zagrodnick wrote: >>>>> obj = lxml.objectify.XML('') >>>>> lxml.etree.tostring(obj) >> '' > > You are requesting a serialisation of the Element, so that's the > expected result. > > This should work: > > etree.tostring(etree.ElementTree(obj)) > > or: > > etree.tostring(obj.getroottree()) Ah yes. This way it works. Rather strange that fromstring/tostring is not symetric. But from the usesage patterns it probably makes sense. I'm probably going to serialse the roottree whenever a node doesn't have a parent. But thanks for the hint :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From pete.forman at westerngeco.com Tue Dec 11 16:06:33 2007 From: pete.forman at westerngeco.com (Pete Forman) Date: Tue, 11 Dec 2007 15:06:33 +0000 Subject: [lxml-dev] Question about newlines References: <20071208221055.GI10157@bytesexual.org> Message-ID: Noah Slater writes: > 2) At the very end of the document. POSIX states that all files must > end in a newline I disagree. Its definition of Text File does strictly say that. However the rationale implies that the special cases of empty file and file ending in an incomplete line might also be added to the strict definition. The specifications for ex and sort explicity allow for an incomplete line, i.e. file not ending with newline. The specs for other utilities that read text files are silent on the matter. That means that it is left to the quality of implementation whether such files are read correctly. In any case, XML files are probably better thought of as binary from a POSIX point of view. The line length must be restricted to LINE_MAX to qualify as text. -- Pete Forman -./\.- Disclaimer: This post is originated WesternGeco -./\.- by myself and does not represent pete.forman at westerngeco.com -./\.- the opinion of Schlumberger or http://petef.port5.com -./\.- WesternGeco. From stefan_ml at behnel.de Sun Dec 9 18:57:11 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 09 Dec 2007 18:57:11 +0100 Subject: [lxml-dev] Question about newlines In-Reply-To: <20071209124934.GB13117@bytesexual.org> References: <20071208221055.GI10157@bytesexual.org> <475B9DC1.8070702@behnel.de> <20071209124934.GB13117@bytesexual.org> Message-ID: <475C2C77.7080009@behnel.de> Noah Slater wrote: > On Sun, Dec 09, 2007 at 08:48:17AM +0100, Stefan Behnel wrote: >> Serialisation will never alter content. > [snip] >>> 1) When adding a PI via the element.addprevious method and PI has >>> it's tail trimmed and so when serialising the PI runs into the >>> root element. > > Well, this is well and good but lxml REMOVES the PI tail so I cannot > insert a newline even if I want to. Ah, got it. Thanks for insisting. :) lxml.etree does this on purpose. If you allow character data around the processing instructions that you add as siblings of the root node, you need to make sure it's only whitespace (not 'real' data) to keep the in-memory tree well-formed and to serialise well-formed XML. So the behaviour would be: strip the tail, but keep it if it's whitespace. Sounds a bit ugly to me... I also noted that libxml2's parser drops whitespace at the root level, which is perfectly fine, as it is the most definitely ignorable whitespace there is. I personally prefer having lxml add a line break when serialising processing instructions and comments at the root level, and cosistently dropping all tail text of PIs and comments appended/prepended to a root node. So the behaviour for the root level would be: drop all whitespace when parsing, and add line breaks around PIs and comments on serialisation. There's also the document ending issue. The document serialiser of libxml2 does append a newline, and one day, lxml may switch to using it. So I added this behaviour now - and had to adapt tons of test cases that compare serialised XML between ET and lxml. But I don't mind having white-space differences in the serialisation as long as it's well-formed, equivalent XML. Stefan From stefan_ml at behnel.de Mon Dec 10 00:11:18 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 10 Dec 2007 00:11:18 +0100 Subject: [lxml-dev] Huge memory leak in latest 2.0 In-Reply-To: <47589A3C.5070305@gmail.com> References: <47589A3C.5070305@gmail.com> Message-ID: <475C7616.2080708@behnel.de> Artur Siekielski wrote: > I'm using latest 2.0 version from trunk, rev. 49494 (because it supports > 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in > loop, 100-200kB each. I have noticed that memory used by my program > increases about 1MB after each document processed, so after a few > hundreds of passes system is about to hang. Running the same code with > lxml 1.3.6 doesn't cause such memory usage increase. > > I'm using the following library calls: > tree = etree.parse( , HTMLParser(encoding=...)) > etree.tostring(tree) > el.xpath(...) > getting children and attributes of elements > > I'm using libxml2 version 2.6.28. > > If anyone knows about solution/workaround, please write. Hmmm, weird. The problem doesn't result from any change in lxml, just from the switch to Cython 0.9.6.8+. And I don't even see any obvious problem in the generated code. Anyway, here's a patch that seems to make the leak go away on my side. Could you give it a try? Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: leak-fix.patch Type: text/x-patch Size: 460 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20071210/34f72308/attachment.bin From stefan_ml at behnel.de Fri Dec 14 09:25:43 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 14 Dec 2007 09:25:43 +0100 Subject: [lxml-dev] Question about newlines In-Reply-To: <475C2C77.7080009@behnel.de> References: <20071208221055.GI10157@bytesexual.org> <475B9DC1.8070702@behnel.de> <20071209124934.GB13117@bytesexual.org> <475C2C77.7080009@behnel.de> Message-ID: <47623E07.8080108@behnel.de> Stefan Behnel wrote: > Noah Slater wrote: >> On Sun, Dec 09, 2007 at 08:48:17AM +0100, Stefan Behnel wrote: >>> Serialisation will never alter content. >> [snip] >>>> 1) When adding a PI via the element.addprevious method and PI has >>>> it's tail trimmed and so when serialising the PI runs into the >>>> root element. >> Well, this is well and good but lxml REMOVES the PI tail so I cannot >> insert a newline even if I want to. > > Ah, got it. Thanks for insisting. :) > > So the behaviour > for the root level would be: drop all whitespace when parsing, and add line > breaks around PIs and comments on serialisation. ..., but only if pretty printing is requested. I think that's all that's needed here. Stefan From stefan_ml at behnel.de Wed Dec 19 11:57:37 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 19 Dec 2007 11:57:37 +0100 Subject: [lxml-dev] Huge memory leak in latest 2.0 In-Reply-To: <475C7616.2080708@behnel.de> References: <47589A3C.5070305@gmail.com> <475C7616.2080708@behnel.de> Message-ID: <4768F921.2020008@behnel.de> Stefan Behnel wrote: > Artur Siekielski wrote: >> I'm using latest 2.0 version from trunk, rev. 49494 (because it supports >> 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in >> loop, 100-200kB each. I have noticed that memory used by my program >> increases about 1MB after each document processed, so after a few >> hundreds of passes system is about to hang. Running the same code with >> lxml 1.3.6 doesn't cause such memory usage increase. >> >> I'm using the following library calls: >> tree = etree.parse( , HTMLParser(encoding=...)) >> etree.tostring(tree) >> el.xpath(...) >> getting children and attributes of elements >> >> I'm using libxml2 version 2.6.28. >> >> If anyone knows about solution/workaround, please write. > > Hmmm, weird. The problem doesn't result from any change in lxml, just from the > switch to Cython 0.9.6.8+. And I don't even see any obvious problem in the > generated code. I fixed the problem in Cython (and Pyrex). It should work with the next release. I attached the patch that I used in case you want to build lxml yourself using Cython. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: kw-only-fixes.patch Type: text/x-patch Size: 1651 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20071219/2c5f77bc/attachment.bin From cardoso.pb at om.asahi-kasei.co.jp Wed Dec 19 12:18:28 2007 From: cardoso.pb at om.asahi-kasei.co.jp (Pedro Cardoso) Date: Wed, 19 Dec 2007 20:18:28 +0900 Subject: [lxml-dev] Python crash with lxml use with multi-thread Message-ID: <4768FE04.2060105@om.asahi-kasei.co.jp> Hi I am using lxml (just updated to version 1.3.6) on python 2.5, on a windows XP machine. I have a wrapper class for an XML file, and worked fine until I used it on several threads. At that moment it will crash. The piece of code I have it : projectNode = self.dom.xpath('//Project[@Code="%s"]' %projectCode)[0] logger.debug("here") previewsTask = self.dom.xpath('//Task[@Code="%s"]' % taskData.xml_Code)[0] logger.debug("here") projectNode.replace(previewsTask, taskData.dom) #projectNode.remove(previewsTask) #projectNode.append(taskData.dom) logger.debug("here") self.save() logger.debug("here") return None The loggin is to know the last line to be called . The last log msg is written, but the one right after the call to this function is not, that makes me think the crash is on the return. There should be no problem of thread safety, since I am using locks. The crash seems to be on the return, and The calling sequence is : thread 1 : append a node to xml and save thread 2 : replaces node and save (the piece of code above) At this moment it crash python ! Any idea of the problem ? Thanks in advance Pedro Cardoso From stefan_ml at behnel.de Wed Dec 19 12:37:37 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 19 Dec 2007 12:37:37 +0100 Subject: [lxml-dev] Python crash with lxml use with multi-thread In-Reply-To: <4768FE04.2060105@om.asahi-kasei.co.jp> References: <4768FE04.2060105@om.asahi-kasei.co.jp> Message-ID: <47690281.6050100@behnel.de> Pedro Cardoso wrote: > I am using lxml (just updated to version 1.3.6) on python 2.5, on a > windows XP machine. > > I have a wrapper class for an XML file, and worked fine until I used it > on several threads. At that moment it will crash. > > The piece of code I have it : > projectNode = self.dom.xpath('//Project[@Code="%s"]' > %projectCode)[0] > logger.debug("here") > previewsTask = self.dom.xpath('//Task[@Code="%s"]' % > taskData.xml_Code)[0] > logger.debug("here") > projectNode.replace(previewsTask, taskData.dom) > #projectNode.remove(previewsTask) > #projectNode.append(taskData.dom) > logger.debug("here") > self.save() > logger.debug("here") > return None > The loggin is to know the last line to be called . > > The last log msg is written, but the one right after the call to this > function is not, that makes me think the crash is on the return. There > should be no problem of thread safety, since I am using locks. > > The crash seems to be on the return, and The calling sequence is : > thread 1 : append a node to xml and save > thread 2 : replaces node and save (the piece of code above) > At this moment it crash python ! > > Any idea of the problem ? Have you read the FAQ section on threading? http://codespeak.net/lxml/dev/FAQ.html#id1 Stefan From stefan_ml at behnel.de Wed Dec 19 13:09:07 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 19 Dec 2007 13:09:07 +0100 Subject: [lxml-dev] lxml 2.0alpha6 released Message-ID: <476909E3.70908@behnel.de> Hi, another 'last' alpha version of the 2.0 series was released to PyPI. The reason this is still not a beta version is that there were a couple of signature changes in methods for iteration, XPath and XSLT to remove some inconsistencies. I hope these do not cause too much hassle for maintainers of existing code. Backwards compatibility is provided through the use of keyword arguments (which are now enforced in some places where they make sense). I'm open to discussion about these changes, but would otherwise declare this release ready for the API freeze. From my point of view, the next release will only focus on bug fixing (if required :) before the final release (hopefully still this year). This release already features some major bug fixes including a memory leak that was introduced in 2.0alpha5 (by newer Cython versions). lxml now requires the not-yet-released Cython 0.9.6.10 to build, which will hopefully contain the bug fix. One thing to improve for 2.0 is the documentation (as usual), especially the generated API docs: http://codespeak.net/lxml/dev/api/index.html Any help is appreciated, as are bug reports and criticism. Have fun, Stefan 2.0alpha6 (2007-12-19) ====================== Features added -------------- * New properties ``position`` and ``code`` on ParseError exception (as in ET 1.3) Bugs fixed ---------- * Memory leak in the ``parse()`` function. * Minor bugs in XSLT error message formatting. * Result document memory leak in target parser. Other changes ------------- * Various places in the XPath, XSLT and iteration APIs now require keyword-only arguments. * The argument order in ``element.itersiblings()`` was changed to match the order used in all other iteration methods. The second argument ('preceding') is now a keyword-only argument. * The ``getiterator()`` method on Elements and ElementTrees was reverted to return an iterator as it did in lxml 1.x. The ET API specification allows it to return either a sequence or an iterator, and it traditionally returned a sequence in ET and an iterator in lxml. However, it is now deprecated in favour of the ``iter()`` method, which should be used in new code wherever possible. * The 'pretty printed' serialisation of ElementTree objects now inserts newlines at the root level between processing instructions, comments and the root tag. * A 'pretty printed' serialisation is now terminated with a newline. * Second argument to ``lxml.etree.Extension()`` helper is no longer required, third argument is now a keyword-only argument ``ns``. * ``lxml.html.tostring`` takes an ``encoding`` argument. From cardoso.pb at om.asahi-kasei.co.jp Thu Dec 20 05:10:47 2007 From: cardoso.pb at om.asahi-kasei.co.jp (Pedro Cardoso) Date: Thu, 20 Dec 2007 13:10:47 +0900 Subject: [lxml-dev] Python crash with lxml use with multi-thread In-Reply-To: <47690281.6050100@behnel.de> References: <4768FE04.2060105@om.asahi-kasei.co.jp> <47690281.6050100@behnel.de> Message-ID: <4769EB47.1000000@om.asahi-kasei.co.jp> Stefan Behnel wrote: > Pedro Cardoso wrote: > >> I am using lxml (just updated to version 1.3.6) on python 2.5, on a >> windows XP machine. >> >> I have a wrapper class for an XML file, and worked fine until I used it >> on several threads. At that moment it will crash. >> >> The piece of code I have it : >> projectNode = self.dom.xpath('//Project[@Code="%s"]' >> %projectCode)[0] >> logger.debug("here") >> previewsTask = self.dom.xpath('//Task[@Code="%s"]' % >> taskData.xml_Code)[0] >> logger.debug("here") >> projectNode.replace(previewsTask, taskData.dom) >> #projectNode.remove(previewsTask) >> #projectNode.append(taskData.dom) >> logger.debug("here") >> self.save() >> logger.debug("here") >> return None >> The loggin is to know the last line to be called . >> >> The last log msg is written, but the one right after the call to this >> function is not, that makes me think the crash is on the return. There >> should be no problem of thread safety, since I am using locks. >> >> The crash seems to be on the return, and The calling sequence is : >> thread 1 : append a node to xml and save >> thread 2 : replaces node and save (the piece of code above) >> At this moment it crash python ! >> >> Any idea of the problem ? >> > > Have you read the FAQ section on threading? > > http://codespeak.net/lxml/dev/FAQ.html#id1 > > Stefan > > > Ok, I feel stupid now ! Problem fixed. Thanks. Pedro From dfedoruk at gmail.com Wed Dec 26 19:01:01 2007 From: dfedoruk at gmail.com (Dmitri Fedoruk) Date: Wed, 26 Dec 2007 21:01:01 +0300 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 Message-ID: Greetings! First of all, happy holidays. Looks like it's not the perfect time to report a problem, but I can't find any solution by myself. More than that, I'm not sure this is a lxml problem. But I really hope for somebody to take a look at this long message :) My application is a web service written in Python. It is running inside of mod_python handled by Apache2 (precise versions follow). Apache runs in prefork mode. I use xml to collect data from sources. I use xslt for their processing and, eventually, forming the html output. I use lxml library for this. Xslt transformations are compiled once for every instance of apache fork and then they run in their own thread independently. Up to one moment everything went smoothly and painlessly. Then I decided to upgrade all the software and came to the following situation. When I try to serialize an etree object ... transform = etree.XSLT(xslt_doc) result_tree = transform(data, **variables ) return etree.tostring(result_tree, 'utf-8') ( or return str(result_tree) ) I have a core dump of apache. The point is that it happens during serialization. I tried to omit 'utf-8', I tried to use str() - did not help. What is strange - this error occurs with only several xsl templates, and not all the time. What is even more strange: this happens only on amd64 machines _within_ apache. On my i386 desktop I can not reproduce this bug. Also I can not reproduce this from the console. Now the configuration details. I tried almost all possible combinations of this software. FreeBSD 6.2-20070330-SNAP Apache/2.0.61, Apache/2.2.6 mod_python-3.3.1 lxml 1.3.5, 1.3.6 libxml2-2.6.27, libxml2-2.6.30 libxslt-1.1.20, libxslt-1.1.22 I have to mention that I do also use some modules written in C that have pythonic binding. We suggested that they could be the reason of memory corruption. We run the application without their usage and still got the same core dump, so they are not the reason. The error messages are httpd in free(): error: chunk is already free httpd in free(): error: modified (chunk-) pointer httpd in free(): error: pointer to wrong page And the backtrace: (gdb) bt #0 0x00000008013824bc in kill () from /lib/libc.so.6 #1 0x000000080138134d in abort () from /lib/libc.so.6 #2 0x000000080131a265 in _UTF8_init () from /lib/libc.so.6 #3 0x000000080131a29c in _UTF8_init () from /lib/libc.so.6 #4 0x000000080131b23d in _UTF8_init () from /lib/libc.so.6 #5 0x00000008055dc3f9 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #6 0x00000008055dc29d in xmlFreeProp () from /usr/local/lib/libxml2.so.5 #7 0x00000008055dc2dc in xmlFreePropList () from /usr/local/lib/libxml2.so.5 #8 0x00000008055dc4bb in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #9 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #10 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #11 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #12 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #13 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #14 0x00000008055dcad5 in xmlFreeDoc () from /usr/local/lib/libxml2.so.5 #15 0x00000008051a8399 in __pyx_tp_dealloc_5etree__Document () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #16 0x00000008051be15b in __pyx_tp_dealloc_5etree__Element () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #17 0x00000008051a9b8f in __pyx_tp_dealloc_5etree__ElementTree () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #18 0x00000008036cbb4b in _PyFloat_Unpack8 () from /usr/local/lib/libpython2.5.so #19 0x0000000803723c03 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #20 0x00000008037229eb in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #21 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #22 0x00000008037229eb in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #23 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #24 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #25 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #26 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #27 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #28 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #29 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #30 0x00000008036cd8ae in PyFunction_SetClosure () from /usr/local/lib/libpython2.5.so #31 0x00000008036b3c73 in PyObject_Call () from /usr/local/lib/libpython2.5.so #32 0x0000000803721262 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #33 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #34 0x00000008037229eb in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #35 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #36 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #37 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #38 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #39 0x00000008037229eb in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #40 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #41 0x00000008036cd8ae in PyFunction_SetClosure () from /usr/local/lib/libpython2.5.so #42 0x00000008036b3c73 in PyObject_Call () from /usr/local/lib/libpython2.5.so #43 0x00000008036bbd64 in PyMethod_New () from /usr/local/lib/libpython2.5.so #44 0x00000008036b3c73 in PyObject_Call () from /usr/local/lib/libpython2.5.so #45 0x00000008036b3d09 in PyObject_Call () from /usr/local/lib/libpython2.5.so #46 0x00000008036b4024 in PyObject_CallMethod () from /usr/local/lib/libpython2.5.so #47 0x000000080356acf2 in python_handler () from /usr/local/libexec/apache2/mod_python.so #48 0x0000000000425e4a in ap_run_handler (r=0x190b090) at config.c:152 #49 0x0000000000426755 in ap_invoke_handler (r=0x190b090) at config.c:364 #50 0x0000000000422550 in ap_process_request (r=0x190b090) at http_request.c:249 #51 0x000000000041bbd7 in ap_process_http_connection (c=0x77f1b0) at http_core.c:253 #52 0x0000000000433c1a in ap_run_process_connection (c=0x77f1b0) at connection.c:43 #53 0x0000000000434075 in ap_process_connection (c=0x77f1b0, csd=0x77f090) at connection.c:176 #54 0x000000000042437f in child_main (child_num_arg=3) at prefork.c:610 #55 0x000000000042450b in make_child (s=0x5a94f8, slot=3) at prefork.c:704 #56 0x00000000004247b0 in perform_idle_server_maintenance (p=0x578028) at prefork.c:839 #57 0x0000000000424c37 in ap_mpm_run (_pconf=0x578028, plog=0x5a4028, s=0x5a94f8) at prefork.c:1040 #58 0x000000000042d27e in main (argc=1, argv=0x7fffffffea08) at main.c:656 The irony of it all is that I still can not downgrade all this packages to the version that did not have core dump... I've spent a week already trying to figure out the gist of the problem, combining different lxml, libxml2\xslt and apache versions. I start with this mailing list as the fatal call is made with the lxml function. I really hope there is an explanation and solution for this. I do not want to switch to another xslt processor or rebuild the architecture of the whole service in the worst case. I'd be grateful for any ideas and suggestions. Cheers, Dmitri From stefan_ml at behnel.de Wed Dec 26 20:47:10 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 26 Dec 2007 20:47:10 +0100 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 In-Reply-To: References: Message-ID: <4772AFBE.8020801@behnel.de> Hi, Dmitri Fedoruk wrote: > First of all, happy holidays. Looks like it's not the perfect time to > report a problem, but I can't find any solution by myself. More than > that, I'm not sure this is a lxml problem. But I really hope for > somebody to take a look at this long message :) > ... > transform = etree.XSLT(xslt_doc) > result_tree = transform(data, **variables ) > return etree.tostring(result_tree, 'utf-8') > ( or return str(result_tree) ) > > FreeBSD 6.2-20070330-SNAP > Apache/2.0.61, Apache/2.2.6 > mod_python-3.3.1 > lxml 1.3.5, 1.3.6 > libxml2-2.6.27, libxml2-2.6.30 > libxslt-1.1.20, libxslt-1.1.22 > > (gdb) bt > #0 0x00000008013824bc in kill () from /lib/libc.so.6 > #1 0x000000080138134d in abort () from /lib/libc.so.6 > #2 0x000000080131a265 in _UTF8_init () from /lib/libc.so.6 > #3 0x000000080131a29c in _UTF8_init () from /lib/libc.so.6 > #4 0x000000080131b23d in _UTF8_init () from /lib/libc.so.6 > #5 0x00000008055dc3f9 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 > #6 0x00000008055dc29d in xmlFreeProp () from /usr/local/lib/libxml2.so.5 > #7 0x00000008055dc2dc in xmlFreePropList () from /usr/local/lib/libxml2.so.5 > #8 0x00000008055dc4bb in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 > #9 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 > #10 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 > #11 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 > #12 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 > #13 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 > #14 0x00000008055dcad5 in xmlFreeDoc () from /usr/local/lib/libxml2.so.5 > #15 0x00000008051a8399 in __pyx_tp_dealloc_5etree__Document () > from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so > #16 0x00000008051be15b in __pyx_tp_dealloc_5etree__Element () > from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so > #17 0x00000008051a9b8f in __pyx_tp_dealloc_5etree__ElementTree () > from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so Hmm, this looks like a deallocation problem - and there shouldn't be any left in lxml 1.3.6... Also, the usage you describe sounds perfectly reasonable and shouldn't lead to any problems. For a quick shot: could you try switching to lxml 2.0alpha6 to see if the problem persists? Stefan From dfedoruk at gmail.com Thu Dec 27 00:20:42 2007 From: dfedoruk at gmail.com (Dmitri Fedoruk) Date: Thu, 27 Dec 2007 02:20:42 +0300 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 In-Reply-To: <4772AFBE.8020801@behnel.de> References: <4772AFBE.8020801@behnel.de> Message-ID: Hello, Thank you fro your reply! > For a quick shot: could you try switching to lxml 2.0alpha6 to see if the > problem persists? Hm, switching turned out not to be a pice of cake - in fact, some transformations do not run at all. 1) There is a template that was processed with lxml 1.3.x with no problem, but it can not be processed in 2.0: "Entity 'mdash' not defined, line 67, column 43". All entities are defined in an included file. There are two similar templates that are processed with no problem both in 1.3.x and 2.0 2) I get something like this: function takes at most 1 positional arguments (2 given) I'll try to figure it out in the morning. I had a hope that downgrading to lxml 1.3.3[4] coudl solve my problem - it did not, at least with libxml2 2.6.30. I'll try with another libxml2 version tomorrow. Dmitri From stefan_ml at behnel.de Thu Dec 27 09:43:40 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 27 Dec 2007 09:43:40 +0100 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 In-Reply-To: References: <4772AFBE.8020801@behnel.de> Message-ID: <477365BC.1070804@behnel.de> Hi, Dmitri Fedoruk wrote: >> For a quick shot: could you try switching to lxml 2.0alpha6 to see if the >> problem persists? > Hm, switching turned out not to be a pice of cake - in fact, some > transformations do not run at all. > 1) There is a template that was processed with lxml 1.3.x with no > problem, but it can not be processed in 2.0: "Entity 'mdash' not > defined, line 67, column 43". All entities are defined in an included > file. There are two similar templates that are processed with no > problem both in 1.3.x and 2.0 There were changes regarding entities in 2.0. They are now supported as real Element classes rather than requiring the parser to resolve them (which could cause problems in 1.3 if they were not resolved). This change is up to the parser configuration and therefore shouldn't normally touch old code. But as I do not know what exactly you are doing, I can't guess what the impact is in your case. > 2) I get something like this: > function takes at most 1 positional arguments (2 given) This comes from one of the API changes for requiring keyword-only arguments for optional parameters. You should be able to use keyword arguments here in both 2.0 and 1.3. Stefan From jholg at gmx.de Thu Dec 27 09:48:33 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Thu, 27 Dec 2007 09:48:33 +0100 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 In-Reply-To: References: <4772AFBE.8020801@behnel.de> Message-ID: <20071227084833.7690@gmx.net> Hi, > 2) I get something like this: > function takes at most 1 positional arguments (2 given) The 2.0 series has moved to keyword-only arguments in some signatures somewhere in its alpha-phase, so it might complain when you use certain positional args in legacy code. This should be fixed rather easily by using keyword arguments instead. Cheers, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail From dfedoruk at gmail.com Thu Dec 27 17:21:27 2007 From: dfedoruk at gmail.com (Dmitri Fedoruk) Date: Thu, 27 Dec 2007 19:21:27 +0300 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 In-Reply-To: <20071227084833.7690@gmx.net> References: <4772AFBE.8020801@behnel.de> <20071227084833.7690@gmx.net> Message-ID: Hello once again, I've upgdaded my code to be lxml2.0-compatible. > Entity 'hellip' not defined Parsing of the incoming data fails when I have html entities in it. Literally I have this code: xmlParser = etree.XMLParser( no_network = False, resolve_entities = False ) storedDoc = etree.parse( StringIO.StringIO(reply['data']), xmlParser ) I tried to turn resolve_entities = True, did not help either. The point is that all entities are defined in the files included in the DTD file, and I do not want to validate the data in the runtime - I have strict time limitations. It worked fine win 1.3.x without special parser, just with storedDoc = etree.parse( StringIO.StringIO(reply['data']) ) So, is there any chance to deal with entities in my incoming data without validating? > function takes at most 1 positional arguments (2 given) That was the very string that leads to problems, I had to add the 'encoding' keyword. return etree.tostring(result_tree, encoding = 'utf-8') Nevertheless the upgrade did not help. (gdb) bt #0 0x00000008011464bc in kill () from /lib/libc.so.6 #1 0x0000000800f5261e in raise () from /lib/libpthread.so.2 #2 0x000000080114534d in abort () from /lib/libc.so.6 #3 0x00000008010de265 in _UTF8_init () from /lib/libc.so.6 #4 0x00000008010de29c in _UTF8_init () from /lib/libc.so.6 #5 0x00000008010df23d in _UTF8_init () from /lib/libc.so.6 #6 0x00000008069d7a19 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #7 0x00000008069d78bd in xmlFreeProp () from /usr/local/lib/libxml2.so.5 #8 0x00000008069d78fc in xmlFreePropList () from /usr/local/lib/libxml2.so.5 #9 0x00000008069d7adb in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #10 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #11 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #12 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #13 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #14 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #15 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #16 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #17 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #18 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #19 0x00000008069d80f5 in xmlFreeDoc () from /usr/local/lib/libxml2.so.5 #20 0x000000080656d589 in __pyx_tp_dealloc_4lxml_5etree__Document () from /usr/local/lib/python2.5/site-packages/lxml-2.0alpha6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #21 0x000000080658f48b in __pyx_tp_dealloc_4lxml_5etree__Element () from /usr/local/lib/python2.5/site-packages/lxml-2.0alpha6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #22 0x000000080656eacf in __pyx_tp_dealloc_4lxml_5etree__ElementTree () from /usr/local/lib/python2.5/site-packages/lxml-2.0alpha6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #23 0x0000000804bb7b4b in _PyFloat_Unpack8 () from /usr/local/lib/libpython2.5.so As I have already said, this happens with only several given stylesheets. May this be the data\stylesheet problem? Cheers, Dmitri From peter at cornerstonenw.com Thu Dec 27 20:29:27 2007 From: peter at cornerstonenw.com (Peter Rust) Date: Thu, 27 Dec 2007 11:29:27 -0800 Subject: [lxml-dev] lxml homepage broken Message-ID: <003e01c848be$cc461430$64d23c90$@com> FYI: The "default document" of the lxml site appears to be broken. I can get to the page directly (http://codespeak.net/lxml/index.html), but the codespeak homepage link (and most external links) point to http://codespeak.net/lxml, which doesn't work. Peter Rust Developer, Cornerstone Systems PS: I started using lxml.etree yesterday as a drop-in replacement for elementtree and have been very pleased with the way it handles comments and doctypes, thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20071227/360ef2a5/attachment.htm From jlovell at esd189.org Thu Dec 27 20:36:54 2007 From: jlovell at esd189.org (John Lovell) Date: Thu, 27 Dec 2007 11:36:54 -0800 Subject: [lxml-dev] lxml homepage broken In-Reply-To: <003e01c848be$cc461430$64d23c90$@com> References: <003e01c848be$cc461430$64d23c90$@com> Message-ID: <3A49C88789256B4AB33AC603DB6AF49B08E14B@ZIRIA.esd189.org> Everyone: I have noticed that it works correctly with IE but not FireFox. It has been this way for quite a while. John W. Lovell Web Applications Engineer Northwest Educational Service District 1601 R Avenue Anacortes, WA 98221 (360) 299-4086 jlovell at esd189.org www.esd189.org Together We Can ... ________________________________ From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Peter Rust Sent: Thursday, December 27, 2007 11:29 AM To: lxml-dev at codespeak.net Subject: [lxml-dev] lxml homepage broken FYI: The "default document" of the lxml site appears to be broken. I can get to the page directly (http://codespeak.net/lxml/index.html), but the codespeak homepage link (and most external links) point to http://codespeak.net/lxml, which doesn't work. Peter Rust Developer, Cornerstone Systems PS: I started using lxml.etree yesterday as a drop-in replacement for elementtree and have been very pleased with the way it handles comments and doctypes, thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20071227/01d7da89/attachment-0001.htm From stefan_ml at behnel.de Fri Dec 28 09:28:55 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 28 Dec 2007 09:28:55 +0100 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 In-Reply-To: References: <4772AFBE.8020801@behnel.de> <20071227084833.7690@gmx.net> Message-ID: <4774B3C7.4030602@behnel.de> Hi, Dmitri Fedoruk wrote: > I've upgdaded my code to be lxml2.0-compatible. Cool. Hope it wasn't too hard. >> Entity 'hellip' not defined > Parsing of the incoming data fails when I have html entities in it. > > Literally I have this code: > > xmlParser = etree.XMLParser( no_network = False, resolve_entities = False ) > storedDoc = etree.parse( StringIO.StringIO(reply['data']), xmlParser ) > > I tried to turn resolve_entities = True, did not help either. The > point is that all entities are defined in the files included in the > DTD file, and I do not want to validate the data in the runtime - I > have strict time limitations. You can load the DTD without triggering validation by passing "load_dtd = True". I never tested the performance impact, though. The XML parser needs to read the DTD to learn about the entities (that's how it works). If you are dealing with HTML, you can also try the HTMLParser() - it's not only good for fixing HTML, it also knows a lot of HTML specifics. > As I have already said, this happens with only several given > stylesheets. May this be the data\stylesheet problem? Not sure what you mean here. Can you figure out what is different in the stylesheets that fail? Something like "only they call document() to read from other XML files" or "only they (or all of them) use stylesheet-local data" or "they were created at a different place in the code". As a quick fix, did you try changing the mod_python config as proposed in the FAQ? http://codespeak.net/lxml/dev/FAQ.html#my-program-crashes-when-run-with-mod-python-pyro-zope-plone Again, no idea about the performance impact here. Stefan From stefan_ml at behnel.de Fri Dec 28 09:37:19 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 28 Dec 2007 09:37:19 +0100 Subject: [lxml-dev] lxml homepage broken In-Reply-To: <3A49C88789256B4AB33AC603DB6AF49B08E14B@ZIRIA.esd189.org> References: <003e01c848be$cc461430$64d23c90$@com> <3A49C88789256B4AB33AC603DB6AF49B08E14B@ZIRIA.esd189.org> Message-ID: <4774B5BF.5000409@behnel.de> [top-posting fixed] > Peter Rust wrote: >> FYI: The "default document" of the lxml site appears to be broken. I can >> get to the page directly (http://codespeak.net/lxml/index.html), but the >> codespeak homepage link (and most external links) point to >> http://codespeak.net/lxml, which doesn't work. John Lovell wrote: > I have noticed that it works correctly with IE but not FireFox. Didn't try IE for a while, but Firefox works nicely for me. It redirects me to http://codespeak.net/lxml/ though (mind the slash at the end). Is that any different from your experience? > It has been this way for quite a while. We've had a couple of reports once in a while that there were problems with accessing the front page. I could never reproduce any of them. Philipp, could you please check if there's anything wrong with the web-server configuration? Thanks, Stefan From jlovell at esd189.org Fri Dec 28 17:18:27 2007 From: jlovell at esd189.org (John Lovell) Date: Fri, 28 Dec 2007 08:18:27 -0800 Subject: [lxml-dev] lxml homepage broken In-Reply-To: <4774B5BF.5000409@behnel.de> References: <003e01c848be$cc461430$64d23c90$@com> <3A49C88789256B4AB33AC603DB6AF49B08E14B@ZIRIA.esd189.org> <4774B5BF.5000409@behnel.de> Message-ID: <3A49C88789256B4AB33AC603DB6AF49B08E14F@ZIRIA.esd189.org> Weird, it is now working in both browsers. I did double check this before posting yesterday. When I started working with lxml in early September this drove me nuts. Sincerely, John W. Lovell Web Applications Engineer Northwest Educational Service District 1601 R Avenue Anacortes, WA 98221 (360) 299-4086 jlovell at esd189.org www.esd189.org Together We Can ... -----Original Message----- From: Stefan Behnel [mailto:stefan_ml at behnel.de] Sent: Friday, December 28, 2007 12:37 AM To: John Lovell; Philipp von Weitershausen Cc: lxml-dev at codespeak.net Subject: Re: [lxml-dev] lxml homepage broken [top-posting fixed] > Peter Rust wrote: >> FYI: The "default document" of the lxml site appears to be broken. I >> can get to the page directly (http://codespeak.net/lxml/index.html), >> but the codespeak homepage link (and most external links) point to >> http://codespeak.net/lxml, which doesn't work. John Lovell wrote: > I have noticed that it works correctly with IE but not FireFox. Didn't try IE for a while, but Firefox works nicely for me. It redirects me to http://codespeak.net/lxml/ though (mind the slash at the end). Is that any different from your experience? > It has been this way for quite a while. We've had a couple of reports once in a while that there were problems with accessing the front page. I could never reproduce any of them. Philipp, could you please check if there's anything wrong with the web-server configuration? Thanks, Stefan From dfedoruk at gmail.com Sat Dec 29 17:48:36 2007 From: dfedoruk at gmail.com (Dmitri Fedoruk) Date: Sat, 29 Dec 2007 19:48:36 +0300 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 In-Reply-To: <4772AFBE.8020801@behnel.de> References: <4772AFBE.8020801@behnel.de> Message-ID: Hi once again, So, just to me more precise - iit is truly a deallocation problem of libxml2 inside of Apache. here is the code with debugging traces: ... result = '' try: result_tree = transform(data, **variables ) logging.debug('try') if isText: result = str(result_tree) else: result = etree.tostring(result_tree, encoding = 'utf-8') logging.debug('passed') except Exception, exc: inLogger.error( exc.__str__() ) inLogger.error( "xslt error" ) return "" logging.debug('fake object') result_tree = '1' # as well as None, etree.Element(), etc logging.debug('exiting') return result Here is the log of the normal run: Sat, 29 Dec 2007 19:42:38 DEBUG try Sat, 29 Dec 2007 19:42:38 DEBUG passed Sat, 29 Dec 2007 19:42:38 DEBUG fake object Sat, 29 Dec 2007 19:42:38 DEBUG exiting and here is the log of the crashing call: Sat, 29 Dec 2007 19:42:37 DEBUG try Sat, 29 Dec 2007 19:42:37 DEBUG passed Sat, 29 Dec 2007 19:42:37 DEBUG fake object httpd in free(): error: modified (chunk-) pointer So, it happens when I try to replace my result_tree value Is it worth of reporting this crash to libxml2 / apache mailing lists, what would you say? Cheers, Dmitri From stefan_ml at behnel.de Sat Dec 29 19:51:17 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 29 Dec 2007 19:51:17 +0100 Subject: [lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64 In-Reply-To: References: <4772AFBE.8020801@behnel.de> Message-ID: <47769725.9050802@behnel.de> Hi, Dmitri Fedoruk wrote: > So, just to me more precise - iit is truly a deallocation problem of > libxml2 inside of Apache. [example code stripped] > Is it worth of reporting this crash to libxml2 / apache mailing lists, > what would you say? I'm sure it's not a problem in libxml2. Since I do not have enough information, I do not know if the following explanation fits here, but I'll give it anyway. The way XSLT is implemented in lxml is a bit tricky, as libxslt makes some things hard to control that lxml uses in libxml2 for performance reasons. In particular, lxml uses a thread-local hash table for constant strings, which is much faster than a malloc() for each string that occurs in a document. However, libxslt doesn't honour this dictionary and creates its own one based on the stylesheet dictionary. The result is that the stylesheet can leak into the result document through string references that now point into the hash table of the stylesheet. There isn't a way in libxslt that would allow us to prevent this or to control the allocation. That's why I decided to restrict the execution of XSL transformations to threads that inherit the same hash table as the stylesheet, this should normally prevent any problems. As I said, this might or might not be the source of this particular problem. Threading is always hard to get right, so maybe there are constellations where the current restrictions are not enough. So far, I'm not aware of any. Redesigning the way XSLT interacts with threads is not a small change and quite risky, so I'd prefer considering that the last resort... Stefan