From ianb at colorstudy.com Fri Feb 1 02:50:52 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 31 Jan 2008 19:50:52 -0600 Subject: [lxml-dev] Compile error with trunk Message-ID: <47A27AFC.7000500@colorstudy.com> I tried rebuilding the trunk, installing cython and deleting the .c files I had around. I got this: ~/src/lxml$ python setup.py develop Building with Cython 0.9.6.11. Building lxml version 2.0.beta2-51162. running develop running egg_info writing src/lxml.egg-info/PKG-INFO writing top-level names to src/lxml.egg-info/top_level.txt writing dependency_links to src/lxml.egg-info/dependency_links.txt reading manifest template 'MANIFEST.in' warning: no files found matching 'lxml.objectify.c' under directory 'src/lxml' warning: no files found matching 'lxml.pyclasslookup.c' under directory 'src/lxml' warning: no files found matching '*.html' under directory 'doc' warning: no previously-included files found matching 'src/lxml/etree.pxi' writing manifest file 'src/lxml.egg-info/SOURCES.txt' running build_ext building 'lxml.etree' extension gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/libxml2 -I/usr/include/python2.5 -c src/lxml/lxml.etree.c -o build/temp.linux-i686-2.5/src/lxml/lxml.etree.o -w src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__BaseParser?: src/lxml/lxml.etree.c:91711: error: invalid lvalue in assignment src/lxml/lxml.etree.c:91714: error: invalid lvalue in assignment src/lxml/lxml.etree.c:91717: error: invalid lvalue in assignment src/lxml/lxml.etree.c:91720: error: invalid lvalue in assignment src/lxml/lxml.etree.c:91723: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__Document?: src/lxml/lxml.etree.c:91930: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree_DocInfo?: src/lxml/lxml.etree.c:92086: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__Element?: src/lxml/lxml.etree.c:92310: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__ElementTree?: src/lxml/lxml.etree.c:93275: error: invalid lvalue in assignment src/lxml/lxml.etree.c:93278: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__Attrib?: src/lxml/lxml.etree.c:93467: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__AttribIterator?: src/lxml/lxml.etree.c:93653: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__ElementIterator?: src/lxml/lxml.etree.c:93973: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree_ElementDepthFirstIterator?: src/lxml/lxml.etree.c:94539: error: invalid lvalue in assignment src/lxml/lxml.etree.c:94542: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree_ElementTextIterator?: src/lxml/lxml.etree.c:94708: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__BaseErrorLog?: src/lxml/lxml.etree.c:95110: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree_FallbackElementClassLookup?: src/lxml/lxml.etree.c:96712: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__ResolverRegistry?: src/lxml/lxml.etree.c:98662: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__ResolverContext?: src/lxml/lxml.etree.c:98832: error: invalid lvalue in assignment src/lxml/lxml.etree.c:98835: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__ParserDictionaryContext?: src/lxml/lxml.etree.c:99002: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__FileReaderContext?: src/lxml/lxml.etree.c:99192: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__ParserContext?: src/lxml/lxml.etree.c:99366: error: invalid lvalue in assignment src/lxml/lxml.etree.c:99369: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__SaxParserContext?: src/lxml/lxml.etree.c:99529: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__ParserSchemaValidationContext?: src/lxml/lxml.etree.c:99856: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__Validator?: src/lxml/lxml.etree.c:100012: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree_TreeBuilder?: src/lxml/lxml.etree.c:100641: error: invalid lvalue in assignment src/lxml/lxml.etree.c:100656: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__FilelikeWriter?: src/lxml/lxml.etree.c:101436: error: invalid lvalue in assignment src/lxml/lxml.etree.c:101439: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__IterparseContext?: src/lxml/lxml.etree.c:101634: error: invalid lvalue in assignment src/lxml/lxml.etree.c:101637: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__IDDict?: src/lxml/lxml.etree.c:102200: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree_XInclude?: src/lxml/lxml.etree.c:102381: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__BaseContext?: src/lxml/lxml.etree.c:102750: error: invalid lvalue in assignment src/lxml/lxml.etree.c:102771: error: invalid lvalue in assignment src/lxml/lxml.etree.c:102774: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__ElementUnicodeResult?: src/lxml/lxml.etree.c:102965: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__XPathEvaluatorBase?: src/lxml/lxml.etree.c:103314: error: invalid lvalue in assignment src/lxml/lxml.etree.c:103317: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree_XPathElementEvaluator?: src/lxml/lxml.etree.c:103487: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__XSLTResolverContext?: src/lxml/lxml.etree.c:104093: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree_XSLT?: src/lxml/lxml.etree.c:104554: error: invalid lvalue in assignment src/lxml/lxml.etree.c:104557: error: invalid lvalue in assignment src/lxml/lxml.etree.c:104560: error: invalid lvalue in assignment src/lxml/lxml.etree.c:104563: error: invalid lvalue in assignment src/lxml/lxml.etree.c: In function ?__pyx_tp_clear_4lxml_5etree__XSLTResultTree?: src/lxml/lxml.etree.c:104742: error: invalid lvalue in assignment src/lxml/lxml.etree.c:104745: error: invalid lvalue in assignment error: command 'gcc' failed with exit status 1 From stefan_ml at behnel.de Fri Feb 1 07:12:56 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Feb 2008 07:12:56 +0100 Subject: [lxml-dev] Compile error with trunk In-Reply-To: <47A27AFC.7000500@colorstudy.com> References: <47A27AFC.7000500@colorstudy.com> Message-ID: <47A2B868.6070202@behnel.de> Hi Ian, Ian Bicking wrote: > I tried rebuilding the trunk, installing cython and deleting the .c > files I had around. Note that this now requires a "make realclean", "make clean" will keep the .c files (mostly to keep people from breaking things by accident if they build without Cython). > I got this: > > ~/src/lxml$ python setup.py develop > Building with Cython 0.9.6.11. [...] > building 'lxml.etree' extension > gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O2 -Wall > -Wstrict-prototypes -fPIC -I/usr/include/libxml2 > -I/usr/include/python2.5 -c src/lxml/lxml.etree.c -o > build/temp.linux-i686-2.5/src/lxml/lxml.etree.o -w > src/lxml/lxml.etree.c: In function > ?__pyx_tp_clear_4lxml_5etree__BaseParser?: > src/lxml/lxml.etree.c:91711: error: invalid lvalue in assignment > src/lxml/lxml.etree.c:91714: error: invalid lvalue in assignment > src/lxml/lxml.etree.c:91717: error: invalid lvalue in assignment > src/lxml/lxml.etree.c:91720: error: invalid lvalue in assignment [...] There is a bugfix release called Cython 0.9.6.11b. That should fix it. BTW, I wasn't able to fix all lxml.html test cases, including the 'HTML namespace' test which is now disabled. Please take a look at it once you've fixed your setup. I also removed the element from the "html.clean" tests as it seems to be problematic. Maybe there's a way to work around that? Thanks, Stefan From cz at gocept.com Fri Feb 1 09:58:23 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Fri, 1 Feb 2008 09:58:23 +0100 Subject: [lxml-dev] requesting lxml testimonials? References: <200801311551.48243.srichter@cosmos.phy.tufts.edu> Message-ID: On 2008-01-31 21:51:48 +0100, Stephan Richter said: > > lxml takes all the pain out of XML. I couldn't agree more. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From jholg at gmx.de Fri Feb 1 10:03:42 2008 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 01 Feb 2008 10:03:42 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <47A1F7E5.1000203@behnel.de> References: <20080131154553.289150@gmx.net> <47A1F7E5.1000203@behnel.de> Message-ID: <20080201090342.15000@gmx.net> > thanks for the report. I'm actually pretty late... sorry for that. Maybe I should set up s.th. like a nightly or weekly checkout/build/test/bench someday. > Hmm, guess that's something to fix in Cython. The *LongLong() functions > are a > recent addition for safe type conversion. > > The line numbers above differ from mine, though. Could you send me the > source > code of the lines that failed here? 110149 110150 static INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { 110151 if (x == Py_True) return 1; 110152 else if (x == Py_False) return 0; 110153 else return PyObject_IsTrue(x); 110154 } 110155 110156 static INLINE PY_LONG_LONG __pyx_PyInt_AsLongLong(PyObject* x) { 110157 if (PyInt_CheckExact(x)) { 110158 return PyInt_AS_LONG(x); 110159 } 110160 else if (PyLong_CheckExact(x)) { 110161 return PyLong_AsLongLong(x); 110162 } 110163 else { 110164 PyObject* tmp = PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; 110165 PY_LONG_LONG val = __pyx_PyInt_AsLongLong(tmp); 110166 Py_DECREF(tmp); 110167 return val; 110168 } 110169 } 110170 110171 static INLINE unsigned PY_LONG_LONG __pyx_PyInt_AsUnsignedLongLong(PyObject* x) { 110172 if (PyInt_CheckExact(x)) { 110173 long val = PyInt_AS_LONG(x); 110174 if (unlikely(val < 0)) { 110175 PyErr_SetString(PyExc_TypeError, "Negative assignment to unsigned type."); 110176 return (unsigned PY_LONG_LONG)-1; 110177 } 110178 return val; 110179 } 110180 else if (PyLong_CheckExact(x)) { 110181 return PyLong_AsUnsignedLongLong(x); 110182 } 110183 else { 110184 PyObject* tmp = PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; 110185 PY_LONG_LONG val = __pyx_PyInt_AsUnsignedLongLong(tmp); 110186 Py_DECREF(tmp); 110187 return val; 110188 } 110189 } 110190 110191 > > It seems to lack catalog support. I thought about adding that test or not. > Looks like it's better to leave it out. Is that my libmxl2 that's lacking catalog support? Or do these tests try to access s.th. from the web? > I wouldn't dare to compare the numbers here, given a difference of 30 > tests > (especially not knowing which ones are missing). If you get errors, it > naturally takes (a bit) longer. Also, it seems to run much less tests, so > I > guess you either do not have ElementTree installed for the compat tests > (though I actually think that's the case for both runs), or it just takes > longer to search the (non-existing) catalogs, or ... > > If you want real numbers, you should rather run the benchmarks. Your're right, of course, I'm gonna do that. Got a little carried away as I could actually see the test count going up ever so slowly compared to my last build. Btw I noticed that the file bench.py is missing, which the Makefile tries to invoke. Is there any existing tools to compare logs of different benchmark runs? Guess I'm gonna hack up s.th. rather than check differences manually... Thanks, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail From l.oluyede at gmail.com Fri Feb 1 10:16:53 2008 From: l.oluyede at gmail.com (Lawrence Oluyede) Date: Fri, 1 Feb 2008 10:16:53 +0100 Subject: [lxml-dev] requesting lxml testimonials? In-Reply-To: References: Message-ID: <9eebf5740802010116u12656667l7384a79974b497c8@mail.gmail.com> I started using lxml heavily for an internal project involving resolving relaxng schemas and validating financial instruments and I'm really, really happy to have it in my toolbox. The great thing about it is you can do pretty much anything with an intuitive API. Heavy XPath queries, transformations, HTML parsing. It's definitely a wonder library. I really makes XML bearable as Stephan said :-) -- Lawrence, stacktrace.it - oluyede.org - neropercaso.it "It is difficult to get a man to understand something when his salary depends on not understanding it" - Upton Sinclair From stefan_ml at behnel.de Fri Feb 1 10:38:23 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Feb 2008 10:38:23 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <20080201090342.15000@gmx.net> References: <20080131154553.289150@gmx.net> <47A1F7E5.1000203@behnel.de> <20080201090342.15000@gmx.net> Message-ID: <47A2E88F.9090500@behnel.de> Hi, jholg at gmx.de wrote: >> thanks for the report. > > I'm actually pretty late... sorry for that. Maybe I should set up s.th. > like a nightly or weekly checkout/build/test/bench someday. good idea. :) > 110165 PY_LONG_LONG val = __pyx_PyInt_AsLongLong(tmp); Ok, so this line failed with src/lxml/lxml.etree.c:110165: parse error before `long' which lets me assume that PY_LONG_LONG is the usual "long long" on your machine, which apparently fails in gcc 2.95. Let me guess: your Python install was not compiled with gcc 2.95, was it? >> It seems to lack catalog support. I thought about adding that test or >> not. Looks like it's better to leave it out. > > Is that my libmxl2 that's lacking catalog support? Yes, or maybe just the catalog itself. I have the DocBook DTD locally in my catalog under /usr/share/xml/docbook/schema/dtd/ > Or do these tests try to access s.th. from the web? They shouldn't. Though, maybe, ... Guess it's just best to switch them off. :) > Btw I noticed that the file bench.py is missing, which the Makefile tries > to invoke. Ah, had forgotten about that target. Fixed. > Is there any existing tools to compare logs of different benchmark runs? > Guess I'm gonna hack up s.th. rather than check differences manually... That would be very helpful. The benchmarks just output a whole bunch of numbers, but I never got around to making anything more legible from them. Maybe we should have some kind of an "ETstone" or something, that would output a single number for ET performance. Or maybe one for parsing, one for serialising and one for the API or something in that line. Stefan From jholg at gmx.de Fri Feb 1 11:46:04 2008 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 01 Feb 2008 11:46:04 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <47A2E88F.9090500@behnel.de> References: <20080131154553.289150@gmx.net> <47A1F7E5.1000203@behnel.de> <20080201090342.15000@gmx.net> <47A2E88F.9090500@behnel.de> Message-ID: <20080201104604.14990@gmx.net> Hi Stefan, > which lets me assume that PY_LONG_LONG is the usual "long long" on your > machine, which apparently fails in gcc 2.95. > > Let me guess: your Python install was not compiled with gcc 2.95, was it? > It was compiled with 2.95: 0 pytaf at adevp02 .../lxml-2.0beta2 $ gcc -dumpversion 2.95.2 0 pytaf at adevp02 .../lxml-2.0beta2 $ /apps/pydev/hjoukl/bin/python2.4 setup.py build Building with Cython 0.9.6.11b. Building lxml version 2.0.beta2-51091. running build running build_py writing byte-compilation script '/tmp/tmp61b-gM.py' /apps/pydev/hjoukl/bin/python2.4 -O /tmp/tmp61b-gM.py removing /tmp/tmp61b-gM.py running build_ext building 'lxml.etree' extension gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I/apps/prod//include -I/apps/prod//include/libxml2 -I/apps/prod/include/libxml2 -I/apps/prod/include -I/apps/pydev/hjoukl/include/python2.4 -c src/lxml/lxml.etree.c -o build/temp.solaris-2.8-sun4u-2.4/src/lxml/lxml.etree.o -w src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsLongLong': src/lxml/lxml.etree.c:110165: parse error before `long' src/lxml/lxml.etree.c:110167: `val' undeclared (first use in this function) src/lxml/lxml.etree.c:110167: (Each undeclared identifier is reported only once src/lxml/lxml.etree.c:110167: for each function it appears in.) src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsUnsignedLongLong': src/lxml/lxml.etree.c:110185: parse error before `long' src/lxml/lxml.etree.c:110187: `val' undeclared (first use in this function) error: command 'gcc' failed with exit status 1 1 pytaf at adevp02 .../lxml-2.0beta2 $ 0 pytaf at adevp02 .../lxml-2.0beta2 $ 0 pytaf at adevp02 .../lxml-2.0beta2 $ 0 pytaf at adevp02 .../lxml-2.0beta2 $ 0 pytaf at adevp02 .../lxml-2.0beta2 $ 0 pytaf at adevp02 .../lxml-2.0beta2 $ 0 pytaf at adevp02 .../lxml-2.0beta2 $/apps/pydev/hjoukl/bin/python2.4 Python 2.4.4 (#1, Mar 6 2007, 11:22:31) [GCC 2.95.2 19991024 (release)] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> 0 pytaf at adevp02 .../lxml-2.0beta2 $ So I don't *think* it's s.th. to do with my setup, which is the same I used with the successful 2.0alpha builds (older cython version then, of course). Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal f?r Modem und ISDN: http://www.gmx.net/de/go/smartsurfer From stefan_ml at behnel.de Fri Feb 1 12:07:34 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Feb 2008 12:07:34 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <20080201104604.14990@gmx.net> References: <20080131154553.289150@gmx.net> <47A1F7E5.1000203@behnel.de> <20080201090342.15000@gmx.net> <47A2E88F.9090500@behnel.de> <20080201104604.14990@gmx.net> Message-ID: <47A2FD76.1080707@behnel.de> Hi Holger, jholg at gmx.de wrote: > src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsLongLong': > src/lxml/lxml.etree.c:110165: parse error before `long' > src/lxml/lxml.etree.c:110167: `val' undeclared (first use in this function) > src/lxml/lxml.etree.c:110167: (Each undeclared identifier is reported only once > src/lxml/lxml.etree.c:110167: for each function it appears in.) > src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsUnsignedLongLong': > src/lxml/lxml.etree.c:110185: parse error before `long' > src/lxml/lxml.etree.c:110187: `val' undeclared (first use in this function) > error: command 'gcc' failed with exit status 1 I forwarded this to the Cython list, let's see what that gives. Stefan From gilles.lenfant at gmail.com Fri Feb 1 12:09:35 2008 From: gilles.lenfant at gmail.com (Gilles Lenfant) Date: Fri, 1 Feb 2008 12:09:35 +0100 Subject: [lxml-dev] requesting lxml testimonials? In-Reply-To: <200801311551.48243.srichter@cosmos.phy.tufts.edu> References: <200801311551.48243.srichter@cosmos.phy.tufts.edu> Message-ID: Le 31 janv. 08 ? 21:51, Stephan Richter a ?crit : > On Thursday 31 January 2008, Martijn Faassen wrote: >> else wants to give a testimonial, together with permission for its >> use? > [...] > lxml takes all the pain out of XML. That's the key point of lxml. THE tool for python and XML, as well as for newbies and experienced developers. > Regards -- Gilles Lenfant gilles.lenfant at gmail.com From stefan_ml at behnel.de Fri Feb 1 14:35:35 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Feb 2008 14:35:35 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <47A2FD76.1080707@behnel.de> References: <20080131154553.289150@gmx.net> <47A1F7E5.1000203@behnel.de> <20080201090342.15000@gmx.net> <47A2E88F.9090500@behnel.de> <20080201104604.14990@gmx.net> <47A2FD76.1080707@behnel.de> Message-ID: <47A32027.1000806@behnel.de> Hi Holger, Stefan Behnel wrote: > jholg at gmx.de wrote: >> src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsLongLong': >> src/lxml/lxml.etree.c:110165: parse error before `long' >> src/lxml/lxml.etree.c:110167: `val' undeclared (first use in this function) >> src/lxml/lxml.etree.c:110167: (Each undeclared identifier is reported only once >> src/lxml/lxml.etree.c:110167: for each function it appears in.) >> src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsUnsignedLongLong': >> src/lxml/lxml.etree.c:110185: parse error before `long' >> src/lxml/lxml.etree.c:110187: `val' undeclared (first use in this function) >> error: command 'gcc' failed with exit status 1 > > I forwarded this to the Cython list, let's see what that gives. And it helped! :) http://comments.gmane.org/gmane.comp.python.cython.devel/588 Here's a fix for Cython. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: gcc295-fix.patch Type: text/x-patch Size: 1182 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080201/9607939d/attachment.bin From jholg at gmx.de Fri Feb 1 16:06:29 2008 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 01 Feb 2008 16:06:29 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <47A32027.1000806@behnel.de> References: <20080131154553.289150@gmx.net> <47A1F7E5.1000203@behnel.de> <20080201090342.15000@gmx.net> <47A2E88F.9090500@behnel.de> <20080201104604.14990@gmx.net> <47A2FD76.1080707@behnel.de> <47A32027.1000806@behnel.de> Message-ID: <20080201150629.15020@gmx.net> > > > > I forwarded this to the Cython list, let's see what that gives. > > And it helped! :) > > http://comments.gmane.org/gmane.comp.python.cython.devel/588 > > Here's a fix for Cython. > > Stefan Thanks very much, I'll try that out. You guys are lightspeed, as ever. That's one of another big point for lxml, btw: Mailing-list responsiveness by its maintainer, and experienced users. I tried to do a little bit of performance comp. This is the stuff that seems to be >20% slower for me since 2.0alpha: 0 lb54320 at adevp02 .../lxml-2.0beta2 $ /data/pydev/hjoukl/python/pysource/tools/lxml_benchcmp.py /data/tmp/pytaf/benchmarks/etree_2.0alpha.log /data/tmp/pytaf/benchmarks/etree_2.0beta2.log --tolerance 20 --loglevel MUCHSLOWER lxe: index_slice_neg (--TR T1 ): 0.02000000 <<< 0.14900000 msec/pass (+6.450000) !!! lxe: index_slice_neg (--TR T4 ): 0.00790000 <<< 0.10700000 msec/pass (+12.544304) !!! lxe: replace_children (--TC T2 ): 0.27700000 <<< 0.39790000 msec/pass (+0.436462) !!! lxe: replace_children (--TC T1 ): 0.03290000 <<< 0.04200000 msec/pass (+0.276596) !!! lxe: index_slice (--TR T3 ): 0.01100000 <<< 0.01410000 msec/pass (+0.281818) !!! lxe: replace_children (--TC T4 ): 0.03290000 <<< 0.04080000 msec/pass (+0.240122) !!! 0 lb54320 at adevp02 .../lxml-2.0beta2 $ I hacked up a little script to produce this (attached), not tested at all yet. I won't be able to check the patch until monday, unfortunately (unless I install a 2.95.2 on my linux box at home, that is ;-) Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail -------------- next part -------------- A non-text attachment was scrubbed... Name: lxml_benchcmp.py Type: application/octet-stream Size: 6233 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080201/15346b80/attachment.obj From stefan_ml at behnel.de Fri Feb 1 16:22:49 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Feb 2008 16:22:49 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <20080131154553.289150@gmx.net> References: <20080131154553.289150@gmx.net> Message-ID: <47A33949.1010904@behnel.de> Hi Holger, jholg at gmx.de wrote: > Ran 855 tests in 37.860s > > Compared to 2.0alpha (I rebuilt that also with gcc 3.4.4): > > Ran 824 tests in 2.698s > > So basically performance drops by factor >10 for me Sorry, I had completely forgotten that I switched on garbage collection between test runs somewhere during the alpha cycle. The test run you are comparing to does not explicitly run GC after each test to catch refcounting bugs. That's what makes the difference here, not lxml itself. Stefan From stefan_ml at behnel.de Fri Feb 1 16:29:18 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Feb 2008 16:29:18 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <20080201150629.15020@gmx.net> References: <20080131154553.289150@gmx.net> <47A1F7E5.1000203@behnel.de> <20080201090342.15000@gmx.net> <47A2E88F.9090500@behnel.de> <20080201104604.14990@gmx.net> <47A2FD76.1080707@behnel.de> <47A32027.1000806@behnel.de> <20080201150629.15020@gmx.net> Message-ID: <47A33ACE.7050109@behnel.de> Hi Holger, jholg at gmx.de wrote: > I tried to do a little bit of performance comp. This is the stuff that > seems to be >20% slower for me since 2.0alpha: > > 0 lb54320 at adevp02 .../lxml-2.0beta2 $ /data/pydev/hjoukl/python/pysource/tools/lxml_benchcmp.py /data/tmp/pytaf/benchmarks/etree_2.0alpha.log /data/tmp/pytaf/benchmarks/etree_2.0beta2.log --tolerance 20 --loglevel MUCHSLOWER > lxe: index_slice_neg (--TR T1 ): 0.02000000 <<< 0.14900000 msec/pass (+6.450000) !!! > lxe: index_slice_neg (--TR T4 ): 0.00790000 <<< 0.10700000 msec/pass (+12.544304) !!! > lxe: replace_children (--TC T2 ): 0.27700000 <<< 0.39790000 msec/pass (+0.436462) !!! > lxe: replace_children (--TC T1 ): 0.03290000 <<< 0.04200000 msec/pass (+0.276596) !!! > lxe: index_slice (--TR T3 ): 0.01100000 <<< 0.01410000 msec/pass (+0.281818) !!! > lxe: replace_children (--TC T4 ): 0.03290000 <<< 0.04080000 msec/pass (+0.240122) !!! > 0 lb54320 at adevp02 .../lxml-2.0beta2 $ Hmm, interesting. I'll look over that when I find the time. This is not release critical. Thanks, Stefan From stefan_ml at behnel.de Fri Feb 1 19:25:59 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Feb 2008 19:25:59 +0100 Subject: [lxml-dev] lxml 2.0 is born! Message-ID: <47A36437.8080907@behnel.de> Hi everyone, I'm very happy to announce the official release of lxml 2.0! http://codespeak.net/lxml/ http://pypi.python.org/pypi/lxml/2.0 This release marks the end of a development effort of more than 6 months, starting with the release of the last stable series lxml 1.3. The major differences are explained on this page: http://codespeak.net/lxml/lxml2.html lxml 2.0 is not a revolution, it is a gradual move towards a cleaner API with more things working together as expected. But it nevertheless comes with a lot of new tools and features, that makes your XML life easier - and even more your HTML life. There are also a couple of minor things that were deprecated, which will be removed for lxml 2.1. See the above link for details. The new release has already adopted a lot of changes from the upcoming ElementTree 1.3 library, and implements a much broader set of compatible features, such as the TreeBuilder interface for parser targets. I appended the complete changelog, but lets start with the most important things: * * * * * * * * * * * * * * * ( ) (*) (*) * | | |~| |~| * | | | | | | ,,.......,, | | ,.a@@@@| |@@@@@@@@@@@@@@@@| |@@@@a. .,a@@@@@@@@@| |@@@@@@@@@@@@@@@@| |@@@@@@@@a,. ,a@@@@@@@@@@@@| |@@@@@@.@@@@@@@@@| |@@@@@@@@@@@@a, a@@@@@@@@@@@@@@@@@@@@@' . `@@@@@@@@@@@@@@@@@@@@@@@@a ;`@@@@@@@@@@@@@@@@@@' . `@@@@@@@@@@@@@@@@@@@@@'; ;@@@`@@@@@@@@@@@@@' . `@@@@@@@@@@@@@@@@'@@@; ;@@@;,.aaaaaaaaaa . aaaaa,,aaaaaaa,;@@@; ;;@;;;;@@@@@@@@;@ @.@ ;@@@;;;@@@@@@;;;;@@; ;;;;;;;@@@@;@@;;@ @@ . @@ ;;@;;;;@@;@@@;;;;;;; ;;;;;;;;@@;;;;;;; @@ . @@ ;;;;;;;;;;;@@;;;;@;; ;;;;;;;;;;;;;;;;;@@ . @@;;;;;;;;;;;;;;;;@@@; ,%%%;;;;;;;;@;;;;;;;; . ;;;;;;;;;;;;;;;;@@;;%%%, .%%%%%%;;;;;;;@@;;;;;;;; ,%%%, ;;;;;;;;;;;;;;;;;;;;%%%%%%, .%%%%%%%;;;;;;;@@;;;;;;;; ,%%%%%%%, ;;;;;;;;;;;;;;;;;;;;%%%%%%%, %%%%%%%%`;;;;;;;;;;;;;;;; %%%%%%%%%%% ;;;;;;;;;;;;;;;;;;;'%%%%%%%% %%%%%%%%%%%%`;;;;;;;;;;;;,%%%%%%%%%%%%%,;;;;;;;;;;;;;;;'%%%%%%%%%%%% `%%%%%%%%%%%%%%%%%,,,,,,,%%%%%%%%%%%%%%%,,,,,,,%%%%%%%%%%%%%%%%%%%%' `%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%' `%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%' """"""""""""""`,,,,,,,,,'""""""""""""""""" `%%%%%%%' `%%%%%' %%% generously donated by Susie Oviatt %%%%% .,%%%%%%%,. ,%%%%%%%%%%%%%%%%%%%, Happy birthday, lxml - and may the force be with you! :) Have fun, Stefan ** ChangeLog: 2.0 (2008-02-01) ================ Features added -------------- * Passing the ``unicode`` type as ``encoding`` to ``tostring()`` will serialise to unicode. The ``tounicode()`` function is now officially deprecated. * ``XMLSchema()`` and ``RelaxNG()`` can parse from StringIO. * ``makeparser()`` function in ``lxml.objectify`` to create a new parser with the usual objectify setup. Bugs fixed ---------- Other changes ------------- 2.0beta2 (2008-01-26) ===================== Features added -------------- * Plain ASCII XPath string results are no longer forced into unicode objects as in 2.0beta1, but are returned as plain strings as before. * All XPath string results are 'smart' objects that have a ``getparent()`` method to retrieve their parent Element. * ``with_tail`` option in serialiser functions. * More accurate exception messages in validator creation. Bugs fixed ---------- * Missing import in ``lxml.html.clean``. * Some Python 2.4-isms prevented lxml from building/running under Python 2.3. Other changes ------------- * Exceptions carry only the part of the error log that is related to the operation that caused the error. * ``XMLSchema()`` and ``RelaxNG()`` now enforce passing the source file/filename through the ``file`` keyword argument. * The test suite now skips most doctests under Python 2.3. * ``make clean`` no longer removes the .c files (use ``make realclean`` instead) 2.0beta1 (2008-01-11) ===================== Features added -------------- * Parse-time XML schema validation (``schema`` parser keyword). * XPath string results of the ``text()`` function and attribute selection make their Element container accessible through a ``getparent()`` method. As a side-effect, they are now always unicode objects (even ASCII strings). * ``XSLT`` objects are usable in any thread - at the cost of a deep copy if they were not created in that thread. * Invalid entity names and character references will be rejected by the ``Entity()`` factory. * ``entity.text`` returns the textual representation of the entity, e.g. ``&``. Bugs fixed ---------- * XPath on ElementTrees could crash when selecting the virtual root node of the ElementTree. * Compilation ``--without-threading`` was buggy in alpha5/6. Other changes ------------- * Minor performance tweaks for Element instantiation and subelement creation 2.0alpha6 (2007-12-19) ====================== Features added -------------- * New properties ``position`` and ``code`` on ParseError exception (as in ET 1.3) Bugs fixed ---------- * Memory leak in the ``parse()`` function. * Minor bugs in XSLT error message formatting. * Result document memory leak in target parser. Other changes ------------- * Various places in the XPath, XSLT and iteration APIs now require keyword-only arguments. * The argument order in ``element.itersiblings()`` was changed to match the order used in all other iteration methods. The second argument ('preceding') is now a keyword-only argument. * The ``getiterator()`` method on Elements and ElementTrees was reverted to return an iterator as it did in lxml 1.x. The ET API specification allows it to return either a sequence or an iterator, and it traditionally returned a sequence in ET and an iterator in lxml. However, it is now deprecated in favour of the ``iter()`` method, which should be used in new code wherever possible. * The 'pretty printed' serialisation of ElementTree objects now inserts newlines at the root level between processing instructions, comments and the root tag. * A 'pretty printed' serialisation is now terminated with a newline. * Second argument to ``lxml.etree.Extension()`` helper is no longer required, third argument is now a keyword-only argument ``ns``. * ``lxml.html.tostring`` takes an ``encoding`` argument. 2.0alpha5 (2007-11-24) ====================== Features added -------------- * Rich comparison of ``element.attrib`` proxies. * ElementTree compatible TreeBuilder class. * Use default prefixes for some common XML namespaces. * ``lxml.html.clean.Cleaner`` now allows for a ``host_whitelist``, and two overridable methods: ``allow_embedded_url(el, url)`` and the more general ``allow_element(el)``. * Extended slicing of Elements as in ``element[1:-1:2]``, both in etree and in objectify * Resolvers can now provide a ``base_url`` keyword argument when resolving a document as string data. * When using ``lxml.doctestcompare`` you can give the doctest option ``NOPARSE_MARKUP`` (like ``# doctest: +NOPARSE_MARKUP``) to suppress the special checking for one test. Bugs fixed ---------- * Target parser failed to report comments. * In the ``lxml.html`` ``iter_links`` method, links in ```` tags weren't recognized. (Note: plugin-specific link parameters still aren't recognized.) Also, the ```` tag, though not standard, is now included in ``lxml.html.defs.special_inline_tags``. * Using custom resolvers on XSLT stylesheets parsed from a string could request ill-formed URLs. * With ``lxml.doctestcompare`` if you do ```` in your output, it will then be namespace-neutral (before the ellipsis was treated as a real namespace). Other changes ------------- * The module source files were renamed to "lxml.*.pyx", such as "lxml.etree.pyx". This was changed for consistency with the way Pyrex commonly handles package imports. The main effect is that classes now know about their fully qualified class name, including the package name of their module. * Keyword-only arguments in some API functions, especially in the parsers and serialisers. 2.0alpha4 (2007-10-07) ====================== Features added -------------- Bugs fixed ---------- * AttributeError in feed parser on parse errors Other changes ------------- * Tag name validation in lxml.etree (and lxml.html) now distinguishes between HTML tags and XML tags based on the parser that was used to parse or create them. HTML tags no longer reject any non-ASCII characters in tag names but only spaces and the special characters ``<>&/"'``. 2.0alpha3 (2007-09-26) ====================== Features added -------------- * Separate ``feed_error_log`` property for the feed parser interface. The normal parser interface and ``iterparse`` continue to use ``error_log``. * The normal parsers and the feed parser interface are now separated and can be used concurrently on the same parser instance. * ``fromstringlist()`` and ``tostringlist()`` functions as in ElementTree 1.3 * ``iterparse()`` accepts an ``html`` boolean keyword argument for parsing with the HTML parser (note that this interface may be subject to change) * Parsers accept an ``encoding`` keyword argument that overrides the encoding of the parsed documents. * New C-API function ``hasChild()`` to test for children * ``annotate()`` function in objectify can annotate with Python types and XSI types in one step. Accompanied by ``xsiannotate()`` and ``pyannotate()``. Bugs fixed ---------- * XML feed parser setup problem * Type annotation for unicode strings in ``DataElement()`` Other changes ------------- * lxml.etree now emits a warning if you use XPath with libxml2 2.6.27 (which can crash on certain XPath errors) * Type annotation in objectify now preserves the already annotated type by default to prevent loosing type information that is already there. 2.0alpha2 (2007-09-15) ====================== Features added -------------- * ``ET.write()``, ``tostring()`` and ``tounicode()`` now accept a keyword argument ``method`` that can be one of 'xml' (or None), 'html' or 'text' to serialise as XML, HTML or plain text content. * ``iterfind()`` method on Elements returns an iterator equivalent to ``findall()`` * ``itertext()`` method on Elements * Setting a QName object as value of the .text property or as an attribute will resolve its prefix in the respective context * ElementTree-like parser target interface as described in http://effbot.org/elementtree/elementtree-xmlparser.htm * ElementTree-like feed parser interface on XMLParser and HTMLParser (``feed()`` and ``close()`` methods) Bugs fixed ---------- * lxml failed to serialise namespace declarations of elements other than the root node of a tree * Race condition in XSLT where the resolver context leaked between concurrent XSLT calls Other changes ------------- * ``element.getiterator()`` returns a list, use ``element.iter()`` to retrieve an iterator (ElementTree 1.3 compatible behaviour) 2.0alpha1 (2007-09-02) ====================== Features added -------------- * Reimplemented ``objectify.E`` for better performance and improved integration with objectify. Provides extended type support based on registered PyTypes. * XSLT objects now support deep copying * New ``makeSubElement()`` C-API function that allows creating a new subelement straight with text, tail and attributes. * XPath extension functions can now access the current context node (``context.context_node``) and use a context dictionary (``context.eval_context``) from the context provided in their first parameter * HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup`` * New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified doctests based on XML/HTML output. Use by importing ``lxml.usedoctest`` or ``lxml.html.usedoctest`` from within a doctest. * New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS selectors. * New package ``lxml.html`` written by Ian Bicking for advanced HTML treatment. * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. * Schematron validation (incomplete in libxml2) * Additional ``stringify`` argument to ``objectify.PyType()`` takes a conversion function to strings to support setting text values from arbitrary types. * Entity support through an ``Entity`` factory and element classes. XML parsers now have a ``resolve_entities`` keyword argument that can be set to False to keep entities in the document. * ``column`` field on error log entries to accompany the ``line`` field * Error specific messages in XPath parsing and evaluation NOTE: for evaluation errors, you will now get an XPathEvalError instead of an XPathSyntaxError. To catch both, you can except on ``XPathError`` * The regular expression functions in XPath now support passing a node-set instead of a string * Extended type annotation in objectify: new ``xsiannotate()`` function * EXSLT RegExp support in standard XPath (not only XSLT) Bugs fixed ---------- * lxml.etree did not check tag/attribute names * The XML parser did not report undefined entities as error * The text in exceptions raised by XML parsers, validators and XPath evaluators now reports the first error that occurred instead of the last * Passing '' as XPath namespace prefix did not raise an error * Thread safety in XPath evaluators Other changes ------------- * objectify.PyType for None is now called "NoneType" * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 - original name is still available as alias * In the public C-API, ``findOrBuildNodeNs()`` was replaced by the more generic ``findOrBuildNodeNsPrefix`` * Major refactoring in XPath/XSLT extension function code * Network access in parsers disabled by default From chairos at gmail.com Sat Feb 2 07:12:52 2008 From: chairos at gmail.com (Jon Rosebaugh) Date: Sat, 2 Feb 2008 00:12:52 -0600 Subject: [lxml-dev] Segfault and bus error when importing lxml.html.clean after importing webbrowser Message-ID: I was trying to use lxml.html.clean to sanitize comments in my blog. Unfortunately, although I can import and use it in a standalone console session, it fails within the webapp. Sometimes it segfaults, and sometimes it's a bus error instead. After going through all the imports to see what _they_ imported, I finally tracked down a minimal example that can cause the problem: import webbrowser import lxml.html.clean If I reverse the order of imports, everything works fine, so for the moment I've worked around it by making sure that lxml.html.clean is imported the very first thing. I have lxml compiled from the 2.0 tgz from the site, libxml2 2.6.31 and libxslt 1.1.22 installed via macports (both the latest versions macports has), Cython 0.9.6.11 installed, and I'm using Python 2.5.1 as downloaded from python.org for OS X. Here's the crash log Mac OS X provides: Process: Python [82702] Path: /Library/Frameworks/Python.framework/Versions/2.5/Resources/Python.app/Contents/MacOS/Python Identifier: Python Version: ??? (???) Code Type: X86 (Native) Parent Process: bash [188] Date/Time: 2008-02-02 00:05:49.472 -0600 OS Version: Mac OS X 10.5.1 (9B18) Report Version: 6 Exception Type: EXC_BAD_ACCESS (SIGBUS) Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000005 Crashed Thread: 0 Thread 0 Crashed: 0 libxml2.2.dylib 0x91cda419 xmlDictLookup + 360 1 libxml2.2.dylib 0x00d78728 xmlXPathCompExprAdd + 280 2 libxml2.2.dylib 0x00d84e0c xmlXPathCompStep + 1004 3 libxml2.2.dylib 0x00d85812 xmlXPathCompRelativeLocationPath + 98 4 libxml2.2.dylib 0x00d86235 xmlXPathCompPathExpr + 1973 5 libxml2.2.dylib 0x00d86c35 xmlXPathCompUnaryExpr + 213 6 libxml2.2.dylib 0x00d86e3f xmlXPathCompMultiplicativeExpr + 15 7 libxml2.2.dylib 0x00d8702f xmlXPathCompAdditiveExpr + 15 8 libxml2.2.dylib 0x00d8717f xmlXPathCompRelationalExpr + 15 9 libxml2.2.dylib 0x00d8732f xmlXPathCompEqualityExpr + 15 10 libxml2.2.dylib 0x00d8748f xmlXPathCompAndExpr + 15 11 libxml2.2.dylib 0x00d87602 xmlXPathCompileExpr + 18 12 libxml2.2.dylib 0x00d868ec xmlXPathCompPathExpr + 3692 13 libxml2.2.dylib 0x00d86c35 xmlXPathCompUnaryExpr + 213 14 libxml2.2.dylib 0x00d86e3f xmlXPathCompMultiplicativeExpr + 15 15 libxml2.2.dylib 0x00d8702f xmlXPathCompAdditiveExpr + 15 16 libxml2.2.dylib 0x00d8717f xmlXPathCompRelationalExpr + 15 17 libxml2.2.dylib 0x00d8732f xmlXPathCompEqualityExpr + 15 18 libxml2.2.dylib 0x00d8748f xmlXPathCompAndExpr + 15 19 libxml2.2.dylib 0x00d87602 xmlXPathCompileExpr + 18 20 libxml2.2.dylib 0x00d87872 xmlXPathCompPredicate + 194 21 libxml2.2.dylib 0x00d84d4e xmlXPathCompStep + 814 22 libxml2.2.dylib 0x00d85812 xmlXPathCompRelativeLocationPath + 98 23 libxml2.2.dylib 0x00d86235 xmlXPathCompPathExpr + 1973 24 libxml2.2.dylib 0x00d86c35 xmlXPathCompUnaryExpr + 213 25 libxml2.2.dylib 0x00d86e3f xmlXPathCompMultiplicativeExpr + 15 26 libxml2.2.dylib 0x00d8702f xmlXPathCompAdditiveExpr + 15 27 libxml2.2.dylib 0x00d8717f xmlXPathCompRelationalExpr + 15 28 libxml2.2.dylib 0x00d8732f xmlXPathCompEqualityExpr + 15 29 libxml2.2.dylib 0x00d8748f xmlXPathCompAndExpr + 15 30 libxml2.2.dylib 0x00d87602 xmlXPathCompileExpr + 18 31 libxml2.2.dylib 0x00d8c6ca xmlXPathCtxtCompile + 90 32 etree.so 0x00bdc1d7 __pyx_pf_4lxml_5etree_5XPath___init__ + 551 (lxml.etree.c:79217) 33 org.python.python 0x00448981 type_call + 166 (typeobject.c:436) 34 org.python.python 0x003f9278 PyObject_Call + 45 (abstract.c:1860) 35 org.python.python 0x00480851 PyEval_EvalFrameEx + 9242 (ceval.c:3775) 36 org.python.python 0x00484cdc PyEval_EvalCodeEx + 1819 (ceval.c:2831) 37 org.python.python 0x00484e90 PyEval_EvalCode + 87 (ceval.c:500) 38 org.python.python 0x0049bcbe PyImport_ExecCodeModuleEx + 193 (import.c:669) 39 org.python.python 0x0049c114 load_source_module + 726 (import.c:953) 40 org.python.python 0x0049cd9b import_submodule + 293 (import.c:2394) 41 org.python.python 0x0049cfe9 load_next + 195 (import.c:2214) 42 org.python.python 0x0049d4dd import_module_level + 213 (import.c:2002) 43 org.python.python 0x0049d98d PyImport_ImportModuleLevel + 45 (import.c:2066) 44 org.python.python 0x00478917 builtin___import__ + 156 (bltinmodule.c:49) 45 org.python.python 0x003f9278 PyObject_Call + 45 (abstract.c:1860) 46 org.python.python 0x0047d5b2 PyEval_CallObjectWithKeywords + 112 (ceval.c:3433) 47 org.python.python 0x00481464 PyEval_EvalFrameEx + 12333 (ceval.c:2063) 48 org.python.python 0x00484cdc PyEval_EvalCodeEx + 1819 (ceval.c:2831) 49 org.python.python 0x00484e90 PyEval_EvalCode + 87 (ceval.c:500) 50 org.python.python 0x004a7cf2 PyRun_InteractiveOneFlags + 460 (pythonrun.c:1271) 51 org.python.python 0x004a7f2e PyRun_InteractiveLoopFlags + 85 (pythonrun.c:723) 52 org.python.python 0x004a86da PyRun_AnyFileExFlags + 155 (pythonrun.c:690) 53 org.python.python 0x004b5bb6 Py_Main + 3077 (main.c:523) 54 org.python.python 0x00001f8e 0x1000 + 3982 55 org.python.python 0x00001eb5 0x1000 + 3765 Thread 0 crashed with X86 Thread State (32-bit): eax: 0x009d0bd4 ebx: 0x91cda2c2 ecx: 0x00000004 edx: 0x0000e3d0 edi: 0x00000005 esi: 0x00000005 ebp: 0xbfffd428 esp: 0xbfffd3f0 ss: 0x0000001f efl: 0x00010202 eip: 0x91cda419 cs: 0x00000017 ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037 cr2: 0x00000005 Binary Images: 0x1000 - 0x1fff +org.python.python 2.5a0 (2.5alpha0) /Library/Frameworks/Python.framework/Versions/2.5/Resources/Python.app/Contents/MacOS/Python 0xea000 - 0xebfff +cStringIO.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/cStringIO.so 0x3f0000 - 0x4e4fc3 +org.python.python 2.5a0 (2.5) /Library/Frameworks/Python.framework/Versions/2.5/Python 0x7f4000 - 0x7f5ffb +select.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/select.so 0x900000 - 0x926fdf +readline.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/readline.so 0x93e000 - 0x96ffe7 +libncurses.5.dylib ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/libncurses.5.dylib 0x98b000 - 0x98dfff +collections.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/collections.so 0x99a000 - 0x99bfff +fcntl.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/fcntl.so 0x9e3000 - 0x9e6ffb +_struct.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/_struct.so 0x9f4000 - 0x9f5ff3 +time.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/time.so 0xb00000 - 0xb02fff +binascii.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/binascii.so 0xb0d000 - 0xb0e073 +icglue.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/icglue.so 0xb1c000 - 0xb1ffff +strop.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/strop.so 0xb2c000 - 0xb2ffff +_Res.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/_Res.so 0xb3e000 - 0xb44fff +_File.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/_File.so 0xb63000 - 0xb64ffd +MacOS.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/MacOS.so 0xbb5000 - 0xc67ff9 +etree.so ??? (???) <9f2b581810d292f482356b73a83969f9> /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml-2.0-py2.5-macosx-10.3-fat.egg/lxml/etree.so 0xcdd000 - 0xd09fff +libxslt.1.dylib ??? (???) /opt/local/lib/libxslt.1.dylib 0xd13000 - 0xd1ffff +libexslt.0.dylib ??? (???) /opt/local/lib/libexslt.0.dylib 0xd25000 - 0xe27fef +libxml2.2.dylib ??? (???) /opt/local/lib/libxml2.2.dylib 0xe59000 - 0xe69ffd +libz.1.dylib ??? (???) /opt/local/lib/libz.1.dylib 0xe6e000 - 0xf65ff0 +libiconv.2.dylib ??? (???) /opt/local/lib/libiconv.2.dylib 0xfd2000 - 0xfd4fff +operator.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/operator.so 0xfdf000 - 0xfe2fff +itertools.so ??? (???) /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload/itertools.so 0x8fe00000 - 0x8fe2d883 dyld 95.3 (???) <81592e798780564b5d46b988f7ee1a6a> /usr/lib/dyld 0x9029d000 - 0x902f6fff libGLU.dylib ??? (???) /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLU.dylib 0x902f7000 - 0x906b5fea libLAPACK.dylib ??? (???) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib 0x906b6000 - 0x907fbff7 com.apple.ImageIO.framework 2.0.0 (2.0.0) <154d4d8cda2bd99518cbabc9f2d69833> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ImageIO.framework/Versions/A/ImageIO 0x907fc000 - 0x9080affd libz.1.dylib ??? (???) <5ddd8539ae2ebfd8e7cc1c57525385c7> /usr/lib/libz.1.dylib 0x908b7000 - 0x908b7fff com.apple.Carbon 136 (136) <98a5e3bc0c4fa44bbb09713bb88707fe> /System/Library/Frameworks/Carbon.framework/Versions/A/Carbon 0x908c8000 - 0x908ccfff libGIF.dylib ??? (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ImageIO.framework/Versions/A/Resources/libGIF.dylib 0x908cd000 - 0x90b46fe7 com.apple.Foundation 6.5.1 (677.1) <85ac18c7cd454378db6122bea0c00965> /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation 0x90b47000 - 0x90e20fe7 com.apple.CoreServices.CarbonCore 783 (783) <8370e664eeb25edc98d5c1f5405b06ae> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore 0x90e21000 - 0x90e21ffd com.apple.Accelerate.vecLib 3.4 (vecLib 3.4) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib 0x90fde000 - 0x9101dfef libTIFF.dylib ??? (???) <6d0f80e9d4d81f3f64c876aca005bd53> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ImageIO.framework/Versions/A/Resources/libTIFF.dylib 0x9101e000 - 0x910aaff7 com.apple.LaunchServices 286 (286) <72b15e7a01e42d510f0339e90113d5d6> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/LaunchServices.framework/Versions/A/LaunchServices 0x910ab000 - 0x910caffa libJPEG.dylib ??? (???) <0cfb80109d624beb9ceb3c43b6c5ec10> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ImageIO.framework/Versions/A/Resources/libJPEG.dylib 0x910cb000 - 0x910e1fe7 com.apple.CoreVideo 1.5.0 (1.5.0) <7e010557527a0e6d49147c297d16850a> /System/Library/Frameworks/CoreVideo.framework/Versions/A/CoreVideo 0x910e2000 - 0x911c1fff libobjc.A.dylib ??? (???) <5eda47fec2d0e7853b3506aa1fd2dafa> /usr/lib/libobjc.A.dylib 0x911eb000 - 0x91345fe3 libSystem.B.dylib ??? (???) <8ecc83dc0399be3946f7a46e88cf4bbb> /usr/lib/libSystem.B.dylib 0x91346000 - 0x91350feb com.apple.audio.SoundManager 3.9.2 (3.9.2) <0f2ba6e891d3761212cf5a5e6134d683> /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/CarbonSound.framework/Versions/A/CarbonSound 0x91c34000 - 0x91c3afff com.apple.print.framework.Print 218 (220) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/Print.framework/Versions/A/Print 0x91c3b000 - 0x91c51fff com.apple.DictionaryServices 1.0.0 (1.0.0) /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/DictionaryServices.framework/Versions/A/DictionaryServices 0x91c52000 - 0x91c6dffb libPng.dylib ??? (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ImageIO.framework/Versions/A/Resources/libPng.dylib 0x91c78000 - 0x91c78ffb com.apple.installserver.framework 1.0 (8) /System/Library/PrivateFrameworks/InstallServer.framework/Versions/A/InstallServer 0x91cd8000 - 0x91db9ff7 libxml2.2.dylib ??? (???) <450ec38b57fb46013847cce851001a2f> /usr/lib/libxml2.2.dylib 0x91e5c000 - 0x91f0cfff edu.mit.Kerberos 6.0.11 (6.0.11) <33c25789baedcd70a7e24881775dd9ad> /System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos 0x92527000 - 0x92533ff5 libGL.dylib ??? (???) /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGL.dylib 0x9253c000 - 0x925c6fff com.apple.framework.IOKit 1.5.1 (???) <5176a7383151a19c962334009fef2c6d> /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit 0x9260a000 - 0x926b1fff com.apple.QD 3.11.50 (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/QD.framework/Versions/A/QD 0x926b2000 - 0x92768fe3 com.apple.CoreServices.OSServices 210.2 (210.2) <4ed69f07fc0f211ab32d1ee96e281fc2> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/OSServices.framework/Versions/A/OSServices 0x92a21000 - 0x92a21ff8 com.apple.ApplicationServices 34 (34) <8f910fa65f01d401ad8d04cc933cf887> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/ApplicationServices 0x939aa000 - 0x93a75fff com.apple.ColorSync 4.5.0 (4.5.0) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ColorSync.framework/Versions/A/ColorSync 0x93a7c000 - 0x93abefef com.apple.NavigationServices 3.5.1 (161) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/NavigationServices.framework/Versions/A/NavigationServices 0x93c16000 - 0x93cc5fff com.apple.DesktopServices 1.4.3 (1.4.3) <66d5ed56111c43d234e235d365d02469> /System/Library/PrivateFrameworks/DesktopServicesPriv.framework/Versions/A/DesktopServicesPriv 0x93cc6000 - 0x93cf0fef libauto.dylib ??? (???) /usr/lib/libauto.dylib 0x93d61000 - 0x94067fff com.apple.HIToolbox 1.5.0 (???) <1b872a7151ee3f80c9c736a3e46d00d9> /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox 0x94080000 - 0x94147ff2 com.apple.vImage 3.0 (3.0) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vImage.framework/Versions/A/vImage 0x94196000 - 0x941a6ffc com.apple.LangAnalysis 1.6.4 (1.6.4) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/LangAnalysis.framework/Versions/A/LangAnalysis 0x941a7000 - 0x94226ff5 com.apple.SearchKit 1.2.0 (1.2.0) <277b460da86bc222785159fe77e2e2ed> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/SearchKit.framework/Versions/A/SearchKit 0x94227000 - 0x942d9ffb libcrypto.0.9.7.dylib ??? (???) <330b0e48e67faffc8c22dfc069ca7a47> /usr/lib/libcrypto.0.9.7.dylib 0x942da000 - 0x942daffa com.apple.CoreServices 32 (32) <2fcc8f3bd5bbfc000b476cad8e6a3dd2> /System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices 0x942db000 - 0x94352fe3 com.apple.CFNetwork 220 (221) <972a41911805859205b057a6f5b91e8d> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CFNetwork.framework/Versions/A/CFNetwork 0x943f2000 - 0x943f7fff com.apple.CommonPanels 1.2.4 (85) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/CommonPanels.framework/Versions/A/CommonPanels 0x943f8000 - 0x94427fe3 com.apple.AE 402 (402) <994ba8e884aefe7bf1fc5987df099e7b> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/AE.framework/Versions/A/AE 0x94428000 - 0x944afff7 libsqlite3.0.dylib ??? (???) <273efcb717e89c21207c851d7d33fda4> /usr/lib/libsqlite3.0.dylib 0x944b0000 - 0x944ddfeb libvDSP.dylib ??? (???) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvDSP.dylib 0x944de000 - 0x944dfffc libffi.dylib ??? (???) /usr/lib/libffi.dylib 0x9455e000 - 0x9456efff com.apple.speech.synthesis.framework 3.6.59 (3.6.59) <4ffef145fad3d4d787e0c33eab26b336> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/SpeechSynthesis.framework/Versions/A/SpeechSynthesis 0x9456f000 - 0x9458dfff libresolv.9.dylib ??? (???) <54e6a08c2f108bdf5916fb483d51961b> /usr/lib/libresolv.9.dylib 0x945c8000 - 0x945dcff3 com.apple.ImageCapture 4.0 (5.0.0) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/ImageCapture.framework/Versions/A/ImageCapture 0x945dd000 - 0x949edfef libBLAS.dylib ??? (???) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 0x949ee000 - 0x94a4aff7 com.apple.htmlrendering 68 (1.1.3) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HTMLRendering.framework/Versions/A/HTMLRendering 0x94a51000 - 0x94f1dffe libGLProgrammability.dylib ??? (???) /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLProgrammability.dylib 0x94f1f000 - 0x94f79ff7 com.apple.CoreText 2.0.0 (???) <7fa39cd5bc847615ec02e7c7a37c0508> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreText.framework/Versions/A/CoreText 0x94f7a000 - 0x95143fef com.apple.security 5.0.1 (32736) <8c9eda0fcc1d8a571543025ac900715f> /System/Library/Frameworks/Security.framework/Versions/A/Security 0x95156000 - 0x957edfef com.apple.CoreGraphics 1.351.0 (???) <7a6f399039eed6dbe845c169f7d21a70> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/CoreGraphics 0x957ee000 - 0x957fbfe7 com.apple.opengl 1.5.5 (1.5.5) /System/Library/Frameworks/OpenGL.framework/Versions/A/OpenGL 0x95808000 - 0x95820fff com.apple.openscripting 1.2.6 (???) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/OpenScripting.framework/Versions/A/OpenScripting 0x95835000 - 0x958b1feb com.apple.audio.CoreAudio 3.1.0 (3.1) <70bb7c657061631491029a61babe0b26> /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio 0x958b2000 - 0x958b5fff com.apple.help 1.1 (36) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/Help.framework/Versions/A/Help 0x95ab6000 - 0x95ab8ff5 libRadiance.dylib ??? (???) <20eadb285da83df96c795c2c5fa20590> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ImageIO.framework/Versions/A/Resources/libRadiance.dylib 0x95ab9000 - 0x95abbfff com.apple.securityhi 3.0 (30817) <2b2854123fed609d1820d2779e2e0963> /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/SecurityHI.framework/Versions/A/SecurityHI 0x95b66000 - 0x95babfef com.apple.Metadata 10.5.0 (398) <4fd74fba0062c2e08ec4b1c10b40ff63> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Versions/A/Metadata 0x95bac000 - 0x95c3eff3 com.apple.ApplicationServices.ATS 3.0 (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/ATS 0x95c3f000 - 0x95c9cffb libstdc++.6.dylib ??? (???) <04b812dcec670daa8b7d2852ab14be60> /usr/lib/libstdc++.6.dylib 0x95f3d000 - 0x95f8dff7 com.apple.HIServices 1.6.0 (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/HIServices.framework/Versions/A/HIServices 0x96083000 - 0x96116fff com.apple.ink.framework 101.3 (86) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/Ink.framework/Versions/A/Ink 0x9642d000 - 0x96463fff com.apple.SystemConfiguration 1.9.0 (1.9.0) <7919d9588c3b0d556646e555b7193f1f> /System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemConfiguration 0x96483000 - 0x965bbff7 libicucore.A.dylib ??? (???) /usr/lib/libicucore.A.dylib 0x96782000 - 0x967fcff8 com.apple.print.framework.PrintCore 5.5 (245) <9441d178f4b430cf92b67bf346646693> /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/PrintCore.framework/Versions/A/PrintCore 0x96835000 - 0x9685cfff libcups.2.dylib ??? (???) <5521498e8902ddd0b15cfaa7db384e29> /usr/lib/libcups.2.dylib 0x9685d000 - 0x96897ff7 com.apple.coreui 0.1 (60) /System/Library/PrivateFrameworks/CoreUI.framework/Versions/A/CoreUI 0x96898000 - 0x9689ffe9 libgcc_s.1.dylib ??? (???) /usr/lib/libgcc_s.1.dylib 0x968a4000 - 0x968abffe libbsm.dylib ??? (???) /usr/lib/libbsm.dylib 0x968ea000 - 0x96c80ff7 com.apple.QuartzCore 1.5.1 (1.5.1) /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore 0x96c81000 - 0x96ca5fff libxslt.1.dylib ??? (???) <4933ddc7f6618743197aadc85b33b5ab> /usr/lib/libxslt.1.dylib 0x96ca6000 - 0x96caefff com.apple.DiskArbitration 2.2 (2.2) <1551b2af557fdf6f368f93e093933852> /System/Library/Frameworks/DiskArbitration.framework/Versions/A/DiskArbitration 0x96caf000 - 0x96cafffd com.apple.Accelerate 1.4 (Accelerate 1.4) /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate 0x96cb0000 - 0x96d24fef libvMisc.dylib ??? (???) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvMisc.dylib 0x96d68000 - 0x96da5ff7 libGLImage.dylib ??? (???) <202d73e6a4688fc06ff11b71910c2ce7> /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLImage.dylib 0x96dd6000 - 0x96dd7fef libmathCommon.A.dylib ??? (???) /usr/lib/system/libmathCommon.A.dylib 0x96dd8000 - 0x96f0afe7 com.apple.CoreFoundation 6.5 (476) <8bfebc0dbad6fc33bea0fa00a1b9ec37> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation 0x96f0b000 - 0x96f14fff com.apple.speech.recognition.framework 3.7.24 (3.7.24) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/SpeechRecognition.framework/Versions/A/SpeechRecognition 0xfffe8000 - 0xfffebfff libobjc.A.dylib ??? (???) /usr/lib/libobjc.A.dylib 0xffff0000 - 0xffff1780 libSystem.B.dylib ??? (???) /usr/lib/libSystem.B.dylib From stefan_ml at behnel.de Sat Feb 2 07:34:57 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Feb 2008 07:34:57 +0100 Subject: [lxml-dev] Segfault and bus error when importing lxml.html.clean after importing webbrowser In-Reply-To: References: Message-ID: <47A40F11.8070401@behnel.de> Hi, Jon Rosebaugh wrote: > I was trying to use lxml.html.clean to sanitize comments in my blog. > Unfortunately, although I can import and use it in a standalone > console session, it fails within the webapp. Sometimes it segfaults, > and sometimes it's a bus error instead. > After going through all the imports to see what _they_ imported, I > finally tracked down a minimal example that can cause the problem: > > import webbrowser > import lxml.html.clean > > If I reverse the order of imports, everything works fine, so for the > moment I've worked around it by making sure that lxml.html.clean is > imported the very first thing. > > I have lxml compiled from the 2.0 tgz from the site, libxml2 2.6.31 > and libxslt 1.1.22 installed via macports (both the latest versions > macports has), Cython 0.9.6.11 installed, and I'm using Python 2.5.1 > as downloaded from python.org for OS X. Could you please try two things: - uninstall Cython (you do not need it to build from release sources) or make sure it is at least 0.9.6.11b (with a 'b'), not only 0.9.6.11, which has bugs - set DYLD_LIBRARY_PATH as explained here: http://codespeak.net/lxml/build.html#providing-newer-library-versions-on-mac-os-x If that doesn't work, try building statically to make sure MacOS-X gets your libs right (which you definitely want for a production environment). Stefan From chairos at gmail.com Sat Feb 2 08:11:29 2008 From: chairos at gmail.com (Jon Rosebaugh) Date: Sat, 2 Feb 2008 01:11:29 -0600 Subject: [lxml-dev] Segfault and bus error when importing lxml.html.clean after importing webbrowser In-Reply-To: <47A40F11.8070401@behnel.de> References: <47A40F11.8070401@behnel.de> Message-ID: On Feb 2, 2008 12:34 AM, Stefan Behnel wrote: > Could you please try two things: > > - uninstall Cython (you do not need it to build from release sources) or make > sure it is at least 0.9.6.11b (with a 'b'), not only 0.9.6.11, which has bugs > > - set DYLD_LIBRARY_PATH as explained here: > > http://codespeak.net/lxml/build.html#providing-newer-library-versions-on-mac-os-x These did not work. > If that doesn't work, try building statically to make sure MacOS-X gets your > libs right (which you definitely want for a production environment). My production deployment will actually be on Debian Linux; this is just my development machine. That said, I tried looking at the static build directions, but they're for Windows. I did download all the source archives for the various libraries, but I believe I have to configure them in a certain way for the static build? At any rate, the directories aren't the same as in the Windows example, and I'm not sure which directories are the right ones. From stefan_ml at behnel.de Sat Feb 2 10:50:16 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Feb 2008 10:50:16 +0100 Subject: [lxml-dev] Segfault and bus error when importing lxml.html.clean after importing webbrowser In-Reply-To: References: <47A40F11.8070401@behnel.de> Message-ID: <47A43CD8.1020308@behnel.de> Hi, Jon Rosebaugh wrote: > My production deployment will actually be on Debian Linux; this is > just my development machine. That said, I tried looking at the static > build directions, but they're for Windows. I did download all the > source archives for the various libraries, but I believe I have to > configure them in a certain way for the static build? No, normally they build everything you need for both static and dynamic linking. I have no idea how MacOS-X works here, but you should already have the libraries installed on your system, maybe you just need the development packages with header files and static build libs. But if you have the source directories, something like this might work: # cd /path/to/libxml2-src # ./configure CFLAGS="whatever you use" --without-python # make # cd /path/to/libxslt-src # ./configure CFLAGS="whatever you use" --without-python \ --with-libxml-src=/path/to/libxml2-src # make and then open lxml's setup.py and set the static include dirs to /path/to/libxml2-src/include /path/to/libxslt-src/libxslt /path/to/libxslt-src/libexslt + the normal system include dirs for zlib, iconv, etc. and the static lib dirs to /path/to/libxml2-src/.libs /path/to/libxslt-src/libxslt/.libs /path/to/libxslt-src/libexslt/.libs You may have to add something like "-liconv" and "-lz" to the static cflags to compile against the other libs, or maybe you need to link those statically also. Just try it out. The compiler calls from the standard build of lxml will give you hints what else you need. Hope that helps, Stefan From chairos at gmail.com Sat Feb 2 12:31:45 2008 From: chairos at gmail.com (Jon Rosebaugh) Date: Sat, 2 Feb 2008 05:31:45 -0600 Subject: [lxml-dev] Segfault and bus error when importing lxml.html.clean after importing webbrowser In-Reply-To: <47A43CD8.1020308@behnel.de> References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> Message-ID: No luck. Apparently I need a universal build of libxml2 and libxslt in order for lxml to build, and I can't figure out how to compile those manually. I tried telling lxml to statically link against the universal builds provided by macports, but I got segfaulting again. On Sat, Feb 2, 2008 at 3:50 AM, Stefan Behnel wrote: > Hi, > > > Jon Rosebaugh wrote: > > My production deployment will actually be on Debian Linux; this is > > just my development machine. That said, I tried looking at the static > > build directions, but they're for Windows. I did download all the > > source archives for the various libraries, but I believe I have to > > configure them in a certain way for the static build? > > No, normally they build everything you need for both static and dynamic linking. > > I have no idea how MacOS-X works here, but you should already have the > libraries installed on your system, maybe you just need the development > packages with header files and static build libs. > > But if you have the source directories, something like this might work: > > # cd /path/to/libxml2-src > # ./configure CFLAGS="whatever you use" --without-python > # make > # cd /path/to/libxslt-src > # ./configure CFLAGS="whatever you use" --without-python \ > --with-libxml-src=/path/to/libxml2-src > # make > > and then open lxml's setup.py and set the static include dirs to > > /path/to/libxml2-src/include > /path/to/libxslt-src/libxslt > /path/to/libxslt-src/libexslt > + the normal system include dirs for zlib, iconv, etc. > > and the static lib dirs to > > /path/to/libxml2-src/.libs > /path/to/libxslt-src/libxslt/.libs > /path/to/libxslt-src/libexslt/.libs > > You may have to add something like "-liconv" and "-lz" to the static cflags to > compile against the other libs, or maybe you need to link those statically > also. Just try it out. The compiler calls from the standard build of lxml will > give you hints what else you need. > > Hope that helps, > Stefan > > From stefan_ml at behnel.de Sat Feb 2 12:40:50 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Feb 2008 12:40:50 +0100 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> Message-ID: <47A456C2.8020108@behnel.de> Hi, Jon Rosebaugh wrote: > No luck. Apparently I need a universal build of libxml2 and libxslt in > order for lxml to build, and I can't figure out how to compile those > manually. I tried telling lxml to statically link against the > universal builds provided by macports, but I got segfaulting again. Hmm, too bad. Ok, calling for help here. Are there any other Mac users on the list who could give hints? Stefan From ebgssth at gmail.com Sat Feb 2 14:26:51 2008 From: ebgssth at gmail.com (js) Date: Sat, 2 Feb 2008 22:26:51 +0900 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: <47A456C2.8020108@behnel.de> References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> Message-ID: Hi Stefan and Jon, I've just intalled lxml-2.0 on OS X Tiger by using MacPorts and it worked without any problem. lxml-2.0 port is not yet available so I have to patch the portfile. I already sent the patch to MacPorts team so I hope that will be available soon. You can fetch the patch from below. http://trac.macosforge.org/projects/macports/ticket/14137 Thanks. On Feb 2, 2008 8:40 PM, Stefan Behnel wrote: > Hi, > > Jon Rosebaugh wrote: > > No luck. Apparently I need a universal build of libxml2 and libxslt in > > order for lxml to build, and I can't figure out how to compile those > > manually. I tried telling lxml to statically link against the > > universal builds provided by macports, but I got segfaulting again. > > Hmm, too bad. > > Ok, calling for help here. Are there any other Mac users on the list who could > give hints? > > Stefan > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > From piet at cs.uu.nl Sat Feb 2 14:03:01 2008 From: piet at cs.uu.nl (Piet van Oostrum) Date: Sat, 2 Feb 2008 14:03:01 +0100 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: <47A456C2.8020108@behnel.de> References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> Message-ID: <18340.27141.491714.859261@cp453394-a.venlo1.lb.home.nl> >>>>> Stefan Behnel (SB) wrote: >SB> Hi, >SB> Jon Rosebaugh wrote: >>> No luck. Apparently I need a universal build of libxml2 and libxslt in >>> order for lxml to build, and I can't figure out how to compile those >>> manually. I tried telling lxml to statically link against the >>> universal builds provided by macports, but I got segfaulting again. >SB> Hmm, too bad. >SB> Ok, calling for help here. Are there any other Mac users on the >SB> list who could give hints? I have the latest versions of libxml2 and libxslt installed with macports. Macports has a possibility to specify that it should install universal versions of the libraries but I haven't done that. When I compiled lxml 2.0 it did complain about something with ppc and intel architectures (I have an Intel machine), but the resulting library did work (it did run the tests). -- Piet van Oostrum URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: piet at vanoostrum.org From chairos at gmail.com Sat Feb 2 14:41:59 2008 From: chairos at gmail.com (Jon Rosebaugh) Date: Sat, 2 Feb 2008 07:41:59 -0600 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> Message-ID: On Feb 2, 2008 7:26 AM, js wrote: > Hi Stefan and Jon, > > I've just intalled lxml-2.0 on OS X Tiger by using MacPorts and it > worked without any problem. > lxml-2.0 port is not yet available so I have to patch the portfile. > I already sent the patch to MacPorts team so I hope > that will be available soon. > You can fetch the patch from below. > http://trac.macosforge.org/projects/macports/ticket/14137 > > Thanks. > I use the framework Python from python.org rather than the python available from macports, so it's my understanding that I can't use any macports python packages. Am I wrong? From ebgssth at gmail.com Sat Feb 2 14:50:38 2008 From: ebgssth at gmail.com (js) Date: Sat, 2 Feb 2008 22:50:38 +0900 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> Message-ID: Beats me. I've never used the framework Python. Not 100% sure but I *think* MacPorts python will work even if you installed the framework python, so If you don't have to stick to, give it a try. On Feb 2, 2008 10:41 PM, Jon Rosebaugh wrote: > On Feb 2, 2008 7:26 AM, js wrote: > > Hi Stefan and Jon, > > > > I've just intalled lxml-2.0 on OS X Tiger by using MacPorts and it > > worked without any problem. > > lxml-2.0 port is not yet available so I have to patch the portfile. > > I already sent the patch to MacPorts team so I hope > > that will be available soon. > > You can fetch the patch from below. > > http://trac.macosforge.org/projects/macports/ticket/14137 > > > > Thanks. > > > > I use the framework Python from python.org rather than the python > available from macports, so it's my understanding that I can't use any > macports python packages. Am I wrong? > From etiffany at alum.mit.edu Sat Feb 2 15:19:14 2008 From: etiffany at alum.mit.edu (Eric Tiffany) Date: Sat, 02 Feb 2008 09:19:14 -0500 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: Message-ID: Sorry for the late reply. I've been using lxml 1.3.x for a while on MacOS, using the Python 2.4 installed by MacPorts (along with the libxml2 and libxslt). This configuration works well for me, but heed the DYLD_LIBRARY_PATH suggestion. You need to make sure this is set to the correct value for both building and (especially) runtime. If you build lxml using the DYLD set but don't have it set at runtime, lxml will report that it is using the correct versions of libraries, when it is actually not. I think I reported that on this list a while ago. I use lxml within a Zope/Plone installation, and also from within Eclipse. However, I have not used lxml 2.0 in this config, so YMMV. But, I would suggest getting everything from MacPorts (python, libxml, etc.). Also, note that it seems like the recent MacPorts builds only create python2.4 and python2.5 binaries -- it's up to you to either link "python" to one of these, or make sure your config settings explicitly call /usr/local/bin/python2.4 Good luck ET On 2/2/08 8:50 AM, "js" wrote: > Beats me. > I've never used the framework Python. > Not 100% sure but I *think* MacPorts python will work > even if you installed the framework python, > so If you don't have to stick to, give it a try. > > On Feb 2, 2008 10:41 PM, Jon Rosebaugh wrote: >> On Feb 2, 2008 7:26 AM, js wrote: >>> Hi Stefan and Jon, >>> >>> I've just intalled lxml-2.0 on OS X Tiger by using MacPorts and it >>> worked without any problem. >>> lxml-2.0 port is not yet available so I have to patch the portfile. >>> I already sent the patch to MacPorts team so I hope >>> that will be available soon. >>> You can fetch the patch from below. >>> http://trac.macosforge.org/projects/macports/ticket/14137 >>> >>> Thanks. >>> >> >> I use the framework Python from python.org rather than the python >> available from macports, so it's my understanding that I can't use any >> macports python packages. Am I wrong? >> > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev From chairos at gmail.com Sat Feb 2 20:19:23 2008 From: chairos at gmail.com (Jon Rosebaugh) Date: Sat, 2 Feb 2008 13:19:23 -0600 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: <47A456C2.8020108@behnel.de> References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> Message-ID: I'm beginning to think that it's not actually compiling statically. I set up a fresh install of OS X, installed Python, installed XCode, installed macports, libiconv, libxml2, zlib, libxslt, and tried building statically with the following options in the setup file and the following command: STATIC_INCLUDE_DIRS = ["/opt/local/include", "/opt/local/include/libxml2"] STATIC_LIBRARY_DIRS = ["/opt/local/lib"] STATIC_CFLAGS = ["-liconv", "-lz"] euterpe:~/lxml-2.0 jon$ export DYLD_LIBRARY_PATH=/opt/local/lib euterpe:~/lxml-2.0 jon$ python setup.py bdist_egg --static However, I noticed (a) that the gcc line still says -dynamic and (b) the file size of etree.so does not differ whether or not I build with the --static option. And I still get the same segfaults. Building lxml version 2.0. NOTE: Trying to build without Cython, pre-generated 'src/lxml/etree.c' needs to be available. running bdist_egg running egg_info writing src/lxml.egg-info/PKG-INFO writing top-level names to src/lxml.egg-info/top_level.txt writing dependency_links to src/lxml.egg-info/dependency_links.txt reading manifest file 'src/lxml.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files found matching 'doc/pyrex.txt' writing manifest file 'src/lxml.egg-info/SOURCES.txt' installing library code to build/bdist.macosx-10.3-fat/egg running install_lib running build_py creating build creating build/lib.macosx-10.3-fat-2.5 creating build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/__init__.py -> build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/_elementpath.py -> build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/builder.py -> build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/cssselect.py -> build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/doctestcompare.py -> build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/ElementInclude.py -> build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/htmlbuilder.py -> build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/sax.py -> build/lib.macosx-10.3-fat-2.5/lxml copying src/lxml/usedoctest.py -> build/lib.macosx-10.3-fat-2.5/lxml creating build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/__init__.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/_dictmixin.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/_diffcommand.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/builder.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/clean.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/defs.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/diff.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/ElementSoup.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/formfill.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/setmixin.py -> build/lib.macosx-10.3-fat-2.5/lxml/html copying src/lxml/html/usedoctest.py -> build/lib.macosx-10.3-fat-2.5/lxml/html running build_ext building 'lxml.etree' extension creating build/temp.macosx-10.3-fat-2.5 creating build/temp.macosx-10.3-fat-2.5/src creating build/temp.macosx-10.3-fat-2.5/src/lxml gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -I/opt/local/include -I/opt/local/include/libxml2 -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c src/lxml/lxml.etree.c -o build/temp.macosx-10.3-fat-2.5/src/lxml/lxml.etree.o -w -liconv -lz i686-apple-darwin8-gcc-4.0.1: -liconv: linker input file unused because linking not done i686-apple-darwin8-gcc-4.0.1: -lz: linker input file unused because linking not done powerpc-apple-darwin8-gcc-4.0.1: -liconv: linker input file unused because linking not done powerpc-apple-darwin8-gcc-4.0.1: -lz: linker input file unused because linking not done gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle -undefined dynamic_lookup build/temp.macosx-10.3-fat-2.5/src/lxml/lxml.etree.o -L/opt/local/lib -lxslt -lexslt -lxml2 -lz -lm -o build/lib.macosx-10.3-fat-2.5/lxml/etree.so building 'lxml.objectify' extension gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -I/opt/local/include -I/opt/local/include/libxml2 -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c src/lxml/lxml.objectify.c -o build/temp.macosx-10.3-fat-2.5/src/lxml/lxml.objectify.o -w -liconv -lz i686-apple-darwin8-gcc-4.0.1: -liconv: linker input file unused because linking not done i686-apple-darwin8-gcc-4.0.1: -lz: linker input file unused because linking not done powerpc-apple-darwin8-gcc-4.0.1: -liconv: linker input file unused because linking not done powerpc-apple-darwin8-gcc-4.0.1: -lz: linker input file unused because linking not done gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle -undefined dynamic_lookup build/temp.macosx-10.3-fat-2.5/src/lxml/lxml.objectify.o -L/opt/local/lib -lxslt -lexslt -lxml2 -lz -lm -o build/lib.macosx-10.3-fat-2.5/lxml/objectify.so building 'lxml.pyclasslookup' extension gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -I/opt/local/include -I/opt/local/include/libxml2 -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c src/lxml/lxml.pyclasslookup.c -o build/temp.macosx-10.3-fat-2.5/src/lxml/lxml.pyclasslookup.o -w -liconv -lz i686-apple-darwin8-gcc-4.0.1: -liconv: linker input file unused because linking not done i686-apple-darwin8-gcc-4.0.1: -lz: linker input file unused because linking not done powerpc-apple-darwin8-gcc-4.0.1: -liconv: linker input file unused because linking not done powerpc-apple-darwin8-gcc-4.0.1: -lz: linker input file unused because linking not done gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle -undefined dynamic_lookup build/temp.macosx-10.3-fat-2.5/src/lxml/lxml.pyclasslookup.o -L/opt/local/lib -lxslt -lexslt -lxml2 -lz -lm -o build/lib.macosx-10.3-fat-2.5/lxml/pyclasslookup.so creating build/bdist.macosx-10.3-fat creating build/bdist.macosx-10.3-fat/egg creating build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/__init__.py -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/_elementpath.py -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/builder.py -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/cssselect.py -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/doctestcompare.py -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/ElementInclude.py -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/etree.so -> build/bdist.macosx-10.3-fat/egg/lxml creating build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/__init__.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/_dictmixin.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/_diffcommand.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/builder.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/clean.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/defs.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/diff.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/ElementSoup.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/formfill.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/setmixin.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/html/usedoctest.py -> build/bdist.macosx-10.3-fat/egg/lxml/html copying build/lib.macosx-10.3-fat-2.5/lxml/htmlbuilder.py -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/objectify.so -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/pyclasslookup.so -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/sax.py -> build/bdist.macosx-10.3-fat/egg/lxml copying build/lib.macosx-10.3-fat-2.5/lxml/usedoctest.py -> build/bdist.macosx-10.3-fat/egg/lxml byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/__init__.py to __init__.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/_elementpath.py to _elementpath.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/builder.py to builder.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/cssselect.py to cssselect.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/doctestcompare.py to doctestcompare.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/ElementInclude.py to ElementInclude.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/__init__.py to __init__.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/_dictmixin.py to _dictmixin.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/_diffcommand.py to _diffcommand.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/builder.py to builder.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/clean.py to clean.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/defs.py to defs.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/diff.py to diff.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/ElementSoup.py to ElementSoup.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/formfill.py to formfill.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/setmixin.py to setmixin.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/html/usedoctest.py to usedoctest.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/htmlbuilder.py to htmlbuilder.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/sax.py to sax.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/usedoctest.py to usedoctest.pyc creating stub loader for lxml/etree.so creating stub loader for lxml/objectify.so creating stub loader for lxml/pyclasslookup.so byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/etree.py to etree.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/objectify.py to objectify.pyc byte-compiling build/bdist.macosx-10.3-fat/egg/lxml/pyclasslookup.py to pyclasslookup.pyc creating build/bdist.macosx-10.3-fat/egg/EGG-INFO writing src/lxml.egg-info/native_libs.txt copying src/lxml.egg-info/PKG-INFO -> build/bdist.macosx-10.3-fat/egg/EGG-INFO copying src/lxml.egg-info/SOURCES.txt -> build/bdist.macosx-10.3-fat/egg/EGG-INFO copying src/lxml.egg-info/dependency_links.txt -> build/bdist.macosx-10.3-fat/egg/EGG-INFO copying src/lxml.egg-info/native_libs.txt -> build/bdist.macosx-10.3-fat/egg/EGG-INFO copying src/lxml.egg-info/not-zip-safe -> build/bdist.macosx-10.3-fat/egg/EGG-INFO copying src/lxml.egg-info/top_level.txt -> build/bdist.macosx-10.3-fat/egg/EGG-INFO creating dist creating 'dist/lxml-2.0-py2.5-macosx-10.3-fat.egg' and adding 'build/bdist.macosx-10.3-fat/egg' to it removing 'build/bdist.macosx-10.3-fat/egg' (and everything under it) euterpe:~/lxml-2.0 jon$ ls -lh dist/ total 3616 -rw-r--r-- 1 jon jon 1M Feb 2 13:03 lxml-2.0-py2.5-macosx-10.3-fat.egg From stefan_ml at behnel.de Sun Feb 3 18:17:30 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Feb 2008 18:17:30 +0100 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> Message-ID: <47A5F72A.4000201@behnel.de> Hi, Jon Rosebaugh wrote: > I'm beginning to think that it's not actually compiling statically. I > set up a fresh install of OS X, installed Python, installed XCode, > installed macports, libiconv, libxml2, zlib, libxslt, and tried > building statically with the following options in the setup file and > the following command: > > STATIC_INCLUDE_DIRS = ["/opt/local/include", "/opt/local/include/libxml2"] > STATIC_LIBRARY_DIRS = ["/opt/local/lib"] > STATIC_CFLAGS = ["-liconv", "-lz"] > > euterpe:~/lxml-2.0 jon$ export DYLD_LIBRARY_PATH=/opt/local/lib > euterpe:~/lxml-2.0 jon$ python setup.py bdist_egg --static > > However, I noticed (a) that the gcc line still says -dynamic and (b) > the file size of etree.so does not differ whether or not I build with > the --static option. And I still get the same segfaults. Sorry for that, I just checked. "--static" will explicitly not work for other platforms than win32 (see the file setupinfo.py). I didn't write that code, so I wasn't aware how platform specific it is. Is there any chance you could figure out what you need to do for a static compile on MacOS? I.e., what binaries of the libs you need to include in the build, etc. If we can get them into setupinfo.py, I bet you wouldn't be the only happy Mac user. Otherwise, you'd have to stick with the normal build - but there's not much I can do myself to find out what's going wrong - I don't have a Mac (and when I see how difficult it seems to be to update a system library, I guess it won't become my favourite platform either...) Stefan From stefan_ml at behnel.de Sun Feb 3 21:56:37 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Feb 2008 21:56:37 +0100 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> Message-ID: <47A62A85.9090802@behnel.de> Hi, js wrote: > seg fault occured to me, too. > Repeating etree.parse causes that. > > Here's the crash.log > Command: python2.5 > Path: /opt/local/bin/python2.5 > Parent: zsh [12091] > > Version: ??? (???) > > PID: 12274 > Thread: 0 > > Exception: EXC_BAD_ACCESS (0x0001) > Codes: KERN_INVALID_ADDRESS (0x0001) at 0x6f632e6f > > Thread 0 Crashed: > 0 libxml2.2.dylib 0x01371fc0 xmlDictFree + 45 > 1 libxml2.2.dylib 0x0137200e xmlDictFree + 123 > 2 etree.so 0x010634ad > __pyx_f_4lxml_5etree_24_ParserDictionaryContext_initThreadDictRef + 63 > (lxml.etree.c:45919) > .... Have you set DYLD... to the directory where the new lib versions are installed when running this? (compile time is *not* enough) If you did, could you try passing the "--without-threading" option to setup.py and rebuild with that to see if the problem persists? Stefan From mike at it-loops.com Sun Feb 3 22:14:24 2008 From: mike at it-loops.com (Michael Guntsche) Date: Sun, 3 Feb 2008 22:14:24 +0100 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: <47A456C2.8020108@behnel.de> References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> Message-ID: <7638A5A1-E559-4035-A84A-142AECEAE27E@it-loops.com> On Feb 2, 2008, at 12:40, Stefan Behnel wrote: > Hmm, too bad. > > Ok, calling for help here. Are there any other Mac users on the > list who could > give hints? > > Stefan Here's a quick howto how I build dynamically linked lxml eggs under Macosx 10.4. Python is version 2.5 universal binary form python.org Via macports I installed libxml2 as an universal variant (+universal) and also the other needed libraries. I made sure that xml2-config from macports is found BEFORE the stock xml2-config in /usr/bin. Then I just called python setup.py bdist_egg to compile it. make test shows that 2.6.31 is used from macports but shows a bus error at the "test_inplace" test though. Apart from that I haven't noticed any problems. Kind regards, Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2417 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080203/c2f903ea/attachment.bin From stefan_ml at behnel.de Mon Feb 4 08:09:06 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 04 Feb 2008 08:09:06 +0100 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: <7638A5A1-E559-4035-A84A-142AECEAE27E@it-loops.com> References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> <7638A5A1-E559-4035-A84A-142AECEAE27E@it-loops.com> Message-ID: <47A6BA12.5030706@behnel.de> Hi, Michael Guntsche wrote: > Here's a quick howto how I build dynamically linked lxml eggs under > Macosx 10.4. > > Python is version 2.5 universal binary form python.org > Via macports I installed libxml2 as an universal variant (+universal) > and also the other needed libraries. > I made sure that xml2-config from macports is found BEFORE the stock > xml2-config in /usr/bin. > Then I just called python setup.py bdist_egg to compile it. > make test shows that 2.6.31 is used from macports but shows a bus error > at the "test_inplace" test though. Hmm, "test_inplace" doesn't tell me much, as it's just the make target for running all tests, not an individual test. However, it seems to be common for Mac users to have lxml crash, so we should definitely do something about it... Stefan From jholg at gmx.de Mon Feb 4 09:36:05 2008 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 04 Feb 2008 09:36:05 +0100 Subject: [lxml-dev] build & performance issues with 2.0beta2 In-Reply-To: <47A32027.1000806@behnel.de> References: <20080131154553.289150@gmx.net> <47A1F7E5.1000203@behnel.de> <20080201090342.15000@gmx.net> <47A2E88F.9090500@behnel.de> <20080201104604.14990@gmx.net> <47A2FD76.1080707@behnel.de> <47A32027.1000806@behnel.de> Message-ID: <20080204083605.62950@gmx.net> Hi Stefan, > >> src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsLongLong': > >> src/lxml/lxml.etree.c:110165: parse error before `long' > >> src/lxml/lxml.etree.c:110167: `val' undeclared (first use in this > function) > >> src/lxml/lxml.etree.c:110167: (Each undeclared identifier is reported > only once > >> src/lxml/lxml.etree.c:110167: for each function it appears in.) > >> src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsUnsignedLongLong': > >> src/lxml/lxml.etree.c:110185: parse error before `long' > >> src/lxml/lxml.etree.c:110187: `val' undeclared (first use in this > function) > >> error: command 'gcc' failed with exit status 1 > > > > I forwarded this to the Cython list, let's see what that gives. > > And it helped! :) > > http://comments.gmane.org/gmane.comp.python.cython.devel/588 > > Here's a fix for Cython. > > Stefan As expected, fix works smoothly! Thank you, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail From stefan_ml at behnel.de Mon Feb 4 09:45:37 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 04 Feb 2008 09:45:37 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X Message-ID: <47A6D0B1.5020600@behnel.de> Hi, it looks like many Mac users have problems with lxml on their platform. This usually involves installing up-to-date dependencies (libxml2/libxslt) in addition to the system libraries. I would like to get these problems resolved. To get a start on this, we must collect some information. We had a few reports, but I need to know in more detail what people did, what they tried, and to what avail. So here is a list of questions for Mac users. Please help us by answering them. Some instructions follow at the end. When building lxml, please move any installed Cython versions out of the way and run the build on the unpacked lxml-2.0.tar.gz release sources. It must say "trying to build without Cython" at the beginning. Please provide the following information: - what package management system (fink/macports) do you use? - are you using the stock Python or one that is installed separately? - what library versions are you using of libxml2, libxslt, zlib, libiconv? - which library versions are preinstalled on your platform? (I do not know how to find that out, can anyone provide instructions here?) - what library versions does lxml.etree find? (see below) Just in case there are people who actually have a working installation, - has anyone successfully built lxml statically against libxml2/libxslt? * does it work reliably? (see "Testing" below) * did you build with the --without-threads option? * does it work with *and* without that option? - has anyone managed to get lxml working reliably (see "Testing" below) with a dynamic build? * did you set DYLD_LIBRARY_PATH? * is DYLD_LIBRARY_PATH required for you or does it work without? * is there anything special you did to make it work? * if there are crashes, is the install unusable or are there things you can still do reliably? If lxml crashes for you, - does it work if you set DYLD_LIBRARY_PATH at runtime? (dynamic builds only) - does it work when building with the --without-threads option? - does it crash when running the normal tests? - if the tests pass fine, does it crash with the "ot-test" script below? Every reply is appreciated. You can reply in private e-mail, if you prefer, although it might be helpful to others to see your answer. Thanks in advance, Stefan Here are some instructions: * Checking library versions Please report the library versions that lxml.etree thinks it is using: http://codespeak.net/lxml/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do * Setting DYLD_LIBRARY_PATH When you run tests or an application with lxml, you can pass the environment variable DYLD_LIBRARY_PATH to your program. It needs to be set to the directory (or directories) where lxml's library dependencies are installed, i.e. libxml2 and libxslt. http://codespeak.net/lxml/build.html#providing-newer-library-versions-on-mac-os-x * Testing When I say "works reliably", I mean, without crashes. The first thing to verify that is to run "make test". If it already crashes here, there is no need to try the script below. Please report anything you can find out about the crash in this case. A debugger run and/or a stack trace might give us some useful hints here. If the normal tests pass, please try another test. Here is an XML document and a script that fires off a bunch of threads to parse the document and run multiple XSLTs over it. The archive is about 1MB. http://codespeak.net/lxml/ot-test.zip It's kind of a worst-case scenario that I use to find problems, so if you have a working installation of lxml 2.0, please run it to see if it crashes for you. Parsing may fail (usually for threads 7-9), that's fine, but it must not crash with your otherwise working setup. It normally starts up 16 threads, which should require a couple of hundred MB of memory. If you find that it runs out of memory, you can try running less threads instead by passing the number to the script at the command line. From chairos at gmail.com Mon Feb 4 09:52:46 2008 From: chairos at gmail.com (Jon Rosebaugh) Date: Mon, 4 Feb 2008 02:52:46 -0600 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X In-Reply-To: <47A6D0B1.5020600@behnel.de> References: <47A6D0B1.5020600@behnel.de> Message-ID: On Feb 4, 2008 2:45 AM, Stefan Behnel wrote: > Hi, > > it looks like many Mac users have problems with lxml on their platform. This > usually involves installing up-to-date dependencies (libxml2/libxslt) in > addition to the system libraries. I would like to get these problems resolved. For whatever it's worth, I could usually get lxml to work by itself, when it was the only thing imported in the python interpreter. The problem came when it was imported in a webapp with a bunch of other modules. I believe threading may also have been involved. I'll try to fill out a full report in the next few days, but I've actually obviated my need for lxml by porting lxml.html.clean to BeautifulSoup. It runs nicely and all the tests pass after adjusting for whitespace differences. From mike at it-loops.com Mon Feb 4 16:38:02 2008 From: mike at it-loops.com (Michael Guntsche) Date: Mon, 4 Feb 2008 16:38:02 +0100 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: <47A6BA12.5030706@behnel.de> References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> <7638A5A1-E559-4035-A84A-142AECEAE27E@it-loops.com> <47A6BA12.5030706@behnel.de> Message-ID: On Feb 4, 2008, at 8:09, Stefan Behnel wrote: > Hmm, "test_inplace" doesn't tell me much, as it's just the make > target for > running all tests, not an individual test. > > However, it seems to be common for Mac users to have lxml crash, so > we should > definitely do something about it... > I upgraded Cython to 0.9.6.11 and recompiled lxml. Now make test, no longer segfaults but I get several errors in test_elementtree.py Examples: ERROR: test_feed_parser (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/unittest.py", line 260, in run testMethod() File "/Users/maru/source/cvs+svn/lxml/src/lxml/tests/ test_elementtree.py", line 3011, in test_feed_parser parser = self.etree.XMLParser() AttributeError: 'module' object has no attribute 'XMLParser' ERROR: test_tostring_method_text (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/unittest.py", line 260, in run testMethod() File "/Users/maru/source/cvs+svn/lxml/src/lxml/tests/ test_elementtree.py", line 2420, in test_tostring_method_text tostring(a, method="text")) TypeError: tostring() got an unexpected keyword argument 'method' Other than that everything seems to work normally, at least the features I use (validation and xpath most often). /mike -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2417 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080204/c9cef774/attachment.bin From mike at it-loops.com Mon Feb 4 16:55:28 2008 From: mike at it-loops.com (Michael Guntsche) Date: Mon, 4 Feb 2008 16:55:28 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X In-Reply-To: <47A6D0B1.5020600@behnel.de> References: <47A6D0B1.5020600@behnel.de> Message-ID: <2C6DA784-C579-45F7-919B-159909D6325B@it-loops.com> On Feb 4, 2008, at 9:45, Stefan Behnel wrote: > When building lxml, please move any installed Cython versions out > of the way > and run the build on the unpacked lxml-2.0.tar.gz release sources. > It must say > "trying to build without Cython" at the beginning. > Slight difference here Cython 0.9.6.11 and lxml from current trunk. > Please provide the following information: > > - what package management system (fink/macports) do you use? > > - are you using the stock Python or one that is installed separately? > > - what library versions are you using of libxml2, libxslt, zlib, > libiconv? > > - which library versions are preinstalled on your platform? > (I do not know how to find that out, can anyone provide > instructions here?) > With tiger the preinstalled libraries are lxml2: 2.6.16 lxslt: 1.1.11 To get this information just run /usr/bin/xml2-config --version /usr/bin/xslt-config --version Leopard should have newer libraries but I have no machine right now to check that. > - what library versions does lxml.etree find? (see below) > The OS is Macosx Tiger 10.4.11 Python is version 2.5.1 universal binary from www.python.org package management for (libxml2, libxslt) is macports TESTED VERSION: 2.0.0-51232 Python: (2, 5, 1, 'final', 0) lxml.etree: (2, 0, 0, 51232) libxml used: (2, 6, 31) libxml compiled: (2, 6, 31) libxslt used: (1, 1, 22) libxslt compiled: (1, 1, 22) All libraries are universal built libraries from macports and linked dynamically. I have not tried static, since macports by default ONLY compiles dynamic libraries and everyhing seems to be working fine for me. > > Just in case there are people who actually have a working > installation, > > - has anyone successfully built lxml statically against libxml2/ > libxslt? > As said before dynamically linked only. > * does it work reliably? (see "Testing" below) ot-test runs through without any crashes. Threads 7-9 throw exceptions though, as you said. > * did you build with the --without-threads option? > * does it work with *and* without that option? > Dynamically linked only > - has anyone managed to get lxml working reliably (see "Testing" > below) with > a dynamic build? > > * did you set DYLD_LIBRARY_PATH? > * is DYLD_LIBRARY_PATH required for you or does it work without? > * is there anything special you did to make it work? > DYLD_LIBRARY_PATH is not set at all. All I had to do for lxml to find the correct libraries was to make sure that xml2-config and xslt-config from my macports installation is found BEFORE the stock versions in /usr/bin. This way etree links against the correct libraries. > * if there are crashes, is the install unusable or are there things > you can > still do reliably? No crashes at all. If you need any more information just tell me. Kind regards, Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2417 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080204/91952256/attachment.bin From stefan_ml at behnel.de Mon Feb 4 19:43:08 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 04 Feb 2008 19:43:08 +0100 Subject: [lxml-dev] Build problems under MacOS-X In-Reply-To: References: <47A40F11.8070401@behnel.de> <47A43CD8.1020308@behnel.de> <47A456C2.8020108@behnel.de> <7638A5A1-E559-4035-A84A-142AECEAE27E@it-loops.com> <47A6BA12.5030706@behnel.de> Message-ID: <47A75CBC.201@behnel.de> Hi, Michael Guntsche wrote: > I upgraded Cython to 0.9.6.11 and recompiled lxml. Now make test, no > longer segfaults [...] > everything seems to work normally, at least the features > I use (validation and xpath most often). Ah, that's good. You actually need Cython 0.9.6.11b(!) for a reliable install, but to keep these minor versions from biting people, I just updated the build page to make clear you'd better not install Cython for a normal release build, but to use the provided (and tested) C sources instead. http://codespeak.net/lxml/build.html#cython > but I get several errors in test_elementtree.py > > Examples: > > ERROR: test_feed_parser (lxml.tests.test_elementtree.ElementTreeTestCase) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", > line 260, in run > testMethod() > File > "/Users/maru/source/cvs+svn/lxml/src/lxml/tests/test_elementtree.py", > line 3011, in test_feed_parser > parser = self.etree.XMLParser() > AttributeError: 'module' object has no attribute 'XMLParser' That's just fine. The compatibility tests require ET 1.3 (currently only in SVN) to run. I'll just have to check why these tests aren't disabled for older (c)ET versions yet. Stefan From stefan_ml at behnel.de Mon Feb 4 19:55:32 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 04 Feb 2008 19:55:32 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X In-Reply-To: <2C6DA784-C579-45F7-919B-159909D6325B@it-loops.com> References: <47A6D0B1.5020600@behnel.de> <2C6DA784-C579-45F7-919B-159909D6325B@it-loops.com> Message-ID: <47A75FA4.6000406@behnel.de> Hi, Michael Guntsche wrote: > On Feb 4, 2008, at 9:45, Stefan Behnel wrote: > >> When building lxml, please move any installed Cython versions out of >> the way >> and run the build on the unpacked lxml-2.0.tar.gz release sources. It >> must say >> "trying to build without Cython" at the beginning. > > Slight difference here Cython 0.9.6.11 and lxml from current trunk. :) ... if even that works, I really don't know what ... > With tiger the preinstalled libraries are > > lxml2: 2.6.16 > lxslt: 1.1.11 > > To get this information just run > > /usr/bin/xml2-config --version > /usr/bin/xslt-config --version Ah, sure, thanks. Those libraries are definitely too old and too buggy. > ot-test runs through without any crashes. Threads 7-9 throw exceptions > though, as you said. Then that really looks like a safe setup. > DYLD_LIBRARY_PATH is not set at all. All I had to do for lxml to find > the correct libraries was to make sure > that xml2-config and xslt-config from my macports installation is found > BEFORE the stock versions in /usr/bin. > This way etree links against the correct libraries. Hmmm, could others comment on this? Does this make a difference? Especially for those, who must currently set the DYLD var? Stefan From mike at it-loops.com Mon Feb 4 20:09:35 2008 From: mike at it-loops.com (Michael Guntsche) Date: Mon, 4 Feb 2008 20:09:35 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X In-Reply-To: <47A75FA4.6000406@behnel.de> References: <47A6D0B1.5020600@behnel.de> <2C6DA784-C579-45F7-919B-159909D6325B@it-loops.com> <47A75FA4.6000406@behnel.de> Message-ID: <038D47D3-0F6B-419F-9F9A-B9E1EB9E31F3@it-loops.com> On Feb 4, 2008, at 19:55, Stefan Behnel wrote: >> DYLD_LIBRARY_PATH is not set at all. All I had to do for lxml to find >> the correct libraries was to make sure >> that xml2-config and xslt-config from my macports installation is >> found >> BEFORE the stock versions in /usr/bin. >> This way etree links against the correct libraries. > > Hmmm, could others comment on this? Does this make a difference? > Especially > for those, who must currently set the DYLD var? > Just had a look at setupinfo.py and CHANGES.txt, only "xslt-config" is used to find the corrent CFLAGS and libs, but since xml2-config and xslt-config should always be in the same directory this should not be much of a problem. So just make sure the new xslt-config is found first and you should be safe. /Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2417 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080204/aa112498/attachment.bin From cz at gocept.com Tue Feb 5 14:01:32 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Tue, 5 Feb 2008 14:01:32 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X References: <47A6D0B1.5020600@behnel.de> Message-ID: Hey, On 2008-02-04 09:45:37 +0100, Stefan Behnel said: > Hi, > > it looks like many Mac users have problems with lxml on their platform. This > usually involves installing up-to-date dependencies (libxml2/libxslt) in > addition to the system libraries. I would like to get these problems resolved. > > To get a start on this, we must collect some information. We had a few > reports, but I need to know in more detail what people did, what they tried, > and to what avail. So here is a list of questions for Mac users. Please help > us by answering them. Some instructions follow at the end. > > When building lxml, please move any installed Cython versions out of the way > and run the build on the unpacked lxml-2.0.tar.gz release sources. It must say > "trying to build without Cython" at the beginning. > > Please provide the following information: > > - what package management system (fink/macports) do you use? We use buidout for the development/deployment. Via buildout we build basically everything to be sure we get consistent results: [libxml2] recipe = zc.recipe.cmmi url = http://ftp.gnome.org/pub/GNOME/sources/libxml2/2.6/libxml2-2.6.26.tar.gz extra_options = --without-python [libxslt] recipe = zc.recipe.cmmi url = http://ftp.gnome.org/pub/GNOME/sources/libxslt/1.1/libxslt-1.1.16.tar.bz2 extra_options = --with-libxml-prefix=${buildout:directory}/parts/libxml2/ --without-python [lxml] recipe = zc.recipe.egg:custom egg = lxml include-dirs = ${buildout:directory}/parts/libxml2/include/libxml2 ${buildout:directory}/parts/libxslt/include library-dirs = ${buildout:directory}/parts/libxml2/lib ${buildout:directory}/parts/libxslt/lib rpath = ${buildout:directory}/parts/libxml2/lib ${buildout:directory}/parts/libxslt/lib > > - are you using the stock Python or one that is installed separately? Custom built python wich *nothing* else installed. [....] I'll look after the other things when i've got more time. But basically since using buildout we're fine :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From sidnei at enfoldsystems.com Tue Feb 5 18:00:59 2008 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Tue, 5 Feb 2008 15:00:59 -0200 Subject: [lxml-dev] requesting lxml testimonials? In-Reply-To: <47A1D22F.8050702@behnel.de> References: <47A1D22F.8050702@behnel.de> Message-ID: On Jan 31, 2008 11:50 AM, Stefan Behnel wrote: > I started a FAQ entry "Who uses lxml?": > > http://codespeak.net/svn/lxml/trunk/doc/FAQ.txt > > It will go online with the release of lxml 2.0 tomorrow (although maybe I > should wait a little longer, this has been a suspiciously calm week on the > list). Anyway, I hope that people will start bugging me why their own link is > missing. ;) > > But I agree, there should be some quotes on the web site also, and maybe even > the FAQ entry should be placed (or referenced) more prominently... The upcoming Enfold Proxy 4 release will support applying XSLT transformations to proxied pages: http://www.enfoldsystems.com/Products/Proxy/4 -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Tue Feb 5 21:22:32 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 05 Feb 2008 21:22:32 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X In-Reply-To: References: <47A6D0B1.5020600@behnel.de> Message-ID: <47A8C588.9060701@behnel.de> Hi, Christian Zagrodnick wrote: > We use buildout for the development/deployment. Via buildout we build > basically everything to be sure we get consistent results Martijn also made a buildout script for lxml a while ago: http://faassen.n--tree.net/blog/view/weblog/2006/10/03/0 I guess this is really helpful. Definitely for production environments, but it might also come in handy for Mac users. Can someone enlighten me how finding libxml2/libxslt works here at runtime? Martijn, you suggested adding this to lxml back then. I think we should have this in SVN so that people can use it straight away. Stefan From cz at gocept.com Thu Feb 7 08:52:40 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Thu, 7 Feb 2008 08:52:40 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X In-Reply-To: <47A8C588.9060701@behnel.de> References: <47A6D0B1.5020600@behnel.de> <47A8C588.9060701@behnel.de> Message-ID: On 05.02.2008, at 21:22, Stefan Behnel wrote: > Hi, > > Christian Zagrodnick wrote: >> We use buildout for the development/deployment. Via buildout we build >> basically everything to be sure we get consistent results > > Martijn also made a buildout script for lxml a while ago: > > http://faassen.n--tree.net/blog/view/weblog/2006/10/03/0 > > I guess this is really helpful. Definitely for production > environments, but it > might also come in handy for Mac users. > > Can someone enlighten me how finding libxml2/libxslt works here at > runtime? Well, the generated scripts use the compiled lxml: % grep lxml bin/test '/Users/zagy/.../develop-eggs/lxml-2.0-py2.4-macosx-10.5-i386.egg', And actually I thought the `rpath` option was there to do that: rpath: A new-line separated list of directories to search for dynamic libraries at run time. But that doesn't exactly seem to work as it really seems lxml would use the system libraries at runtime. Gotta ask jim. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From cz at gocept.com Thu Feb 7 16:36:02 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Thu, 7 Feb 2008 16:36:02 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X References: <47A6D0B1.5020600@behnel.de> <47A8C588.9060701@behnel.de> Message-ID: On 2008-02-07 08:52:40 +0100, Christian Zagrodnick said: > > On 05.02.2008, at 21:22, Stefan Behnel wrote: > >> Hi, >> >> Christian Zagrodnick wrote: >>> We use buildout for the development/deployment. Via buildout we build >>> basically everything to be sure we get consistent results >> >> Martijn also made a buildout script for lxml a while ago: >> >> http://faassen.n--tree.net/blog/view/weblog/2006/10/03/0 >> >> I guess this is really helpful. Definitely for production > >> environments, but it >> might also come in handy for Mac users. >> >> Can someone enlighten me how finding libxml2/libxslt works here at > >> runtime? > > Well, the generated scripts use the compiled lxml: > > % grep lxml bin/test > '/Users/zagy/.../develop-eggs/lxml-2.0-py2.4-macosx-10.5-i386.egg', > > And actually I thought the `rpath` option was there to do that: > > rpath: A new-line separated list of directories to search for dynamic > > libraries at run time. > > But that doesn't exactly seem to work as it really seems lxml would > > use the system libraries at runtime. Gotta ask jim. Right, so actually buildout does the right thing. The main problem is, that lxml runs the wrong xslt-config. So I was bascially buildint libxml2 and libxslt just for the fun of it. The question is if lxml really always needs to call xslt-config. Or how one would set the path in the buildout so that the right xslt-config is called. If I manually set the path it works like charm: % PATH=`pwd`/parts/libxslt/bin:$PATH bin/buildout -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From cz at gocept.com Thu Feb 7 17:16:54 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Thu, 7 Feb 2008 17:16:54 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing Message-ID: Hi there, given the following little script: import lxml.etree import lxml.objectify xml = lxml.objectify.XML( '' 'titfoobar') foo = xml['b'][0] bar = xml['b'][1] foo['text'] = 'FOO!' print lxml.etree.tostring(xml, pretty_print=True) foo = xml.xpath('b[@name="b1"]')[0] bar = xml.xpath('b[@name="b2"]')[0] xml['b'] = [bar, foo] print lxml.etree.tostring(xml, pretty_print=True) The first print statement prints, where everything is fine: tit FOO! bar The second prints: bar tit FOO! Note the last References: Message-ID: <47AB358E.40800@behnel.de> Hi, Christian Zagrodnick wrote: > Note the last > I think is has something todo with the xpath, but I cannot really > figure out what. > > That's with lxml 2.0 and python 2.4 it's not related to XPath, it's just that you change the order of the children in your test: ------------------------- import lxml.etree import lxml.objectify xml = lxml.objectify.XML( '' 'titfoo' 'bar') print lxml.etree.tostring(xml, pretty_print=True) foo = xml['b'][0] bar = xml['b'][1] foo['text'] = 'FOO!' print lxml.etree.tostring(xml, pretty_print=True) foo = xml['b'][0] bar = xml['b'][1] xml['b'] = [bar, foo] print lxml.etree.tostring(xml, pretty_print=True) ------------------------- gives the same result. I don't even think it's related to objectify. I'll look into it when I find the time. Stefan From jwashin at vt.edu Thu Feb 7 22:48:16 2008 From: jwashin at vt.edu (Jim Washington) Date: Thu, 07 Feb 2008 16:48:16 -0500 Subject: [lxml-dev] zif.sedna Message-ID: <47AB7CA0.8080901@vt.edu> Hi, all The Zif Collective has released to the cheese shop a python adapter to Sedna, an open source "native" XML database. Sedna is available at http://modis.ispras.ru/sedna/ in binary and source form under an Apache 2.0 license. Sedna stores XML in documents and collections, and can support multiple "databases" of related documents and collections. Sedna has many of the features of relational databases, but instead of SQL it uses XQuery. Sedna extends XQuery to support updates, inserts, deletes, etc. Sedna supports ACID transactions. zif.sedna provides communication with a Sedna server in python. It also provides a dbapi-like interface (connections and cursors). XQuery can be cool. Getting your results in object form can be as simple as parsing in lxml.objectify. Enjoy! -Jim Washington From cz at gocept.com Fri Feb 8 08:14:06 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Fri, 8 Feb 2008 08:14:06 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing References: <47AB358E.40800@behnel.de> Message-ID: On 2008-02-07 17:45:02 +0100, Stefan Behnel said: > Hi, > > Christian Zagrodnick wrote: >> Note the last > >> I think is has something todo with the xpath, but I cannot really >> figure out what. >> >> That's with lxml 2.0 and python 2.4 > > it's not related to XPath, it's just that you change the order of the children > in your test: > > ------------------------- > import lxml.etree > import lxml.objectify > > xml = lxml.objectify.XML( > '' > 'titfoo' > 'bar') > > print lxml.etree.tostring(xml, pretty_print=True) > > foo = xml['b'][0] > bar = xml['b'][1] > foo['text'] = 'FOO!' > > print lxml.etree.tostring(xml, pretty_print=True) > > foo = xml['b'][0] > bar = xml['b'][1] > xml['b'] = [bar, foo] > > print lxml.etree.tostring(xml, pretty_print=True) > ------------------------- > > gives the same result. I don't even think it's related to objectify. Ah. I tried to reproduce it w/o the XPath but probably did something else different then. > > I'll look into it when I find the time. Thanks! -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From cz at gocept.com Fri Feb 8 08:54:41 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Fri, 8 Feb 2008 08:54:41 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing References: <47AB358E.40800@behnel.de> Message-ID: On 2008-02-08 08:14:06 +0100, Christian Zagrodnick said: > On 2008-02-07 17:45:02 +0100, Stefan Behnel said: > > >> gives the same result. I don't even think it's related to objectify. > > Ah. I tried to reproduce it w/o the XPath but probably did something > > else different then. Hum. It seems to only affect elements... that's strange. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From jholg at gmx.de Fri Feb 8 08:52:22 2008 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 08 Feb 2008 08:52:22 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: References: <47AB358E.40800@behnel.de> Message-ID: <20080208080037.46210@gmx.net> Hi, > > it's not related to XPath, it's just that you change the order of the > children > > in your test: > > > > ------------------------- > > import lxml.etree > > import lxml.objectify > > > > [...] > > > > gives the same result. I don't even think it's related to objectify. ?Yep, right. Here's without any objectify-ism: ?>>> from lxml import etree >>> >>> xml = etree.fromstring( ...???? '' ...???? 'titfoo' ...???? 'bar') >>> >>> print etree.tostring(xml, pretty_print=True) ? ??? tit ??? foo ? ? ??? bar ? >>> >>> foo = xml[0] >>> bar = xml[1] >>> >>> print etree.tostring(xml, pretty_print=True) ? ??? tit ??? foo ? ? ??? bar ? >>> >>> xml[0] = bar >>> xml.append(foo) >>> >>> print etree.tostring(xml, pretty_print=True) ? ??? bar ? ? ??? tit ??? foo ? >>> >>> print etree.__version__ 2.0.alpha4 >>> >>> ?FWIW this also fails with older versions, but: ?- 1.2.1 fails with the objectify version, but not the "pure" etree one - 1.3beta? fails with the objectify version, but not the "pure" etree one ?Holger? ? -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080208/9259dd64/attachment.htm From stefan_ml at behnel.de Fri Feb 8 09:46:05 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Feb 2008 09:46:05 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: <20080208080037.46210@gmx.net> References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> Message-ID: <47AC16CD.9070309@behnel.de> Hi Holger, jholg at gmx.de wrote: > Here's without any objectify-ism: [...] > FWIW this also fails with older versions, but: > - 1.2.1 fails with the objectify version, but not the "pure" etree one > - 1.3beta fails with the objectify version, but not the "pure" etree one Hmm, that's interesting. I never thought we were *that* robust before. ;) Anyway, I started by making objectify a bit more robust here, as slice assignment tends to have more corner cases than you might think. The problems usually arise when you replace an Element's children with themselves or with their own siblings (as you did in your example). I changed the implementation to copy *all* source nodes first, and *then* start replacing them in the target slice. Previously, they were copied over one by one while doing the replacements, so now you're on the safe side here. @Christian, please check out the current SVN trunk to see if it works for you (and take care you use Cython 0.9.6.11*b* to build it). BTW, I also removed the support for things like this: el.a[:] = [[el1, el2], [el3, el4]] which behaved just like el.a[:] = [el1, el2, el3, el4] although you can still do el.a = [el1, el2] I don't think anyone will really miss the first one, especially as it only worked for one list level anyway. Regarding the same problem in etree: I'm not sure, but it looks like libxml2 is behaving in a strange way here. When we copy a node to a new document root (xmlDocCopyNode), it creates the necessary namespaces on the new node, but in the example, I seem to be getting something like this back: tit FOO! Note the missing namespace declaration on the second child. It was redeclared on the first child when the node left the scope of the parent, however, it doesn't seem to get declared on the second child. But maybe that's just the serialisation and it looks different internally. I'll have to take a deeper look into it... Stefan From jholg at gmx.de Fri Feb 8 14:26:30 2008 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 08 Feb 2008 14:26:30 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: <47AC16CD.9070309@behnel.de> References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> Message-ID: <20080208134031.179990@gmx.net> Hi Stefan, ? > > their own siblings (as you did in your example). I changed the > implementation > to copy *all* source nodes first, and *then* start replacing them in the > target slice. Previously, they were copied over one by one while doing > the > replacements, so now you're on the safe side here. > ??I can confirm the objectify case works for me now: ?>>> >>> import lxml.etree >>> import lxml.objectify >>> >>> xml = lxml.objectify.XML( ...???? '' ...???? 'titfoo' ...???? 'bar') >>> >>> print lxml.etree.tostring(xml, pretty_print=True) ? ??? tit ??? foo ? ? ??? bar ? ? >>> >>> foo = xml['b'][0] >>> bar = xml['b'][1] >>> foo['text'] = 'FOO!' >>> >>> print lxml.etree.tostring(xml, pretty_print=True) ? ??? tit ??? FOO! ? ? ??? bar ? ? >>> >>> foo = xml['b'][0] >>> bar = xml['b'][1] >>> xml['b'] = [bar, foo] >>> >>> print lxml.etree.tostring(xml, pretty_print=True) ? ??? bar ? ? ??? tit ??? FOO! ? ? >>> print etree.__version__ 2.0.0-51328 ? > although you can still do > > el.a = [el1, el2] > > ? >>> l = [1, 2, 3, 4] >>> l[2:3] = ["a", "b", "c", "d"] >>> l [1, 2, 'a', 'b', 'c', 'd', 4] >>> root = objectify.Element("root") >>> root.l = [1, 2, 3, 4] >>> root.l[2:3] = ["a", "b", "c", "d"] >>> print objectify.dump(root) root = None [ObjectifiedElement] ??? l = 1 [IntElement] ????? * py:pytype = 'int' ??? l = 2 [IntElement] ????? * py:pytype = 'int' ??? l = 'a' [StringElement] ????? * py:pytype = 'str' ??? l = 4 [IntElement] ????? * py:pytype = 'int' ??? l = 'b' [StringElement] ????? * py:pytype = 'str' ??? l = 'c' [StringElement] ????? * py:pytype = 'str' ??? l = 'd' [StringElement] ????? * py:pytype = 'str' >>> etree.tostring(root, pretty_print=True) '\n? 1\n? 2\n? a\n? 4\n? b\n? c\n? d\n\n' >>> print etree.tostring(root, pretty_print=True) ? 1 ? 2 ? a ? 4 ? b ? c ? d ? >>> ?So the correct slice gets substituted, but the order is a bit confused. ? > I don't think anyone will really miss the first one, especially as it > only > worked for one list level anyway. ?Right. I think the slice assignment stuff is basically corner cases, but it's nice somehow :-) But I won't miss the first one for sure. ?Cheers, Holger? -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080208/ce1c85b1/attachment-0001.htm From cz at gocept.com Fri Feb 8 14:55:30 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Fri, 8 Feb 2008 14:55:30 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> <20080208134031.179990@gmx.net> Message-ID: On 2008-02-08 14:26:30 +0100, jholg at gmx.de said: > > > Hi Stefan, > > =A0 >> =20 > >> their own siblings (as you did in your example). I changed the=20 >> implementation >> to copy *all* source nodes first, and *then* start replacing them in th= > e >> target slice. Previously, they were copied over one by one while doing=20 >> the >> replacements, so now you're on the safe side here. >> =20 > > =A0=A0I can confirm the objectify case works for me now: Thats good. Because I had some hard time trying to get a buildout with cython and lxml.. stopping that. :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Fri Feb 8 15:31:47 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Feb 2008 15:31:47 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: <20080208134031.179990@gmx.net> References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> <20080208134031.179990@gmx.net> Message-ID: <47AC67D3.80502@behnel.de> Hi, jholg at gmx.de wrote: > I can confirm the objectify case works for me now: > >>> l = [1, 2, 3, 4] > > >>> l[2:3] = ["a", "b", "c", "d"] > >>> l > [1, 2, 'a', 'b', 'c', 'd', 4] > >>> root = objectify.Element("root") > >>> root.l = [1, 2, 3, 4] > >>> root.l[2:3] = ["a", "b", "c", "d"] > >>> print objectify.dump(root) > root = None [ObjectifiedElement] > l = 1 [IntElement] > * py:pytype = 'int' > l = 2 [IntElement] > * py:pytype = 'int' > l = 'a' [StringElement] > * py:pytype = 'str' > l = 4 [IntElement] > * py:pytype = 'int' > l = 'b' [StringElement] > * py:pytype = 'str' > l = 'c' [StringElement] > * py:pytype = 'str' > l = 'd' [StringElement] > * py:pytype = 'str' > > So the correct slice gets substituted, but the order is a bit confused. Right, we were definitely missing test cases here. I'll add some to see if I can fix it. The first bunch is in SVN, some of them failing (seems to be __getitem__() already, which I didn't change). > I think the slice assignment stuff is basically corner cases totally... Stefan From stefan_ml at behnel.de Fri Feb 8 19:51:30 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Feb 2008 19:51:30 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: <47AC67D3.80502@behnel.de> References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> <20080208134031.179990@gmx.net> <47AC67D3.80502@behnel.de> Message-ID: <47ACA4B2.9090106@behnel.de> Hi, Stefan Behnel wrote: > jholg at gmx.de wrote: >> I can confirm the objectify case works for me now: >>>>> l = [1, 2, 3, 4] >>>>> l[2:3] = ["a", "b", "c", "d"] >>>>> l >> [1, 2, 'a', 'b', 'c', 'd', 4] >>>>> root = objectify.Element("root") >>>>> root.l = [1, 2, 3, 4] >>>>> root.l[2:3] = ["a", "b", "c", "d"] >>>>> print objectify.dump(root) >> root = None [ObjectifiedElement] >> l = 1 [IntElement] >> * py:pytype = 'int' >> l = 2 [IntElement] >> * py:pytype = 'int' >> l = 'a' [StringElement] >> * py:pytype = 'str' >> l = 4 [IntElement] >> * py:pytype = 'int' >> l = 'b' [StringElement] >> * py:pytype = 'str' >> l = 'c' [StringElement] >> * py:pytype = 'str' >> l = 'd' [StringElement] >> * py:pytype = 'str' >> >> So the correct slice gets substituted, but the order is a bit confused. > > Right, we were definitely missing test cases here. I'll add some to see if I > can fix it. The first bunch is in SVN, some of them failing (seems to be > __getitem__() already, which I didn't change). Should be fixed on the trunk now. Stefan From stefan_ml at behnel.de Fri Feb 8 20:12:30 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Feb 2008 20:12:30 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: <20080208080037.46210@gmx.net> References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> Message-ID: <47ACA99E.9040201@behnel.de> Hi Holger, jholg at gmx.de wrote: > Yep, right. Here's without any objectify-ism: [...] > >>> xml[0] = bar > >>> xml.append(foo) > >>> > >>> print etree.tostring(xml, pretty_print=True) > > > bar > > > tit > foo > > > >>> print etree.__version__ > 2.0.alpha4 Hmmm, I don't get that with 2.0 final and libxml2 2.6.30. Could you double check that on your platform? Stefan From stefan_ml at behnel.de Fri Feb 8 20:43:18 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Feb 2008 20:43:18 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X In-Reply-To: References: <47A6D0B1.5020600@behnel.de> <47A8C588.9060701@behnel.de> Message-ID: <47ACB0D6.4070704@behnel.de> Hi, Christian Zagrodnick wrote: > The main problem is, that lxml runs the wrong xslt-config. So I was > basically building libxml2 and libxslt just for the fun of it. > > The question is if lxml really always needs to call xslt-config. Or how > one would set the path in the buildout so that the right xslt-config is > called. > > If I manually set the path it works like charm: > > % PATH=`pwd`/parts/libxslt/bin:$PATH bin/buildout You can now pass "--with-xslt-config=XXX" to setup.py. Stefan From stefan_ml at behnel.de Sat Feb 9 09:05:56 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 09 Feb 2008 09:05:56 +0100 Subject: [lxml-dev] Fwd: News flash: Python possibly guilty in excessive DTD traffic In-Reply-To: References: <20080209040312.725218a2@dartworks.biz> Message-ID: <47AD5EE4.5080600@behnel.de> Hi Sidnei, Sidnei da Silva wrote: > http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic > Does any of that apply to lxml? I don't think so, the article relates to DTD loading through urllib. lxml leaves that to libxml2's parser. > I suppose lxml supports dtd catalogs? Yes, libxml2 has catalog support (although you can compile that out), so it will normally see network access as a last resort to resolve external entities. > Does it cache dtds in any way? There is no internal document caching (except for repeated access to the same document during a single operation, e.g. in XSLT). If you do not provide catalogs on your system, that's your own 'decision'. You can still write your own caching resolver in that case, but I would consider catalogs the best solution to this problem. Stefan > ---------- Forwarded message ---------- > From: Guido van Rossum > Date: 2008/2/9 > Subject: [Web-SIG] Fwd: [Baypiggies] News flash: Python possibly > guilty in excessive DTD traffic > To: Web SIG > > > ---------- Forwarded message ---------- > From: Keith Dart ? > Date: Feb 8, 2008 8:03 PM > Subject: [Baypiggies] News flash: Python possibly guilty in excessive > DTD traffic > To: baypiggies at python.org > > > http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic > > This is interesting. I've noticed that when you use Python's XML > package in validating mode it does try to fetch the DTD. Be careful > when you use that. From cz at gocept.com Sat Feb 9 14:33:43 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Sat, 9 Feb 2008 14:33:43 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X References: <47A6D0B1.5020600@behnel.de> <47A8C588.9060701@behnel.de> <47ACB0D6.4070704@behnel.de> Message-ID: On 2008-02-08 20:43:18 +0100, Stefan Behnel said: > Hi, > > Christian Zagrodnick wrote: >> The main problem is, that lxml runs the wrong xslt-config. So I was >> basically building libxml2 and libxslt just for the fun of it. >> >> The question is if lxml really always needs to call xslt-config. Or how >> one would set the path in the buildout so that the right xslt-config is >> called. >> >> If I manually set the path it works like charm: >> >> % PATH=`pwd`/parts/libxslt/bin:$PATH bin/buildout > > You can now pass "--with-xslt-config=XXX" to setup.py. Gotta check how we best pass that along in buildout. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From ebgssth at gmail.com Sun Feb 10 04:58:19 2008 From: ebgssth at gmail.com (js) Date: Sun, 10 Feb 2008 12:58:19 +0900 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X In-Reply-To: <47A6D0B1.5020600@behnel.de> References: <47A6D0B1.5020600@behnel.de> Message-ID: > - what package management system (fink/macports) do you use? MacPorts > - are you using the stock Python or one that is installed separately? Python2.5.1 from MacPorts > - what library versions are you using of libxml2, libxslt, zlib, libiconv? % port installed |egrep 'libxml2|xslt|zlib|iconv' libiconv @1.12_0+darwin_8 (active) libxml2 @2.6.31_0 (active) libxslt @1.1.22_0 (active) py25-zlib @2.5.1_0 (active) zlib @1.2.3_1 (active) > - which library versions are preinstalled on your platform? > (I do not know how to find that out, can anyone provide instructions here?) What library? > - what library versions does lxml.etree find? (see below) from lxml import etree print "lxml.etree: ", etree.LXML_VERSION lxml.etree: (2, 0, 0, 0) print "libxml used: ", etree.LIBXML_VERSION libxml used: (2, 6, 31) print "libxml compiled: ", etree.LIBXML_COMPILED_VERSION libxml compiled: (2, 6, 31) print "libxslt used: ", etree.LIBXSLT_VERSION libxslt used: (1, 1, 22) print "libxslt compiled: ", etree.LIBXSLT_COMPILED_VERSION libxslt compiled: (1, 1, 22) > - does it crash when running the normal tests? Ran 1101 tests in 12.232s FAILED (errors=14) make: *** [test_inplace] Error 1 From pw_lists at slinkp.com Mon Feb 11 08:04:11 2008 From: pw_lists at slinkp.com (Paul Winkler) Date: Mon, 11 Feb 2008 02:04:11 -0500 Subject: [lxml-dev] Bug in lxml.html with nameless form submit? Message-ID: <20080211070411.GB11341@slinkp.com> This is a pretty common idiom in html - to have a submit button that has no name: But lxml.html barfs if you use the form's fields.items() or fields.values() method: >>> import lxml.html >>> tree = lxml.html.fromstring(''' ... ...
... ... ...
... ... ''') >>> tree >>> tree.forms tree.forms >>> tree.forms[0] >>> tree.forms[0].fields >>> tree.forms[0].fields.keys() [None, 'foo'] >>> tree.forms[0].fields.items() Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.4/UserDict.py", line 112, in items return list(self.iteritems()) File "/usr/lib/python2.4/UserDict.py", line 101, in iteritems yield (k, self[k]) File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 749, in __getitem__ return self.inputs[item].value File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 811, in __getitem__ raise KeyError( KeyError: 'No input element with the name None' >>> tree.forms[0].fields.values() Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.4/UserDict.py", line 110, in values return [v for _, v in self.iteritems()] File "/usr/lib/python2.4/UserDict.py", line 101, in iteritems yield (k, self[k]) File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 749, in __getitem__ return self.inputs[item].value File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 811, in __getitem__ raise KeyError( KeyError: 'No input element with the name None' If you give the submit element a name attribute, this doesn't happen. -- Paul Winkler http://www.slinkp.com From jholg at gmx.de Mon Feb 11 09:24:35 2008 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 11 Feb 2008 09:24:35 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: <47ACA99E.9040201@behnel.de> References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47ACA99E.9040201@behnel.de> Message-ID: <20080211082743.196110@gmx.net> Hi Stefan, > jholg at gmx.de wrote: > > Yep, right. Here's without any objectify-ism: > [...] > > >>> print etree.__version__ > > 2.0.alpha4 > > Hmmm, I don't get that with 2.0 final and libxml2 2.6.30. Could you > double > check that on your platform? ?You're right. I just checked 2.0 final and the current trunk, and I don't see that problem with neither of them (libxml2 2.6.27): ?>>> from lxml import etree >>> xml = etree.fromstring( ...????? '' ...????? 'titfoo' ...????? 'bar') >>> >>> print etree.tostring(xml, pretty_print=True) ? ??? tit ??? foo ? ? ??? bar ? ? >>> >>> foo = xml[0] >>> bar = xml[1] >>> print etree.tostring(xml, pretty_print=True) ? ??? tit ??? foo ? ? ??? bar ? ? >>> >>> xml[0] = bar >>> xml.append(foo) >>> >>> print etree.tostring(xml, pretty_print=True) ? ??? bar ? ? ??? tit ??? foo ? ? >>> print etree.__version__ 2.0.0-51192 >>> print etree.LIBXML_VERSION (2, 6, 27) >>> print etree.LIBXSLT_VERSION (1, 1, 20) >>> ?Holger -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080211/a13ec095/attachment-0001.htm From cz at gocept.com Mon Feb 11 10:12:43 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 11 Feb 2008 10:12:43 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> <20080208134031.179990@gmx.net> Message-ID: On 2008-02-08 14:55:30 +0100, Christian Zagrodnick said: > On 2008-02-08 14:26:30 +0100, jholg at gmx.de said: > >> > >> > >> Hi Stefan, >> > >> =A0 >>> =20 >> > >>> their own siblings (as you did in your example). I changed the=20 >>> implementation >>> to copy *all* source nodes first, and *then* start replacing them in th > >> e >>> target slice. Previously, they were copied over one by one while doing > =20 >>> the >>> replacements, so now you're on the safe side here. >>> =20 >> > >> =A0=A0I can confirm the objectify case works for me now: > > Thats good. Because I had some hard time trying to get a buildout with > > cython and lxml.. stopping that. :) By the way, what is the release schedule. I'd need the fix rather soon in a released lxml :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Mon Feb 11 10:52:19 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Feb 2008 10:52:19 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> <20080208134031.179990@gmx.net> Message-ID: <47B01AD3.2060308@behnel.de> Hi, Christian Zagrodnick wrote: > On 2008-02-08 14:55:30 +0100, Christian Zagrodnick said: > >> On 2008-02-08 14:26:30 +0100, jholg at gmx.de said: >> >>> Hi Stefan, >>> >>> =A0 >>>> =20 >>>> their own siblings (as you did in your example). I changed the=20 >>>> implementation >>>> to copy *all* source nodes first, and *then* start replacing them in th >>> e >>>> target slice. Previously, they were copied over one by one while doing >> =20 >>>> the >>>> replacements, so now you're on the safe side here. >>>> =20 >>> =A0=A0I can confirm the objectify case works for me now: >> Thats good. Because I had some hard time trying to get a buildout with >> >> cython and lxml.. stopping that. :) > > By the way, what is the release schedule. I'd need the fix rather soon > in a released lxml :) Does the xslt-config option work for you in buildout? (and how?) I do not currently have any other pending bugs, so I would release as soon as I know everything is fine so far. Stefan From stefan_ml at behnel.de Mon Feb 11 12:12:49 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Feb 2008 12:12:49 +0100 Subject: [lxml-dev] Bug in lxml.html with nameless form submit? In-Reply-To: <20080211070411.GB11341@slinkp.com> References: <20080211070411.GB11341@slinkp.com> Message-ID: <47B02DB1.1050501@behnel.de> Hi, Paul Winkler wrote: > This is a pretty common idiom in html - to have a submit button that > has no name: > > Definitely valid HTML: http://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.4 > But lxml.html barfs if you use the form's fields.items() or > fields.values() method: > >>>> import lxml.html >>>> tree = lxml.html.fromstring(''' > ... > ...
> ... > ... > ...
> ... > ... ''') >>>> tree > >>>> tree.forms > tree.forms >>>> tree.forms[0] > >>>> tree.forms[0].fields > >>>> tree.forms[0].fields.keys() > [None, 'foo'] >>>> tree.forms[0].fields.items() > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.4/UserDict.py", line 112, in items > return list(self.iteritems()) > File "/usr/lib/python2.4/UserDict.py", line 101, in iteritems > yield (k, self[k]) > File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 749, in __getitem__ > return self.inputs[item].value > File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 811, in __getitem__ > raise KeyError( > KeyError: 'No input element with the name None' >>>> tree.forms[0].fields.values() > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.4/UserDict.py", line 110, in values > return [v for _, v in self.iteritems()] > File "/usr/lib/python2.4/UserDict.py", line 101, in iteritems > yield (k, self[k]) > File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 749, in __getitem__ > return self.inputs[item].value > File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 811, in __getitem__ > raise KeyError( > KeyError: 'No input element with the name None' Looks like a bug to me. Ian? Stefan From cz at gocept.com Mon Feb 11 15:12:41 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 11 Feb 2008 15:12:41 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> <20080208134031.179990@gmx.net> <47B01AD3.2060308@behnel.de> Message-ID: On 2008-02-11 10:52:19 +0100, Stefan Behnel said: > Hi, > > Christian Zagrodnick wrote: >> On 2008-02-08 14:55:30 +0100, Christian Zagrodnick said: >> >>> On 2008-02-08 14:26:30 +0100, jholg at gmx.de said: >>> >>>> Hi Stefan, >>>> >>>> =A0 >>>>> =20 >>>>> their own siblings (as you did in your example). I changed the=20 >>>>> implementation >>>>> to copy *all* source nodes first, and *then* start replacing them in th >>>> e >>>>> target slice. Previously, they were copied over one by one while doing >>> =20 >>>>> the >>>>> replacements, so now you're on the safe side here. >>>>> =20 >>>> =A0=A0I can confirm the objectify case works for me now: >>> Thats good. Because I had some hard time trying to get a buildout with >>> >>> cython and lxml.. stopping that. :) >> >> By the way, what is the release schedule. I'd need the fix rather soon >> in a released lxml :) > > Does the xslt-config option work for you in buildout? (and how?) I do not > currently have any other pending bugs, so I would release as soon as I know > everything is fine so far. The xslt-config options isn't as straitforward as one might think because buildout calls easy_intall where passing arguments is not possible (afaik, not in buildout anyway). Passing enviornment variables is very well possible though. So I'm pondering about setting env-vars and a recipe. Will discuss that on the distuils mailing list. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Mon Feb 11 15:24:38 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Feb 2008 15:24:38 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing In-Reply-To: References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> <20080208134031.179990@gmx.net> <47B01AD3.2060308@behnel.de> Message-ID: <47B05AA6.6070907@behnel.de> Hi, Christian Zagrodnick wrote: > The xslt-config options isn't as straitforward as one might think > because buildout calls easy_intall where passing arguments is not > possible (afaik, not in buildout anyway). Passing enviornment variables > is very well possible though. So I'm pondering about setting env-vars > and a recipe. Ok, but then it should be enough if you manage to set PATH in the recipe, no need to change anything else in lxml. > Will discuss that on the distuils mailing list. I'd be happy to know their answer. Stefan From cz at gocept.com Mon Feb 11 15:29:41 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 11 Feb 2008 15:29:41 +0100 Subject: [lxml-dev] Bug: type annotation namespace-prefix goes missing References: <47AB358E.40800@behnel.de> <20080208080037.46210@gmx.net> <47AC16CD.9070309@behnel.de> <20080208134031.179990@gmx.net> <47B01AD3.2060308@behnel.de> <47B05AA6.6070907@behnel.de> Message-ID: On 2008-02-11 15:24:38 +0100, Stefan Behnel said: > Hi, > > Christian Zagrodnick wrote: >> The xslt-config options isn't as straitforward as one might think >> because buildout calls easy_intall where passing arguments is not >> possible (afaik, not in buildout anyway). Passing enviornment variables >> is very well possible though. So I'm pondering about setting env-vars >> and a recipe. > > Ok, but then it should be enough if you manage to set PATH in the recipe, no > need to change anything else in lxml. > >> Will discuss that on the distuils mailing list. > > I'd be happy to know their answer. Sure, I'll keep you informed. It would be great to have the buildout configuration snippet on codepeak to be able to include it in the buildout like this: [buildout] extends = http://lxml.codespeak.net/lxml-buildout-config parts = libxml2 libxslt lxml Once I've got a buildout running which does all that i'll post the necessary information. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From Olivier.Collioud at wipo.int Mon Feb 11 19:07:56 2008 From: Olivier.Collioud at wipo.int (Olivier Collioud) Date: Mon, 11 Feb 2008 19:07:56 +0100 Subject: [lxml-dev] c14n, pretty printing and diffing Message-ID: Hello, I would like to use my favourite text diffing tool to compare XML files. Is their a way to produce a pretty printed canonical version of my XML files using lxml ? Thanks, Olivier. ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. From egrim at swri.org Mon Feb 11 19:39:10 2008 From: egrim at swri.org (Evan Grim) Date: Mon, 11 Feb 2008 18:39:10 +0000 (UTC) Subject: [lxml-dev] lxml 2.0 on Windows Message-ID: Greetings, I'm a recent lxml convert and am developing a cross-platform application that needs lxml on both windows and linux platforms. I notice that 2.0 was just released and am wondering if whoever it is that so kindly makes the windows installers plans on releasing a 2.0 binary soon. If not, I'll just stick with the latest 1.x series release so that I can be consistent on both platforms. Does anyone know the plan? Cheers, Evan From stefan_ml at behnel.de Mon Feb 11 19:56:28 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Feb 2008 19:56:28 +0100 Subject: [lxml-dev] c14n, pretty printing and diffing In-Reply-To: References: Message-ID: <47B09A5C.1070308@behnel.de> Hi, Olivier Collioud wrote: > I would like to use my favourite text diffing tool to compare XML > files. Which is not lxml.html.diff, I assume? (I'm not sure how HTML specific that is, BTW). Also, for doctests, there is lxml.usedoctest that you can import (the lxml web pages use it for doctests). > Is their a way to produce a pretty printed canonical version of my XML > files using lxml ? Not using the c14n interface (libxml2 doesn't support it). Serialising by hand is not too hard, though. You can look at ElementTree._write() for an example: http://svn.effbot.org/public/elementtree/elementtree/ElementTree.py Stefan From sidnei at enfoldsystems.com Mon Feb 11 20:02:08 2008 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Mon, 11 Feb 2008 17:02:08 -0200 Subject: [lxml-dev] lxml 2.0 on Windows In-Reply-To: References: Message-ID: On Feb 11, 2008 4:39 PM, Evan Grim wrote: > Greetings, > > I'm a recent lxml convert and am developing a cross-platform application that > needs lxml on both windows and linux platforms. I notice that 2.0 was just > released and am wondering if whoever it is that so kindly makes the windows > installers plans on releasing a 2.0 binary soon. If not, I'll just stick with > the latest 1.x series release so that I can be consistent on both platforms. > Does anyone know the plan? Sorry, this fell off my radar. I am building a binary right now, should be up in the next few hours. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Mon Feb 11 20:44:30 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Feb 2008 20:44:30 +0100 Subject: [lxml-dev] Fwd: News flash: Python possibly guilty in excessive DTD traffic In-Reply-To: <47AD5EE4.5080600@behnel.de> References: <20080209040312.725218a2@dartworks.biz> <47AD5EE4.5080600@behnel.de> Message-ID: <47B0A59E.9000502@behnel.de> Hi again, Stefan Behnel wrote: > Sidnei da Silva wrote: >> http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic >> Does any of that apply to lxml? > > I don't think so, the article relates to DTD loading through urllib. lxml > leaves that to libxml2's parser. > > >> I suppose lxml supports dtd catalogs? > > Yes, libxml2 has catalog support (although you can compile that out), so it > will normally see network access as a last resort to resolve external entities. > > >> Does it cache dtds in any way? > > There is no internal document caching (except for repeated access to the same > document during a single operation, e.g. in XSLT). If you do not provide > catalogs on your system, that's your own 'decision'. You can still write your > own caching resolver in that case, but I would consider catalogs the best > solution to this problem. Two more things to add: lxml does not use validation by default, you have to explicitly enable it in a parser if you want to use it - in which case it should not be asked too much to make sure your catalogs are properly installed. :) Secondly, lxml 2.0 does not load referenced network resources by default. While it loads documents that you explicitly ask it to download by parsing from a URL, you will also have to explicitly tell it to enable network access for referenced resources like DTDs, schemas and the like, again, by configuring a parser. So, no, using lxml will not unexpectedly waste any network resources unless you explicitly tell it to do so. Stefan From Olivier.Collioud at wipo.int Tue Feb 12 07:24:49 2008 From: Olivier.Collioud at wipo.int (Olivier Collioud) Date: Tue, 12 Feb 2008 07:24:49 +0100 Subject: [lxml-dev] c14n, pretty printing and diffing Message-ID: Thanks Stephan. I prefer visual diffing : the ones provided by Eclipse, TkDiff or WinMerge. I did not fin any doc or usage example of lxml.usedoctest, could you please give some pointer ? Let me share my simple (because I do not use any namespace, PI, comment...) solution based on iterparse: depth = 0 sourceTree = ElementTree.iterparse(open(inputFile, 'r'), events=("start", "end")) for event, elem in sourceTree: if event == "start": i = "\n" + depth*" " depth += 1 outputFile.write('%s<%s' % (i,elem.tag)) if len(elem.items()): attrs = elem.items() attrs.sort() outputFile.write(' ') outputFile.write(' '.join(['%s="%s"' % (a[0],a[1]) for a in attrs if a[0] != 'size'])) if elem.text and elem.text.strip(): outputFile.write('>%s' % elem.text.strip('\n').encode('utf-8')) elif len(elem): outputFile.write('>') if event == "end": if (elem.text and elem.text.strip()) or len(elem): outputFile.write('%s' % (i,elem.tag)) else: outputFile.write('/>') if elem.tail and elem.tail.strip(): outputFile.write(elem.tail.strip('\n').encode('utf-8')) depth -= 1 elem.clear() Olivier. >>> Stefan Behnel 11/02/08 7:56 pm >>> Hi, Olivier Collioud wrote: > I would like to use my favourite text diffing tool to compare XML > files. Which is not lxml.html.diff, I assume? (I'm not sure how HTML specific that is, BTW). Also, for doctests, there is lxml.usedoctest that you can import (the lxml web pages use it for doctests). > Is their a way to produce a pretty printed canonical version of my XML > files using lxml ? Not using the c14n interface (libxml2 doesn't support it). Serialising by hand is not too hard, though. You can look at ElementTree._write() for an example: http://svn.effbot.org/public/elementtree/elementtree/ElementTree.py Stefan _______________________________________________ lxml-dev mailing list lxml-dev at codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. From Olivier.Collioud at wipo.int Tue Feb 12 07:26:20 2008 From: Olivier.Collioud at wipo.int (Olivier Collioud) Date: Tue, 12 Feb 2008 07:26:20 +0100 Subject: [lxml-dev] c14n, pretty printing and diffing Message-ID: Thanks Stephan. I prefer visual diffing : the ones provided by Eclipse, TkDiff or WinMerge. I did not fin any doc or usage example of lxml.usedoctest, could you please give some pointer ? Let me share my simple (because I do not use any namespace, PI, comment...) solution based on iterparse: depth = 0 sourceTree = ElementTree.iterparse(open(inputFile, 'r'), events=("start", "end")) for event, elem in sourceTree: if event == "start": i = "\n" + depth*" " depth += 1 outputFile.write('%s<%s' % (i,elem.tag)) if len(elem.items()): attrs = elem.items() attrs.sort() outputFile.write(' ') outputFile.write(' '.join(['%s="%s"' % (a[0],a[1]) for a in attrs if a[0] != 'size'])) if elem.text and elem.text.strip(): outputFile.write('>%s' % elem.text.strip('\n').encode('utf-8')) elif len(elem): outputFile.write('>') if event == "end": if (elem.text and elem.text.strip()) or len(elem): outputFile.write('%s' % (i,elem.tag)) else: outputFile.write('/>') if elem.tail and elem.tail.strip(): outputFile.write(elem.tail.strip('\n').encode('utf-8')) depth -= 1 elem.clear() Olivier. >>> Stefan Behnel 11/02/08 7:56 pm >>> Hi, Olivier Collioud wrote: > I would like to use my favourite text diffing tool to compare XML > files. Which is not lxml.html.diff, I assume? (I'm not sure how HTML specific that is, BTW). Also, for doctests, there is lxml.usedoctest that you can import (the lxml web pages use it for doctests). > Is their a way to produce a pretty printed canonical version of my XML > files using lxml ? Not using the c14n interface (libxml2 doesn't support it). Serialising by hand is not too hard, though. You can look at ElementTree._write() for an example: http://svn.effbot.org/public/elementtree/elementtree/ElementTree.py Stefan _______________________________________________ lxml-dev mailing list lxml-dev at codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. From jholg at gmx.de Tue Feb 12 08:50:16 2008 From: jholg at gmx.de (jholg at gmx.de) Date: Tue, 12 Feb 2008 08:50:16 +0100 Subject: [lxml-dev] Fwd: News flash: Python possibly guilty in excessive DTD traffic In-Reply-To: <47B0A59E.9000502@behnel.de> References: <20080209040312.725218a2@dartworks.biz> <47AD5EE4.5080600@behnel.de> <47B0A59E.9000502@behnel.de> Message-ID: <20080212075913.239140@gmx.net> Hi, > Secondly, lxml 2.0 does not load referenced network resources by default. > While it loads documents that you explicitly ask it to download by > parsing > from a URL, you will also have to explicitly tell it to enable network > access > for referenced resources like DTDs, schemas and the like, again, by > configuring a parser. > ?A question on this: I don't see any problems when network-parsing a schema that includes other schemas: ?>>> schema = etree.XMLSchema(root) >>> print etree.__version__ 2.0.0-51192 >>> root = objectify.parse("http://adevp02:8080/accountSummary-1.2.xsd").getroot() >>> schema = etree.XMLSchema(root) >>> ?My simple http server says this: ?adevp01.ae.hz.lbbw.sko.de - - [12/Feb/2008 08:49:19] "GET /accountSummary-1.2.xsd HTTP/1.0" 200 - adevp01.ae.hz.lbbw.sko.de - - [12/Feb/2008 08:49:28] "GET /iso3currency.xsd HTTP/1.0" 200 - adevp01.ae.hz.lbbw.sko.de - - [12/Feb/2008 08:49:28] "GET /iso3currency-1.0.xsd HTTP/1.0" 200 - ?where the first GET is the parse operation and the 2nd & 3rd GET are the "schemafying" of the parsed doc. ?Now, what I'm curious about is that I did never set no_network to False. Here's how I initialize lxml:? ?def _register(): ??? """Register lxml objectify module with pytaf standard settings. ??? Needs not be explicitly called when importing xmsg from a pytaf installation ??? as this is done on first xmsg module import. ??? """ ??? # set a default parser that removes whitespace in mixed-content elements ??? parser = etree.XMLParser(remove_blank_text=True) ??? # enable ns/tag-based lookup that falls back on pytype/xsi:type/guess-lookup ??? lookup = etree.ElementNamespaceClassLookup( ??????? objectify.ObjectifyElementClassLookup()) ??? parser.setElementClassLookup(lookup) ??? # set our parser as objectify default parser ??? objectify.setDefaultParser(parser) ??? # Set our parser as etree default parser, too. Otherwise etree.Element() ??? # returns etree._Element instead of ObjectifiedElements ??? etree.setDefaultParser(parser) ??? ??? # enable recursive pretty-printing of ObjectifiedElements ??? objectify.enableRecursiveStr() ???? Holger? -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080212/1eeaaa81/attachment.htm From Olivier.Collioud at wipo.int Tue Feb 12 10:52:59 2008 From: Olivier.Collioud at wipo.int (Olivier Collioud) Date: Tue, 12 Feb 2008 10:52:59 +0100 Subject: [lxml-dev] c14n, pretty printing and diffing Message-ID: For those interested by the iterparse method, the following is much better: sourceTree = ElementTree.iterparse(open(inputDir+'/'+file, 'r'), events=("start", "end")) for event, elem in sourceTree: if event == "start": i = "\n" + depth*" " depth += 1 outputFile.write('%s<%s' % (i,elem.tag)) if len(elem.items()): attrs = elem.items() attrs.sort() outputFile.write(' ') outputFile.write(' '.join(['%s="%s"' % (a[0],a[1]) for a in attrs if a[0] != 'size'])) outputFile.write('>') if elem.text and elem.text.strip(): outputFile.write(elem.text.strip('\n').encode('utf-8')) if event == "end": outputFile.write('%s' % (i,elem.tag)) if elem.tail and elem.tail.strip(): outputFile.write(elem.tail.strip('\n').encode('utf-8')) depth -= 1 elem.clear() because when event == 'start' then len(elem) is always 0, and I don't how to guess if the element will have some content in order to produce en empty tag (or not). Therefore,the above code always produce an element end tag even when there is no content. >>> "Olivier Collioud" 12/02/08 7:26 am >>> Thanks Stephan. I prefer visual diffing : the ones provided by Eclipse, TkDiff or WinMerge. I did not fin any doc or usage example of lxml.usedoctest, could you please give some pointer ? Let me share my simple (because I do not use any namespace, PI, comment...) solution based on iterparse: depth = 0 sourceTree = ElementTree.iterparse(open(inputFile, 'r'), events=("start", "end")) for event, elem in sourceTree: if event == "start": i = "\n" + depth*" " depth += 1 outputFile.write('%s<%s' % (i,elem.tag)) if len(elem.items()): attrs = elem.items() attrs.sort() outputFile.write(' ') outputFile.write(' '.join(['%s="%s"' % (a[0],a[1]) for a in attrs if a[0] != 'size'])) if elem.text and elem.text.strip(): outputFile.write('>%s' % elem.text.strip('\n').encode('utf-8')) elif len(elem): outputFile.write('>') if event == "end": if (elem.text and elem.text.strip()) or len(elem): outputFile.write('%s' % (i,elem.tag)) else: outputFile.write('/>') if elem.tail and elem.tail.strip(): outputFile.write(elem.tail.strip('\n').encode('utf-8')) depth -= 1 elem.clear() Olivier. >>> Stefan Behnel 11/02/08 7:56 pm >>> Hi, Olivier Collioud wrote: > I would like to use my favourite text diffing tool to compare XML > files. Which is not lxml.html.diff, I assume? (I'm not sure how HTML specific that is, BTW). Also, for doctests, there is lxml.usedoctest that you can import (the lxml web pages use it for doctests). > Is their a way to produce a pretty printed canonical version of my XML > files using lxml ? Not using the c14n interface (libxml2 doesn't support it). Serialising by hand is not too hard, though. You can look at ElementTree._write() for an example: http://svn.effbot.org/public/elementtree/elementtree/ElementTree.py Stefan _______________________________________________ lxml-dev mailing list lxml-dev at codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. _______________________________________________ lxml-dev mailing list lxml-dev at codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. From stefan_ml at behnel.de Tue Feb 12 13:34:31 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 12 Feb 2008 13:34:31 +0100 (CET) Subject: [lxml-dev] c14n, pretty printing and diffing In-Reply-To: References: Message-ID: <23198.194.114.62.39.1202819671.squirrel@groupware.dvs.informatik.tu-darmstadt.de> >> "Olivier Collioud" wrote: > I did not fin any doc or usage example of lxml.usedoctest, > could you please give some pointer ? As I said, you can just import it, like all the doctests on the webpage do. http://codespeak.net/lxml/lxml2.html#new-modules Here is an example: http://codespeak.net/lxml/objectify.html Stefan From sidnei at enfoldsystems.com Tue Feb 12 14:24:45 2008 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Tue, 12 Feb 2008 11:24:45 -0200 Subject: [lxml-dev] lxml 2.0 on Windows In-Reply-To: References: Message-ID: Hi Evan, I've just uploaded the Windows Binaries for lxml 2.0. Thanks! -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Tue Feb 12 14:32:50 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 12 Feb 2008 14:32:50 +0100 (CET) Subject: [lxml-dev] network access in lxml API In-Reply-To: <20080212075913.239140@gmx.net> References: <20080209040312.725218a2@dartworks.biz> <47AD5EE4.5080600@behnel.de> <47B0A59E.9000502@behnel.de> <20080212075913.239140@gmx.net> Message-ID: <34637.194.114.62.39.1202823170.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Hi Holger, >> Secondly, lxml 2.0 does not load referenced network resources by >> default. >> While it loads documents that you explicitly ask it to download by >> parsing >> from a URL, you will also have to explicitly tell it to enable network >> access >> for referenced resources like DTDs, schemas and the like, again, by >> configuring a parser. > > A question on this: > > I don't see any problems when network-parsing a schema that includes other > schemas: > > >>> schema = etree.XMLSchema(root) >>>> print etree.__version__ > 2.0.0-51192 >>>> root = > objectify.parse("http://adevp02:8080/accountSummary-1.2.xsd").getroot() >>>> schema = etree.XMLSchema(root) Hmm, ok, I was refering to the parsers. Schema imports will always work - and I see no reason to disable them, as this would break schema handling. If you want to use a schema from the network that uses imports (or one that explicitly imports from a URL), be prepared for network access. Note that this is still different from *parsing* the schema document. You have to actually create an XMLSchema() object, which I find a pretty clear indication that you want to use the schema, thus requiring the imports to be resolved. But now that you mention it, I noticed that XSLT allows network access by default. This means that you can use imports, but also that you can do "document('http://evilsite.com')" in a stylesheet. I'm not sure into which category this falls, the schema-handling or the parsers, but it looks more like something that should not be restricted by default, as it's explicit in the stylesheet. In the parser case, you'd have to explicitly enable external loading anyway, so you can enable network access right in the same line of code. So, to sum it up, I think it's ok the way it is now, and it's also easy to use caching, just by not re-instantiating the schema/XSLT object too often. Stefan From egrim at swri.org Tue Feb 12 23:51:39 2008 From: egrim at swri.org (Evan Grim) Date: Tue, 12 Feb 2008 22:51:39 +0000 (UTC) Subject: [lxml-dev] lxml 2.0 on Windows References: Message-ID: Sidnei da Silva enfoldsystems.com> writes: > > Hi Evan, > > I've just uploaded the Windows Binaries for lxml 2.0. Thanks! > They worked perfectly. Thank you so much for the prompt reaction! From ianb at colorstudy.com Wed Feb 13 06:47:35 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 12 Feb 2008 23:47:35 -0600 Subject: [lxml-dev] Bug in lxml.html with nameless form submit? In-Reply-To: <47B02DB1.1050501@behnel.de> References: <20080211070411.GB11341@slinkp.com> <47B02DB1.1050501@behnel.de> Message-ID: <47B28477.3010102@colorstudy.com> I believe this is fixed in trunk (thanks for the test Paul); None just shouldn't have shown up in keys. Stefan Behnel wrote: > Hi, > > Paul Winkler wrote: >> This is a pretty common idiom in html - to have a submit button that >> has no name: >> >> > > Definitely valid HTML: > > http://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.4 > > >> But lxml.html barfs if you use the form's fields.items() or >> fields.values() method: >> >>>>> import lxml.html >>>>> tree = lxml.html.fromstring(''' >> ... >> ...
>> ... >> ... >> ...
>> ... >> ... ''') >>>>> tree >> >>>>> tree.forms >> tree.forms >>>>> tree.forms[0] >> >>>>> tree.forms[0].fields >> >>>>> tree.forms[0].fields.keys() >> [None, 'foo'] >>>>> tree.forms[0].fields.items() >> Traceback (most recent call last): >> File "", line 1, in ? >> File "/usr/lib/python2.4/UserDict.py", line 112, in items >> return list(self.iteritems()) >> File "/usr/lib/python2.4/UserDict.py", line 101, in iteritems >> yield (k, self[k]) >> File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 749, in __getitem__ >> return self.inputs[item].value >> File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 811, in __getitem__ >> raise KeyError( >> KeyError: 'No input element with the name None' >>>>> tree.forms[0].fields.values() >> Traceback (most recent call last): >> File "", line 1, in ? >> File "/usr/lib/python2.4/UserDict.py", line 110, in values >> return [v for _, v in self.iteritems()] >> File "/usr/lib/python2.4/UserDict.py", line 101, in iteritems >> yield (k, self[k]) >> File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 749, in __getitem__ >> return self.inputs[item].value >> File "/usr/lib64/python2.4/site-packages/lxml-2.0-py2.4-linux-x86_64.egg/lxml/html/__init__.py", line 811, in __getitem__ >> raise KeyError( >> KeyError: 'No input element with the name None' > > Looks like a bug to me. Ian? > > Stefan > From ianb at colorstudy.com Wed Feb 13 06:53:13 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 12 Feb 2008 23:53:13 -0600 Subject: [lxml-dev] c14n, pretty printing and diffing In-Reply-To: <47B09A5C.1070308@behnel.de> References: <47B09A5C.1070308@behnel.de> Message-ID: <47B285C9.2030702@colorstudy.com> Stefan Behnel wrote: > Olivier Collioud wrote: >> I would like to use my favourite text diffing tool to compare XML >> files. > > Which is not lxml.html.diff, I assume? (I'm not sure how HTML specific that > is, BTW). lxml.html.diff is intended for presenting to readers, like a history diff of a page; it's content-focused, and ignores changes to markup (though it attempts to show the content inside the newest version of the markup). It's not all that HTML specific exactly; it would probably work okay with Docbook too. But it's very specific to presentational markup. Ian From stefan_ml at behnel.de Wed Feb 13 22:58:09 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 13 Feb 2008 22:58:09 +0100 Subject: [lxml-dev] lxml 2.0.1 released Message-ID: <47B367F1.1090903@behnel.de> Hi all, lxml 2.0.1 is out, changelog follows as usual. The most important changes may actually be the clean-up of the generated API documentation, including loads of added doc-strings all over the place. Have a look at it here: http://codespeak.net/lxml/api/index.html More documentation patches will definitely be appreciated! Have fun, Stefan 2.0.1 (2008-02-13) Features added * Child iteration in lxml.pyclasslookup. * Loads of new docstrings reflect the signature of functions and methods to make them visible in API docs and help() Bugs fixed * The module lxml.html.builder was duplicated as lxml.htmlbuilder * Form elements would return None for form.fields.keys() if there was an unnamed input field. Now unnamed input fields are completely ignored. * Setting an element slice in objectify could insert slice-overlapping elements at the wrong position. Other changes * The generated API documentation was cleaned up and disburdened from non-public classes etc. * The previously public module lxml.html.setmixin was renamed to lxml.html._setmixin as it is not an official part of lxml. If you want to use it, feel free to copy it over to your own source base. * Passing --with-xslt-config=/path/to/xslt-config to setup.py will override the xslt-config script that is used to determine the C compiler options. From marius at pov.lt Thu Feb 14 20:53:03 2008 From: marius at pov.lt (Marius Gedminas) Date: Thu, 14 Feb 2008 21:53:03 +0200 Subject: [lxml-dev] One-time memory leak? Message-ID: <20080214195303.GB24017@fridge.pov.lt> Hi! I've been using libxml2 (before lxml was even created) and I've built some infrastructure for catching libxml2 memory leaks in my unit tests. Recently I've started using lxml on a completely different project and noticed that my old leak watcher was hooked up -- because it reported a leak. This is most likely a false positive (the "leak" happens only once during the program's lifetime), but I'd like to understand what exactly happens. I'm attaching a short test program that produces this output on my machine: $ bin/python lxml-memleak.py test_libxml2_html: leaked 0 bytes test_libxml2_xml: leaked 0 bytes test_lxml_html: leaked 9423 bytes test_lxml_xml: leaked 9479 bytes This is in a virtualenv sandbox with lxml 2.0.1 from cheeseshop and system-wide libxml2 2.0.30 (plus a security patch or two) from Ubuntu Gutsy. Each of those tests was run in a separate Python process to avoid contamination. Note that if I run the same test more than once, I see no new leaks: $ bin/python lxml-memleak.py test_lxml_html 3 test_lxml_html: leaked 9423 bytes test_lxml_html: leaked 0 bytes test_lxml_html: leaked 0 bytes which leads me to think this "leak" is in fact harmless on-demand initialization of some sort. I would like to improve my leak detector to avoid false positives (it already does a funny dance with initParser/cleanupParser to do so). I've tried looking at the lxml source code but gave up in about 30 seconds. I don't know Cython. I can't tell which is generated code and which is the source for that. I cannot find the entry point that would let me trace how lxml.etree.HTML() is implemented ("HTML" is a pretty ungreppable string). ltrace'ing a Python process failed to notice any dynamic library calls to libxml2's functions. How can I translate the short lxml code snippet from lxml.etree import HTML doc = HTML(sample_document) del doc to low-level libxml2 library function calls and see where it allocates the extra memory? I could, of course, declare lxml to be leak-free and just disable my leak finder, but I cannot resist the opportunity to make sure of it (and for that I need a leak detector without false positives). Regards, Marius Gedminas -- Professionalism has no place in art, and hacking is art. Software Engineering might be science; but that's not what I do. I'm a hacker, not an engineer. -- jwz -------------- next part -------------- A non-text attachment was scrubbed... Name: lxml-memleak.py Type: text/x-python Size: 1372 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080214/dbf47b65/attachment.py -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://codespeak.net/pipermail/lxml-dev/attachments/20080214/dbf47b65/attachment.pgp From stefan_ml at behnel.de Thu Feb 14 22:02:49 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 14 Feb 2008 22:02:49 +0100 Subject: [lxml-dev] One-time memory leak? In-Reply-To: <20080214195303.GB24017@fridge.pov.lt> References: <20080214195303.GB24017@fridge.pov.lt> Message-ID: <47B4AC79.3090109@behnel.de> Hi, Marius Gedminas wrote: > I've been using libxml2 (before lxml was even created) and I've built > some infrastructure for catching libxml2 memory leaks in my unit tests. > Recently I've started using lxml on a completely different project and > noticed that my old leak watcher was hooked up -- because it reported a > leak. > > This is most likely a false positive (the "leak" happens only once during the > program's lifetime), but I'd like to understand what exactly happens. I'm > attaching a short test program that produces this output on my machine: > > $ bin/python lxml-memleak.py > test_libxml2_html: leaked 0 bytes > test_libxml2_xml: leaked 0 bytes > test_lxml_html: leaked 9423 bytes > test_lxml_xml: leaked 9479 bytes > > This is in a virtualenv sandbox with lxml 2.0.1 from cheeseshop and > system-wide libxml2 2.0.30 (plus a security patch or two) from Ubuntu > Gutsy. Each of those tests was run in a separate Python process to > avoid contamination. You're not testing the same thing, though. lxml does all sorts of stuff when you call etree.HTML(), not just a plain call to the parser. Also, I have no idea what happens when you use lxml and libxml2 together - which you still do here, as you call into libxml2 to enable leak debugging. > Note that if I run the same test more than once, I see no new leaks: > > $ bin/python lxml-memleak.py test_lxml_html 3 > test_lxml_html: leaked 9423 bytes > test_lxml_html: leaked 0 bytes > test_lxml_html: leaked 0 bytes > > which leads me to think this "leak" is in fact harmless on-demand > initialization of some sort. That may be so - but I wouldn't sign it without further investigation. :) > I've tried looking at the lxml source code but gave up in about 30 > seconds. I don't know Cython. I can't tell which is generated code and > which is the source for that. The .pyx and .pxi files are what you want to look at first (maybe I should write up some "how to read the source" docs...) And don't be afraid of Cython, it's a lot like Python, and there are some editors (and some ways of life like Emacs) that can display it with colourful syntax highlighting. > I cannot find the entry point that would > let me trace how lxml.etree.HTML() is implemented ("HTML" is a pretty > ungreppable string). There is a file called "lxml.etree.pyx", which is the main module. It contains the main API implementation. However, the HTML() function will quickly jump into "_parseMemoryDocument(...)", which is implemented at the end of the "parser.pxi" file. The call line then continues up to a call to _BaseParser._parseDoc(), where the actual parsing step is implemented. > ltrace'ing a Python process failed to notice any > dynamic library calls to libxml2's functions. Just guessing, but maybe that's because it needs to trace calls from a library dynamically loaded by Python? > How can I translate the short lxml code snippet > > from lxml.etree import HTML > doc = HTML(sample_document) > del doc > > to low-level libxml2 library function calls and see where it allocates > the extra memory? Hmmm, that's three lines of Python, but there really is a lot happening behind the scenes, so that's harder to answer than you might think. I don't know if you noticed, but parsers can do a lot of weird stuff in lxml, and whenever you parse a byte string in any of your threads, it will end up in that function in one way or another... I guess it would actually be easiest if you could get ltrace to work with a Python extension module... > I could, of course, declare lxml to be leak-free and just disable my > leak finder, but I cannot resist the opportunity to make sure of it (and > for that I need a leak detector without false positives). I totally find that a good idea. Normally, I use valgrind for leak debugging, but having something that you could switch on and off around a unit test would be just perfect. I'd say the best way would be to add a debugging module to lxml that would just call into the libxml2 debugging API. Stefan From cz at gocept.com Fri Feb 15 09:25:07 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Fri, 15 Feb 2008 09:25:07 +0100 Subject: [lxml-dev] lxml 2.0.1 released References: <47B367F1.1090903@behnel.de> Message-ID: On 2008-02-13 22:58:09 +0100, Stefan Behnel said: > Hi all, > > lxml 2.0.1 is out, changelog follows as usual. Works like charm. Thanks. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Fri Feb 15 10:40:08 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Feb 2008 10:40:08 +0100 Subject: [lxml-dev] Source code "meta" documentation Message-ID: <47B55DF8.2030306@behnel.de> Hi, Stefan Behnel wrote (elsewhere): > The .pyx and .pxi files are what you want to look at first (maybe I should > write up some "how to read the source" docs...) I started an initial document here: http://codespeak.net/svn/lxml/trunk/doc/lxml-source-howto.txt I'd be happy if others could contribute sections about the parts they know, such as lxml.objectify and lxml.html. Questions that help in focussing the document are also appreciated: what would you like to know for starters? Stefan From ebgssth at gmail.com Fri Feb 15 13:41:38 2008 From: ebgssth at gmail.com (js) Date: Fri, 15 Feb 2008 21:41:38 +0900 Subject: [lxml-dev] CPU you selected does not support x86-64 instruction set Message-ID: HI, I was trying to install lxml on FreeBSD 6.2 Xeon server with "easy_install lxml", but I got weird error "CPU you selected does not support x86-64 instruction set". I searched on the web with no luck. Does anybody know this error? From ebgssth at gmail.com Fri Feb 15 14:30:00 2008 From: ebgssth at gmail.com (js) Date: Fri, 15 Feb 2008 22:30:00 +0900 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> Message-ID: A quick question. Does this version of lxml fixed segfault bug on Mac OSX? make test passed all test on OSX 10.4.11. $ make test python setup.py build_ext -i Building lxml version 2.0.1. NOTE: Trying to build without Cython, pre-generated 'src/lxml/etree.c' needs to be available. running build_ext copying build/lib.macosx-10.3-i386-2.5/lxml/etree.so -> src/lxml copying build/lib.macosx-10.3-i386-2.5/lxml/objectify.so -> src/lxml copying build/lib.macosx-10.3-i386-2.5/lxml/pyclasslookup.so -> src/lxml python test.py -p -v TESTED VERSION: 2.0.1 Python: (2, 5, 1, 'final', 0) lxml.etree: (2, 0, 1, 0) libxml used: (2, 6, 31) libxml compiled: (2, 6, 31) libxslt used: (1, 1, 22) libxslt compiled: (1, 1, 22) 881/881 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 881 tests in 9.743s OK PYTHONPATH=src python selftest.py 173 tests ok. PYTHONPATH=src python selftest2.py 94 tests ok. On Fri, Feb 15, 2008 at 5:25 PM, Christian Zagrodnick wrote: > On 2008-02-13 22:58:09 +0100, Stefan Behnel said: > > > Hi all, > > > > lxml 2.0.1 is out, changelog follows as usual. > > Works like charm. Thanks. > > -- > Christian Zagrodnick > > gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale > www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 > > > > > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > From stefan_ml at behnel.de Fri Feb 15 15:29:01 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Feb 2008 15:29:01 +0100 Subject: [lxml-dev] CPU you selected does not support x86-64 instruction set In-Reply-To: References: Message-ID: <47B5A1AD.3090706@behnel.de> Hi, js wrote: > I was trying to install lxml on FreeBSD 6.2 Xeon server with > "easy_install lxml", > but I got weird error "CPU you selected does not support x86-64 > instruction set". > I searched on the web with no luck. > > Does anybody know this error? You didn't give the complete error output, but check your CFLAGS (and the command line you see when the compiler starts up). Maybe there's something wrong with the compiler options that distutils use. Stefan From xphuture at gmail.com Fri Feb 15 21:06:47 2008 From: xphuture at gmail.com (Fabien) Date: Fri, 15 Feb 2008 21:06:47 +0100 Subject: [lxml-dev] Setting default namespace Message-ID: <622afeaa0802151206s62b46a02j634ecfd2a49eb4a8@mail.gmail.com> Hello, I've a document with a namespace of the root element of my document and when I display an element .tag, I get it with the namespace. Also, when I'm using xpath(), I need to use the namespaces={..}) parameters. Is there a way to define a default namespace in order to manipulate the document like if it hasn't namespace ? >>> xml = """ ... ... ... ... ... """ >>> tree = etree.fromstring(xml) >>> print tree >>> for element in tree: ... print element.tag ... {http://www.example.com}a {http://www.example.com}b Thanks in advance. -- Fabien SCHWOB _____________________________________________________________ Derri?re chaque bogue, il y a un d?veloppeur, un homme qui s'est tromp?. (Bon, OK, parfois ils s'y mettent ? plusieurs). From ebgssth at gmail.com Sat Feb 16 06:35:56 2008 From: ebgssth at gmail.com (js) Date: Sat, 16 Feb 2008 14:35:56 +0900 Subject: [lxml-dev] CPU you selected does not support x86-64 instruction set In-Reply-To: <47B5A1AD.3090706@behnel.de> References: <47B5A1AD.3090706@behnel.de> Message-ID: I don't have access to the box right now, that why I couldn't give you the complete error output. Who uses Xeon server at home :) I'll check what causes this, but wanted to know if someone's already having this issue. On Feb 15, 2008 11:29 PM, Stefan Behnel wrote: > Hi, > > > js wrote: > > I was trying to install lxml on FreeBSD 6.2 Xeon server with > > "easy_install lxml", > > but I got weird error "CPU you selected does not support x86-64 > > instruction set". > > I searched on the web with no luck. > > > > Does anybody know this error? > > You didn't give the complete error output, but check your CFLAGS (and the > command line you see when the compiler starts up). Maybe there's something > wrong with the compiler options that distutils use. > > Stefan > From ebgssth at gmail.com Sat Feb 16 06:40:43 2008 From: ebgssth at gmail.com (js) Date: Sat, 16 Feb 2008 14:40:43 +0900 Subject: [lxml-dev] CPU you selected does not support x86-64 instruction set In-Reply-To: References: <47B5A1AD.3090706@behnel.de> Message-ID: Oops, seems not fixed. Python 2.5.1 (r251:54863, Feb 2 2008, 17:46:03) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import webbrowser >>> from lxml import etree, html zsh: bus error python2.5 On Feb 16, 2008 2:35 PM, js wrote: > I don't have access to the box right now, > that why I couldn't give you the complete error output. > Who uses Xeon server at home :) > > I'll check what causes this, but wanted to know if someone's already > having this issue. > > > > On Feb 15, 2008 11:29 PM, Stefan Behnel wrote: > > Hi, > > > > > > js wrote: > > > I was trying to install lxml on FreeBSD 6.2 Xeon server with > > > "easy_install lxml", > > > but I got weird error "CPU you selected does not support x86-64 > > > instruction set". > > > I searched on the web with no luck. > > > > > > Does anybody know this error? > > > > You didn't give the complete error output, but check your CFLAGS (and the > > command line you see when the compiler starts up). Maybe there's something > > wrong with the compiler options that distutils use. > > > > Stefan > > > From stefan_ml at behnel.de Sat Feb 16 08:18:13 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 16 Feb 2008 08:18:13 +0100 Subject: [lxml-dev] Setting default namespace In-Reply-To: <622afeaa0802151206s62b46a02j634ecfd2a49eb4a8@mail.gmail.com> References: <622afeaa0802151206s62b46a02j634ecfd2a49eb4a8@mail.gmail.com> Message-ID: <47B68E35.3010504@behnel.de> Hi, Fabien wrote: > I've a document with a namespace of the root element of my document > and when I display an element .tag, I get it with the namespace. Also, > when I'm using xpath(), I need to use the namespaces={..}) parameters. > Is there a way to define a default namespace in order to manipulate > the document like if it hasn't namespace ? No. Because a namespaced name is different from a non-namespaced name. Take a look at the E factory and lxml.objectify, if you want some more magic at the API level. http://codespeak.net/lxml/tutorial.html#the-e-factory http://codespeak.net/lxml/objectify.html http://codespeak.net/lxml/objectify.html#tree-generation-with-the-e-factory Stefan From stefan_ml at behnel.de Sat Feb 16 08:27:04 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 16 Feb 2008 08:27:04 +0100 Subject: [lxml-dev] CPU you selected does not support x86-64 instruction set In-Reply-To: References: <47B5A1AD.3090706@behnel.de> Message-ID: <47B69048.9000403@behnel.de> Hi, js wrote: > Oops, seems not fixed. > > Python 2.5.1 (r251:54863, Feb 2 2008, 17:46:03) > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> import webbrowser >>>> from lxml import etree, html > zsh: bus error python2.5 Works for me: Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32) [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import webbrowser >>> from lxml import etree,html >>> Please, fix your system. Stefan From ebgssth at gmail.com Sat Feb 16 09:15:53 2008 From: ebgssth at gmail.com (js) Date: Sat, 16 Feb 2008 17:15:53 +0900 Subject: [lxml-dev] CPU you selected does not support x86-64 instruction set In-Reply-To: <47B69048.9000403@behnel.de> References: <47B5A1AD.3090706@behnel.de> <47B69048.9000403@behnel.de> Message-ID: Sorry, previous mail should be sent to another thread. Please ignore... On Feb 16, 2008 4:27 PM, Stefan Behnel wrote: > Hi, > > js wrote: > > Oops, seems not fixed. > > > > Python 2.5.1 (r251:54863, Feb 2 2008, 17:46:03) > > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > >>>> import webbrowser > >>>> from lxml import etree, html > > zsh: bus error python2.5 > > Works for me: > > Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32) > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import webbrowser > >>> from lxml import etree,html > >>> > > Please, fix your system. > > Stefan > From ebgssth at gmail.com Sat Feb 16 09:17:01 2008 From: ebgssth at gmail.com (js) Date: Sat, 16 Feb 2008 17:17:01 +0900 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> Message-ID: Confirmed that the latest lxml still has some problem on OS X. Python 2.5.1 (r251:54863, Feb 2 2008, 17:46:03) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import webbrowser >>> from lxml import etree, html zsh: bus error python2.5 From ebgssth at gmail.com Sat Feb 16 09:19:40 2008 From: ebgssth at gmail.com (js) Date: Sat, 16 Feb 2008 17:19:40 +0900 Subject: [lxml-dev] CPU you selected does not support x86-64 instruction set In-Reply-To: <47B69048.9000403@behnel.de> References: <47B5A1AD.3090706@behnel.de> <47B69048.9000403@behnel.de> Message-ID: And as for Xeon's problem, I'll investigate it a bit more within a few days. I think I'll be able to bring some more info on this next time. Thanks. On Feb 16, 2008 4:27 PM, Stefan Behnel wrote: > Hi, > > js wrote: > > Oops, seems not fixed. > > > > Python 2.5.1 (r251:54863, Feb 2 2008, 17:46:03) > > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > >>>> import webbrowser > >>>> from lxml import etree, html > > zsh: bus error python2.5 > > Works for me: > > Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32) > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import webbrowser > >>> from lxml import etree,html > >>> > > Please, fix your system. > > Stefan > From cz at gocept.com Sat Feb 16 09:43:41 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Sat, 16 Feb 2008 09:43:41 +0100 Subject: [lxml-dev] lxml 2.0.1 released References: <47B367F1.1090903@behnel.de> Message-ID: On 2008-02-15 14:30:00 +0100, js said: > A quick question. > Does this version of lxml fixed segfault bug on Mac OSX? I don't get segfaults on OS X. Using Python 2.4 though. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From ebgssth at gmail.com Sat Feb 16 10:03:38 2008 From: ebgssth at gmail.com (js) Date: Sat, 16 Feb 2008 18:03:38 +0900 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> Message-ID: > > A quick question. > > Does this version of lxml fixed segfault bug on Mac OSX? > > I don't get segfaults on OS X. Using Python 2.4 though. I'm using 10.4.11, Python 2.4.4 by MacPorts. Python 2.4.4 (#1, Feb 12 2008, 23:51:38) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import webbrowser >>> from lxml import etree, html >>> etree.LXML_VERSION (2, 0, 1, 0) >>> html.parse('http://example.com') Python(29901) malloc: *** Deallocation of a pointer not malloced: 0x80; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug From stefan_ml at behnel.de Sun Feb 17 07:54:36 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 17 Feb 2008 07:54:36 +0100 Subject: [lxml-dev] Validating cElementTrees with lxml In-Reply-To: References: Message-ID: <47B7DA2C.8080704@behnel.de> Hi, Ryan K wrote: > If I have a cElementTree.ElementTree (or the one from the Standard > Library), can I use lxml's validation features on it since it > implements the same ElementTree API? Not directly. lxml and cElementTree use different tree models internally, so you can't just apply C-implemented features of one to the other. Any reason you can't just use lxml /instead/ of cElementTree? In case you really want to combine both: to validate the tree, you only need to create an lxml tree from your ElementTree in a one-way operation, which is easy, as this can be done through the (identical) API. Just create a new lxml Element and then traverse the ElementTree recursively, create a new SubElement for each child you find, and set its text, tail and attrib properties. Something like this might work (though untested): class TreeMigrator(object): def __init__(ET_impl_from, ET_impl_to): self.Element = ET_impl_to.Element self.SubElement = ET_impl_to.SubElement def copyChildren(self, from_parent, to_parent): for from_child in from_parent: tag = from_child.tag if not isinstance(tag, basestring): # skip Comments etc. continue to_child = self.SubElement( to_parent, tag, **from_child.attrib) to_child.text = child.text to_child.tail = child.tail self.copyChildren(from_child, to_child) def __call__(self, from_root): tag = from_root.tag to_root = self.Element(tag, **from_root.attrib) to_root.text = from_root.text to_root.tail = from_root.tail if isinstance(tag, basestring): # skip Comments etc. self.copyChildren(from_root, to_root) return to_root Feel free to finish it up and send it back to the list. :) Stefan From ianb at colorstudy.com Mon Feb 18 00:38:22 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 17 Feb 2008 17:38:22 -0600 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc Message-ID: <47B8C56E.3090106@colorstudy.com> There doesn't seem to be any way to set a document's URL when parsing the document. E.g.: >>> from lxml import html >>> tree = html.parse('http://www.python.org') >>> tree.docinfo.URL 'http://www.python.org' But the parse function doesn't really take any arguments, and the URL attribute is write-only. Ideally you could do fromstring('...doc...', URL='location'). (Also I'm not sure why the URL shouldn't be writable.) Ian From stefan_ml at behnel.de Mon Feb 18 09:33:17 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 18 Feb 2008 09:33:17 +0100 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47B8C56E.3090106@colorstudy.com> References: <47B8C56E.3090106@colorstudy.com> Message-ID: <47B942CD.5090501@behnel.de> Hi Ian, Ian Bicking wrote: > There doesn't seem to be any way to set a document's URL when parsing > the document. E.g.: > > >>> from lxml import html > >>> tree = html.parse('http://www.python.org') > >>> tree.docinfo.URL > 'http://www.python.org' > > But the parse function doesn't really take any arguments, and the URL > attribute is write-only. Ideally you could do fromstring('...doc...', > URL='location'). All keyword arguments that you pass to the parse/fromstring functions are passed on to lxml.etree's corresponding functions. That means, you can pass the "base_url" keyword. (Maybe that should be mentioned in the docstrings). > Also I'm not sure why the URL shouldn't be writable. What would be the use case? The problem that arises is that the source URL of a document would no longer be an immutable identifier of the document. If it can change, it's less valuable for caching (for example). It's a different thing if you pass a URL to the parser because it can't know where the document came from, or if you change the 'source' of a document at will. Stefan From cz at gocept.com Mon Feb 18 14:04:45 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 18 Feb 2008 14:04:45 +0100 Subject: [lxml-dev] Explicit type checking and zope.security Message-ID: Hi, lxml does quite some explicit type checking. For instance: object_path.addattr(root, value) gives me an error if root is security proxied: File "objectpath.pxi", line 74, in lxml.objectify.ObjectPath.addattr TypeError: Argument 'root' has incorrect type (expected lxml.etree._Element, got zope.security._proxy._Proxy) Is this explicit checking really necessary? It's quite annoying to have to unwrap everything (and eventually check manually for security) before putting it into lxml's hands. Any other ideas? Regards, -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From cz at gocept.com Mon Feb 18 14:08:38 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 18 Feb 2008 14:08:38 +0100 Subject: [lxml-dev] lxml 2.0.1 released References: <47B367F1.1090903@behnel.de> Message-ID: On 2008-02-16 10:03:38 +0100, js said: >>> A quick question. >>> Does this version of lxml fixed segfault bug on Mac OSX? >> >> I don't get segfaults on OS X. Using Python 2.4 though. > > I'm using 10.4.11, Python 2.4.4 by MacPorts. > > Python 2.4.4 (#1, Feb 12 2008, 23:51:38) > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> import webbrowser >>>> from lxml import etree, html >>>> etree.LXML_VERSION > (2, 0, 1, 0) >>>> html.parse('http://example.com') > Python(29901) malloc: *** Deallocation of a pointer not malloced: > 0x80; This could be a double free(), or free() called with the middle > of an allocated block; Try setting environment variable MallocHelp to > see tools to help debug > % PYTHONPATH=develop-eggs/lxml-2.0.1-py2.4-macosx-10.5-i386.egg ~/development/python/bin/python2.4 Python 2.4.4 (#1, Dec 14 2007, 15:35:42) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import lxml.html >>> lxml.html.parse('http://example.com') >>> import lxml.etree >>> lxml.etree.LXML_VERSION (2, 0, 1, 0) >>> lxml.etree.LIBXML_VERSION (2, 6, 30) >>> lxml.etree.LIBXSLT_VERSION (1, 1, 22) >>> That is with a manually built python; ibxml2, libxslt and lxml built via buildout. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From ebgssth at gmail.com Mon Feb 18 14:13:25 2008 From: ebgssth at gmail.com (js) Date: Mon, 18 Feb 2008 22:13:25 +0900 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> Message-ID: Try import webbrowser first, then use lxml. I don't know why but this makes difference. On Feb 18, 2008 10:08 PM, Christian Zagrodnick wrote: > > On 2008-02-16 10:03:38 +0100, js said: > > >>> A quick question. > >>> Does this version of lxml fixed segfault bug on Mac OSX? > >> > >> I don't get segfaults on OS X. Using Python 2.4 though. > > > > I'm using 10.4.11, Python 2.4.4 by MacPorts. > > > > Python 2.4.4 (#1, Feb 12 2008, 23:51:38) > > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > >>>> import webbrowser > >>>> from lxml import etree, html > >>>> etree.LXML_VERSION > > (2, 0, 1, 0) > >>>> html.parse('http://example.com') > > Python(29901) malloc: *** Deallocation of a pointer not malloced: > > 0x80; This could be a double free(), or free() called with the middle > > of an allocated block; Try setting environment variable MallocHelp to > > see tools to help debug > > > > % PYTHONPATH=develop-eggs/lxml-2.0.1-py2.4-macosx-10.5-i386.egg > ~/development/python/bin/python2.4 > Python 2.4.4 (#1, Dec 14 2007, 15:35:42) > [GCC 4.0.1 (Apple Inc. build 5465)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import lxml.html > >>> lxml.html.parse('http://example.com') > > >>> import lxml.etree > >>> lxml.etree.LXML_VERSION > (2, 0, 1, 0) > >>> lxml.etree.LIBXML_VERSION > (2, 6, 30) > >>> lxml.etree.LIBXSLT_VERSION > (1, 1, 22) > >>> > > That is with a manually built python; ibxml2, libxslt and lxml built > via buildout. > > > > -- > Christian Zagrodnick > > gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale > www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 > > > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > From stefan_ml at behnel.de Mon Feb 18 14:15:27 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 18 Feb 2008 14:15:27 +0100 Subject: [lxml-dev] Explicit type checking and zope.security In-Reply-To: References: Message-ID: <47B984EF.5060105@behnel.de> Hi, Christian Zagrodnick wrote: > lxml does quite some explicit type checking. For instance: > > object_path.addattr(root, value) > > gives me an error if root is security proxied: > > File "objectpath.pxi", line 74, in lxml.objectify.ObjectPath.addattr > TypeError: Argument 'root' has incorrect type (expected > lxml.etree._Element, got zope.security._proxy._Proxy) > > Is this explicit checking really necessary? It's quite annoying to have > to unwrap everything (and eventually check manually for security) > before putting it into lxml's hands. You would get a crash if lxml didn't check the type. _Element objects are proxies that contain a pointer to the C struct of a libxml2 node, so passing a wrapper won't work. Stefan From cz at gocept.com Mon Feb 18 14:16:27 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 18 Feb 2008 14:16:27 +0100 Subject: [lxml-dev] lxml 2.0.1 released References: <47B367F1.1090903@behnel.de> Message-ID: On 2008-02-18 14:13:25 +0100, js said: > Try import webbrowser first, then use lxml. > I don't know why but this makes difference. Works as well: >>> import webbrowser >>> import lxml.html >>> lxml.html.parse('http://example.com') > > On Feb 18, 2008 10:08 PM, Christian Zagrodnick wrote: >> >> On 2008-02-16 10:03:38 +0100, js said: >> >>>>> A quick question. >>>>> Does this version of lxml fixed segfault bug on Mac OSX? >>>> >>>> I don't get segfaults on OS X. Using Python 2.4 though. >>> >>> I'm using 10.4.11, Python 2.4.4 by MacPorts. >>> >>> Python 2.4.4 (#1, Feb 12 2008, 23:51:38) >>> [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin >>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> import webbrowser >>>>>> from lxml import etree, html >>>>>> etree.LXML_VERSION >>> (2, 0, 1, 0) >>>>>> html.parse('http://example.com') >>> Python(29901) malloc: *** Deallocation of a pointer not malloced: >>> 0x80; This could be a double free(), or free() called with the middle >>> of an allocated block; Try setting environment variable MallocHelp to >>> see tools to help debug >>> >> >> % PYTHONPATH=develop-eggs/lxml-2.0.1-py2.4-macosx-10.5-i386.egg >> ~/development/python/bin/python2.4 >> Python 2.4.4 (#1, Dec 14 2007, 15:35:42) >> [GCC 4.0.1 (Apple Inc. build 5465)] on darwin >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import lxml.html >>>>> lxml.html.parse('http://example.com') >> >>>>> import lxml.etree >>>>> lxml.etree.LXML_VERSION >> (2, 0, 1, 0) >>>>> lxml.etree.LIBXML_VERSION >> (2, 6, 30) >>>>> lxml.etree.LIBXSLT_VERSION >> (1, 1, 22) >>>>> >> >> That is with a manually built python; ibxml2, libxslt and lxml built >> via buildout. >> >> >> >> -- >> Christian Zagrodnick >> >> gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale >> www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 >> >> >> >> _______________________________________________ >> lxml-dev mailing list >> lxml-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/lxml-dev -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From cz at gocept.com Mon Feb 18 14:37:43 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 18 Feb 2008 14:37:43 +0100 Subject: [lxml-dev] Explicit type checking and zope.security References: <47B984EF.5060105@behnel.de> Message-ID: On 2008-02-18 14:15:27 +0100, Stefan Behnel said: > Hi, > > Christian Zagrodnick wrote: >> lxml does quite some explicit type checking. For instance: >> >> object_path.addattr(root, value) >> >> gives me an error if root is security proxied: >> >> File "objectpath.pxi", line 74, in lxml.objectify.ObjectPath.addattr >> TypeError: Argument 'root' has incorrect type (expected >> lxml.etree._Element, got zope.security._proxy._Proxy) >> >> Is this explicit checking really necessary? It's quite annoying to have >> to unwrap everything (and eventually check manually for security) >> before putting it into lxml's hands. > > You would get a crash if lxml didn't check the type. _Element objects are > proxies that contain a pointer to the C struct of a libxml2 node, so passing a > wrapper won't work. Sounds reasonable :) I'm not too familiar with C extensions. So never mind. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From ebgssth at gmail.com Mon Feb 18 14:57:28 2008 From: ebgssth at gmail.com (js) Date: Mon, 18 Feb 2008 22:57:28 +0900 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> Message-ID: Not lxml's problem but my box's? Could you run it with MacPorts python? Python 2.4.4 (#1, Feb 12 2008, 23:51:38) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import webbrowser >>> import lxml.html >>> lxml.html.parse('http://example.com') Python(21992) malloc: *** Deallocation of a pointer not malloced: 0x80; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug >>> import lxml.etree >>> lxml.etree.LXML_VERSION (2, 0, 1, 0) >>> lxml.etree.LIBXML_VERSION (2, 6, 31) >>> lxml.etree.LIBXSLT_VERSION (1, 1, 22) On Feb 18, 2008 10:16 PM, Christian Zagrodnick wrote: > On 2008-02-18 14:13:25 +0100, js said: > > > Try import webbrowser first, then use lxml. > > I don't know why but this makes difference. > > Works as well: > > >>> import webbrowser > >>> import lxml.html > >>> lxml.html.parse('http://example.com') > > > > > > > > > > On Feb 18, 2008 10:08 PM, Christian Zagrodnick wrote: > >> > >> On 2008-02-16 10:03:38 +0100, js said: > >> > >>>>> A quick question. > >>>>> Does this version of lxml fixed segfault bug on Mac OSX? > >>>> > >>>> I don't get segfaults on OS X. Using Python 2.4 though. > >>> > >>> I'm using 10.4.11, Python 2.4.4 by MacPorts. > >>> > >>> Python 2.4.4 (#1, Feb 12 2008, 23:51:38) > >>> [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > >>> Type "help", "copyright", "credits" or "license" for more information. > >>>>>> import webbrowser > >>>>>> from lxml import etree, html > >>>>>> etree.LXML_VERSION > >>> (2, 0, 1, 0) > >>>>>> html.parse('http://example.com') > >>> Python(29901) malloc: *** Deallocation of a pointer not malloced: > >>> 0x80; This could be a double free(), or free() called with the middle > >>> of an allocated block; Try setting environment variable MallocHelp to > >>> see tools to help debug > >>> > >> > >> % PYTHONPATH=develop-eggs/lxml-2.0.1-py2.4-macosx-10.5-i386.egg > >> ~/development/python/bin/python2.4 > >> Python 2.4.4 (#1, Dec 14 2007, 15:35:42) > >> [GCC 4.0.1 (Apple Inc. build 5465)] on darwin > >> Type "help", "copyright", "credits" or "license" for more information. > >>>>> import lxml.html > >>>>> lxml.html.parse('http://example.com') > >> > >>>>> import lxml.etree > >>>>> lxml.etree.LXML_VERSION > >> (2, 0, 1, 0) > >>>>> lxml.etree.LIBXML_VERSION > >> (2, 6, 30) > >>>>> lxml.etree.LIBXSLT_VERSION > >> (1, 1, 22) > >>>>> > >> > >> That is with a manually built python; ibxml2, libxslt and lxml built > >> via buildout. > >> > >> > >> > >> -- > >> Christian Zagrodnick > >> > >> gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale > >> www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 > >> > >> > >> > >> _______________________________________________ > >> lxml-dev mailing list > >> lxml-dev at codespeak.net > >> http://codespeak.net/mailman/listinfo/lxml-dev > > > -- > > Christian Zagrodnick > > gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale > www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 > > > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > From cz at gocept.com Mon Feb 18 15:02:43 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 18 Feb 2008 15:02:43 +0100 Subject: [lxml-dev] lxml 2.0.1 released References: <47B367F1.1090903@behnel.de> Message-ID: On 2008-02-18 14:57:28 +0100, js said: > Not lxml's problem but my box's? > Could you run it with MacPorts python? Sorry, I don't use MacPorts. Try with a self-compiled python. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Mon Feb 18 15:02:49 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 18 Feb 2008 15:02:49 +0100 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> Message-ID: <47B99009.2030503@behnel.de> Hi, js wrote: >>> A quick question. >>> Does this version of lxml fixed segfault bug on Mac OSX? >> I don't get segfaults on OS X. Using Python 2.4 though. > > I'm using 10.4.11, Python 2.4.4 by MacPorts. > > Python 2.4.4 (#1, Feb 12 2008, 23:51:38) > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> import webbrowser >>>> from lxml import etree, html >>>> etree.LXML_VERSION > (2, 0, 1, 0) >>>> html.parse('http://example.com') > Python(29901) malloc: *** Deallocation of a pointer not malloced: > 0x80; This could be a double free(), or free() called with the middle > of an allocated block; Try setting environment variable MallocHelp to > see tools to help debug > This information is definitely not enough for me to get an idea about what happens here. That said, could you try passing the "--auto-rpath" option to setup.py when building? I'd like to know if that makes a difference. You should then see an option "rpath" (or "install_name"??) in the gcc command line. And please also make sure that the "xslt-config" that setup.py finds is the one that comes with the updated libxml2/libxslt libraries, *not* the one pre-installed on your system. Stefan From ebgssth at gmail.com Mon Feb 18 15:36:01 2008 From: ebgssth at gmail.com (js) Date: Mon, 18 Feb 2008 23:36:01 +0900 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: <47B99009.2030503@behnel.de> References: <47B367F1.1090903@behnel.de> <47B99009.2030503@behnel.de> Message-ID: Hi, I tried passing --auto-rpath to setup.py, but rpath didn't came up in gcc line. xslt-config seems right one for me because this seemed using /opt/local's ones. Any clues? $ sudo python2.5 setup.py --auto-rpath build Building lxml version 2.0.1. NOTE: Trying to build without Cython, pre-generated 'src/lxml/etree.c' needs to be available. running build running build_py creating build creating build/lib.macosx-10.3-i386-2.5 creating build/lib.macosx-10.3-i386-2.5/lxml copying src/lxml/__init__.py -> build/lib.macosx-10.3-i386-2.5/lxml copying src/lxml/_elementpath.py -> build/lib.macosx-10.3-i386-2.5/lxml copying src/lxml/builder.py -> build/lib.macosx-10.3-i386-2.5/lxml copying src/lxml/cssselect.py -> build/lib.macosx-10.3-i386-2.5/lxml copying src/lxml/doctestcompare.py -> build/lib.macosx-10.3-i386-2.5/lxml copying src/lxml/ElementInclude.py -> build/lib.macosx-10.3-i386-2.5/lxml copying src/lxml/sax.py -> build/lib.macosx-10.3-i386-2.5/lxml copying src/lxml/usedoctest.py -> build/lib.macosx-10.3-i386-2.5/lxml creating build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/__init__.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/_dictmixin.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/_diffcommand.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/_setmixin.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/builder.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/clean.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/defs.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/diff.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/ElementSoup.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/formfill.py -> build/lib.macosx-10.3-i386-2.5/lxml/html copying src/lxml/html/usedoctest.py -> build/lib.macosx-10.3-i386-2.5/lxml/html running build_ext building 'lxml.etree' extension creating build/temp.macosx-10.3-i386-2.5 creating build/temp.macosx-10.3-i386-2.5/src creating build/temp.macosx-10.3-i386-2.5/src/lxml /usr/bin/gcc-4.0 -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/opt/local/include -I/opt/local/include/libxml2 -I/opt/local/include/python2.5 -c src/lxml/lxml.etree.c -o build/temp.macosx-10.3-i386-2.5/src/lxml/lxml.etree.o -w /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup build/temp.macosx-10.3-i386-2.5/src/lxml/lxml.etree.o -L/opt/local/lib -L/opt/local/lib -lxslt -lexslt -lxml2 -lz -lm -o build/lib.macosx-10.3-i386-2.5/lxml/etree.so building 'lxml.objectify' extension /usr/bin/gcc-4.0 -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/opt/local/include -I/opt/local/include/libxml2 -I/opt/local/include/python2.5 -c src/lxml/lxml.objectify.c -o build/temp.macosx-10.3-i386-2.5/src/lxml/lxml.objectify.o -w /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup build/temp.macosx-10.3-i386-2.5/src/lxml/lxml.objectify.o -L/opt/local/lib -L/opt/local/lib -lxslt -lexslt -lxml2 -lz -lm -o build/lib.macosx-10.3-i386-2.5/lxml/objectify.so building 'lxml.pyclasslookup' extension /usr/bin/gcc-4.0 -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/opt/local/include -I/opt/local/include/libxml2 -I/opt/local/include/python2.5 -c src/lxml/lxml.pyclasslookup.c -o build/temp.macosx-10.3-i386-2.5/src/lxml/lxml.pyclasslookup.o -w /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup build/temp.macosx-10.3-i386-2.5/src/lxml/lxml.pyclasslookup.o -L/opt/local/lib -L/opt/local/lib -lxslt -lexslt -lxml2 -lz -lm -o build/lib.macosx-10.3-i386-2.5/lxml/pyclasslookup.so On 2/18/08, Stefan Behnel wrote: > Hi, > > js wrote: > >>> A quick question. > >>> Does this version of lxml fixed segfault bug on Mac OSX? > >> I don't get segfaults on OS X. Using Python 2.4 though. > > > > I'm using 10.4.11, Python 2.4.4 by MacPorts. > > > > Python 2.4.4 (#1, Feb 12 2008, 23:51:38) > > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > >>>> import webbrowser > >>>> from lxml import etree, html > >>>> etree.LXML_VERSION > > (2, 0, 1, 0) > >>>> html.parse('http://example.com') > > Python(29901) malloc: *** Deallocation of a pointer not malloced: > > 0x80; This could be a double free(), or free() called with the middle > > of an allocated block; Try setting environment variable MallocHelp to > > see tools to help debug > > > > This information is definitely not enough for me to get an idea about what > happens here. > > That said, could you try passing the "--auto-rpath" option to setup.py when > building? I'd like to know if that makes a difference. You should then see > an > option "rpath" (or "install_name"??) in the gcc command line. > > And please also make sure that the "xslt-config" that setup.py finds is the > one that comes with the updated libxml2/libxslt libraries, *not* the one > pre-installed on your system. > > Stefan > From ianb at colorstudy.com Mon Feb 18 19:04:29 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 18 Feb 2008 12:04:29 -0600 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47B942CD.5090501@behnel.de> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> Message-ID: <47B9C8AD.1050502@colorstudy.com> Stefan Behnel wrote: > Ian Bicking wrote: >> There doesn't seem to be any way to set a document's URL when parsing >> the document. E.g.: >> >> >>> from lxml import html >> >>> tree = html.parse('http://www.python.org') >> >>> tree.docinfo.URL >> 'http://www.python.org' >> >> But the parse function doesn't really take any arguments, and the URL >> attribute is write-only. Ideally you could do fromstring('...doc...', >> URL='location'). > > All keyword arguments that you pass to the parse/fromstring functions are > passed on to lxml.etree's corresponding functions. That means, you can pass > the "base_url" keyword. (Maybe that should be mentioned in the docstrings). Yeah... it's hard to figure out what method is underlying these. I've added a note to the docstring and an explicit base_url argument to the functions, so you can see the presence of the parameter more easily. It does not appear that html.parse() takes a base_url argument (just as etree.parse does not). If you pass a URL or filename then I suppose that becomes the base. If you pass in a file-like object then I think it also works, if the file-like object has a geturl() method (like urllib's files do). >> Also I'm not sure why the URL shouldn't be writable. > > What would be the use case? The problem that arises is that the source URL of > a document would no longer be an immutable identifier of the document. If it > can change, it's less valuable for caching (for example). It's a different > thing if you pass a URL to the parser because it can't know where the document > came from, or if you change the 'source' of a document at will. If you can just get it right during parsing it should be fine. But there's things like xml:base (doesn't apply to HTML; not sure how it's handled in XML), or unusual headers like Content-Location, which you might want to handle at point in time that the document has already been parsed. Probably not a problem, but it doesn't seem that much like a problem to make it writable too. Especially since the document itself is writable. Once you've edited the document, it's not *the* document at that URL anyway. Maybe you get a page, edit it, and serve it at a new location. Deliverance does this by getting the theme page, then injecting the content into that page -- but the theme page is the originally-parsed object, though it will be served at a different location. I'd like to be able to fix up that data. And I'm not sure how I'd make a copy of a document with a new URL, if the URL/document link is immutable. (Right now I'm mostly ignoring the URL, but it would be nice if I could actually trust it.) Ian From howesteve at gmail.com Mon Feb 18 20:16:00 2008 From: howesteve at gmail.com (Steve Howe) Date: Mon, 18 Feb 2008 17:16:00 -0200 Subject: [lxml-dev] Is resolve_entities not working ?? Message-ID: <200802181616.01988.howesteve@gmail.com> Hello all, Is the resolve_entities XmlParser constructor attribute not working or what did I do wrong ? howe at yezda ~ $ python Python 2.5.1 (r251:54863, Jan 9 2008, 05:34:21) [GCC 4.2.2 (Gentoo 4.2.2 p1.0)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>> print etree.__version__ 2.0.1 >>> print etree.LIBXML_VERSION (2, 6, 30) >>> import StringIO >>> xml = StringIO.StringIO('

©

') >>> etree.parse(xml, etree.XMLParser(resolve_entities=False)) Traceback (most recent call last): File "", line 1, in File "lxml.etree.pyx", line 2515, in lxml.etree.parse File "parser.pxi", line 1743, in lxml.etree._parseDocument File "parser.pxi", line 1775, in lxml.etree._parseMemoryDocument File "parser.pxi", line 1676, in lxml.etree._parseDoc File "parser.pxi", line 793, in lxml.etree._BaseParser._parseDoc File "parser.pxi", line 450, in lxml.etree._ParserContext._handleParseResultDoc File "parser.pxi", line 534, in lxml.etree._handleParseResult File "parser.pxi", line 476, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: Entity 'copy' not defined, line 1, column 46 Thanks! -- Best Regards, Steve Howe From stefan_ml at behnel.de Mon Feb 18 21:06:37 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 18 Feb 2008 21:06:37 +0100 Subject: [lxml-dev] Is resolve_entities not working ?? In-Reply-To: <200802181616.01988.howesteve@gmail.com> References: <200802181616.01988.howesteve@gmail.com> Message-ID: <47B9E54D.7040204@behnel.de> Hi, Steve Howe wrote: > Is the resolve_entities XmlParser constructor attribute not working or what > did I do wrong ? > > howe at yezda ~ $ python > Python 2.5.1 (r251:54863, Jan 9 2008, 05:34:21) > [GCC 4.2.2 (Gentoo 4.2.2 p1.0)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from lxml import etree >>>> print etree.__version__ > 2.0.1 >>>> print etree.LIBXML_VERSION > (2, 6, 30) >>>> import StringIO >>>> xml = StringIO.StringIO(' >

©

') >>>> etree.parse(xml, etree.XMLParser(resolve_entities=False)) > Traceback (most recent call last): > File "", line 1, in > File "lxml.etree.pyx", line 2515, in lxml.etree.parse > File "parser.pxi", line 1743, in lxml.etree._parseDocument > File "parser.pxi", line 1775, in lxml.etree._parseMemoryDocument > File "parser.pxi", line 1676, in lxml.etree._parseDoc > File "parser.pxi", line 793, in lxml.etree._BaseParser._parseDoc > File "parser.pxi", line 450, in > lxml.etree._ParserContext._handleParseResultDoc > File "parser.pxi", line 534, in lxml.etree._handleParseResult > File "parser.pxi", line 476, in lxml.etree._raiseParseError > lxml.etree.XMLSyntaxError: Entity 'copy' not defined, line 1, column 46 As the document does not specify a DTD, the entity "copy" is undefined, which is an error if you instructed the parser to *resolve* the entities. Stefan From stefan_ml at behnel.de Mon Feb 18 21:29:34 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 18 Feb 2008 21:29:34 +0100 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47B9C8AD.1050502@colorstudy.com> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> <47B9C8AD.1050502@colorstudy.com> Message-ID: <47B9EAAE.6070409@behnel.de> Hi Ian, Ian Bicking wrote: > Stefan Behnel wrote: >> Ian Bicking wrote: >>> There doesn't seem to be any way to set a document's URL when parsing >>> the document. E.g.: >>> >>> >>> from lxml import html >>> >>> tree = html.parse('http://www.python.org') >>> >>> tree.docinfo.URL >>> 'http://www.python.org' >>> >>> But the parse function doesn't really take any arguments, and the URL >>> attribute is write-only. Ideally you could do >>> fromstring('...doc...', URL='location'). >> >> All keyword arguments that you pass to the parse/fromstring functions are >> passed on to lxml.etree's corresponding functions. That means, you can >> pass >> the "base_url" keyword. (Maybe that should be mentioned in the >> docstrings). > > Yeah... it's hard to figure out what method is underlying these. I've > added a note to the docstring and an explicit base_url argument to the > functions, so you can see the presence of the parameter more easily. That's good, then epydoc can pick it up. > It does not appear that html.parse() takes a base_url argument (just as > etree.parse does not). If you pass a URL or filename then I suppose > that becomes the base. Yes. parse() is for parsing from files/URLs, so you'd normally have some kind of source name/URL. StringIO is a different thing, but then, in most cases where you could use parse(StringIO), it would be better to use fromstring(), which supports the "base_url" keyword. > If you pass in a file-like object then I think > it also works, if the file-like object has a geturl() method (like > urllib's files do). The code we use is this: cdef _getFilenameForFile(source): # file instances have a name attribute try: return source.name except AttributeError: pass # gzip file instances have a filename attribute try: return source.filename except AttributeError: pass # urllib2 provides a geturl() method try: geturl = source.geturl except AttributeError: # can't determine filename return None else: return geturl() >>> Also I'm not sure why the URL shouldn't be writable. >> >> What would be the use case? The problem that arises is that the source >> URL of >> a document would no longer be an immutable identifier of the document. >> If it >> can change, it's less valuable for caching (for example). It's a >> different >> thing if you pass a URL to the parser because it can't know where the >> document >> came from, or if you change the 'source' of a document at will. > > If you can just get it right during parsing it should be fine. But > there's things like xml:base (doesn't apply to HTML; not sure how it's > handled in XML) Not sure, but that should be handled in the parser. At least, it deals with parse-time information. > or unusual headers like Content-Location, which you > might want to handle at point in time that the document has already been > parsed. "Header" sounds more like something you'd also know in advance. > Probably not a problem, but it doesn't seem that much like a problem to > make it writable too. Especially since the document itself is writable. > Once you've edited the document, it's not *the* document at that URL > anyway. Maybe you get a page, edit it, and serve it at a new location. > Deliverance does this by getting the theme page, then injecting the > content into that page -- but the theme page is the originally-parsed > object, though it will be served at a different location. I'd like to > be able to fix up that data. And I'm not sure how I'd make a copy of a > document with a new URL, if the URL/document link is immutable. (Right > now I'm mostly ignoring the URL, but it would be nice if I could > actually trust it.) I see. The URL is currently retrieved through "tree.docinfo" (i.e. the DocInfo class), which is completely read-only. I'll have to figure out the implications first - feel free to inject some ideas. :) Stefan From cz at gocept.com Tue Feb 19 10:06:45 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Tue, 19 Feb 2008 10:06:45 +0100 Subject: [lxml-dev] lxml namespaces References: <987377390701070724ua506a6cpa2459777a90bfa70@mail.gmail.com> Message-ID: Hi, On 2007-01-07 16:24:15 +0100, "Maxim Sloyko" said: > Hi All! > > I have a little problem with XML namespaces. > In my application I have two XML processors, that process the same > document, one after the other. The first one looks for nodes in 'ns1' > namespace, and substitutes them, according to some algorithm. After > this processor is finished, it is guaranteed that there are no more > 'ns1' nodes left in the tree. 'ns1' namespace dclaration is still > there, in the root node (well, I put it there manually). Now, when > this namespace is no longer needed, I want to get rid of it, because > it confuses some other processors (namely, my browser) > > So, the question is, how do I do that? > del tree.getroot().nsmap['ns1'] > does not seem to do the trick :( actually I'm curious, too how to remove namespaces. And I'm not sure at all how to do that. Clueless right now. -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From ebgssth at gmail.com Tue Feb 19 13:45:29 2008 From: ebgssth at gmail.com (js) Date: Tue, 19 Feb 2008 21:45:29 +0900 Subject: [lxml-dev] CPU you selected does not support x86-64 instruction set In-Reply-To: References: <47B5A1AD.3090706@behnel.de> <47B69048.9000403@behnel.de> Message-ID: Hi, Quick update. Here's how compiler executed. cc -fno-strict-aliasing -DNDEBUG -O -pipe -march=pentiumpro -D__wchar_t=wchar_t -D_THREAD_SAFE -DTHREAD_STACK_SIZE=0x100000 -fPIC -I/usr/local/include -I/usr/local/include/libxml2 -I/usr/local/include/python2.5 -c src/lxml/lxml.etree.c -o build/temp.freebsd-6.2-amd64-2.5/src/lxml/lxml.etree.o -w As I said earlier, this end up in the following error. CPU you selected does not support x86-64 instruction set I tried dropping "-march=pentiumpro" and re-run setup.py build again. This time, above error is not occured, but another warning came up. /usr/local/include/python2.5/pyport.h:734:2: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?). BTW, I use gcc 3.4.6. I'm not sure how should I resolve this... Do you find anything from above report? lxml users, Is there anybody running lxml on 64bit system? Thanks. From howesteve at gmail.com Tue Feb 19 14:22:36 2008 From: howesteve at gmail.com (Steve Howe) Date: Tue, 19 Feb 2008 11:22:36 -0200 Subject: [lxml-dev] Is resolve_entities not working ?? In-Reply-To: <47B9E54D.7040204@behnel.de> References: <200802181616.01988.howesteve@gmail.com> <47B9E54D.7040204@behnel.de> Message-ID: <200802191022.38393.howesteve@gmail.com> Hello Stefan Behnel, > As the document does not specify a DTD, the entity "copy" is undefined, > which is an error if you instructed the parser to *resolve* the entities. Agreed, but I set "resolve_entities=False" so it should not be resolving anything, right ? Or did I misunderstand something ? -- Best Regards, Steve Howe From cz at gocept.com Tue Feb 19 15:15:02 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Tue, 19 Feb 2008 15:15:02 +0100 Subject: [lxml-dev] Help getting lxml to work reliably on MacOS-X References: <47A6D0B1.5020600@behnel.de> <47A8C588.9060701@behnel.de> <47ACB0D6.4070704@behnel.de> Message-ID: On 2008-02-09 14:33:43 +0100, Christian Zagrodnick said: > On 2008-02-08 20:43:18 +0100, Stefan Behnel said: > >> Hi, >> > >> Christian Zagrodnick wrote: >>> The main problem is, that lxml runs the wrong xslt-config. So I was >>> basically building libxml2 and libxslt just for the fun of it. >>> > >>> The question is if lxml really always needs to call xslt-config. Or how >>> one would set the path in the buildout so that the right xslt-config is >>> called. >>> > >>> If I manually set the path it works like charm: >>> > >>> % PATH=`pwd`/parts/libxslt/bin:$PATH bin/buildout >> > >> You can now pass "--with-xslt-config=XXX" to setup.py. > > Gotta check how we best pass that along in buildout. So in the *next* version of zc.recipe.egg (i.e. >1.0.0), the following will probably work. It works with the trunk of zc.recipe.egg (but this is neither released nor has it been reviewed by Jim Fulton, yet): [libxml2] recipe = zc.recipe.cmmi url = http://ftp.gnome.org/pub/GNOME/sources/libxml2/2.6/libxml2-2.6.30.tar.gz extra_options = --without-python [libxslt] recipe = zc.recipe.cmmi url = http://ftp.gnome.org/pub/GNOME/sources/libxslt/1.1/libxslt-1.1.22.tar.bz2 extra_options = --with-libxml-prefix=${buildout:directory}/parts/libxml2/ --without-python [lxml-environment] PATH=${buildout:directory}/parts/libxslt/bin:%(PATH)s [lxml] recipe = zc.recipe.egg:custom egg = lxml include-dirs = ${buildout:directory}/parts/libxml2/include/libxml2 ${buildout:directory}/parts/libxslt/include library-dirs = ${buildout:directory}/parts/libxml2/lib ${buildout:directory}/parts/libxslt/lib rpath = ${buildout:directory}/parts/libxml2/lib ${buildout:directory}/parts/libxslt/lib environment = lxml-environment Regards, -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Tue Feb 19 16:00:47 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 19 Feb 2008 16:00:47 +0100 (CET) Subject: [lxml-dev] Is resolve_entities not working ?? In-Reply-To: <200802191022.38393.howesteve@gmail.com> References: <200802181616.01988.howesteve@gmail.com> <47B9E54D.7040204@behnel.de> <200802191022.38393.howesteve@gmail.com> Message-ID: <40845.194.114.62.67.1203433247.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Steve Howe wrote: >> As the document does not specify a DTD, the entity "copy" is undefined, >> which is an error if you instructed the parser to *resolve* the >> entities. > Agreed, but I set "resolve_entities=False" so it should not be resolving > anything, right ? Or did I misunderstand something ? Ah, sorry, I misread your example as saying "=True" ... Documents that do not declare their entities are not well-formed: --------------------------- Well-formedness constraint: Entity Declared In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference MUST match that in an entity declaration that does not occur within the external subset or a parameter entity, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. The declaration of a general entity MUST precede any reference to it which appears in a default value in an attribute-list declaration. --------------------------- with one exception: --------------------------- Note that non-validating processors are not obligated to read and process entity declarations occurring in parameter entities or in the external subset; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'. --------------------------- But since your document does not define an external Subset, the parser knows that the Entity is not defined and that the document is not well-formed. If you add a DOCTYPE, the parser will assume the entity to be defined in the referenced DTD (even if it does not load it), and thus ignore the missing declaration (you should still get a warning in the parser "error_log", though). Also, if you add "recover=True" to the parser, it will ignore the (otherwise fatal) error. Note that entities appear as children since lxml 2.0, not as text. Stefan From stefan_ml at behnel.de Tue Feb 19 17:24:14 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 19 Feb 2008 17:24:14 +0100 (CET) Subject: [lxml-dev] lxml namespaces In-Reply-To: References: <987377390701070724ua506a6cpa2459777a90bfa70@mail.gmail.com> Message-ID: <57841.194.114.62.67.1203438254.squirrel@groupware.dvs.informatik.tu-darmstadt.de> > On 2007-01-07 16:24:15 +0100, "Maxim Sloyko" said: >> I have a little problem with XML namespaces. >> In my application I have two XML processors, that process the same >> document, one after the other. The first one looks for nodes in 'ns1' >> namespace, and substitutes them, according to some algorithm. After >> this processor is finished, it is guaranteed that there are no more >> 'ns1' nodes left in the tree. Sounds a bit like a case for XSLT to me. >> 'ns1' namespace dclaration is still >> there, in the root node (well, I put it there manually). Now, when >> this namespace is no longer needed, I want to get rid of it, because >> it confuses some other processors (namely, my browser) >> >> So, the question is, how do I do that? >> del tree.getroot().nsmap['ns1'] >> does not seem to do the trick :( Hmmm, I think the easiest way to remove unused namespaces from a document is: new_nsmap = dict(p,n for p,n in root.nsmap.items() if n != NS_TO_REMOVE) new_root = etree.Element(root.tag, root.attrib, new_nsmap) new_root.text = root.text new_root.tail = root.tail new_root[:] = root[:] root = new_root That's somewhat costly, but it's a rare usecase anyway... or use XSLT. Honestly, assuring tree correctness if ".nsmap" was writable is not at all trivial. You'd have to - check which namespaces are being added and which are removed (incl. parental inheritance, prefix override issues, ...) - verify that removed namespaces are no longer used anywhere in the subtree - replace the namespace declarations on the node, keeping pointers to the old ones - fix all namespace references in the subtree - free the now-unused namespace declarations The "fix all namespaces" bit is easy (should just work with the usual moveNodeToDocument() dance), but I'm not feeling like implementing the first steps right now... Stefan From cz at gocept.com Wed Feb 20 08:01:50 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Wed, 20 Feb 2008 08:01:50 +0100 Subject: [lxml-dev] lxml namespaces References: <987377390701070724ua506a6cpa2459777a90bfa70@mail.gmail.com> <57841.194.114.62.67.1203438254.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: On 2008-02-19 17:24:14 +0100, "Stefan Behnel" said: >> On 2007-01-07 16:24:15 +0100, "Maxim Sloyko" said: >>> I have a little problem with XML namespaces. >>> In my application I have two XML processors, that process the same >>> document, one after the other. The first one looks for nodes in 'ns1' >>> namespace, and substitutes them, according to some algorithm. After >>> this processor is finished, it is guaranteed that there are no more >>> 'ns1' nodes left in the tree. > > Sounds a bit like a case for XSLT to me. Yeah. http://cocoon.apache.org/2.0/faq/faq-xslt.html#faq-5 :) > > >>> 'ns1' namespace dclaration is still >>> there, in the root node (well, I put it there manually). Now, when >>> this namespace is no longer needed, I want to get rid of it, because >>> it confuses some other processors (namely, my browser) >>> >>> So, the question is, how do I do that? >>> del tree.getroot().nsmap['ns1'] >>> does not seem to do the trick :( > > Hmmm, I think the easiest way to remove unused namespaces from a document is: > > new_nsmap = dict(p,n for p,n in root.nsmap.items() if n != NS_TO_REMOVE) > new_root = etree.Element(root.tag, root.attrib, new_nsmap) > new_root.text = root.text > new_root.tail = root.tail > new_root[:] = root[:] > root = new_root > > That's somewhat costly, but it's a rare usecase anyway... or use XSLT. > > Honestly, assuring tree correctness if ".nsmap" was writable is not at all > trivial. You'd have to > > - check which namespaces are being added and which are removed (incl. > parental inheritance, prefix override issues, ...) > - verify that removed namespaces are no longer used anywhere in the subtree > - replace the namespace declarations on the node, keeping pointers to the > old ones > - fix all namespace references in the subtree > - free the now-unused namespace declarations > > The "fix all namespaces" bit is easy (should just work with the usual > moveNodeToDocument() dance), but I'm not feeling like implementing the > first steps right now... Nono, the XSLT bit is fine. I don't quite understand why this works, but then I'm not that much into XSLT. Actually, seen that: http://www.patentstorm.us/patents/7120864.html :/ -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Wed Feb 20 08:32:50 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 20 Feb 2008 08:32:50 +0100 (CET) Subject: [lxml-dev] lxml namespaces In-Reply-To: References: <987377390701070724ua506a6cpa2459777a90bfa70@mail.gmail.com> <57841.194.114.62.67.1203438254.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <8330.194.114.62.67.1203492770.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Christian Zagrodnick wrote: > http://cocoon.apache.org/2.0/faq/faq-xslt.html#faq-5 > the XSLT bit is fine. I don't quite understand why this works, > but then I'm not that much into XSLT. XSLT is (mostly) about copying XML trees selectively. Things you do not copy will not appear in the result. In this case, you only copy plain (i.e. local) element names, not their namespaces (which may or may not be what you want). > Actually, seen that: http://www.patentstorm.us/patents/7120864.html :/ Hehe, read the title: "Eliminating superfluous namespace declarations and undeclaring default namespaces *in XML serialization processing*". This is about serialisation only. The (intermediate) result of an XSLT is a tree, not a byte stream, so the namespace fixing is not part of the serialisation process. :] (Not that I would agree that this is worth a patent...) Stefan From nefasus at gmail.com Wed Feb 20 16:42:57 2008 From: nefasus at gmail.com (Nef Asus) Date: Wed, 20 Feb 2008 15:42:57 +0000 (UTC) Subject: [lxml-dev] Can't load external DTD. Message-ID: Hello everyone, I've written this little program that refuses to work: from lxml import etree if __name__ == "__main__": xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" parser = etree.XMLParser(load_dtd = True, dtd_validation = True, attribute_defaults = True) doc = etree.parse(xml_input, parser) Here's the traceback. Traceback (most recent call last): File "C:\Desarrollo\pythontests\lxml\dtd_loader.py", \ line 27, in doc = etree.parse(xml_input, parser) File "lxml.etree.pyx", line 2515, in lxml.etree.parse File "parser.pxi", line 1755, in lxml.etree._parseDocument File "parser.pxi", line 1759, in lxml.etree._parseDocumentFromURL File "parser.pxi", line 1681, in lxml.etree._parseDocFromFile File "parser.pxi", line 826 ,in lxml.etree._BaseParser._parseDocFromFile File "parser.pxi",line 450,in lxml.etree._ParserContext._handleParseResultDoc File "parser.pxi", line 534, in lxml.etree._handleParseResult File "parser.pxi", line 476, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: failed to load external entity "NULL", line 9, column 83 This is a snippet of foo.xml : ... Then, I tried to write a custom resolver. from lxml import etree class DTDResolver(etree.Resolver): def resolve(self, url, id, context): print("Resolving (url, %s)(id, %s)"% (url,id)) self.resolve_filename("C:\Desarrollo\pythontests\lxml\JENSEN.dtd", \ context) if __name__ == "__main__": parser = etree.XMLParser(load_dtd = True, dtd_validation = True, attribute_defaults = True) parser.resolvers.add(DTDResolver()) xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" doc = etree.parse(xml_input, parser) But it still fails (same traceback). What am I doing wrong? BTW, lxml version is 2.0.1. Thanks in advance. From piet at cs.uu.nl Wed Feb 20 17:41:09 2008 From: piet at cs.uu.nl (Piet van Oostrum) Date: Wed, 20 Feb 2008 17:41:09 +0100 Subject: [lxml-dev] Can't load external DTD. In-Reply-To: References: Message-ID: <18364.22565.494719.849969@cochabamba.cs.uu.nl> >>>>> Nef Asus (NA) wrote: >NA> Hello everyone, >NA> I've written this little program that refuses to work: >NA> from lxml import etree >NA> if __name__ == "__main__": >NA> xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" Double your backslashes or use r"C:\Desarrollo\pythontests\lxml\foo.xml". -- Piet van Oostrum URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: piet at vanoostrum.org From stefan_ml at behnel.de Tue Feb 19 13:06:05 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 19 Feb 2008 13:06:05 +0100 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47B9C8AD.1050502@colorstudy.com> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> <47B9C8AD.1050502@colorstudy.com> Message-ID: <47BAC62D.4080200@behnel.de> Hi, Ian Bicking wrote: > It does not appear that html.parse() takes a base_url argument (just as > etree.parse does not). If you pass a URL or filename then I suppose > that becomes the base. If you pass in a file-like object then I think > it also works, if the file-like object has a geturl() method (like > urllib's files do). I added the base_url keyword to parse() for now, so that you can set the URL for file-like objects. Stefan From egrim at swri.org Wed Feb 20 23:47:55 2008 From: egrim at swri.org (Evan Grim) Date: Wed, 20 Feb 2008 22:47:55 +0000 (UTC) Subject: [lxml-dev] Instantiated ObjectifiedElement Message-ID: I've run into some curious behavior when trying to do something a bit non-standard with objectify. The code snippet below causes a hard crash. I know this isn't part of what is anticipated as normal use, but since it causes a crash I figure it's worth bringing up here to at least get some comments. from lxml import etree, objectify class parent(objectify.ObjectifiedElement): pass p = parent() print p.tag My use case here is that I'm trying to make a package that acts like objectify, but enforces schema restrictions. I'm trying my hand at writing custom classes that will do this, but maintain the nice interface that objectify provides. Where this fails is when I try to create an instance of one of my custom classes directly (as seen in the trivial code snippet above). I'm welcome to any suggestions, and at the very least wanted to bring the seg-fault producing condition to light so that the extension can be amended to prevent it. From stefan_ml at behnel.de Thu Feb 21 07:10:29 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 21 Feb 2008 07:10:29 +0100 Subject: [lxml-dev] Instantiated ObjectifiedElement In-Reply-To: References: Message-ID: <47BD15D5.4020808@behnel.de> Hi, Evan Grim wrote: > I've run into some curious behavior when trying to do something a bit > non-standard with objectify. The code snippet below causes a hard crash. I > know this isn't part of what is anticipated as normal use, but since it causes a > crash I figure it's worth bringing up here to at least get some comments. > > > from lxml import etree, objectify > > class parent(objectify.ObjectifiedElement): > pass > > p = parent() > print p.tag > A crash is the expected result here. The docs say: """Note that you cannot (or rather must not) instantiate this class yourself. lxml.etree will do that for you through its normal ElementTree API. """ http://codespeak.net/lxml/element_classes.html > My use case here is that I'm trying to make a package that acts like objectify, > but enforces schema restrictions. Not sure what exactly you mean. Do you mean data types, structural validity, or both? How would you enforce restrictions during multi-step changes to the tree, where the first couple of steps do not result in a valid tree by themselves? Doesn't validation do what you want? > I'm trying my hand at writing custom classes > that will do this, but maintain the nice interface that objectify provides. That should be doable without instantiating any classes directly. Just override __setattr__ and __setitem__ and intercept the cases where attributes are assigned. Use your subclasses as described here: http://codespeak.net/lxml/objectify.html#advanced-element-class-lookup http://codespeak.net/lxml/element_classes.html#setting-up-a-class-lookup-scheme Stefan From stefan_ml at behnel.de Thu Feb 21 07:36:20 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 21 Feb 2008 07:36:20 +0100 Subject: [lxml-dev] Can't load external DTD. In-Reply-To: References: Message-ID: <47BD1BE4.8030904@behnel.de> Hi, Nef Asus wrote: > I've written this little program that refuses to work: > > from lxml import etree > if __name__ == "__main__": > xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" > parser = etree.XMLParser(load_dtd = True, dtd_validation = True, > attribute_defaults = True) > doc = etree.parse(xml_input, parser) > > > Here's the traceback. > Traceback (most recent call last): > File "C:\Desarrollo\pythontests\lxml\dtd_loader.py", \ > line 27, in doc = etree.parse(xml_input, parser) > File "lxml.etree.pyx", line 2515, in lxml.etree.parse > File "parser.pxi", line 1755, in lxml.etree._parseDocument > File "parser.pxi", line 1759, in lxml.etree._parseDocumentFromURL > File "parser.pxi", line 1681, in lxml.etree._parseDocFromFile > File "parser.pxi", line 826 ,in lxml.etree._BaseParser._parseDocFromFile > File "parser.pxi",line 450,in lxml.etree._ParserContext._handleParseResultDoc > File "parser.pxi", line 534, in lxml.etree._handleParseResult > File "parser.pxi", line 476, in lxml.etree._raiseParseError > lxml.etree.XMLSyntaxError: failed to load external entity "NULL", > line 9, column 83 > > This is a snippet of foo.xml : > > SYSTEM "C:\Desarrollo\pythontests\lxml\foo.dtd"> > ... Could you show me what line 9 in your XML file looks like? Stefan From nefasus at gmail.com Thu Feb 21 10:19:36 2008 From: nefasus at gmail.com (Nef Asus) Date: Thu, 21 Feb 2008 10:19:36 +0100 Subject: [lxml-dev] Can't load external DTD. In-Reply-To: <47BD1BE4.8030904@behnel.de> References: <47BD1BE4.8030904@behnel.de> Message-ID: <4eff23900802210119k622088dyc5c7fe6fe8620d5a@mail.gmail.com> Hello Stefan, this is my xml up to line 9: I omitted the commented lines in my first snippet. PD: I'm sorry I sent the response straight to you and missed the mailing list. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080221/d6389fe3/attachment.htm From cz at gocept.com Thu Feb 21 11:56:09 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Thu, 21 Feb 2008 11:56:09 +0100 Subject: [lxml-dev] Default prefixes for common XML namespaces *sometimes* doesn't work Message-ID: Hi, in 2.0alpha5 the feature was added that there are defaults for some namespace prefixes (like xsi, xsd, py). Problem is, I sometimes still get things like bla And I don't know why. What are the conditions when LXML uses (in this case) the py: prefix and when not? Or is it a bug? In the case above I used an ObjectPath with setattr. Regards, -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Fri Feb 22 07:49:10 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 22 Feb 2008 07:49:10 +0100 Subject: [lxml-dev] Can't load external DTD. In-Reply-To: <4eff23900802210119k622088dyc5c7fe6fe8620d5a@mail.gmail.com> References: <47BD1BE4.8030904@behnel.de> <4eff23900802210119k622088dyc5c7fe6fe8620d5a@mail.gmail.com> Message-ID: <47BE7066.2090604@behnel.de> Hi, Nef Asus wrote: > lxml.etree.XMLSyntaxError: failed to load external entity "NULL", > line 9, column 83 > > this is my xml up to line 9: > > > > > "C:\Desarrollo\pythontests\lxml\foo.dtd"> > > > I omitted the commented lines in my first snippet. Hmm, I think the "NULL" comes from the fact that you only use "SYSTEM", without providing an ID. libxml2 printf()'s the ID here. So, well, it can't find it on your system. Have you tried something like "file://C:/Desarrollo/pythontests/lxml/foo.dtd" ? And: have you made sure the file is actually there? Stefan From stefan_ml at behnel.de Fri Feb 22 07:58:17 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 22 Feb 2008 07:58:17 +0100 Subject: [lxml-dev] Default prefixes for common XML namespaces *sometimes* doesn't work In-Reply-To: References: Message-ID: <47BE7289.1090705@behnel.de> Hi, Christian Zagrodnick wrote: > in 2.0alpha5 the feature was added that there are defaults for some > namespace prefixes (like xsi, xsd, py). -py here. > Problem is, I sometimes still get things like > > ns0:pytype="str">bla > > And I don't know why. What are the conditions when LXML uses (in this > case) the py: prefix and when not? Or is it a bug? Just wasn't in the dict of default prefixes. I added it. Stefan From nefasus at gmail.com Fri Feb 22 10:14:18 2008 From: nefasus at gmail.com (Nef Asus) Date: Fri, 22 Feb 2008 10:14:18 +0100 Subject: [lxml-dev] Can't load external DTD. In-Reply-To: <47BE7066.2090604@behnel.de> References: <47BD1BE4.8030904@behnel.de> <4eff23900802210119k622088dyc5c7fe6fe8620d5a@mail.gmail.com> <47BE7066.2090604@behnel.de> Message-ID: <4eff23900802220114o73313914iead07987bbd653c2@mail.gmail.com> Hi, Hmm, I think the "NULL" comes from the fact that you only use "SYSTEM", > without providing an ID. libxml2 printf()'s the ID here. > > So, well, it can't find it on your system. Have you tried something like > > "file://C:/Desarrollo/pythontests/lxml/foo.dtd" > > ? > > And: have you made sure the file is actually there? All the files are in the right place, so that is not the problem. I tried "file://C:/Desarrollo/pythontests/lxml/foo.dtd", bang, same failure. But "C:/Desarrollo/pythontests/lxml/foo.dtd" does work!. "foo.dtd" works too (foo.xml is in the same location). I'm wondering why, in the unsuccessful cases, the resolver receives both url and id set to None, perhaps the parser doesn't like backward slashes. Any idea? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080222/e9f2822e/attachment.htm From stefan_ml at behnel.de Fri Feb 22 17:10:28 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 22 Feb 2008 17:10:28 +0100 Subject: [lxml-dev] lxml 2.0.2 released Message-ID: <47BEF3F4.6060806@behnel.de> Hi all, lxml 2.0.2 is up on PyPI. This is a bug fix release for the stable 2.0 series. Changelog follows below. Have fun, Stefan 2.0.2 (2008-02-22) ================== Features added -------------- * Support passing ``base_url`` to file parser functions to override the filename of the file(-like) object. Bugs fixed ---------- * The prefix for objectify's pytype namespace was missing from the set of default prefixes. * Memory leak in Schematron (fixed only for libxml2 2.6.31+). * Error type names in RelaxNG were reported incorrectly. * Slice deletion bug fixed in objectify. Other changes ------------- * Enabled doctests for some Python modules (especially ``lxml.html``). * Add a ``method`` argument to ``lxml.html.tostring()`` (``method="xml"`` for XHTML output). * Make it clearer that methods like ``lxml.html.fromstring()`` take a ``base_url`` argument. From egrim at swri.org Fri Feb 22 22:46:16 2008 From: egrim at swri.org (Evan Grim) Date: Fri, 22 Feb 2008 21:46:16 +0000 (UTC) Subject: [lxml-dev] Instantiated ObjectifiedElement References: <47BD15D5.4020808@behnel.de> Message-ID: > First of all, thank you for your thorough reply. I'll attempt to explain myself better by addressing your input inline below. Stefan Behnel behnel.de> writes: > > Hi, > > Evan Grim wrote: > > I've run into some curious behavior when trying to do something a bit > > non-standard with objectify. The code snippet below causes a hard crash. I > > know this isn't part of what is anticipated as normal use, but since it causes a > > crash I figure it's worth bringing up here to at least get some comments. > > > > > > from lxml import etree, objectify > > > > class parent(objectify.ObjectifiedElement): > > pass > > > > p = parent() > > print p.tag > > > > A crash is the expected result here. The docs say: > > """Note that you cannot (or rather must not) instantiate this class yourself. > lxml.etree will do that for you through its normal ElementTree API. > """ > > http://codespeak.net/lxml/element_classes.html > I realized that instantiation is not the typical way to use these classes, but had forgotten that the documentation was so explicit. Thanks for jogging my memory. I haven't had a chance to look at the underlying code, but if it isn't too difficult wouldn't an AssertionError (or some other exception) be better than crashing? > > My use case here is that I'm trying to make a package that acts like objectify, > > but enforces schema restrictions. > > Not sure what exactly you mean. Do you mean data types, structural validity, > or both? How would you enforce restrictions during multi-step changes to the > tree, where the first couple of steps do not result in a valid tree by themselves? I'm shooting for all of the above. Elements would not accept any changes that would make them deviate from what is specified by their respective schema definition. Multi-step changes are definitely a tricky wicket when it comes to this approach, but (at least for the use-cases I'm encountering) can be handled by careful interface designs that make sure the library user has all the tools they need. > > Doesn't validation do what you want? lxml's validation capabilities are great for document level validation. But what I'm wanting deviates from the current etree/objectify in what I believe is currently an incompatible way: I want to be able to go beyond document level validation and instantiate the various types defined within the schema as individual objects that can be modified and added to other schema-type-based objects and all the while ensuring the schema rules are never violated. As an aside related to validation: I notice that xsd:unique tags aren't validated [ref: http://www.w3.org/TR/xmlschema-0/#specifyingUniqueness]. Is this planned for the future? Are there any other XMLSchema gotchas of which I should be aware? > > > I'm trying my hand at writing custom classes > > that will do this, but maintain the nice interface that objectify provides. > > That should be doable without instantiating any classes directly. Just > override __setattr__ and __setitem__ and intercept the cases where attributes > are assigned. Use your subclasses as described here: > > http://codespeak.net/lxml/objectify.html#advanced-element-class-lookup > http://codespeak.net/lxml/element_classes.html#setting-up-a-class-lookup-sch eme This is originally the tack I was taking, but quickly ran into the issue of not being able to instantiate the objects in the manner I discuss above. Since running into this problem I've taken to using a custom class for each type that acts an intermediary to an underlying lxml element. > > Stefan > Thanks again for your input on this. And while I've got the mic: thanks to all who've put together a great xml suite. I'm very grateful to no longer be strapped to minidom. Cheers, Evan From ebgssth at gmail.com Sat Feb 23 08:22:49 2008 From: ebgssth at gmail.com (js) Date: Sat, 23 Feb 2008 16:22:49 +0900 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> Message-ID: Tried with self-compiled python. (revision 60972, python 2.6) The result was the same. Could you tell me how you installed lxml? On Mon, Feb 18, 2008 at 11:02 PM, Christian Zagrodnick wrote: > On 2008-02-18 14:57:28 +0100, js said: > > > Not lxml's problem but my box's? > > Could you run it with MacPorts python? > > Sorry, I don't use MacPorts. > > Try with a self-compiled python. > > -- > > > Christian Zagrodnick > > gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale > www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 > > > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > From cz at gocept.com Sat Feb 23 12:01:23 2008 From: cz at gocept.com (Christian Zagrodnick) Date: Sat, 23 Feb 2008 12:01:23 +0100 Subject: [lxml-dev] Default prefixes for common XML namespaces *sometimes* doesn't work References: <47BE7289.1090705@behnel.de> Message-ID: On 2008-02-22 07:58:17 +0100, Stefan Behnel said: > Hi, > > Christian Zagrodnick wrote: >> in 2.0alpha5 the feature was added that there are defaults for some >> namespace prefixes (like xsi, xsd, py). > > -py here. > > >> Problem is, I sometimes still get things like >> >> > ns0:pytype="str">bla >> >> And I don't know why. What are the conditions when LXML uses (in this >> case) the py: prefix and when not? Or is it a bug? > > Just wasn't in the dict of default prefixes. I added it. Oh. So I must have seen ghosts... sorry :) But thanks for adding! -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Sat Feb 23 18:49:03 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 23 Feb 2008 18:49:03 +0100 Subject: [lxml-dev] Instantiated ObjectifiedElement In-Reply-To: References: <47BD15D5.4020808@behnel.de> Message-ID: <47C05C8F.4040006@behnel.de> Hi, Evan Grim wrote: > Stefan Behnel behnel.de> writes: >> Evan Grim wrote: >>> >>> from lxml import etree, objectify >>> >>> class parent(objectify.ObjectifiedElement): >>> pass >>> >>> p = parent() >>> print p.tag >>> > I haven't had a chance to look at the underlying code, > but if it isn't too difficult wouldn't an AssertionError (or some other > exception) be better than crashing? Sure, if you tell me how to do that without a performance impact? But the bigger problem is that you can't easily distinguish between an instantiation from within lxml and an instantiation by user code. > Multi-step changes are definitely a tricky wicket when > it comes to this approach, but (at least for the use-cases I'm encountering) > can be handled by careful interface designs that make sure the library user > has all the tools they need. Then I'd say validity checks should be in the 'tools' as well. > lxml's validation capabilities are great for document level validation. But > what I'm wanting deviates from the current etree/objectify in what I believe > is currently an incompatible way: I want to be able to go beyond document > level validation and instantiate the various types defined within the schema > as individual objects that can be modified and added to other > schema-type-based objects and all the while ensuring the schema rules are > never violated. I don't know your specific setup, but you can always apply a part of a schema to a specific Element. > As an aside related to validation: I notice that xsd:unique tags aren't > validated [ref: http://www.w3.org/TR/xmlschema-0/#specifyingUniqueness]. Is > this planned for the future? Are there any other XMLSchema gotchas of which > I should be aware? That's quite possible. libxml2's XML schema implementation is not 100% complete. But that's nothing that can be changed in lxml. >> http://codespeak.net/lxml/objectify.html#advanced-element-class-lookup >> http://codespeak.net/lxml/element_classes.html#setting-up-a-class-lookup-scheme > > This is originally the tack I was taking, but quickly ran into the issue of > not being able to instantiate the objects in the manner I discuss above. > Since running into this problem I've taken to using a custom class for each > type that acts an intermediary to an underlying lxml element. Mapping the types defined in a schema to the Elements in an XML tree is a topic that has been suggested a couple of times already. lxml's Element class lookup mechanisms allow for pretty sophisticated uses, but there isn't an automatic way to assign classes from schema types. You'll have to do that yourself. Stefan From stefan_ml at behnel.de Sun Feb 24 22:23:11 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Feb 2008 22:23:11 +0100 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> Message-ID: <47C1E03F.7040706@behnel.de> Hi, js wrote: > Tried with self-compiled python. (revision 60972, python 2.6) > The result was the same. > > Could you tell me how you installed lxml? He's using a buildout, as he said before. Stefan From stefan_ml at behnel.de Sun Feb 24 22:25:52 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Feb 2008 22:25:52 +0100 Subject: [lxml-dev] lxml 2.0.1 released In-Reply-To: References: <47B367F1.1090903@behnel.de> <47B99009.2030503@behnel.de> Message-ID: <47C1E0E0.1090808@behnel.de> Hi, js wrote: > I tried passing --auto-rpath to setup.py, but rpath didn't came up in gcc line. > xslt-config seems right one for me because this seemed using > /opt/local's ones. Sorry, I checked now: distutils do not support rpath on MacOS. So this path is not worth going... Stefan From ianb at colorstudy.com Mon Feb 25 23:34:32 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 25 Feb 2008 16:34:32 -0600 Subject: [lxml-dev] PyUnicodeUCS2_Decode errors Message-ID: <47C34278.1000602@colorstudy.com> Lately we've been having problems with errors like "undefined symbol: PyUnicodeUCS2_Decode" when we do "import lxml.etree". When we build lxml from source the errors go away. I'm guessing this is because of systems that use UCS4 instead of UCS2, and lxml eggs that were compiled differently then the system. Is this the case? Is this a sign something is missing from the platform signature of egg files? Ian From alexander.kozlovsky at gmail.com Tue Feb 26 14:30:10 2008 From: alexander.kozlovsky at gmail.com (Alexander Kozlovsky) Date: Tue, 26 Feb 2008 16:30:10 +0300 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries Message-ID: <181320030.20080226163010@gmail.com> Hello! I'm trying to use EXSLT extension functions on Windows with standard lxml binary distribution (lxml-1.3.6.win32-py2.4.exe) I'm trying to do the next, but it is not work as expected:

test

Is it possible to use EXSLT extensions on Windows without rebuilding of standard binaries? Thanks in advance -- Best regards, Alexander mailto:alexander.kozlovsky at gmail.com From stefan_ml at behnel.de Tue Feb 26 14:58:30 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 26 Feb 2008 14:58:30 +0100 (CET) Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: <181320030.20080226163010@gmail.com> References: <181320030.20080226163010@gmail.com> Message-ID: <18674.194.114.62.34.1204034310.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Hi, Alexander Kozlovsky wrote: > I'm trying to use EXSLT extension functions on Windows > with standard lxml binary distribution (lxml-1.3.6.win32-py2.4.exe) > > I'm trying to do the next, but it is not work as expected: What result do you get? > > xmlns:xsl='http://www.w3.org/1999/XSL/Transform' > xmlns:str="http://exslt.org/strings" Try removing the following line (except for the '>', obviously): > extension-element-prefixes="str"> > > >

test

> >
> >
> > Is it possible to use EXSLT extensions on Windows without rebuilding > of standard binaries? Never tried, but I wouldn't know a reason why this should fail. Stefan From stefan_ml at behnel.de Tue Feb 26 15:03:00 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 26 Feb 2008 15:03:00 +0100 (CET) Subject: [lxml-dev] PyUnicodeUCS2_Decode errors In-Reply-To: <47C34278.1000602@colorstudy.com> References: <47C34278.1000602@colorstudy.com> Message-ID: <53970.194.114.62.34.1204034580.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Hi Ian, Ian Bicking wrote: > Lately we've been having problems with errors like "undefined symbol: > PyUnicodeUCS2_Decode" when we do "import lxml.etree". When we build > lxml from source the errors go away. > > I'm guessing this is because of systems that use UCS4 instead of UCS2, > and lxml eggs that were compiled differently then the system. Is this > the case? I guess so, yes. > Is this a sign something is missing from the platform > signature of egg files? Definitely. The right place to ask this is the distutils SIG mailing list. Stefan From alexander.kozlovsky at gmail.com Tue Feb 26 15:54:04 2008 From: alexander.kozlovsky at gmail.com (Alexander Kozlovsky) Date: Tue, 26 Feb 2008 17:54:04 +0300 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: <18674.194.114.62.34.1204034310.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <181320030.20080226163010@gmail.com> <18674.194.114.62.34.1204034310.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <1665956445.20080226175404@gmail.com> > Alexander Kozlovsky wrote: >> I'm trying to use EXSLT extension functions on Windows >> with standard lxml binary distribution (lxml-1.3.6.win32-py2.4.exe) >> >> I'm trying to do the next, but it is not work as expected: > > What result do you get? This exception: XSLTApplyError: Internal error: Failed to evaluate the AVT of attribute 'class' > Try removing the following line (except for the '>', obviously): > >> extension-element-prefixes="str"> I got the same error >> Is it possible to use EXSLT extensions on Windows without rebuilding >> of standard binaries? > > Never tried, but I wouldn't know a reason why this should fail. Probably I misread this old message: http://codespeak.net/pipermail/lxml-dev/2006-April/001098.html "However, this requires linking against libexslt" So, I have supposed (maybe incorrectly) standard widows binaries does not linked against libexslt -- Best regards, Alexander mailto:alexander.kozlovsky at gmail.com From ianb at colorstudy.com Tue Feb 26 19:08:24 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 26 Feb 2008 12:08:24 -0600 Subject: [lxml-dev] PyUnicodeUCS2_Decode errors In-Reply-To: <53970.194.114.62.34.1204034580.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <47C34278.1000602@colorstudy.com> <53970.194.114.62.34.1204034580.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <47C45598.3090803@colorstudy.com> Stefan Behnel wrote: > Hi Ian, > > Ian Bicking wrote: >> Lately we've been having problems with errors like "undefined symbol: >> PyUnicodeUCS2_Decode" when we do "import lxml.etree". When we build >> lxml from source the errors go away. >> >> I'm guessing this is because of systems that use UCS4 instead of UCS2, >> and lxml eggs that were compiled differently then the system. Is this >> the case? > > I guess so, yes. I think I must be confusing something, as I realize now there aren't any Linux eggs on PyPI, just the tarballs. (We have some private eggs we've built, and maybe those are what's causing the problem, but we'll have to investigate more.) Ian From stefan_ml at behnel.de Mon Feb 25 16:38:57 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 25 Feb 2008 16:38:57 +0100 Subject: [lxml-dev] Instantiated ObjectifiedElement In-Reply-To: <47C05C8F.4040006@behnel.de> References: <47BD15D5.4020808@behnel.de> <47C05C8F.4040006@behnel.de> Message-ID: <47C2E111.7020100@behnel.de> Hi again, Stefan Behnel wrote: > Evan Grim wrote: >> As an aside related to validation: I notice that xsd:unique tags aren't >> validated [ref: http://www.w3.org/TR/xmlschema-0/#specifyingUniqueness]. Is >> this planned for the future? Are there any other XMLSchema gotchas of which >> I should be aware? > > That's quite possible. libxml2's XML schema implementation is not 100% > complete. But that's nothing that can be changed in lxml. In case I wasn't clear, this means: the best place to ask is the libxml2 mailing list. Send them a schema snippet that doesn't work with their xmllint processor (part of libxml2), describe the problem, and with a bit of luck it will be in the next release. Stefan From stefan_ml at behnel.de Mon Feb 25 17:19:39 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 25 Feb 2008 17:19:39 +0100 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47B9C8AD.1050502@colorstudy.com> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> <47B9C8AD.1050502@colorstudy.com> Message-ID: <47C2EA9B.8030007@behnel.de> Hi, Ian Bicking wrote: > Probably not a problem, but it doesn't seem that much like a problem to > make it writable too. Especially since the document itself is writable. > Once you've edited the document, it's not *the* document at that URL > anyway. Maybe you get a page, edit it, and serve it at a new location. > Deliverance does this by getting the theme page, then injecting the > content into that page -- but the theme page is the originally-parsed > object, though it will be served at a different location. I'd like to > be able to fix up that data. And I'm not sure how I'd make a copy of a > document with a new URL, if the URL/document link is immutable. (Right > now I'm mostly ignoring the URL, but it would be nice if I could > actually trust it.) Setting the document URL works on the current trunk. I also added a "base" property to Elements that is based on the xml:base attribute (or the appropriate fallback to the document URL). Stefan From ianb at colorstudy.com Tue Feb 26 21:55:28 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 26 Feb 2008 14:55:28 -0600 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47C2EA9B.8030007@behnel.de> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> <47B9C8AD.1050502@colorstudy.com> <47C2EA9B.8030007@behnel.de> Message-ID: <47C47CC0.7090904@colorstudy.com> Stefan Behnel wrote: > Setting the document URL works on the current trunk. Cool. > I also added a "base" property to Elements that is based on the xml:base > attribute (or the appropriate fallback to the document URL). Hmm... there's a property in lxml.html called .base_url, which previously just read docinfo.URL. Now it could read .base... but obviously that's silly, as it's just an alias. We could deprecate .base_url in lxml.html, or rename .base as .base_url, but having both ain't good. Ian From stefan_ml at behnel.de Tue Feb 26 21:56:28 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 26 Feb 2008 21:56:28 +0100 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: <1665956445.20080226175404@gmail.com> References: <181320030.20080226163010@gmail.com> <18674.194.114.62.34.1204034310.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <1665956445.20080226175404@gmail.com> Message-ID: <47C47CFC.4070800@behnel.de> Hi, Alexander Kozlovsky wrote: >> Alexander Kozlovsky wrote: >>> I'm trying to use EXSLT extension functions on Windows >>> with standard lxml binary distribution (lxml-1.3.6.win32-py2.4.exe) >>> >>> I'm trying to do the next, but it is not work as expected: >> What result do you get? > > This exception: > > XSLTApplyError: Internal error: Failed to evaluate the AVT of attribute 'class' > >> Try removing the following line (except for the '>', obviously): >> >>> extension-element-prefixes="str"> > > I got the same error Interesting. Could you run the test suite from the source distribution? There are a couple of EXSLT tests in there. Just unpack the source tar.gz from PyPI and (having lxml installed) run "python test.py". >>> Is it possible to use EXSLT extensions on Windows without rebuilding >>> of standard binaries? >> Never tried, but I wouldn't know a reason why this should fail. > > Probably I misread this old message: > http://codespeak.net/pipermail/lxml-dev/2006-April/001098.html > > "However, this requires linking against libexslt" > > So, I have supposed (maybe incorrectly) standard widows binaries > does not linked against libexslt That shouldn't have anything to do with it. Stefan From faassen at startifact.com Wed Feb 27 01:44:21 2008 From: faassen at startifact.com (Martijn Faassen) Date: Wed, 27 Feb 2008 01:44:21 +0100 Subject: [lxml-dev] PyUnicodeUCS2_Decode errors In-Reply-To: <47C45598.3090803@colorstudy.com> References: <47C34278.1000602@colorstudy.com> <53970.194.114.62.34.1204034580.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47C45598.3090803@colorstudy.com> Message-ID: Ian Bicking wrote: > Stefan Behnel wrote: >> Hi Ian, >> >> Ian Bicking wrote: >>> Lately we've been having problems with errors like "undefined symbol: >>> PyUnicodeUCS2_Decode" when we do "import lxml.etree". When we build >>> lxml from source the errors go away. >>> >>> I'm guessing this is because of systems that use UCS4 instead of UCS2, >>> and lxml eggs that were compiled differently then the system. Is this >>> the case? >> I guess so, yes. > > I think I must be confusing something, as I realize now there aren't any > Linux eggs on PyPI, just the tarballs. (We have some private eggs we've > built, and maybe those are what's causing the problem, but we'll have to > investigate more.) Right - we actually introduced the policy here not to release eggs, just arballs to avoid just this problem, so I was surprised to see you ran into it again. The only platform we release eggs on is Windows, where there should be less problems as there is an established binary Python interpreter released by python.org. As far as I'm aware this issue should entirely go away if you compile yourself. There is no obvious way to resolve this with distutils/setuptools. I brought this up a long time ago when I first ran into it. The conclusion is, if I recall correctly, to extend the software so it encodes the UCS encoding of the Python version into the file name of the egg. Far from ideal.. Regards, Martijn From stefan_ml at behnel.de Wed Feb 27 07:59:13 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 27 Feb 2008 07:59:13 +0100 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: <181320030.20080226163010@gmail.com> References: <181320030.20080226163010@gmail.com> Message-ID: <47C50A41.2050507@behnel.de> Hi, Alexander Kozlovsky wrote: > I'm trying to use EXSLT extension functions on Windows > with standard lxml binary distribution (lxml-1.3.6.win32-py2.4.exe) > > I'm trying to do the next, but it is not work as expected: > > > xmlns:str="http://exslt.org/strings" extension-element-prefixes="str"> > > > >

test

> >
> >
This definitely works for me on Linux: ------------------------------ def test_exslt_str_attribute_replace(self): tree = self.parse('
BC') style = self.parse('''\

test

''') st = etree.XSLT(style) res = st(tree) self.assertEquals('''\ \n

test

\n''', str(res)) ------------------------------ May be a problem with the Windows build? Stefan From stefan_ml at behnel.de Thu Feb 28 11:23:29 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 28 Feb 2008 11:23:29 +0100 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47C47CC0.7090904@colorstudy.com> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> <47B9C8AD.1050502@colorstudy.com> <47C2EA9B.8030007@behnel.de> <47C47CC0.7090904@colorstudy.com> Message-ID: <47C68BA1.3090902@behnel.de> Hi, Ian Bicking wrote: > Stefan Behnel wrote: >> I also added a "base" property to Elements that is based on the xml:base >> attribute (or the appropriate fallback to the document URL). > > Hmm... there's a property in lxml.html called .base_url, which > previously just read docinfo.URL. Now it could read .base... but > obviously that's silly, as it's just an alias. > > We could deprecate .base_url in lxml.html, or rename .base as .base_url, > but having both ain't good. I agree, wasn't aware of it. (Here, we are actually lucky that it wasn't writable already!) But 'base' is a better name for the XML environment given 'xml:base'. It feels weird to set '.base_url' and have it set an xml:base attribute on the Element. Also, it might just be a URI, although that's unlikely. Don't you think it should behave differently for XML and HTML? For XML, I'd expect it to depend on xml:base, while for HTML, it'd rather always depend on the document URL (and not set an xml:base attribute on assignment). Stefan From ianb at colorstudy.com Thu Feb 28 18:06:03 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 28 Feb 2008 11:06:03 -0600 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47C68BA1.3090902@behnel.de> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> <47B9C8AD.1050502@colorstudy.com> <47C2EA9B.8030007@behnel.de> <47C47CC0.7090904@colorstudy.com> <47C68BA1.3090902@behnel.de> Message-ID: <47C6E9FB.1060903@colorstudy.com> Stefan Behnel wrote: > Hi, > > Ian Bicking wrote: >> Stefan Behnel wrote: >>> I also added a "base" property to Elements that is based on the xml:base >>> attribute (or the appropriate fallback to the document URL). >> Hmm... there's a property in lxml.html called .base_url, which >> previously just read docinfo.URL. Now it could read .base... but >> obviously that's silly, as it's just an alias. >> >> We could deprecate .base_url in lxml.html, or rename .base as .base_url, >> but having both ain't good. > > I agree, wasn't aware of it. (Here, we are actually lucky that it wasn't > writable already!) > > But 'base' is a better name for the XML environment given 'xml:base'. It feels > weird to set '.base_url' and have it set an xml:base attribute on the Element. > Also, it might just be a URI, although that's unlikely. > > Don't you think it should behave differently for XML and HTML? For XML, I'd > expect it to depend on xml:base, while for HTML, it'd rather always depend on > the document URL (and not set an xml:base attribute on assignment). Sure, they act somewhat differently, but does it make sense to use two different names? I think they mean similar things in both cases, though perhaps the per-element base attribute in HTML shouldn't be writable. (Though the tree is kind of this weird invisible thing that you wouldn't know is there except for things like docinfo.URL, but a little documentation can fix that of course.) Ian From stefan_ml at behnel.de Thu Feb 28 18:30:29 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 28 Feb 2008 18:30:29 +0100 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: <181320030.20080226163010@gmail.com> References: <181320030.20080226163010@gmail.com> Message-ID: <47C6EFB5.70006@behnel.de> Hi, Alexander Kozlovsky wrote: > I'm trying to use EXSLT extension functions on Windows > with standard lxml binary distribution (lxml-1.3.6.win32-py2.4.exe) > > I'm trying to do the next, but it is not work as expected: > > > xmlns:str="http://exslt.org/strings" extension-element-prefixes="str"> > > > >

test

> >
> >
> > Is it possible to use EXSLT extensions on Windows without rebuilding > of standard binaries? I just checked the release notes of libxslt. They say that str:replace was "improved" in libxslt 1.1.20. However, my tests show that it does not work in any version before 1.1.21, so I assume the binary build uses 1.1.20 or an older version. http://xmlsoft.org/XSLT/news.html Stefan From sidnei at enfoldsystems.com Thu Feb 28 18:45:03 2008 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Thu, 28 Feb 2008 14:45:03 -0300 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: <47C6EFB5.70006@behnel.de> References: <181320030.20080226163010@gmail.com> <47C6EFB5.70006@behnel.de> Message-ID: On Thu, Feb 28, 2008 at 2:30 PM, Stefan Behnel wrote: > I just checked the release notes of libxslt. They say that str:replace was > "improved" in libxslt 1.1.20. However, my tests show that it does not work in > any version before 1.1.21, so I assume the binary build uses 1.1.20 or an > older version. I am almost sure we are using 1.1.19 for building the binary. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From alexander.kozlovsky at gmail.com Thu Feb 28 21:34:15 2008 From: alexander.kozlovsky at gmail.com (Alexander Kozlovsky) Date: Thu, 28 Feb 2008 23:34:15 +0300 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: References: <181320030.20080226163010@gmail.com> <47C6EFB5.70006@behnel.de> Message-ID: <151111847.20080228233415@gmail.com> Sidnei da Silva wrote: > On Thu, Feb 28, 2008 at 2:30 PM, Stefan Behnel wrote: >> I just checked the release notes of libxslt. They say that str:replace was >> "improved" in libxslt 1.1.20. However, my tests show that it does not work in >> any version before 1.1.21, so I assume the binary build uses 1.1.20 or an >> older version. > > I am almost sure we are using 1.1.19 for building the binary. Yes, I checked libxslt version of Windows build and it is 1.1.19 -- Best regards, Alexander mailto:alexander.kozlovsky at gmail.com From stefan_ml at behnel.de Thu Feb 28 21:51:56 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 28 Feb 2008 21:51:56 +0100 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: References: <181320030.20080226163010@gmail.com> <47C6EFB5.70006@behnel.de> Message-ID: <47C71EEC.20202@behnel.de> Hi Sidnei, Sidnei da Silva wrote: > On Thu, Feb 28, 2008 at 2:30 PM, Stefan Behnel wrote: >> I just checked the release notes of libxslt. They say that str:replace was >> "improved" in libxslt 1.1.20. However, my tests show that it does not work in >> any version before 1.1.21, so I assume the binary build uses 1.1.20 or an >> older version. > > I am almost sure we are using 1.1.19 for building the binary. "are using" means: also for the lxml 2.0.x builds? Stefan From sidnei at enfoldsystems.com Thu Feb 28 22:46:18 2008 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Thu, 28 Feb 2008 18:46:18 -0300 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: <47C71EEC.20202@behnel.de> References: <181320030.20080226163010@gmail.com> <47C6EFB5.70006@behnel.de> <47C71EEC.20202@behnel.de> Message-ID: On Thu, Feb 28, 2008 at 5:51 PM, Stefan Behnel wrote: > "are using" means: also for the lxml 2.0.x builds? Yes. I remember back at the time there were issues with .20 or .21 (can't recall) so you told me to stick around with .19. I completely missed that .22 was out. What should we do? Release new builds of 1.3.x with updated libxslt? I haven't built the latest 2.x yet, so that one should get the newer libxslt (as soon as zlatkovic.com is back). -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Fri Feb 29 09:32:43 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 29 Feb 2008 09:32:43 +0100 Subject: [lxml-dev] Using EXSLT extensions on Windows with standard lxml binaries In-Reply-To: References: <181320030.20080226163010@gmail.com> <47C6EFB5.70006@behnel.de> <47C71EEC.20202@behnel.de> Message-ID: <47C7C32B.8030704@behnel.de> Hi, Sidnei da Silva wrote: > On Thu, Feb 28, 2008 at 5:51 PM, Stefan Behnel wrote: >> "are using" means: also for the lxml 2.0.x builds? > > Yes. I remember back at the time there were issues with .20 or .21 > (can't recall) so you told me to stick around with .19. I completely > missed that .22 was out. > > What should we do? Release new builds of 1.3.x with updated libxslt? This is not a critical problem, so I wouldn't do a re-release. If you can build 2.0.2 with a newer libxslt, that's just fine. I currently don't have the time to backport fixes for a 1.3.7 release, but once that gets done, we'll have that problem sorted out as well. Is there a way you could document the libxml2/libxslt versions used when uploading binaries? Like, in the file comment on PyPI? > I haven't built the latest 2.x yet, so that one should get the newer > libxslt (as soon as zlatkovic.com is back). This works just fine for me, and it has libxslt 1.1.22: ftp://ftp.zlatkovic.com/pub/libxml Stefan From stefan_ml at behnel.de Fri Feb 29 21:17:12 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 29 Feb 2008 21:17:12 +0100 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47C6E9FB.1060903@colorstudy.com> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> <47B9C8AD.1050502@colorstudy.com> <47C2EA9B.8030007@behnel.de> <47C47CC0.7090904@colorstudy.com> <47C68BA1.3090902@behnel.de> <47C6E9FB.1060903@colorstudy.com> Message-ID: <47C86848.80003@behnel.de> Hi, Ian Bicking wrote: > Stefan Behnel wrote: >> Don't you think it should behave differently for XML and HTML? For >> XML, I'd >> expect it to depend on xml:base, while for HTML, it'd rather always >> depend on >> the document URL (and not set an xml:base attribute on assignment). > > Sure, they act somewhat differently, but does it make sense to use two > different names? I think they mean similar things in both cases, though > perhaps the per-element base attribute in HTML shouldn't be writable. > (Though the tree is kind of this weird invisible thing that you wouldn't > know is there except for things like docinfo.URL, but a little > documentation can fix that of course.) ok, I do prefer 'base' then, though, as it matches xml:base. It also makes less sense in the HTML area than in the XML area, where you actually /have/ something like a base URL of an element, rather than just a URL of a document that the Element happens to be in. So, if you move an HTML Element from one tree to another, it will change its base URL, while in the XML world, you /can/ work around that if you need/want to. I think we should deprecate 'base_url' in favour of 'base', and document the respective behaviour in the doc strings of both properties. Stefan From ianb at colorstudy.com Fri Feb 29 23:35:52 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 29 Feb 2008 16:35:52 -0600 Subject: [lxml-dev] Setting URL from lxml.html.fromstring, etc In-Reply-To: <47C86848.80003@behnel.de> References: <47B8C56E.3090106@colorstudy.com> <47B942CD.5090501@behnel.de> <47B9C8AD.1050502@colorstudy.com> <47C2EA9B.8030007@behnel.de> <47C47CC0.7090904@colorstudy.com> <47C68BA1.3090902@behnel.de> <47C6E9FB.1060903@colorstudy.com> <47C86848.80003@behnel.de> Message-ID: <47C888C8.50102@colorstudy.com> Stefan Behnel wrote: > Hi, > > Ian Bicking wrote: >> Stefan Behnel wrote: >>> Don't you think it should behave differently for XML and HTML? For >>> XML, I'd >>> expect it to depend on xml:base, while for HTML, it'd rather always >>> depend on >>> the document URL (and not set an xml:base attribute on assignment). >> Sure, they act somewhat differently, but does it make sense to use two >> different names? I think they mean similar things in both cases, though >> perhaps the per-element base attribute in HTML shouldn't be writable. >> (Though the tree is kind of this weird invisible thing that you wouldn't >> know is there except for things like docinfo.URL, but a little >> documentation can fix that of course.) > > ok, I do prefer 'base' then, though, as it matches xml:base. It also makes > less sense in the HTML area than in the XML area, where you actually /have/ > something like a base URL of an element, rather than just a URL of a document > that the Element happens to be in. So, if you move an HTML Element from one > tree to another, it will change its base URL, while in the XML world, you > /can/ work around that if you need/want to. > > I think we should deprecate 'base_url' in favour of 'base', and document the > respective behaviour in the doc strings of both properties. OK. Then would the html base attribute just be a read-only property then? Like: def base(self): return super(HtmlElement, self).base base = property(base) I'm not terribly concerned about whether it is read-only or not. It's a little fuzzy, since HTML is parsed to the lxml representation, and though it will probably be serialized to HTML again (if it is serialized at all) and HTML doesn't have anything like xml:base, the lxml representation is not itself exactly HTML. And if you serialize to XHTML, then xml:base is available. Also translating HTML to XHTML is kind of an outstanding issue for lxml.html, and it seems reasonable to me that XHTML could be parsed into the same classes as HTML. The only real caveat there is that XHTML uses different (namespaced) tag names. If you remove the tag names, then the classes and the lookup applies just fine. (Presumably the lookup could be changed to support XHTML fairly easily.) Ian