From sidnei at enfoldsystems.com Fri Jun 1 05:14:47 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Fri, 1 Jun 2007 00:14:47 -0300 Subject: [lxml-dev] Crash on Win32 under heavy stress Message-ID: Hi there, We're doing some stress testing before rolling some code based on lxml into production and we've been able to reproduce a crash when reusing the same XSLT object repeatedly. I will dump some information here in the hope that anyone can shed a light before I go trying to compile debug versions of everything and the kitchen sink. The test is composed of a rather small XML file, and a rather big and complex XSLT with several etc. We fire up 10 threads. Each one has it's own parsed XML file and XSLT, they are not being shared across threads. Each thread then goes on a loop, applying the XSLT to the XML file and serializing the result to a string. With a less than 1000 iterations the crash almost never happens. At about 50000 iterations, the crash is pretty much guaranteed to happen. There doesn't seem to be any memory leak or anything, memory usage is quite stable. This is using the 1.2.1 release. This is a static build on Win32, against: libxml2-2.6.26.win32 libxslt-1.1.17.win32 zlib-1.2.3.win32 iconv-1.9.2.win32 The crash information says its an 'access violation' and it happens somewhere in etree.pyd, when I bring up the debugger only etree.pyd and python24.dll are on the stack (which I guess is pretty useless information). I am hoping to make a debug build to get more information. I can try running the same code on Linux and seeing if it happens there too, or I can try a debug build, or hopefully someone will come out and say I'm doing something wrong here or that the version of libxml2 I'm using has known issues. I can also provide the test code + data upon request. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Fri Jun 1 06:36:34 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Jun 2007 06:36:34 +0200 Subject: [lxml-dev] Crash on Win32 under heavy stress In-Reply-To: References: Message-ID: <465FA252.1010704@behnel.de> Hi Sidnei, I'll be away on vacation starting today, so I regret I can't provide too much critical help at the moment. So, just a quick note here. Sidnei da Silva wrote: > This is using the 1.2.1 release. For a quick test, try with 1.3beta or the current trunk. No guarantee, though, especially the trunk is not necessarily in a perfect production state. > This is a static build on Win32, against: > > libxml2-2.6.26.win32 > libxslt-1.1.17.win32 > zlib-1.2.3.win32 > iconv-1.9.2.win32 Ok, first thing I'd personally try is the latest libxml2 and libxslt. The guys over there keep fixing bugs (even really old ones in recent versions), so things tend to get better over time. For example, I get a reproduceable XPath crasher in the HTML module Ian is working on with libxml2 2.6.27. It's gone with 2.6.28. libxslt is a good bet here, too. Sorry if this doesn't help, but trying other versions is as much as I can propose at the moment, especially if it's urgent. Ah, one last thing: I know, you're not testing threads for fun but for performance, but if that doesn't prove reliable for your application, you can still switch them off with --without-threading. Reliability is usually more important than performance for production. Stefan From ianb at colorstudy.com Fri Jun 1 06:43:30 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 31 May 2007 23:43:30 -0500 Subject: [lxml-dev] Crash on Win32 under heavy stress In-Reply-To: <465FA252.1010704@behnel.de> References: <465FA252.1010704@behnel.de> Message-ID: <465FA3F2.9030003@colorstudy.com> Stefan Behnel wrote: > Ok, first thing I'd personally try is the latest libxml2 and libxslt. The guys > over there keep fixing bugs (even really old ones in recent versions), so > things tend to get better over time. For example, I get a reproduceable XPath > crasher in the HTML module Ian is working on with libxml2 2.6.27. It's gone > with 2.6.28. libxslt is a good bet here, too. Which XPath is that? I'd rather avoid it if I can, for those that might have 2.6.27. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Fri Jun 1 06:46:46 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 31 May 2007 23:46:46 -0500 Subject: [lxml-dev] html branch In-Reply-To: <465F3A17.9080508@behnel.de> References: <465C48A0.8030808@colorstudy.com> <465C9E02.6030308@behnel.de> <465CA4C5.4050308@colorstudy.com> <465D14DF.9060304@behnel.de> <465F1683.4090804@colorstudy.com> <465F3A17.9080508@behnel.de> Message-ID: <465FA4B6.8040302@colorstudy.com> Stefan Behnel wrote: >>>>>> lxml.[html.]clean: clean Javascript and other problem code from HTML >>>>> That rather looks like an HtmlElement method to me: "cleanup(...)", >>>>> and the >>>>> clean_html() function would fit right into the top-level of the >>>>> lxml.html module. >>>> The long signature of the function made me reluctant to do this. Any >>>> function with that many parameters feels non-authoritative to me. And I >>>> would encourage people to actually write their own clean function with >>>> the parameter defaults that are appropriate for their domain (e.g., >>>> clean_untrusted_comment, clean_wysiwyg_submission, etc). I just guessed >>>> reasonable defaults for those keyword arguments. >>> Ah, ok, good point. Still, I would like to keep the number of modules >>> low. >>> lxml.html should be as close to "one point for solving your HTML >>> needs" as >>> possible. >> OK. *Actually* putting them all in one module would make the module >> feel too big to me. I could import them all into __init__.py. That >> might make the import unnecessarily slow, I'm not sure. > > Avoiding imports tends to be not worth the effort. It already takes a while to > import etree, so importing some more Python modules doesn't add much. > > >> For some reason I've never used lazy-loading functions, though the >> implementation seems obvious enough; just something like: >> >> def clean(*args, **kw): >> from lxml.html import clean >> return clean(*args, **kw) >> >> It breaks documentation tools, I guess (though at least I can refer to >> the real function in the docstring). > > I wouldn't do that. Calling things happens much more often than importing > them, so adding overhead to the call that is usually done only once feels > wrong to me. The overhead of an import (if the module has already been imported) isn't very significant, and could be cached easily enough. That said, the clean module isn't particularly large and doesn't import much itself. But htmldiff is the only module of substantial size. I've integrated rewritelinks directly into __init__, which after refactoring the algorithm a bit isn't very big anyway. I dunno; I'm okay just requiring htmldiff to be imported directly, and importing clean into __init__. >> For a number of the methods I'd also like a function version that takes >> a string and returns a string. I think this makes it easier to convince >> people to use the functions. Obviously this doesn't make sense for a >> lot of the methods, but does for clean, htmldiff, make_links_absolute, >> and maybe rewrite_links. > > I like that pattern, too. So I made a generic wrapper for exposing methods as functions. It parses the first argument if it's not already a parsed document, then does something, and returns the result of the method or the serialized form of the document if the method returns None. This might be a bit too fancy/automatic. But anyway, putting that aside, I was thinking that maybe the general pattern should be like: def make_links_absolute(doc, base_href, fragment=False): if isinstance(doc, basestring): if fragment: doc = parse_element(doc) else: doc = HTML(doc) return_string = True else: doc = copy.deepcopy(doc) return_string = False doc.make_links_absolute(doc, base_href) if return_string: return tostring(doc) else: return doc This makes the function also a handy way to do functional-style transformations of elements. It bothers me a bit to change the return type (which I generally dislike doing), except that it matches the input type which seems like it might be okay. Does this seem okay? Also, I'm wondering if (a) I should try to automatically determine fragment unless it is explicitly given, and/or (b) if parse_element doesn't work (raises an exception) I should use parse_element(doc, create_parent=True) which will wrap the fragment in a
. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From sidnei at enfoldsystems.com Fri Jun 1 14:31:38 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Fri, 1 Jun 2007 09:31:38 -0300 Subject: [lxml-dev] Crash on Win32 under heavy stress In-Reply-To: <465FA252.1010704@behnel.de> References: <465FA252.1010704@behnel.de> Message-ID: Hi Stephan, On 6/1/07, Stefan Behnel wrote: > Sidnei da Silva wrote: > > This is using the 1.2.1 release. > > For a quick test, try with 1.3beta or the current trunk. No guarantee, though, > especially the trunk is not necessarily in a perfect production state. I wish I could compile trunk, but I haven't figured out a resolution to that issue with MSVC. :( > > This is a static build on Win32, against: > > > > libxml2-2.6.26.win32 > > libxslt-1.1.17.win32 > > zlib-1.2.3.win32 > > iconv-1.9.2.win32 > > Ok, first thing I'd personally try is the latest libxml2 and libxslt. The guys > over there keep fixing bugs (even really old ones in recent versions), so > things tend to get better over time. For example, I get a reproduceable XPath > crasher in the HTML module Ian is working on with libxml2 2.6.27. It's gone > with 2.6.28. libxslt is a good bet here, too. I've updated to libxml2-2.6.27 and libxslt-1.1.19 and it hasn't crashed so far. I've mailed Igor Zlatkovic to see if he will be building binaries for the latest libxml2/libxslt anytime soon. > Sorry if this doesn't help, but trying other versions is as much as I can > propose at the moment, especially if it's urgent. That was good advice, and it seems to have solved the immediate issue. Thank you a lot! > Ah, one last thing: I know, you're not testing threads for fun but for > performance, but if that doesn't prove reliable for your application, you can > still switch them off with --without-threading. Reliability is usually more > important than performance for production. Is there any document describing the effects of --without-threading? I guess it's safe to enable that flag if you are sure you won't share objects between threads? -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From itamar at itamarst.org Fri Jun 1 19:57:01 2007 From: itamar at itamarst.org (Itamar Shtull-Trauring) Date: Fri, 01 Jun 2007 13:57:01 -0400 Subject: [lxml-dev] Network downloading of schemas should be off by default? Message-ID: <1180720621.30394.24.camel@localhost.localdomain> Right now, AFAICT, is is on by default in lxml.etree.XMLParser. Network queries by library code are a bad idea: it's an unexpected behavior, causing potential security risk and guaranteed performance problems. -- Itamar Shtull-Trauring http://itamarst.org From fdrake at gmail.com Fri Jun 1 20:13:35 2007 From: fdrake at gmail.com (Fred Drake) Date: Fri, 1 Jun 2007 14:13:35 -0400 Subject: [lxml-dev] Network downloading of schemas should be off by default? In-Reply-To: <1180720621.30394.24.camel@localhost.localdomain> References: <1180720621.30394.24.camel@localhost.localdomain> Message-ID: <9cee7ab80706011113k21b4af85nab356d3a418b88af@mail.gmail.com> On 6/1/07, Itamar Shtull-Trauring wrote: > Right now, AFAICT, is is on by default in lxml.etree.XMLParser. Network > queries by library code are a bad idea: it's an unexpected behavior, > causing potential security risk and guaranteed performance problems. I actually like the way the SAX interface handles this; you provide something that resolves references however you want, and it uses that. -Fred -- Fred L. Drake, Jr. "Chaos is the score upon which reality is written." --Henry Miller From fairwinds at eastlink.ca Sat Jun 2 18:08:14 2007 From: fairwinds at eastlink.ca (David Pratt) Date: Sat, 02 Jun 2007 13:08:14 -0300 Subject: [lxml-dev] XMLSchema validation with XMLSchema.xsd failing Message-ID: <466195EE.4000304@eastlink.ca> Can someone advise whether XMLSchema is currently able validate xml schemas. I am on mac 10.4.9 using lxml 1.3 beta using macs default libxml2. In my initial attempt, I am getting the following errors (after catching the exception in the log). I am using XMLSchema.xsd to validate against as you can see. The error comes before my next step - which would validate the schema I am interested in. Here is my session: >>> import lxml.etree >>> from am.xmlschema import xmlschema_path >>> xmlschema_doc = lxml.etree.parse(xmlschema_path) >>> try: ... xmlschema = lxml.etree.XMLSchema(xmlschema_doc) ... except Exception, e: ... print e.error_log ... /Users/davidpratt/Desktop/xmlschemademo/dev/am.xmlschema/src/am/xmlschema/XMLSchema.xsd:655:ERROR:SCHEMASP:SCHEMAP_REDEFINED_ELEMENT: Element 'element': A global element declaration with the name 'element' does already exist. /Users/davidpratt/Desktop/xmlschemademo/dev/am.xmlschema/src/am/xmlschema/XMLSchema.xsd:864:ERROR:SCHEMASP:SCHEMAP_REDEFINED_ELEMENT: Element 'element': A global element declaration with the name 'group' does already exist. >>> Many thanks Regards David From stefan_ml at behnel.de Sat Jun 2 22:28:57 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Jun 2007 22:28:57 +0200 Subject: [lxml-dev] Crash on Win32 under heavy stress In-Reply-To: References: <465FA252.1010704@behnel.de> Message-ID: <4661D309.4020203@behnel.de> Hi Sidnei, Sidnei da Silva wrote: > On 6/1/07, Stefan Behnel wrote: >> Sidnei da Silva wrote: >> > This is using the 1.2.1 release. >> >> For a quick test, try with 1.3beta or the current trunk. No guarantee, >> though, >> especially the trunk is not necessarily in a perfect production state. > > I wish I could compile trunk, but I haven't figured out a resolution > to that issue with MSVC. :( Ever tried MinGW? I read in a couple of posts that it works pretty well with Python modules - not sure about Pyrex and setuptools, though. > I've updated to libxml2-2.6.27 and libxslt-1.1.19 and it hasn't > crashed so far. I've mailed Igor Zlatkovic to see if he will be > building binaries for the latest libxml2/libxslt anytime soon. Well, if 2.6.27 works for your code, then that's fine. I just said there seem to be certain XPath expressions that make it crash. If you don't hit that problem, you should be on the safe side. >> Sorry if this doesn't help, but trying other versions is as much as I can >> propose at the moment, especially if it's urgent. > > That was good advice, and it seems to have solved the immediate issue. > Thank you a lot! It's actually really hard to keep lxml working across various different libxml2 and libxslt versions. We can only try to keep stuff that crashes libxml2 out of lxml itself, but we can't always keep users from writing stuff that crashes certain versions. >> Ah, one last thing: I know, you're not testing threads for fun but for >> performance, but if that doesn't prove reliable for your application, >> you can >> still switch them off with --without-threading. Reliability is usually >> more important than performance for production. > > Is there any document describing the effects of --without-threading? I > guess it's safe to enable that flag if you are sure you won't share > objects between threads? That's not documented. It just disables all threading code in lxml, meaning, it never frees the GIL and thus prevents libxml2/libxslt code from running concurrently. Stefan From stefan_ml at behnel.de Sat Jun 2 22:40:46 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Jun 2007 22:40:46 +0200 Subject: [lxml-dev] Crash on Win32 under heavy stress In-Reply-To: <465FA3F2.9030003@colorstudy.com> References: <465FA252.1010704@behnel.de> <465FA3F2.9030003@colorstudy.com> Message-ID: <4661D5CE.4000502@behnel.de> Hi Ian, Ian Bicking wrote: > Stefan Behnel wrote: >> I get a reproduceable XPath >> crasher in the HTML module Ian is working on with libxml2 2.6.27. It's >> gone with 2.6.28. libxslt is a good bet here, too. > > Which XPath is that? I'd rather avoid it if I can, for those that might > have 2.6.27. It happens in the "clean" doctests, in one of the ".xpath()" method calls. One thing to try might be using XPath() instead. However, looking at the way clean() is implemented, I would rather rewrite a few places to use the already existing loop over getiterator() rather than XPath. Most likely, users will end up requiring the loop to run anyway, so it would be best to use it for more rather than additionally parsing and running a C loop on different XPath expressions. Stefan From stefan_ml at behnel.de Sun Jun 3 18:13:58 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Jun 2007 18:13:58 +0200 Subject: [lxml-dev] XMLSchema validation with XMLSchema.xsd failing In-Reply-To: <466195EE.4000304@eastlink.ca> References: <466195EE.4000304@eastlink.ca> Message-ID: <4662E8C6.7030103@behnel.de> Hi, David Pratt wrote: > Can someone advise whether XMLSchema is currently able validate xml > schemas. Yes, that's its purpose. > I am on mac 10.4.9 using lxml 1.3 beta using macs default > libxml2. That *might* be a problem. XML Schema support is still being worked on in libxml2 and there may still be schemas that don't work. However, MacOS-X is known for not shipping with up-to-date versions of libxml2, so you might have more luck with installing a newer version. > >>> import lxml.etree > >>> from am.xmlschema import xmlschema_path > >>> xmlschema_doc = lxml.etree.parse(xmlschema_path) > >>> try: > ... xmlschema = lxml.etree.XMLSchema(xmlschema_doc) > ... except Exception, e: > ... print e.error_log > ... > /Users/davidpratt/Desktop/xmlschemademo/dev/am.xmlschema/src/am/xmlschema/XMLSchema.xsd:655:ERROR:SCHEMASP:SCHEMAP_REDEFINED_ELEMENT: > Element 'element': A global element declaration with the name 'element' > does already exist. > /Users/davidpratt/Desktop/xmlschemademo/dev/am.xmlschema/src/am/xmlschema/XMLSchema.xsd:864:ERROR:SCHEMASP:SCHEMAP_REDEFINED_ELEMENT: > Element 'element': A global element declaration with the name 'group' > does already exist. If you are sure the document is correct (and it's usually a good idea to verify with a second tool), a more recent version of libxml2 might help. Otherwise, please file a bug report on libxml2's XML Schema implementation. Stefan From stefan_ml at behnel.de Sun Jun 3 18:24:16 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Jun 2007 18:24:16 +0200 Subject: [lxml-dev] Network downloading of schemas should be off by default? In-Reply-To: <1180720621.30394.24.camel@localhost.localdomain> References: <1180720621.30394.24.camel@localhost.localdomain> Message-ID: <4662EB30.6040505@behnel.de> Hi, Itamar Shtull-Trauring wrote: > Right now, AFAICT, is is on by default in lxml.etree.XMLParser. Network > queries by library code are a bad idea: it's an unexpected behavior, > causing potential security risk and guaranteed performance problems. It's straight forward to switch it off, but I agree that it would be good to have it disabled by default. Loading DTDs is off by default also, so that fits. We should change the default behaviour for 2.0. Stefan From etiffany at alum.mit.edu Sun Jun 3 18:35:12 2007 From: etiffany at alum.mit.edu (Eric Tiffany) Date: Sun, 03 Jun 2007 12:35:12 -0400 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: <465ECDA2.70704@behnel.de> Message-ID: OK, I think I've sorted this out to a certain degree. Using lxml 1.3 beta (including threading support) works to parse and validate my XMLSchema stuff using the builtin MacOS libxml2 libs (2.6.16). I think the failures I was seeing earlier had to do with my cluelessness about how Zope coughs up file data from the database. I will next try using with the MacPorts version of libxml2 2.6.28, but I'll turn on the threading switches in the configuration (or, maybe, turn them off). I suspect that my earlier problems with the lxml - libxml2 configurations were due to mismatched threading expectations. Anyway, thanks for the excellent product, and I'll report back on any further insights. ET On 5/31/07 9:29 AM, "Stefan Behnel" wrote: > Hi Eric, > > Eric Tiffany wrote: >> OK, some more info. If I use the builtin libxml2 (2.6.16) libs from >> Apple, rather than the 2.6.28 version from MacPorts, then I don?t get >> these errors crashing Python/Zope. > > That's expected, lxml switches off threading support for this version. It's > the same as doing "--without-threading". That's why I was puzzled when you > said --without-threading doesn't help you. > > >> However, the lxml parsing/validation >> doesn?t seem to work correctly. > > That's expected, too, XMLSchema ist still under development and definitely was > at the time. > > >> So, it seems that Apple has built their libs in a more friendly way, but >> I?m now wondering whether there is some known issue with using lxml with >> libxml2 2.6.16. >> >> I?m continuing to investigate. > > Please do. It hard for me to come up with a solution without being able to > reproduce the problem. > > Stefan > > >> On 5/30/07 2:14 PM, "Eric Tiffany" wrote: >> >> Sorry for the delay in responding -- been on vacation in Italy. >> Responses inline. I am quite mystified at this point. >> >> On 5/21/07 2:21 AM, "Stefan Behnel" wrote: >> >>> Hi, >>> >>> Eric Tiffany wrote: >>>> I have been prototyping some XMLSchema parsing/validating using lxml >>>> 1.3beta. >>>> >>>> Everthing works great from python 2.4.4 started from the command >> line, or >>>> running from inside Eclipse. >>>> >>>> However, when I moved my code over to my Plone product, python >> crashes when >>>> Zope is initializing the product. I am creating my XMLSchema >> object there. >>>> [...] >>> >>> Is this the Python version? >> >> For some reason, the python reports its version incorrectly in the >> crashdump. It is actually 2.4.4. >> >>> >>> Is there any way to detect MacOS-X at the C level? In that case, we >> could try >>> to disable thread concurrency support completely for this platform >> - in case >>> that's the source of the segfault. You can try to see if this would >> fix the >>> problem by passing the option "--without-threading" to setup.py >> when building >>> lxml. Could you please try that with your current setup and report >> back to the >>> list? >> >> There are certainly ways to detect MacOS at compile-time, though I'm >> not sure of the details. I get this from the shell: >> >> $ uname -a >> Darwin etmac.local 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 >> 20:55:00 PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386 >> >> I have attempted to build from SVN using --without-threading using >> >> $ make clean >> $ python setup.py build --without-threading >> $ sudo python setup.py install >> >> (interestingly, it makes an egg called >> lxml-1.3beta-py2.4-macosx-10.3-i386.egg even though the OS version >> 10.4.9 not 10.3, but whatever). >> >> Also, I see a problem with the self tests regarding >> test_module_HTML_unicode, but I'll report that elsewhere. >> >>> >>> Another question: are you using a custom parser (i.e. passing a second >>> argument to the parse() function) here or is it the default parser that >>> crashes here? >> >> It is the default parser. And it is still crashing inside Zope even >> with the --without-threading. Here is my code: >> >> schemaPath = >> "/Applications/Plone-2.5.2/Instance/Products/xtend/xtend/schedules.xsd" >> print >> sys.stderr, "Loading schema doc from ", schemaPath >> schemaDoc = etree.parse(schemaPath) print >> sys.stderr, "creating >> XMLSchema ..." schemaTree = etree.XMLSchema(schemaDoc) print >> >> sys.stderr, "Trying validation" >> >> And here is the output (when running inside Zope): >> >> Loading schema doc from >> /Applications/Plone-2.5.2/Instance/Products/xtend/xtend/schedules.xsd >> creating XMLSchema ... >> Bus error >> >> So it seems pretty clear that it is croaking while trying to do the >> XMLSchema construction. >> >> Inside a python shell, that code runs fine. In both environments >> (zope and shell) I have >> >> lxml.etree: (1, 3, -1, 43887) >> libxml used: (2, 6, 28) >> libxml compiled: (2, 6, 28) >> libxslt used: (1, 1, 20) >> libxslt compiled: (1, 1, 20) >> >> >> Here is the thread backtrace for the thread that crashed: >> >> Thread 1 Crashed: >> 0 <<00000000>> 0xffff07c7 __memcpy + 39 (cpu_capabilities.h:228) >> 1 libSystem.B.dylib 0x9000b569 __sfvwrite + 409 >> 2 libSystem.B.dylib 0x9001063d __vfprintf + 19692 >> 3 libSystem.B.dylib 0x90011428 vfprintf + 91 >> 4 libxml2.2.dylib 0x91befd3b xmlGenericErrorDefaultFunc + 75 >> 5 libxml2.2.dylib 0x0354bad1 xmlSchemaCheckFacet + 709 >> 6 libxml2.2.dylib 0x0354c020 >> xmlSchemaFixupSimpleTypeStageTwo + 927 >> 7 libxml2.2.dylib 0x0355175b xmlSchemaFixupComponents + 4054 >> 8 libxml2.2.dylib 0x03552207 xmlSchemaParse + 290 >> 9 etree.so 0x06464039 >> __pyx_f_5etree_9XMLSchema___init__ + 980 (etree.c:38191) >> 10 org.python.python 0x0025283e type_call + 166 (typeobject.c:435) >> 11 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 12 org.python.python 0x0027e397 PyEval_EvalFrame + 16838 >> (ceval.c:3776) >> 13 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 14 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 15 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 16 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 17 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 18 org.python.python 0x00214c90 instance_call + 90 >> (classobject.c:2087) >> 19 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 20 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >> (ceval.c:3845) >> 21 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 22 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 23 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 24 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 25 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 26 org.python.python 0x00214c90 instance_call + 90 >> (classobject.c:2087) >> 27 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 28 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >> (ceval.c:3845) >> 29 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 30 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 31 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 32 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >> (ceval.c:3845) >> 33 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 34 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 35 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 36 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 37 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 38 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 39 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 40 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 41 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 42 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >> (ceval.c:3845) >> 43 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 >> (ceval.c:3651) >> 44 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 45 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 46 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 47 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 48 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 49 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 50 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 51 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >> (ceval.c:3845) >> 52 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 53 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 54 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 55 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 56 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 57 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >> (ceval.c:3845) >> 58 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 59 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 60 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 61 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 62 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 63 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >> (ceval.c:3845) >> 64 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 65 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 66 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 67 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 68 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 69 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >> 112 (ceval.c:3435) >> 70 _Acquisition.so 0x0151e9ac CallMethodO + 60 >> (_Acquisition.c:97) >> 71 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 72 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >> 112 (ceval.c:3435) >> 73 org.python.python 0x002715a1 builtin_apply + 201 >> (bltinmodule.c:100) >> 74 org.python.python 0x0027faca PyEval_EvalFrame + 22777 >> (ceval.c:3568) >> 75 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 >> (ceval.c:3651) >> 76 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 77 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 78 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 79 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 80 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 81 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 82 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 83 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >> (ceval.c:3845) >> 84 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 85 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 86 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 87 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 88 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 89 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >> 112 (ceval.c:3435) >> 90 _Acquisition.so 0x0151e9ac CallMethodO + 60 >> (_Acquisition.c:97) >> 91 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 92 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >> 112 (ceval.c:3435) >> 93 org.python.python 0x002715a1 builtin_apply + 201 >> (bltinmodule.c:100) >> 94 org.python.python 0x0027faca PyEval_EvalFrame + 22777 >> (ceval.c:3568) >> 95 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 >> (ceval.c:3651) >> 96 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 97 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 98 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 99 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 100 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 101 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 102 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 103 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >> 112 (ceval.c:3435) >> 104 _Acquisition.so 0x0151e9ac CallMethodO + 60 >> (_Acquisition.c:97) >> 105 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 106 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >> 112 (ceval.c:3435) >> 107 org.python.python 0x002715a1 builtin_apply + 201 >> (bltinmodule.c:100) >> 108 org.python.python 0x0027faca PyEval_EvalFrame + 22777 >> (ceval.c:3568) >> 109 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 >> (ceval.c:3651) >> 110 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 111 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 112 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 113 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 114 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 115 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 116 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 117 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 118 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 119 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >> (ceval.c:3661) >> 120 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >> (ceval.c:2741) >> 121 org.python.python 0x00228063 function_call + 320 >> (funcobject.c:548) >> 122 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 123 org.python.python 0x00215667 instancemethod_call + 401 >> (classobject.c:2532) >> 124 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 125 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >> 112 (ceval.c:3435) >> 126 org.python.python 0x00217aaa PyInstance_New + 114 >> (classobject.c:588) >> 127 org.python.python 0x0020d87f PyObject_Call + 45 >> (abstract.c:1795) >> 128 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >> 112 (ceval.c:3435) >> 129 org.python.python 0x002b3126 t_bootstrap + 62 >> (threadmodule.c:434) >> 130 libSystem.B.dylib 0x90024987 _pthread_body + 84 >> >> _______________________________________________ >> lxml-dev mailing list >> lxml-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/lxml-dev >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> lxml-dev mailing list >> lxml-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/lxml-dev From stefan_ml at behnel.de Sun Jun 3 18:45:12 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Jun 2007 18:45:12 +0200 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: References: Message-ID: <4662F018.8060700@behnel.de> Hi, Eric Tiffany wrote: > OK, I think I've sorted this out to a certain degree. > > Using lxml 1.3 beta (including threading support) works to parse and > validate my XMLSchema stuff using the builtin MacOS libxml2 libs (2.6.16). Ok, good to know. I was actually wrong in telling you that this disables threading support. Even on 2.6.16, it is enabled by default. > I think the failures I was seeing earlier had to do with my cluelessness > about how Zope coughs up file data from the database. Ok, so that means what? You got a validation error and it crashed because of that? > I will next try using with the MacPorts version of libxml2 2.6.28, but I'll > turn on the threading switches in the configuration (or, maybe, turn them > off). I suspect that my earlier problems with the lxml - libxml2 > configurations were due to mismatched threading expectations. > > Anyway, thanks for the excellent product, and I'll report back on any > further insights. Please do. We are always interested in ways to work around problems on whatever platform. If you ran into such problems, others will, too. Stefan > On 5/31/07 9:29 AM, "Stefan Behnel" wrote: > >> Hi Eric, >> >> Eric Tiffany wrote: >>> OK, some more info. If I use the builtin libxml2 (2.6.16) libs from >>> Apple, rather than the 2.6.28 version from MacPorts, then I don?t get >>> these errors crashing Python/Zope. >> That's expected, lxml switches off threading support for this version. It's >> the same as doing "--without-threading". That's why I was puzzled when you >> said --without-threading doesn't help you. >> >> >>> However, the lxml parsing/validation >>> doesn?t seem to work correctly. >> That's expected, too, XMLSchema ist still under development and definitely was >> at the time. >> >> >>> So, it seems that Apple has built their libs in a more friendly way, but >>> I?m now wondering whether there is some known issue with using lxml with >>> libxml2 2.6.16. >>> >>> I?m continuing to investigate. >> Please do. It hard for me to come up with a solution without being able to >> reproduce the problem. >> >> Stefan >> >> >>> On 5/30/07 2:14 PM, "Eric Tiffany" wrote: >>> >>> Sorry for the delay in responding -- been on vacation in Italy. >>> Responses inline. I am quite mystified at this point. >>> >>> On 5/21/07 2:21 AM, "Stefan Behnel" wrote: >>> >>>> Hi, >>>> >>>> Eric Tiffany wrote: >>>>> I have been prototyping some XMLSchema parsing/validating using lxml >>>>> 1.3beta. >>>>> >>>>> Everthing works great from python 2.4.4 started from the command >>> line, or >>>>> running from inside Eclipse. >>>>> >>>>> However, when I moved my code over to my Plone product, python >>> crashes when >>>>> Zope is initializing the product. I am creating my XMLSchema >>> object there. >>>>> [...] >>>> Is this the Python version? >>> For some reason, the python reports its version incorrectly in the >>> crashdump. It is actually 2.4.4. >>> >>>> Is there any way to detect MacOS-X at the C level? In that case, we >>> could try >>>> to disable thread concurrency support completely for this platform >>> - in case >>>> that's the source of the segfault. You can try to see if this would >>> fix the >>>> problem by passing the option "--without-threading" to setup.py >>> when building >>>> lxml. Could you please try that with your current setup and report >>> back to the >>>> list? >>> There are certainly ways to detect MacOS at compile-time, though I'm >>> not sure of the details. I get this from the shell: >>> >>> $ uname -a >>> Darwin etmac.local 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 >>> 20:55:00 PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386 >>> >>> I have attempted to build from SVN using --without-threading using >>> >>> $ make clean >>> $ python setup.py build --without-threading >>> $ sudo python setup.py install >>> >>> (interestingly, it makes an egg called >>> lxml-1.3beta-py2.4-macosx-10.3-i386.egg even though the OS version >>> 10.4.9 not 10.3, but whatever). >>> >>> Also, I see a problem with the self tests regarding >>> test_module_HTML_unicode, but I'll report that elsewhere. >>> >>>> Another question: are you using a custom parser (i.e. passing a second >>>> argument to the parse() function) here or is it the default parser that >>>> crashes here? >>> It is the default parser. And it is still crashing inside Zope even >>> with the --without-threading. Here is my code: >>> >>> schemaPath = >>> "/Applications/Plone-2.5.2/Instance/Products/xtend/xtend/schedules.xsd" >>> print >> sys.stderr, "Loading schema doc from ", schemaPath >>> schemaDoc = etree.parse(schemaPath) print >> sys.stderr, "creating >>> XMLSchema ..." schemaTree = etree.XMLSchema(schemaDoc) print >> >>> sys.stderr, "Trying validation" >>> >>> And here is the output (when running inside Zope): >>> >>> Loading schema doc from >>> /Applications/Plone-2.5.2/Instance/Products/xtend/xtend/schedules.xsd >>> creating XMLSchema ... >>> Bus error >>> >>> So it seems pretty clear that it is croaking while trying to do the >>> XMLSchema construction. >>> >>> Inside a python shell, that code runs fine. In both environments >>> (zope and shell) I have >>> >>> lxml.etree: (1, 3, -1, 43887) >>> libxml used: (2, 6, 28) >>> libxml compiled: (2, 6, 28) >>> libxslt used: (1, 1, 20) >>> libxslt compiled: (1, 1, 20) >>> >>> >>> Here is the thread backtrace for the thread that crashed: >>> >>> Thread 1 Crashed: >>> 0 <<00000000>> 0xffff07c7 __memcpy + 39 (cpu_capabilities.h:228) >>> 1 libSystem.B.dylib 0x9000b569 __sfvwrite + 409 >>> 2 libSystem.B.dylib 0x9001063d __vfprintf + 19692 >>> 3 libSystem.B.dylib 0x90011428 vfprintf + 91 >>> 4 libxml2.2.dylib 0x91befd3b xmlGenericErrorDefaultFunc + 75 >>> 5 libxml2.2.dylib 0x0354bad1 xmlSchemaCheckFacet + 709 >>> 6 libxml2.2.dylib 0x0354c020 >>> xmlSchemaFixupSimpleTypeStageTwo + 927 >>> 7 libxml2.2.dylib 0x0355175b xmlSchemaFixupComponents + 4054 >>> 8 libxml2.2.dylib 0x03552207 xmlSchemaParse + 290 >>> 9 etree.so 0x06464039 >>> __pyx_f_5etree_9XMLSchema___init__ + 980 (etree.c:38191) >>> 10 org.python.python 0x0025283e type_call + 166 (typeobject.c:435) >>> 11 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 12 org.python.python 0x0027e397 PyEval_EvalFrame + 16838 >>> (ceval.c:3776) >>> 13 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 14 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 15 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 16 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 17 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 18 org.python.python 0x00214c90 instance_call + 90 >>> (classobject.c:2087) >>> 19 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 20 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >>> (ceval.c:3845) >>> 21 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 22 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 23 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 24 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 25 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 26 org.python.python 0x00214c90 instance_call + 90 >>> (classobject.c:2087) >>> 27 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 28 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >>> (ceval.c:3845) >>> 29 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 30 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 31 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 32 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >>> (ceval.c:3845) >>> 33 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 34 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 35 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 36 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 37 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 38 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 39 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 40 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 41 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 42 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >>> (ceval.c:3845) >>> 43 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 >>> (ceval.c:3651) >>> 44 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 45 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 46 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 47 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 48 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 49 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 50 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 51 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >>> (ceval.c:3845) >>> 52 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 53 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 54 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 55 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 56 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 57 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >>> (ceval.c:3845) >>> 58 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 59 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 60 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 61 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 62 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 63 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >>> (ceval.c:3845) >>> 64 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 65 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 66 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 67 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 68 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 69 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >>> 112 (ceval.c:3435) >>> 70 _Acquisition.so 0x0151e9ac CallMethodO + 60 >>> (_Acquisition.c:97) >>> 71 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 72 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >>> 112 (ceval.c:3435) >>> 73 org.python.python 0x002715a1 builtin_apply + 201 >>> (bltinmodule.c:100) >>> 74 org.python.python 0x0027faca PyEval_EvalFrame + 22777 >>> (ceval.c:3568) >>> 75 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 >>> (ceval.c:3651) >>> 76 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 77 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 78 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 79 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 80 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 81 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 82 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 83 org.python.python 0x0027e69c PyEval_EvalFrame + 17611 >>> (ceval.c:3845) >>> 84 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 85 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 86 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 87 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 88 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 89 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >>> 112 (ceval.c:3435) >>> 90 _Acquisition.so 0x0151e9ac CallMethodO + 60 >>> (_Acquisition.c:97) >>> 91 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 92 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >>> 112 (ceval.c:3435) >>> 93 org.python.python 0x002715a1 builtin_apply + 201 >>> (bltinmodule.c:100) >>> 94 org.python.python 0x0027faca PyEval_EvalFrame + 22777 >>> (ceval.c:3568) >>> 95 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 >>> (ceval.c:3651) >>> 96 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 97 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 98 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 99 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 100 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 101 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 102 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 103 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >>> 112 (ceval.c:3435) >>> 104 _Acquisition.so 0x0151e9ac CallMethodO + 60 >>> (_Acquisition.c:97) >>> 105 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 106 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >>> 112 (ceval.c:3435) >>> 107 org.python.python 0x002715a1 builtin_apply + 201 >>> (bltinmodule.c:100) >>> 108 org.python.python 0x0027faca PyEval_EvalFrame + 22777 >>> (ceval.c:3568) >>> 109 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 >>> (ceval.c:3651) >>> 110 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 111 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 112 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 113 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 114 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 115 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 116 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 117 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 118 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 119 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 >>> (ceval.c:3661) >>> 120 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 >>> (ceval.c:2741) >>> 121 org.python.python 0x00228063 function_call + 320 >>> (funcobject.c:548) >>> 122 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 123 org.python.python 0x00215667 instancemethod_call + 401 >>> (classobject.c:2532) >>> 124 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 125 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >>> 112 (ceval.c:3435) >>> 126 org.python.python 0x00217aaa PyInstance_New + 114 >>> (classobject.c:588) >>> 127 org.python.python 0x0020d87f PyObject_Call + 45 >>> (abstract.c:1795) >>> 128 org.python.python 0x0027944a PyEval_CallObjectWithKeywords + >>> 112 (ceval.c:3435) >>> 129 org.python.python 0x002b3126 t_bootstrap + 62 >>> (threadmodule.c:434) >>> 130 libSystem.B.dylib 0x90024987 _pthread_body + 84 >>> >>> _______________________________________________ >>> lxml-dev mailing list >>> lxml-dev at codespeak.net >>> http://codespeak.net/mailman/listinfo/lxml-dev >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> lxml-dev mailing list >>> lxml-dev at codespeak.net >>> http://codespeak.net/mailman/listinfo/lxml-dev > > > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev From etiffany at alum.mit.edu Sun Jun 3 18:57:09 2007 From: etiffany at alum.mit.edu (Eric Tiffany) Date: Sun, 03 Jun 2007 12:57:09 -0400 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: <4662F018.8060700@behnel.de> Message-ID: On 6/3/07 12:45 PM, "Stefan Behnel" wrote: > >> I think the failures I was seeing earlier had to do with my cluelessness >> about how Zope coughs up file data from the database. > > Ok, so that means what? You got a validation error and it crashed because of > that? Sorry, should be clearer. I had two problems using lxml within Zope. 1. Using libxml2 2.6.28 built from Macports (probably without threading, in retrospect) I was getting crashes when running XMLSchema from within Zope, but success when running from toplevel. This was fixed by reverting to the Apple libxml2 2.6.16. 2. I erroneously reported that lxml was still not working with libxml2 2.6.16 (it wasn't crashing, but I thought it wasn't parsing correctly). In fact, I was passing bogus data because there is precious little documentation about how Zope makes file objects available. Once I fixed my code, it all seems to work. ET -- ____________________________________________________ Eric Tiffany | eric at projectliberty.org Interop Tech Lead | +1 413-458-3743 Liberty Alliance | +1 413-627-1778 mobile From stefan_ml at behnel.de Sun Jun 3 18:58:38 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Jun 2007 18:58:38 +0200 Subject: [lxml-dev] html branch In-Reply-To: <465FA4B6.8040302@colorstudy.com> References: <465C48A0.8030808@colorstudy.com> <465C9E02.6030308@behnel.de> <465CA4C5.4050308@colorstudy.com> <465D14DF.9060304@behnel.de> <465F1683.4090804@colorstudy.com> <465F3A17.9080508@behnel.de> <465FA4B6.8040302@colorstudy.com> Message-ID: <4662F33E.3040206@behnel.de> Hi, Ian Bicking wrote: > I'm okay just requiring htmldiff to be imported directly, and > importing clean into __init__. Makes sense to me. htmldiff is definitely a module in it's own right. Everything else just deals with different things to do with a tree. > But anyway, putting that aside, I was thinking that maybe the general > pattern should be like: > > def make_links_absolute(doc, base_href, fragment=False): > if isinstance(doc, basestring): > if fragment: > doc = parse_element(doc) > else: > doc = HTML(doc) > return_string = True > else: > doc = copy.deepcopy(doc) > return_string = False > doc.make_links_absolute(doc, base_href) > if return_string: > return tostring(doc) > else: > return doc Ok. > This makes the function also a handy way to do functional-style > transformations of elements. It bothers me a bit to change the return > type (which I generally dislike doing), except that it matches the input > type which seems like it might be okay. > > Does this seem okay? It looks Pythonic to me. You get out what you put in and whatever you put in, it does the same thing to it. So it's just a perfectly polymorphic function. > Also, I'm wondering if (a) I should try to automatically determine > fragment unless it is explicitly given, and/or (b) if parse_element > doesn't work (raises an exception) I should use parse_element(doc, > create_parent=True) which will wrap the fragment in a
. Defaulting to a "wrap with
" fallback means changing the input in a not really predictable way. That sounds like too much magic to me. In most cases, users will know what they are dealing with. Otherwise, they can well catch the exception and then fall back to an alternative *if they want*. I'm fine with having a function that can handle HTML trees or serialised HTML documents and requires users to parse things themselves if it's not a document. Stefan From stefan_ml at behnel.de Sun Jun 3 19:00:34 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Jun 2007 19:00:34 +0200 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: References: Message-ID: <4662F3B2.50703@behnel.de> Hi, Eric Tiffany wrote: > On 6/3/07 12:45 PM, "Stefan Behnel" wrote: > >>> I think the failures I was seeing earlier had to do with my cluelessness >>> about how Zope coughs up file data from the database. >> Ok, so that means what? You got a validation error and it crashed because of >> that? > > > Sorry, should be clearer. I had two problems using lxml within Zope. > > 1. Using libxml2 2.6.28 built from Macports (probably without threading, in > retrospect) I was getting crashes when running XMLSchema from within Zope, > but success when running from toplevel. This was fixed by reverting to the > Apple libxml2 2.6.16. > > 2. I erroneously reported that lxml was still not working with libxml2 > 2.6.16 (it wasn't crashing, but I thought it wasn't parsing correctly). In > fact, I was passing bogus data because there is precious little > documentation about how Zope makes file objects available. Once I fixed my > code, it all seems to work. Ah, ok, so that's really good news then, thanks for clarifying. Stefan From fairwinds at eastlink.ca Mon Jun 4 14:55:42 2007 From: fairwinds at eastlink.ca (David Pratt) Date: Mon, 04 Jun 2007 09:55:42 -0300 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: References: Message-ID: <46640BCE.8080100@eastlink.ca> Hi Eric, let me know if you have any success with Macports libxml2. I could not get it to work a little while back. I am trying some binaries for mac from explain.com libxml2 2.6.26 at the moment to see how they will work out. Regards, David Eric Tiffany wrote: > On 6/3/07 12:45 PM, "Stefan Behnel" wrote: > >>> I think the failures I was seeing earlier had to do with my cluelessness >>> about how Zope coughs up file data from the database. >> Ok, so that means what? You got a validation error and it crashed because of >> that? > > > Sorry, should be clearer. I had two problems using lxml within Zope. > > 1. Using libxml2 2.6.28 built from Macports (probably without threading, in > retrospect) I was getting crashes when running XMLSchema from within Zope, > but success when running from toplevel. This was fixed by reverting to the > Apple libxml2 2.6.16. > > 2. I erroneously reported that lxml was still not working with libxml2 > 2.6.16 (it wasn't crashing, but I thought it wasn't parsing correctly). In > fact, I was passing bogus data because there is precious little > documentation about how Zope makes file objects available. Once I fixed my > code, it all seems to work. > > ET > From gary at zope.com Mon Jun 4 15:30:27 2007 From: gary at zope.com (Gary Poster) Date: Mon, 4 Jun 2007 09:30:27 -0400 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: <46640BCE.8080100@eastlink.ca> References: <46640BCE.8080100@eastlink.ca> Message-ID: On Jun 4, 2007, at 8:55 AM, David Pratt wrote: > Hi Eric, let me know if you have any success with Macports libxml2. I > could not get it to work a little while back. I am trying some > binaries > for mac from explain.com libxml2 2.6.26 at the moment to see how they > will work out. FWIW, I'm using zc.buildout to build a project-local (non-system- installed) libxml2, libxslt, and lxml. This works out reasonably well on the Mac (caveats below) and on other platforms (known success on Centos/RHEL, Ubuntu). It's a bit of philosophy to not use the system-installed libs for this, so you may not like this solution if your philosophy differs on this point. The big caveat on Mac is that you must start python with DYLD_LIBRARY_PATH=/Users/gary/Dev/MYPROJECT/parts/libxml2/lib:/Users/ gary/Dev/MYPROJECT/parts/libxslt/lib for the project-installed libraries to be found (otherwise you still get the system ones). You can get around this by having a wrapper script that inserts the values in the environment for you; for instance, zdaemon will do it for you if so configured. My zc.buildout setup is similar to what Martijn Faassen blogged about last year. If anyone is interested I'll dig up the details-- particularly the differences. Gary From fairwinds at eastlink.ca Mon Jun 4 15:39:54 2007 From: fairwinds at eastlink.ca (David Pratt) Date: Mon, 04 Jun 2007 10:39:54 -0300 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: References: <46640BCE.8080100@eastlink.ca> Message-ID: <4664162A.9080609@eastlink.ca> Hi Gary. I'd definitely be interested in more details about your buildout for these tools. I am also using buildouts and would much rather keep everything local to project. I am no longer using my system python for anything and relying on mac's tools seems lame since I want more control over versions being used, etc. Regards, David Gary Poster wrote: > > On Jun 4, 2007, at 8:55 AM, David Pratt wrote: > >> Hi Eric, let me know if you have any success with Macports libxml2. I >> could not get it to work a little while back. I am trying some binaries >> for mac from explain.com libxml2 2.6.26 at the moment to see how they >> will work out. > > FWIW, I'm using zc.buildout to build a project-local > (non-system-installed) libxml2, libxslt, and lxml. This works out > reasonably well on the Mac (caveats below) and on other platforms (known > success on Centos/RHEL, Ubuntu). It's a bit of philosophy to not use > the system-installed libs for this, so you may not like this solution if > your philosophy differs on this point. > > The big caveat on Mac is that you must start python with > DYLD_LIBRARY_PATH=/Users/gary/Dev/MYPROJECT/parts/libxml2/lib:/Users/gary/Dev/MYPROJECT/parts/libxslt/lib > > for the project-installed libraries to be found (otherwise you still get > the system ones). You can get around this by having a wrapper script > that inserts the values in the environment for you; for instance, > zdaemon will do it for you if so configured. > > My zc.buildout setup is similar to what Martijn Faassen blogged about > last year. If anyone is interested I'll dig up the > details--particularly the differences. > > Gary > From gary at zope.com Mon Jun 4 16:17:06 2007 From: gary at zope.com (Gary Poster) Date: Mon, 4 Jun 2007 10:17:06 -0400 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: <4664162A.9080609@eastlink.ca> References: <46640BCE.8080100@eastlink.ca> <4664162A.9080609@eastlink.ca> Message-ID: <2E614EF3-36A4-4AA3-A6BB-9A826B02E1C1@zope.com> On Jun 4, 2007, at 9:39 AM, David Pratt wrote: > Hi Gary. I'd definitely be interested in more details about your > buildout for these tools. I am also using buildouts and would much > rather keep everything local to project. I am no longer using my > system python for anything and relying on mac's tools seems lame > since I want more control over versions being used, etc. > > Regards, > David ok in [buildout], make sure "parts" includes "libxml2 libxslt lxml". I also include 'lxml' in my install_requires in my setup.py, but that probably is redundant; haven't bothered to find out. I use these three sections: [lxml] recipe = zc.recipe.egg:custom egg = lxml == 1.3beta include-dirs = ${libxml2:location}/include/libxml2:$ {libxslt:location}/include rpath = ${libxml2:location}/lib:${libxslt:location}/lib library-dirs = ${libxml2:location}/lib:${libxslt:location}/lib [libxml2] recipe = zc.recipe.cmmi url = XXX our private download cache of libxml2-2.6.28.tar.gz XXX extra_options = --without-python [libxslt] recipe = zc.recipe.cmmi url = XXX our private download cache of libxslt-1.1.20.tar.gz XXX extra_options = --without-python --with-libxml-prefix=$ {libxml2:location} When I run bin/test and bin/py I need to insert the DYLD_LIBRARY_PATH in the environment; I've planned to write a quick recipe for a shell script that would do that, but have not gotten around to it. If you are using zdaemon, you can leverage it to do that for zopectl start and zopectl fg. Using one of the Zope 3 instance recipes I specify the environment: zdaemon.conf = DYLD_LIBRARY_PATH ${libxml2:location}/lib:${libxslt:location}/lib LD_LIBRARY_PATH ${libxml2:location}/lib:${libxslt:location}/lib (Mac only needs the DYLD one) Gary From fairwinds at eastlink.ca Mon Jun 4 16:37:15 2007 From: fairwinds at eastlink.ca (David Pratt) Date: Mon, 04 Jun 2007 11:37:15 -0300 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: <2E614EF3-36A4-4AA3-A6BB-9A826B02E1C1@zope.com> References: <46640BCE.8080100@eastlink.ca> <4664162A.9080609@eastlink.ca> <2E614EF3-36A4-4AA3-A6BB-9A826B02E1C1@zope.com> Message-ID: <4664239B.5080800@eastlink.ca> Hi Gary. Many thanks for this. I'm definitely going to give this a try. Regards, David Gary Poster wrote: > > On Jun 4, 2007, at 9:39 AM, David Pratt wrote: > >> Hi Gary. I'd definitely be interested in more details about your >> buildout for these tools. I am also using buildouts and would much >> rather keep everything local to project. I am no longer using my >> system python for anything and relying on mac's tools seems lame since >> I want more control over versions being used, etc. >> >> Regards, >> David > > ok > > in [buildout], make sure "parts" includes "libxml2 libxslt lxml". > > I also include 'lxml' in my install_requires in my setup.py, but that > probably is redundant; haven't bothered to find out. > > I use these three sections: > > [lxml] > recipe = zc.recipe.egg:custom > egg = lxml == 1.3beta > include-dirs = > ${libxml2:location}/include/libxml2:${libxslt:location}/include > rpath = ${libxml2:location}/lib:${libxslt:location}/lib > library-dirs = ${libxml2:location}/lib:${libxslt:location}/lib > > [libxml2] > recipe = zc.recipe.cmmi > url = XXX our private download cache of libxml2-2.6.28.tar.gz XXX > extra_options = --without-python > > [libxslt] > recipe = zc.recipe.cmmi > url = XXX our private download cache of libxslt-1.1.20.tar.gz XXX > extra_options = --without-python --with-libxml-prefix=${libxml2:location} > > When I run bin/test and bin/py I need to insert the DYLD_LIBRARY_PATH in > the environment; I've planned to write a quick recipe for a shell script > that would do that, but have not gotten around to it. If you are using > zdaemon, you can leverage it to do that for zopectl start and zopectl > fg. Using one of the Zope 3 instance recipes I specify the environment: > > zdaemon.conf = > > DYLD_LIBRARY_PATH ${libxml2:location}/lib:${libxslt:location}/lib > LD_LIBRARY_PATH ${libxml2:location}/lib:${libxslt:location}/lib > > > (Mac only needs the DYLD one) > > Gary > From etiffany at alum.mit.edu Mon Jun 4 17:48:57 2007 From: etiffany at alum.mit.edu (Eric Tiffany) Date: Mon, 04 Jun 2007 11:48:57 -0400 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: <4664239B.5080800@eastlink.ca> Message-ID: OK, after I added the DYLD_LIBRARY_PATH=/opt/local/lib to my environment for starting zope from the Eclipse debugger, my code works with the Macports 2.6.28 libxml2. Thanks for that hint. I'm a bit perplexed because, all along, lxml was reporting this information in my logs: lxml.etree: (1, 3, -1, 43887) libxml used: (2, 6, 28) libxml compiled: (2, 6, 28) So evidently lxml thought it was using 2.6.28, while python was somehow using the default Apple libxml2 libs. That happened whether I was running inside Eclipse or from the normal zopectl startup, but in both of these cases python would crash when I tried to create a XMLSchema. Strangely, it worked properly when running without zope (and without any special env settings). Anyway, my libxml2 was compiled --with-threads. No clue if that is required in my circumstances, but now that it works I'm not going to do any more experiments for the time being. ET On 6/4/07 10:37 AM, "David Pratt" wrote: > Hi Gary. Many thanks for this. I'm definitely going to give this a try. > > Regards, > David > > Gary Poster wrote: >> >> On Jun 4, 2007, at 9:39 AM, David Pratt wrote: >> >>> Hi Gary. I'd definitely be interested in more details about your >>> buildout for these tools. I am also using buildouts and would much >>> rather keep everything local to project. I am no longer using my >>> system python for anything and relying on mac's tools seems lame since >>> I want more control over versions being used, etc. >>> >>> Regards, >>> David >> >> ok >> >> in [buildout], make sure "parts" includes "libxml2 libxslt lxml". >> >> I also include 'lxml' in my install_requires in my setup.py, but that >> probably is redundant; haven't bothered to find out. >> >> I use these three sections: >> >> [lxml] >> recipe = zc.recipe.egg:custom >> egg = lxml == 1.3beta >> include-dirs = >> ${libxml2:location}/include/libxml2:${libxslt:location}/include >> rpath = ${libxml2:location}/lib:${libxslt:location}/lib >> library-dirs = ${libxml2:location}/lib:${libxslt:location}/lib >> >> [libxml2] >> recipe = zc.recipe.cmmi >> url = XXX our private download cache of libxml2-2.6.28.tar.gz XXX >> extra_options = --without-python >> >> [libxslt] >> recipe = zc.recipe.cmmi >> url = XXX our private download cache of libxslt-1.1.20.tar.gz XXX >> extra_options = --without-python --with-libxml-prefix=${libxml2:location} >> >> When I run bin/test and bin/py I need to insert the DYLD_LIBRARY_PATH in >> the environment; I've planned to write a quick recipe for a shell script >> that would do that, but have not gotten around to it. If you are using >> zdaemon, you can leverage it to do that for zopectl start and zopectl >> fg. Using one of the Zope 3 instance recipes I specify the environment: >> >> zdaemon.conf = >> >> DYLD_LIBRARY_PATH ${libxml2:location}/lib:${libxslt:location}/lib >> LD_LIBRARY_PATH ${libxml2:location}/lib:${libxslt:location}/lib >> >> >> (Mac only needs the DYLD one) >> >> Gary >> From ianb at colorstudy.com Mon Jun 4 18:27:24 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 04 Jun 2007 11:27:24 -0500 Subject: [lxml-dev] html branch In-Reply-To: <4662F33E.3040206@behnel.de> References: <465C48A0.8030808@colorstudy.com> <465C9E02.6030308@behnel.de> <465CA4C5.4050308@colorstudy.com> <465D14DF.9060304@behnel.de> <465F1683.4090804@colorstudy.com> <465F3A17.9080508@behnel.de> <465FA4B6.8040302@colorstudy.com> <4662F33E.3040206@behnel.de> Message-ID: <46643D6C.2060004@colorstudy.com> Stefan Behnel wrote: >> This makes the function also a handy way to do functional-style >> transformations of elements. It bothers me a bit to change the return >> type (which I generally dislike doing), except that it matches the input >> type which seems like it might be okay. >> >> Does this seem okay? > > It looks Pythonic to me. You get out what you put in and whatever you put in, > it does the same thing to it. So it's just a perfectly polymorphic function. > > >> Also, I'm wondering if (a) I should try to automatically determine >> fragment unless it is explicitly given, and/or (b) if parse_element >> doesn't work (raises an exception) I should use parse_element(doc, >> create_parent=True) which will wrap the fragment in a
. > > Defaulting to a "wrap with
" fallback means changing the input in a not > really predictable way. That sounds like too much magic to me. In most cases, > users will know what they are dealing with. Otherwise, they can well catch the > exception and then fall back to an alternative *if they want*. I'm fine with > having a function that can handle HTML trees or serialised HTML documents and > requires users to parse things themselves if it's not a document. I imported a bunch of HTML cleaning tests from other sources, and in the process I found "parse this somehow and give me an element" to be very convenient. Of course, HTML() *does* exactly that kind of parsing, but at least for cleaning you usually don't want a full document, you really just want a fragment. And that's not too uncommon. To make this easier I implemented a parse() function that does its best to parse your content. If your content is a full page, you get a full page back. If it's not a full page and it contains just one element, you get that element back. But if it's not a full page and it contains multiple elements, it gets wrapped in a
. This seems less intrusive than wrapping it in , which is eeffectively what the standard parser does.
is really a generic wrapper (though I suppose since it is block level, it's not *entirely* generic -- it might be more ideal to see if the content contains any block level elements, and if not just wrap in ). Dealing with ordered lists of elements with no parent isn't that easy or natural anywhere in the API. If there was some kind of anonymous container then that would be a nice container, but there isn't one. Is it possible to make something like that? It seems like a new kind of node could cause a lot of problems. Notably, with the HTML parser you frequently get something out with more elements than were in the original. It'll add

or

tags fairly liberally, rearrange tags, etc., to make the document valid. So adding a
tag isn't that far from what can already happen. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From stefan_ml at behnel.de Mon Jun 4 19:34:55 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 04 Jun 2007 19:34:55 +0200 Subject: [lxml-dev] html branch In-Reply-To: <46643D6C.2060004@colorstudy.com> References: <465C48A0.8030808@colorstudy.com> <465C9E02.6030308@behnel.de> <465CA4C5.4050308@colorstudy.com> <465D14DF.9060304@behnel.de> <465F1683.4090804@colorstudy.com> <465F3A17.9080508@behnel.de> <465FA4B6.8040302@colorstudy.com> <4662F33E.3040206@behnel.de> <46643D6C.2060004@colorstudy.com> Message-ID: <46644D3F.5050708@behnel.de> Hi Ian, Ian Bicking wrote: > I imported a bunch of HTML cleaning tests from other sources, and in the > process I found "parse this somehow and give me an element" to be very > convenient. Of course, HTML() *does* exactly that kind of parsing, but > at least for cleaning you usually don't want a full document, you really > just want a fragment. And that's not too uncommon. Ok, that makes sense. > To make this easier I implemented a parse() function that does its best > to parse your content. If your content is a full page, you get a full > page back. If it's not a full page and it contains just one element, > you get that element back. But if it's not a full page and it contains > multiple elements, it gets wrapped in a
. This seems less > intrusive than wrapping it in , which is effectively what > the standard parser does.
is really a generic wrapper (though I > suppose since it is block level, it's not *entirely* generic Adding block elements might break things like CSS. > -- it might > be more ideal to see if the content contains any block level elements, > and if not just wrap in ). That's a good idea. The parse() function could do that as it already aims to be smart about what it returns (otherwise, you could just use the normal etree.parse() with an HTMLParser). If you pass it something that can't be returned as a single element, I find it legitimate to wrap it in something that fits. And if we've already determined that we need to wrap it, we can also check what to wrap it in by traversing the tree(s). As a quick check, we can walk through the parsed root elements to check if there are any block elements and only if not, we can traverse each tree completely. If we find at least one block element (easy to check the tag against a positive set), we wrap with
, otherwise, we wrap with . > Dealing with ordered lists of elements > with no parent isn't that easy or natural anywhere in the API. If there > was some kind of anonymous container then that would be a nice > container, but there isn't one. Is it possible to make something like > that? It seems like a new kind of node could cause a lot of problems. It definitely would. Adding such a beast would cause overhead in basically all API functions, in traversal code, etc. I'd be very happy to avoid that. > Notably, with the HTML parser you frequently get something out with more > elements than were in the original. It'll add

or

tags fairly > liberally, rearrange tags, etc., to make the document valid. So adding > a
tag isn't that far from what can already happen. True. As I said, having a parse() function that accompanies etree.parse() and that deliberately says "I return *one* element and I do it the smart way" is definitely the way to go. Stefan From stefan_ml at behnel.de Mon Jun 4 19:04:41 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 04 Jun 2007 19:04:41 +0200 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: References: Message-ID: <46644629.9090201@behnel.de> Hi, Eric Tiffany wrote: > So evidently lxml thought it was using 2.6.28, while python was somehow > using the default Apple libxml2 libs. If you want to avoid problems like this, consider compiling in libxml2 and libxslt statically into lxml.etree and the other two Pyrex libraries. That adds a couple of MBs to each of the three Pyrex modules, but that's usually a low price for the comfort of always getting the versions you expect. We have a sort-of-recipe for doing that on Windows, but it should not be too hard to adapt that to MacOS-X. http://codespeak.net/lxml/dev/build.html Stefan From ianb at colorstudy.com Tue Jun 5 01:18:52 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 04 Jun 2007 18:18:52 -0500 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: <46644629.9090201@behnel.de> References: <46644629.9090201@behnel.de> Message-ID: <46649DDC.2090103@colorstudy.com> Stefan Behnel wrote: > Hi, > > Eric Tiffany wrote: >> So evidently lxml thought it was using 2.6.28, while python was somehow >> using the default Apple libxml2 libs. > > If you want to avoid problems like this, consider compiling in libxml2 and > libxslt statically into lxml.etree and the other two Pyrex libraries. That > adds a couple of MBs to each of the three Pyrex modules, but that's usually a > low price for the comfort of always getting the versions you expect. We have a > sort-of-recipe for doing that on Windows, but it should not be too hard to > adapt that to MacOS-X. > > http://codespeak.net/lxml/dev/build.html It would be great if someone could distribute a pre-built lxml egg for Mac with libxml2 linked in directly. Only a small number of Mac users currently seem to be able to use lxml without segfaults and bus errors. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Wed Jun 6 02:38:57 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 05 Jun 2007 19:38:57 -0500 Subject: [lxml-dev] lxml.doctestcompare Message-ID: <46660221.1010208@colorstudy.com> I figured out why the tests fail when run in series, when using lxml.(html.)usedoctest -- since it works on import, the import is only run once. After that the module isn't really imported again, so the code that executes on import isn't run. I don't know if there's any feasible way around this. Well, I could unload the module when the doctest is through, just like I unload my OutputChecker hack. That might be okay. Bah. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Wed Jun 6 18:43:25 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 06 Jun 2007 11:43:25 -0500 Subject: [lxml-dev] lxml.doctestcompare In-Reply-To: <46660221.1010208@colorstudy.com> References: <46660221.1010208@colorstudy.com> Message-ID: <4666E42D.50405@colorstudy.com> Ian Bicking wrote: > I figured out why the tests fail when run in series, when using > lxml.(html.)usedoctest -- since it works on import, the import is only > run once. After that the module isn't really imported again, so the > code that executes on import isn't run. > > I don't know if there's any feasible way around this. Well, I could > unload the module when the doctest is through, just like I unload my > OutputChecker hack. That might be okay. Bah. I put in a hack for this too. It's getting more and more ugly, but I place the blame firmly on doctest. I was trying to review some changes Stefan made, and got this: $ svn diff -r43989:44015 \ http://codespeak.net/svn/lxml/branch/html/src/lxml/html/clean.py svn: REPORT request failed on '/svn/!svn/bc/44056/lxml/branch/html/src/lxml/html/clean.py' svn: File not found: revision 43853, path '/lxml/trunk/src/lxml/html/clean.py' I don't understand that. That file hasn't been moved. What am I missing? -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From d.w.morriss at gmail.com Wed Jun 6 19:25:45 2007 From: d.w.morriss at gmail.com (whit) Date: Wed, 06 Jun 2007 12:25:45 -0500 Subject: [lxml-dev] helping debug osx segfaults Message-ID: been beating my head against getting lxml setup on osx. is there any sort of debug information I could give yall that would help or does anyone have any hot tips for getting things running? currently, I'm running trunk and having intermittent segfaults using etree. lxml is linked against most current fink libxml2 and libxslt2 (I think) I can dig out more info if it would help. -w -- ------ d. whit morriss ------ - senior engineer, opencore - - http://www.openplans.org - - m: 415-710-8975 - "If you don't know where you are, you don't know anything at all" Dr. Edgar Spencer, Ph.D., 1995 From etiffany at alum.mit.edu Wed Jun 6 21:45:32 2007 From: etiffany at alum.mit.edu (Eric Tiffany) Date: Wed, 06 Jun 2007 15:45:32 -0400 Subject: [lxml-dev] helping debug osx segfaults In-Reply-To: Message-ID: Have you added the fink lib directory to DYLD_LIBRARY_PATH in the environment where you run your python? This is what fixed things for me. ET On 6/6/07 1:25 PM, "whit" wrote: > been beating my head against getting lxml setup on osx. is there any > sort of debug information I could give yall that would help or does > anyone have any hot tips for getting things running? > > currently, I'm running trunk and having intermittent segfaults using > etree. lxml is linked against most current fink libxml2 and libxslt2 > (I think) > > I can dig out more info if it would help. > > -w From d.w.morriss at gmail.com Wed Jun 6 22:01:52 2007 From: d.w.morriss at gmail.com (whit) Date: Wed, 06 Jun 2007 15:01:52 -0500 Subject: [lxml-dev] helping debug osx segfaults In-Reply-To: References: Message-ID: thanks eric! that seemed to do it. no segfaults yet. -w -- ------ d. whit morriss ------ - senior engineer, opencore - - http://www.openplans.org - - m: 415-710-8975 - "If you don't know where you are, you don't know anything at all" Dr. Edgar Spencer, Ph.D., 1995 From ianb at colorstudy.com Thu Jun 7 05:45:24 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 06 Jun 2007 22:45:24 -0500 Subject: [lxml-dev] atom model Message-ID: <46677F54.8090502@colorstudy.com> After writing a object model for Atom that serialized to and parsed from XML, I realized Atom and its XML representation really shouldn't be separated, so I created some custom elements to make it a bit easier to handle, while still using the XML as the sole source of information. I'm a little unsure what to do about the namespaces. Everything is in a namespace, but it's tedious to put in everywhere. I've put in some little helper methods internally, and mostly you don't need to use the namespace globally, but I'm unsure about it all. I've created an Element() function that automatically adds the namespace if no namespace is given; helps a little I guess. The standard ways of creating elements is a bit tedious, really. I guess builder can help there a bit, though I don't see a way to give my own parser (which I need in this case). Any suggestions about any of it are welcome. The module is at: https://svn.openplans.org/svn/TaggerStore/trunk/taggerstore/atom.py I didn't really look around in the lxml source, but I used several descriptors that could be general purpose. Maybe there's a better way to do these, or maybe they could go in lxml somewhere. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From stefan_ml at behnel.de Thu Jun 7 09:46:46 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Jun 2007 09:46:46 +0200 Subject: [lxml-dev] atom model In-Reply-To: <46677F54.8090502@colorstudy.com> References: <46677F54.8090502@colorstudy.com> Message-ID: <4667B7E6.9080509@behnel.de> Hi Ian, Ian Bicking wrote: > After writing a object model for Atom that serialized to and parsed from > XML, I realized Atom and its XML representation really shouldn't be > separated, so I created some custom elements to make it a bit easier to > handle, while still using the XML as the sole source of information. That's the way I would do it, too. I brought up the idea of having an "lxml.elementlib" package with this kind of Namespace implementations a couple of times, but I now think it would actually make sense. However, "lxml.ns" might be a better name (XIST uses the same package name, BTW). That way, we'd have "lxml.ns.html" and "lxml.ns.atom" and hopefully others in the future. > I'm a little unsure what to do about the namespaces. Everything is in a > namespace, but it's tedious to put in everywhere. I've put in some > little helper methods internally, and mostly you don't need to use the > namespace globally, but I'm unsure about it all. I've created an > Element() function that automatically adds the namespace if no namespace > is given; helps a little I guess. As you say below, builder.py would definitely make this more usable. RSS is even the example FL uses to present the factory: http://online.effbot.org/2006_11_01_archive.htm#et-builder-rss > The standard ways of creating elements is a bit tedious, really. I > guess builder can help there a bit, though I don't see a way to give my > own parser (which I need in this case). Any suggestions about any of it > are welcome. I added a "parser" keyword argument to the factory (trunk), which reuses the "makeelement" method of the parser for Element creation. > https://svn.openplans.org/svn/TaggerStore/trunk/taggerstore/atom.py A few remarks: - for the lookup, you can either use the Namespace registry mechanism of lxml, which makes it a global setup (I'm considering to make this parser local in lxml 2.0) http://codespeak.net/lxml/dev/element_classes.html#namespace-class-lookup http://codespeak.net/lxml/dev/element_classes.html#id1 or just a dictionary (.get) instead of the many ifs. - please avoid "@property" as lxml wants to stay compatible with Python 2.3. Otherwise: looks like you're on the right track. Stefan From stefan_ml at behnel.de Thu Jun 7 12:33:51 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Jun 2007 12:33:51 +0200 Subject: [lxml-dev] lxml.doctestcompare In-Reply-To: <4666E42D.50405@colorstudy.com> References: <46660221.1010208@colorstudy.com> <4666E42D.50405@colorstudy.com> Message-ID: <4667DF0F.2020801@behnel.de> Hi Ian, Ian Bicking wrote: > $ svn diff -r43989:44015 \ > http://codespeak.net/svn/lxml/branch/html/src/lxml/html/clean.py > > svn: REPORT request failed on > '/svn/!svn/bc/44056/lxml/branch/html/src/lxml/html/clean.py' > svn: File not found: revision 43853, path > '/lxml/trunk/src/lxml/html/clean.py' I get the same, and I'm puzzled, too. Philipp, can you look into this? There seems to be something strange going on with the SVN repository. Stefan From stefan_ml at behnel.de Thu Jun 7 14:15:11 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Jun 2007 14:15:11 +0200 Subject: [lxml-dev] lxml.doctestcompare In-Reply-To: <4666E42D.50405@colorstudy.com> References: <46660221.1010208@colorstudy.com> <4666E42D.50405@colorstudy.com> Message-ID: <4667F6CF.8090608@behnel.de> Hi Ian, Ian Bicking wrote: > I was trying to review some changes Stefan made I was just refactoring. It's generally better to avoid isolated calls to * el.xpath() -> use XPath() or, even better, el.getiterator() if possible * el.attrib -> use el.set() and el.get() Obviously, that's only in the cases where the replacement is equivalent. But, for example, looping over "el.getiterator(tag)" is much faster than iterating over "el.xpath('descendant-or-self:tag')". I actually think that code even becomes simpler and thus more readable that way. For more advanced expressions, XPath() allows you to use readable function names instead of the less readable XPath expression. Stefan From ianb at colorstudy.com Fri Jun 8 19:03:33 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 08 Jun 2007 12:03:33 -0500 Subject: [lxml-dev] atom model In-Reply-To: <4667B7E6.9080509@behnel.de> References: <46677F54.8090502@colorstudy.com> <4667B7E6.9080509@behnel.de> Message-ID: <46698BE5.7010406@colorstudy.com> Stefan Behnel wrote: > Ian Bicking wrote: >> After writing a object model for Atom that serialized to and parsed from >> XML, I realized Atom and its XML representation really shouldn't be >> separated, so I created some custom elements to make it a bit easier to >> handle, while still using the XML as the sole source of information. > > That's the way I would do it, too. > > I brought up the idea of having an "lxml.elementlib" package with this kind of > Namespace implementations a couple of times, but I now think it would actually > make sense. However, "lxml.ns" might be a better name (XIST uses the same > package name, BTW). > > That way, we'd have "lxml.ns.html" and "lxml.ns.atom" and hopefully others in > the future. I'm not a fan of deep hierarchies myself; lxml.html and lxml.atom seem self-explanatory to me. Maybe with more obscure formats it would be less so. >> I'm a little unsure what to do about the namespaces. Everything is in a >> namespace, but it's tedious to put in everywhere. I've put in some >> little helper methods internally, and mostly you don't need to use the >> namespace globally, but I'm unsure about it all. I've created an >> Element() function that automatically adds the namespace if no namespace >> is given; helps a little I guess. > > As you say below, builder.py would definitely make this more usable. RSS is > even the example FL uses to present the factory: > > http://online.effbot.org/2006_11_01_archive.htm#et-builder-rss > > >> The standard ways of creating elements is a bit tedious, really. I >> guess builder can help there a bit, though I don't see a way to give my >> own parser (which I need in this case). Any suggestions about any of it >> are welcome. > > I added a "parser" keyword argument to the factory (trunk), which reuses the > "makeelement" method of the parser for Element creation. OK; I suppose the pattern would then be that atom.E would be ElementMaker(parser=atom_parser)? I don't really understand what typemap is in ElementMaker, but it looks like it's not important here. Except looking at the RSS example, I suppose it could use the native Atom format for dates. Incidentally, something that builder doesn't do but many other XML builders do, it check for any keyword attributes that end in _, and then strip the _. This lets you do class_=something, for_=something, etc. It's handy. >> https://svn.openplans.org/svn/TaggerStore/trunk/taggerstore/atom.py > > A few remarks: > > - for the lookup, you can either use the Namespace registry mechanism of lxml, > which makes it a global setup (I'm considering to make this parser local in > lxml 2.0) > > http://codespeak.net/lxml/dev/element_classes.html#namespace-class-lookup > http://codespeak.net/lxml/dev/element_classes.html#id1 I don't quite get what's going on there. Does this mean that you would globally say that, say, {http://www.w3.org/2005/Atom}feed maps to lxml.atom.Feed? That's not so bad, I guess, but I'm pretty comfortable with just using the parser in the atom module. Feels a bit less surprising. One of the things this got me to thinking about was augmenting HTML with specific microformat-related attributes. I'm not sure how to do this at all. For instance, imagine: doc.findall_vcards() Returns things fitting the hCard microformat (elements with a class of "vcard"). The object really *is* the element with that vcard class (there's a weird ambiguity between the name vcard and hCard in this particular microformat). And it would have attributes like fn, url, etc. One of the funny parts of microformats is that a bit of HTML can be multiple microformats at the same time, by adding more classes. So if you have an review of a business, you use hReview and the item you are reviewing is an hCard. Often the item and the hCard will be on the same element. Which is handy and unambiguous, but an element can't be both kinds of objects at once. So maybe in the microformat case really something like findall_vcards() should return an object that wraps the HTML; a kind of hCard view on the HTML. It could still be stateless (ideally it would be, like with Atom). > or just a dictionary (.get) instead of the many ifs. Sure. > - please avoid "@property" as lxml wants to stay compatible with Python 2.3. Sure -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Fri Jun 8 19:09:30 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 08 Jun 2007 12:09:30 -0500 Subject: [lxml-dev] is tostring() confusing? Message-ID: <46698D4A.6070302@colorstudy.com> Having tostring() as a function and not a method seems a bit odd to me. I know it's from ElementTree, but at least for HTML it's awkward -- using lxml.etree.tostring on HTML is almost certain to create bad output; the output won't be real XHTML (lacking namespaces and it'll probably be invalid), and it will parse quite badly as HTML (