From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Jul 1 14:53:03 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sat, 01 Jul 2006 14:53:03 +0200 Subject: [lxml-dev] let lxml write the ?xml pi In-Reply-To: <20060629102825.GA16494@tttech.com> References: <20060619101402.GA11174@morpheus.apaku.dnsalias.org> <20060619103555.GA11768@morpheus.apaku.dnsalias.org> <44968105.1060307@infrae.com> <20060629102825.GA16494@tttech.com> Message-ID: <44A6702E.3050800@gkec.informatik.tu-darmstadt.de> Hi Albert, Albert Brandl wrote: > I started using lxml some weeks ago, and have been lurking on the > mailing list for some time now. Recently I had the problem that the xml > prologue is not included by default, and stumbled over the following > mail: > > On Mon, Jun 19, 2006 at 12:48:37PM +0200, Martijn Faassen wrote: >> I.e., try the following: >> >> etree.tostring(t, 'utf-8', xml_declaration=True) > > Is there any reason that the method write_c14n() does not support this > flag? The canonical form is a bit more readable, therefore I'd prefer > to use this method. As the documentation of the write_c14n() method states, it always writes UTF-8 encoded byte streams, so there is no real need for the prologue. I wouldn't mind adding this, though. Things like 'standalone' and the XML version would otherwise not be available in the output. BTW, if it's about the readability, pretty printing might be closer to what you want anyway. Stefan From albert.brandl at tttech.com Tue Jul 11 17:57:20 2006 From: albert.brandl at tttech.com (Albert Brandl) Date: Tue, 11 Jul 2006 17:57:20 +0200 Subject: [lxml-dev] let lxml write the ?xml pi In-Reply-To: <44A6702E.3050800@gkec.informatik.tu-darmstadt.de> References: <20060619101402.GA11174@morpheus.apaku.dnsalias.org> <20060619103555.GA11768@morpheus.apaku.dnsalias.org> <44968105.1060307@infrae.com> <20060629102825.GA16494@tttech.com> <44A6702E.3050800@gkec.informatik.tu-darmstadt.de> Message-ID: <20060711155720.GA2018@tttech.com> Hi Stefan, On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote: > As the documentation of the write_c14n() method states, it always writes UTF-8 > encoded byte streams, so there is no real need for the prologue. I wouldn't > mind adding this, though. Things like 'standalone' and the XML version would > otherwise not be available in the output. I recently learned about section 4.1 of the C14N recommendation, http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical form does not contain a prologue. Therefore, write_c14n() is ok - sorry for the request. > BTW, if it's about the readability, pretty printing might be closer to what > you want anyway. Thanks for the hint. In lxml 1.0.1, the pretty printed version adds information about the namespace to every tag. Unfortunately, this decreases the readibility, since in my case, almost all tags have a namespace. A "pretty_print" flag for write_c14n() would be a perfect workaround, though :-) Best regards, Albert From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 11 18:34:37 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue, 11 Jul 2006 18:34:37 +0200 Subject: [lxml-dev] let lxml write the ?xml pi In-Reply-To: <20060711155720.GA2018@tttech.com> References: <20060619101402.GA11174@morpheus.apaku.dnsalias.org> <20060619103555.GA11768@morpheus.apaku.dnsalias.org> <44968105.1060307@infrae.com> <20060629102825.GA16494@tttech.com> <44A6702E.3050800@gkec.informatik.tu-darmstadt.de> <20060711155720.GA2018@tttech.com> Message-ID: <44B3D31D.9030009@gkec.informatik.tu-darmstadt.de> Hi Albert, Albert Brandl wrote: > On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote: >> As the documentation of the write_c14n() method states, it always writes UTF-8 >> encoded byte streams, so there is no real need for the prologue. I wouldn't >> mind adding this, though. Things like 'standalone' and the XML version would >> otherwise not be available in the output. > > I recently learned about section 4.1 of the C14N recommendation, > http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical > form does not contain a prologue. Therefore, write_c14n() is ok - sorry > for the request. Thought so. Thanks for checking. >> BTW, if it's about the readability, pretty printing might be closer to what >> you want anyway. > > Thanks for the hint. In lxml 1.0.1, the pretty printed version adds > information about the namespace to every tag. Not on my side. How do you build the tree? > Unfortunately, this > decreases the readibility, since in my case, almost all tags have a > namespace. A "pretty_print" flag for write_c14n() would be a > perfect workaround, though :-) I don't think that's gonna happen. C14N is meant to be a well-defined XML formatting style, and pretty printing is not part of the standard. Stefan From Geraldjohn.M.Manipon at jpl.nasa.gov Thu Jul 13 08:26:09 2006 From: Geraldjohn.M.Manipon at jpl.nasa.gov (Gerald John M. Manipon) Date: Wed, 12 Jul 2006 23:26:09 -0700 Subject: [lxml-dev] tostring() escapes and adding cdata section Message-ID: <44B5E781.6010305@jpl.nasa.gov> Hi, Quick question: How can I prevent the escaping (specifically '&' into '&' that occurs when I use tostring()? i.e. >>> from lxml.etree import * >>> r = Element('root') >>> s = SubElement(r,'sub') >>> s.text = 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' >>> s.text 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' >>> tostring(s) 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' I'm currently just doing a .replace('&','&') on the string I get back. Also, is there a way to specify that an element's text should be enclosed as a CDATA? Thanks for any help. Gerald From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 13 08:35:28 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu, 13 Jul 2006 08:35:28 +0200 Subject: [lxml-dev] tostring() escapes and adding cdata section In-Reply-To: <44B5E781.6010305@jpl.nasa.gov> References: <44B5E781.6010305@jpl.nasa.gov> Message-ID: <44B5E9B0.1020809@gkec.informatik.tu-darmstadt.de> Hi Gerald, Gerald John M. Manipon wrote: > Quick question: How can I prevent the escaping (specifically '&' into > '&' that occurs when I use tostring()? You (obviously) can't. The output would not be (well-formed) XML. > I'm currently just doing a .replace('&','&') on the string I get > back. You can use 'unescape' from the xml.sax.saxutils module. http://docs.python.org/lib/module-xml.sax.saxutils.html But why don't you use lxml itself? Unescaping is done automatically when you parse the string. > >>> from lxml.etree import * > >>> r = Element('root') > >>> s = SubElement(r,'sub') > >>> s.text = 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' > >>> s.text > 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' > >>> tostring(s) > 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' So? What's the problem? That's perfect XML. Any XML parser will be able to handle that. > Also, is there a way to specify that an element's text should be > enclosed as a CDATA? No. What would be the use case? Stefan From Geraldjohn.M.Manipon at jpl.nasa.gov Thu Jul 13 10:09:32 2006 From: Geraldjohn.M.Manipon at jpl.nasa.gov (Gerald John M. Manipon) Date: Thu, 13 Jul 2006 01:09:32 -0700 Subject: [lxml-dev] tostring() escapes and adding cdata section In-Reply-To: <44B5E9B0.1020809@gkec.informatik.tu-darmstadt.de> References: <44B5E781.6010305@jpl.nasa.gov> <44B5E9B0.1020809@gkec.informatik.tu-darmstadt.de> Message-ID: <44B5FFBC.5000607@jpl.nasa.gov> Stefan Behnel wrote: > Hi Gerald, > > Gerald John M. Manipon wrote: >> Quick question: How can I prevent the escaping (specifically '&' into >> '&' that occurs when I use tostring()? > > You (obviously) can't. The output would not be (well-formed) XML. Okay. > > >> I'm currently just doing a .replace('&','&') on the string I get >> back. > > You can use 'unescape' from the xml.sax.saxutils module. > http://docs.python.org/lib/module-xml.sax.saxutils.html I'll look into that. > > But why don't you use lxml itself? Unescaping is done automatically when you > parse the string. > > >> >>> from lxml.etree import * >> >>> r = Element('root') >> >>> s = SubElement(r,'sub') >> >>> s.text = 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' >> >>> s.text >> 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' >> >>> tostring(s) >> 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"' > > So? What's the problem? That's perfect XML. Any XML parser will be able to > handle that. Yes, I understand. We're posting our xml that we get from tostring() to one of our partner's web services (I don't know the exact backend but it looks Java-based) and their services do not like the '&'. I guess it's a problem on their end. > > >> Also, is there a way to specify that an element's text should be >> enclosed as a CDATA? > > No. What would be the use case? Getting around the invalid xml with '&' in an elements text: I'm guessing that since serialization replaces the '&' anyway, the above would be impossible to produce via lxml. Thanks for your response, Gerald > > Stefan > From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 13 10:45:20 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu, 13 Jul 2006 10:45:20 +0200 Subject: [lxml-dev] tostring() escapes and adding cdata section In-Reply-To: <44B5FFBC.5000607@jpl.nasa.gov> References: <44B5E781.6010305@jpl.nasa.gov> <44B5E9B0.1020809@gkec.informatik.tu-darmstadt.de> <44B5FFBC.5000607@jpl.nasa.gov> Message-ID: <44B60820.60802@gkec.informatik.tu-darmstadt.de> Hi Gerald, Gerald John M. Manipon wrote: > We're posting our xml that we get from tostring() to > one of our partner's web services (I don't know the exact backend but > it looks Java-based) and their services do not like the '&'. I > guess it's a problem on their end. Oh, definitely: http://www.w3.org/TR/2004/REC-xml-20040204/#syntax """ The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "&" and "<" respectively. """ If it doesn't work for them, they should start using an XML parser (which is the best choice for parsing XML anyway...) Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Jul 14 21:24:44 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri, 14 Jul 2006 21:24:44 +0200 Subject: [lxml-dev] a C-level API for lxml Message-ID: <44B7EF7C.2010308@gkec.informatik.tu-darmstadt.de> Hi all, as part of a project on lxml, I'm building an external element API module (objectify style) as a Pyrex extension. To make this independent of lxml itself, I decided to add an external C-level API that allows external modules to efficiently interface with the lxml module. Usage in other modules will be as easy as including a header file or cimporting a .pxd file in Pyrex, and then calling an init function from the external module. The match is done by comparing char* strings for the function names at initialisation time, so this is pretty future proof (no missing symbols when the C API changes etc.). This requires some changes in Pyrex, so lxml 1.1 will depend on a patched version (again), until (one day) my patches are accepted upstream. I also published some Python 2.5 related fixes, BTW, to make lxml 1.1 run nicely on Python 2.5. I can't currently test that since I can't get the 2.5 beta versions to work on my machine (broken compiled-in PYTHONPATH). Anyway, at least I got positive feedback that the exception stuff seems to be fixed. The Py_ssize_t fixes are not verified on 2.5, but should also work. A preliminary version of the patched Pyrex is here: http://codespeak.net/svn/lxml/pyrex/ So, if someone could test lxml with it under 2.5 (preferably on a 64-bit machine) ... When the lxml C-API is in place, it will be easy to add new functions to it (basically by adding the "public" keyword to a Pyrex C function). So I'd be glad if everyone who thinks this API would be useful for them could propose more functions to be made public. I know specifically that Andreas had a problem with extending the XPath implementation, so maybe there are ways to get this solved at the C level. This thread is the right place to discuss these things. Regards, Stefan From buro at petr.com Sun Jul 16 00:34:56 2006 From: buro at petr.com (Petr van Blokland) Date: Sun, 16 Jul 2006 00:34:56 +0200 Subject: [lxml-dev] Python values in xpath functions In-Reply-To: <916E60BB-6C4E-4104-B03D-D98B799E3BF0@petr.com> References: <44985C5D.60900@gkec.informatik.tu-darmstadt.de> <916E60BB-6C4E-4104-B03D-D98B799E3BF0@petr.com> Message-ID: <2C4E4E13-856B-49DA-B4AD-A856424EDE8C@petr.com> Hi, may be someone can get me out. I am returning an etree from a Python function in XPath. But it does not seem to work stepping through the result as in ... where ...works fine for the current node. What do I do wrong. Should the function answer something different from an etree, as in: def myfunction(dummy, *args): ... # create etree from args return etree Kind regards, Petr van Blokland ---------------------------------------------- Petr van Blokland buro at petr.com | www.petr.com | +31 15 219 10 40 ---------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060716/b43e93c7/attachment.htm From behnel_ml at gkec.informatik.tu-darmstadt.de Sun Jul 16 09:47:38 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sun, 16 Jul 2006 09:47:38 +0200 Subject: [lxml-dev] Python values in xpath functions In-Reply-To: <2C4E4E13-856B-49DA-B4AD-A856424EDE8C@petr.com> References: <44985C5D.60900@gkec.informatik.tu-darmstadt.de> <916E60BB-6C4E-4104-B03D-D98B799E3BF0@petr.com> <2C4E4E13-856B-49DA-B4AD-A856424EDE8C@petr.com> Message-ID: <44B9EF1A.1060405@gkec.informatik.tu-darmstadt.de> Hi Petr, Petr van Blokland wrote: > I am returning an etree from a Python function in XPath. "etree" is the name of the module. I guess you mean an ElementTree object? > But it does not seem to work stepping through the result > as in ... > where ...works fine > for the current node. What do I do wrong. Don't return an ElementTree (don't you get an exception for that anyway?). Return an Element or a list of Elements. Stefan From buro at petr.com Sun Jul 16 09:57:02 2006 From: buro at petr.com (Petr van Blokland) Date: Sun, 16 Jul 2006 09:57:02 +0200 Subject: [lxml-dev] Python values in xpath functions In-Reply-To: <44B9EF1A.1060405@gkec.informatik.tu-darmstadt.de> References: <44985C5D.60900@gkec.informatik.tu-darmstadt.de> <916E60BB-6C4E-4104-B03D-D98B799E3BF0@petr.com> <2C4E4E13-856B-49DA-B4AD-A856424EDE8C@petr.com> <44B9EF1A.1060405@gkec.informatik.tu-darmstadt.de> Message-ID: <5D85BAE4-ACF6-4D4C-AFFC-920553A0DA94@petr.com> On Jul 16, 2006, at 9:47 AM, Stefan Behnel wrote: > Hi Petr, > > Petr van Blokland wrote: >> I am returning an etree from a Python function in XPath. > > "etree" is the name of the module. I guess you mean an ElementTree > object? > Yes. >> But it does not seem to work stepping through the result >> as in ... >> where ...works fine >> for the current node. What do I do wrong. > > Don't return an ElementTree (don't you get an exception for that > anyway?). I do. > Return an Element or a list of Elements. Ok, I'll try. Thanks. Petr ---------------------------------------------- Petr van Blokland buro at petr.com | www.petr.com | +31 15 219 10 40 ---------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060716/5e6b60d4/attachment.htm From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 17 10:57:46 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 17 Jul 2006 10:57:46 +0200 Subject: [lxml-dev] a C-level API for lxml In-Reply-To: <44B7EF7C.2010308@gkec.informatik.tu-darmstadt.de> References: <44B7EF7C.2010308@gkec.informatik.tu-darmstadt.de> Message-ID: <44BB510A.9080505@gkec.informatik.tu-darmstadt.de> Hi again, Stefan Behnel wrote: > I decided to add an external C-level API that allows external modules > to efficiently interface with the lxml module. Usage in other modules will be > as easy as including a header file or cimporting a .pxd file in Pyrex, and > then calling an init function from the external module. The match is done by > comparing char* strings for the function names at initialisation time, so this > is pretty future proof (no missing symbols when the C API changes etc.). > > [...] I'd be > glad if everyone who thinks this API would be useful for them could propose > more functions to be made public. I know specifically that Andreas had a > problem with extending the XPath implementation, so maybe there are ways to > get this solved at the C level. This thread is the right place to discuss > these things. There is now some documentation on the C-API and its usage online: http://codespeak.net/svn/lxml/branch/capi/doc/capi.txt The current state of the API is described here: http://codespeak.net/svn/lxml/branch/capi/src/lxml/etreepublic.pxd Stefan From luto at myrealbox.com Wed Jul 19 01:59:47 2006 From: luto at myrealbox.com (Andrew Lutomirski) Date: Tue, 18 Jul 2006 16:59:47 -0700 Subject: [lxml-dev] segfault in iterparse Message-ID: Thanks for iterparse -- it (mostly) rocks. However, I can segfault it on large files when I try to clear out the tree to avoid unbounded memory use. See attached code. I'm _guessing_ that the problem is that iterparse doesn't like the deletion of the current node. Thanks, Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060718/c9fff54f/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: lxmlbug.py Type: application/octet-stream Size: 1140 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060718/c9fff54f/attachment.obj From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 19 08:10:48 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed, 19 Jul 2006 08:10:48 +0200 Subject: [lxml-dev] segfault in iterparse In-Reply-To: References: Message-ID: <44BDCCE8.6050701@gkec.informatik.tu-darmstadt.de> Hi Andrew, Andrew Lutomirski wrote: > Thanks for iterparse -- it (mostly) rocks. However, I can segfault it > on large files when I try to clear out the tree to avoid unbounded > memory use. See attached code. Thanks for the test script - perfect bug report! > I'm _guessing_ that the problem is that iterparse doesn't like the > deletion of the current node. Yup. Even worse, what you do is delete the entire tree, including all parents: for i in etree.iterparse(StringIO(xml)): i[1].getroottree().getroot().clear() While lxml prevents the tree from being garbage collected immediately (it uses a parent stack), the above code still unlinks the root node from rest of the tree - looks like libxml2 doesn't like that... The above usage is not currently prevented, only forbidden: http://codespeak.net/svn/lxml/trunk/doc/api.txt """ Note that you should not modify or move the ancestors or siblings of the element during either of the two events [start/end]. You should also avoid moving the element itself. """ I do not know if it is worth taking special measures against accessing the parent (or root tree) of the current element. On the one hand, lxml should not segfault. On the other hand, /blocking/ the access to parents requires some kind of additional flag on each element (AFAICT), so elements returned by iterparse() would have to behave different from normal elements (i.e. be different classes). In any case, it would not give you what you want - clearing the entire tree would then simply raise an exception instead of segfaulting. So, maybe it's enough to rephrase the above quote to *must not* to 'fix' this bug... As for your problem, what you /can/ do, is remove the preceding siblings of the element: for event, element in etree.iterparse(StringIO(xml)): # do something with element element.clear() # clean up children if element.getprevious(): # clean up preceding siblings del element.getparent()[0] This cleans up *after* the element. If you decided to skip elements ('tag' argument), you can use "while" instead of "if" to remove all siblings you might not have seen. Admittedly, there's a little more consideration required than for the ElementTree library, but I guess that's the price we pay for lxml being based on libxml2. Hope it helps. Stefan From luto at myrealbox.com Wed Jul 19 08:34:33 2006 From: luto at myrealbox.com (Andrew Lutomirski) Date: Tue, 18 Jul 2006 23:34:33 -0700 Subject: [lxml-dev] segfault in iterparse In-Reply-To: <44BDCCE8.6050701@gkec.informatik.tu-darmstadt.de> References: <44BDCCE8.6050701@gkec.informatik.tu-darmstadt.de> Message-ID: (sorry for resend -- I borked it the first time) On 7/18/06, Stefan Behnel wrote: > Hi Andrew, > > Andrew Lutomirski wrote: > > Thanks for iterparse -- it (mostly) rocks. However, I can segfault it > > on large files when I try to clear out the tree to avoid unbounded > > memory use. See attached code. > > Thanks for the test script - perfect bug report! > > > > I'm _guessing_ that the problem is that iterparse doesn't like the > > deletion of the current node. > > Yup. Even worse, what you do is delete the entire tree, including all > parents: > > for i in etree.iterparse(StringIO(xml)): > i[1].getroottree().getroot().clear() > > While lxml prevents the tree from being garbage collected immediately (it > uses > a parent stack), the above code still unlinks the root node from rest of > the > tree - looks like libxml2 doesn't like that... > > The above usage is not currently prevented, only forbidden: > > http://codespeak.net/svn/lxml/trunk/doc/api.txt > > """ > Note that you should not modify or move the ancestors or siblings > of the element during either of the two events [start/end]. You should > also > avoid moving the element itself. > """ > Phooey. I didn't read that -- I read http://effbot.org/zone/element-iterparse.htm, which suggests: for event, elem in context: if event == "end" and elem.tag == "record": ... process record elements ... root.clear() (Apologies if gmail butchered that.) > > As for your problem, what you /can/ do, is remove the preceding siblings > of > the element: Didn't the doc just say that you're _not_ supposed to modify siblings of the current element? Perhaps the doc should give some canonical way to do the huge document parsing? (For reference, 100k children of the root element is probably an underestimate for my application, unfortunately.) Probably the rule is "don't do anything that'll make the next event have trouble linking itself into the tree." Removing siblings on an "end" is safe, I guess? Anyway, I'll keep fiddling around. Thanks, Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060718/d5b2842a/attachment.htm From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 19 08:47:48 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed, 19 Jul 2006 08:47:48 +0200 Subject: [lxml-dev] segfault in iterparse In-Reply-To: References: <44BDCCE8.6050701@gkec.informatik.tu-darmstadt.de> Message-ID: <44BDD594.8030905@gkec.informatik.tu-darmstadt.de> Hi Andrew, Andrew Lutomirski wrote: > Phooey. I didn't read that -- I read > http://effbot.org/zone/element-iterparse.htm, which suggests: > > for event, elem in context: > > if event == "end" and elem.tag == "record": > ... process record elements ... > root.clear() Sure, I though so. I now updated the docs to make that clearer and also added a FAQ section so that it is easier to find. http://codespeak.net/svn/lxml/trunk/doc/api.txt http://codespeak.net/svn/lxml/trunk/doc/FAQ.txt > Didn't the doc just say that you're _not_ supposed to modify siblings of > the current element? That was not phrased correctly. What was meant was "following siblings" (which might already be available due to internal implementation details). > Perhaps the doc should give some canonical way to > do the huge document parsing? (For reference, 100k children of the root > element is probably an underestimate for my application, unfortunately.) As I said, clearing the element and deleting the preceding siblings should do the trick. > Probably the rule is "don't do anything that'll make the next event have > trouble linking itself into the tree." I guess that's a good way of putting it. Everything that has to be touched again *after* the current element is a no-no for modification. > Removing siblings on an "end" is safe, I guess? Preceding siblings, yes. Stefan From luto at myrealbox.com Fri Jul 21 00:31:56 2006 From: luto at myrealbox.com (Andrew Lutomirski) Date: Thu, 20 Jul 2006 15:31:56 -0700 Subject: [lxml-dev] another iterparse segfault Message-ID: This one mystifies me competely -- three line testcase attached. This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as Python 2.4 on Gentoo with lxml trunk as of yesterday. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060720/10a636db/attachment.htm -------------- next part -------------- #!/usr/bin/env python from cStringIO import StringIO from lxml import etree xml = '' + ''.join(['' for b in xrange(10000)]) + '' # Uncomment these and it will crash instead of failing an assertion. #class myelement(etree.ElementBase): # pass #etree.setDefaultElementClass(None) iter = etree.iterparse(StringIO(xml)) # The following variant will not crash: #iter = etree.iterwalk(etree.parse(StringIO(xml))) for x in iter: elem = x[1] #print elem, type(elem) # If you uncomment the setDefaultElementClass stuff, you may need to # uncomment this to make it crash. #if len(dir(type(elem).__dict__)) == 0: # print 'This happens sometimes.' #dir(type(elem)) From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Jul 21 07:06:12 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri, 21 Jul 2006 07:06:12 +0200 Subject: [lxml-dev] another iterparse segfault In-Reply-To: References: Message-ID: <44C060C4.2030907@gkec.informatik.tu-darmstadt.de> Hi Andrew, Andrew Lutomirski wrote: > This one mystifies me competely -- three line testcase attached. > > This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as > Python 2.4 on Gentoo with lxml trunk as of yesterday. Again, thanks for the bug report. This one really is a bug and I can reproduce it with your test. It is related to the __ITERPARSE_CHUNK_SIZE (iterparse.pxi) that is used internally to read the data in small chunks and hand it to the parser to generate events. If you reduce the value, the chunk size is passed earlier (after less than the 10000 elements you needed for your test) and the bug occurs after a smaller number of parsed elements. I'll have to take a closer look at it to figure out what's going wrong here. Thanks again, Stefan From Olivier.Collioud at wipo.int Fri Jul 21 16:51:29 2006 From: Olivier.Collioud at wipo.int (Olivier Collioud) Date: Fri, 21 Jul 2006 16:51:29 +0200 Subject: [lxml-dev] exceptions.TypeError: 'Argument must be string or unicode.' Message-ID: Hello, I'm having this error: Exception exceptions.TypeError: 'Argument must be string or unicode.' in 'etree._setAttributeValue' ignored I suspect that I'm setting an attribute value to None but I don't know where. Is there a way to figure out where in my code the error occure ? Olivier. ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Jul 21 21:53:17 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri, 21 Jul 2006 21:53:17 +0200 Subject: [lxml-dev] exceptions.TypeError: 'Argument must be string or unicode.' In-Reply-To: References: Message-ID: <44C130AD.9080608@gkec.informatik.tu-darmstadt.de> Hi Olivier, Olivier Collioud wrote: > I'm having this error: > Exception exceptions.TypeError: 'Argument must be string or unicode.' > in 'etree._setAttributeValue' ignored > > I suspect that I'm setting an attribute value to None but I don't know > where. Sorry, my fault. That's a bug in lxml. You didn't tell us which version you are using, but it's both in 1.0.2 and 1.1alpha. The fix is attached. With this applied, you will get an exception including a normal traceback. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: attribute-value-fix.patch Type: text/x-patch Size: 2098 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060721/617c4f44/attachment.bin From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Jul 22 22:15:35 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sat, 22 Jul 2006 22:15:35 +0200 Subject: [lxml-dev] another iterparse segfault In-Reply-To: <44C060C4.2030907@gkec.informatik.tu-darmstadt.de> References: <44C060C4.2030907@gkec.informatik.tu-darmstadt.de> Message-ID: <44C28767.4020400@gkec.informatik.tu-darmstadt.de> Stefan Behnel wrote: > Andrew Lutomirski wrote: >> This one mystifies me competely -- three line testcase attached. >> >> This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as >> Python 2.4 on Gentoo with lxml trunk as of yesterday. > > Again, thanks for the bug report. This one really is a bug and I can reproduce > it with your test. It is related to the __ITERPARSE_CHUNK_SIZE (iterparse.pxi) > that is used internally to read the data in small chunks and hand it to the > parser to generate events. If you reduce the value, the chunk size is passed > earlier (after less than the 10000 elements you needed for your test) and the > bug occurs after a smaller number of parsed elements. > > I'll have to take a closer look at it to figure out what's going wrong here. ... and so I did. It was a bug in the iterparse.next() method. The events and corresponding elements are stored in a 'queue' (a Python list) and retrieved by a call to PyList_GET_ITEM(). That funtion (or macro) returns a so-called "borrowed reference" that must be INCREF'd by hand (Pyrex does not know about it). Otherwise, the refcount is too low and will be garbage collected before the last reference is gone. Here's the patch. Thanks for the report, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: iterparse-next-incref.patch Type: text/x-patch Size: 1730 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060722/63d1a7eb/attachment.bin From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 09:55:58 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 24 Jul 2006 09:55:58 +0200 Subject: [lxml-dev] exceptions.TypeError: 'Argument must be string or unicode.' In-Reply-To: References: Message-ID: <44C47D0E.2080706@gkec.informatik.tu-darmstadt.de> Hi Olivier, Olivier Collioud wrote: > I'm using 1.0.2 installed with lxml-1.0.2.win32-static-py2.4.exe. > > I guess that I need to pick the source package before applying you > patch and then compile. Right. There are build instructions on the lxml page: http://codespeak.net/lxml/build.html#static-linking-on-windows > When do you think the next version will be provided ? 1.1 beta will be out early next month. As for 1.0.3: there are currently no critical bugs in 1.0.2, so I can't tell if it will be available any earlier - but not later either. Stefan From faassen at infrae.com Mon Jul 24 16:37:24 2006 From: faassen at infrae.com (Martijn Faassen) Date: Mon, 24 Jul 2006 16:37:24 +0200 Subject: [lxml-dev] missing ElementTree API? Message-ID: <44C4DB24.206@infrae.com> Hi there, Reading the "What changed in Python 2.5" document I ran into the following bit of text concerning ElementTree: """ Comments and processing instructions are also represented as Element nodes. To check if a node is a comment or processing instructions: if elem.tag is ET.Comment: ... elif elem.tag is ET.ProcessingInstruction: ... """ As far as I can determine with a simple 'grep' on the source code, this isn't supported yet by lxml, at least the ProcessingInstruction bit. Since this now appears to be documented in a rather central document, perhaps we should. :) I'm a bit surprised about this use of ET.Comment - this would imply elem.tag returns the Comment class when its element is representing a comment? Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 16:52:02 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 24 Jul 2006 16:52:02 +0200 Subject: [lxml-dev] missing ElementTree API? In-Reply-To: <44C4DB24.206@infrae.com> References: <44C4DB24.206@infrae.com> Message-ID: <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de> Hi Martijn, Martijn Faassen wrote: > Reading the "What changed in Python 2.5" document I ran into the > following bit of text concerning ElementTree: > > """ > Comments and processing instructions are also represented as Element > nodes. To check if a node is a comment or processing instructions: > > if elem.tag is ET.Comment: > ... > elif elem.tag is ET.ProcessingInstruction: > ... > """ Interesting. I don't think I've ever seen something like that before. > As far as I can determine with a simple 'grep' on the source code, this > isn't supported yet by lxml, at least the ProcessingInstruction bit. > Since this now appears to be documented in a rather central document, > perhaps we should. :) Perhaps, yes. Guess we should also start supporting PIs, just like normal Elements and Comments (i.e. _isElement would get a third value to check). > I'm a bit surprised about this use of ET.Comment - this would imply > elem.tag returns the Comment class when its element is representing a > comment? Looks like it. ET does this: --------------------------------- def ProcessingInstruction(target, text=None): element = Element(ProcessingInstruction) element.text = target if text: element.text = element.text + " " + text return element PI = ProcessingInstruction --------------------------------- Same for Comment. Funny, hu? So we'd have to return the Comment factory function from _Comment.tag then instead of the current None. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 17:07:51 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 24 Jul 2006 17:07:51 +0200 Subject: [lxml-dev] missing ElementTree API? In-Reply-To: <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de> References: <44C4DB24.206@infrae.com> <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de> Message-ID: <44C4E247.60506@gkec.informatik.tu-darmstadt.de> Stefan Behnel wrote: > ET does this: > > --------------------------------- > def ProcessingInstruction(target, text=None): > element = Element(ProcessingInstruction) > element.text = target > if text: > element.text = element.text + " " + text > return element > PI = ProcessingInstruction > --------------------------------- One thing that bothers me is that this prevents us from returning the target as tag. libxml2 uses the 'name' for this. Maybe we should add a property "target" to PIs? Even ET (1.3?) could easily emulate that by setting element.target = target before returning it. Stefan From faassen at infrae.com Mon Jul 24 18:24:20 2006 From: faassen at infrae.com (Martijn Faassen) Date: Mon, 24 Jul 2006 18:24:20 +0200 Subject: [lxml-dev] missing ElementTree API? In-Reply-To: <44C4E247.60506@gkec.informatik.tu-darmstadt.de> References: <44C4DB24.206@infrae.com> <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de> <44C4E247.60506@gkec.informatik.tu-darmstadt.de> Message-ID: <44C4F434.2090606@infrae.com> Stefan Behnel wrote: > Stefan Behnel wrote: >> ET does this: >> >> --------------------------------- >> def ProcessingInstruction(target, text=None): >> element = Element(ProcessingInstruction) >> element.text = target >> if text: >> element.text = element.text + " " + text >> return element >> PI = ProcessingInstruction >> --------------------------------- > > One thing that bothers me is that this prevents us from returning the target > as tag. libxml2 uses the 'name' for this. Maybe we should add a property > "target" to PIs? > > Even ET (1.3?) could easily emulate that by setting > > element.target = target > > before returning it. I'm fine with extending the API that way to return this information if ET doesn't do it. Heh, all this makes me think we should start working on a ElementTree community standard with accompanying testsuite. Now the only thing we need is time. :) Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 18:27:23 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 24 Jul 2006 18:27:23 +0200 Subject: [lxml-dev] missing ElementTree API? In-Reply-To: <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de> References: <44C4DB24.206@infrae.com> <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de> Message-ID: <44C4F4EB.5020700@gkec.informatik.tu-darmstadt.de> Stefan Behnel wrote: > ET does this: > > --------------------------------- > def ProcessingInstruction(target, text=None): > element = Element(ProcessingInstruction) > element.text = target > if text: > element.text = element.text + " " + text > return element > PI = ProcessingInstruction > --------------------------------- Ok, I think that the target of a PI has a sufficiently high importance to give it an API, so I decided not to go the ET way here. lxml.etree will have a ".target" property that returns the PI target and the ".text" property will not contain the target. This means that ...... will give this in lxml.etree: pi.target == "test" pi.text == "my test PI " and this in ET: pi.text == "test my test PI" I'm also considering making PIs and comments subject to custom Element class selection, which would simplify the re-implementation of PHP over lxml. :] But maybe that can wait until lxml 1.2... :) Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 20:09:13 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 24 Jul 2006 20:09:13 +0200 Subject: [lxml-dev] missing ElementTree API? In-Reply-To: <44C4F434.2090606@infrae.com> References: <44C4DB24.206@infrae.com> <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de> <44C4E247.60506@gkec.informatik.tu-darmstadt.de> <44C4F434.2090606@infrae.com> Message-ID: <44C50CC9.1000609@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Heh, all this makes me think we should start working on a ElementTree > community standard with accompanying testsuite. Now the only thing we > need is time. :) Well, we have tests/test_elementtree.py to contribute for now. It's not quite what I'd call a 'complete' test suite, but it's a good point to start. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 21:06:02 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 24 Jul 2006 21:06:02 +0200 Subject: [lxml-dev] Custom element class lookup mechanisms Message-ID: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de> Hi all, as I was working on the C-API anyway (capi branch), I decided to add a little external module with different ways of determining the Python element class for a libxml2 node. The "lxml.elements.classlookup" module currently implements three different ways of doing this: * ElementDefaultClassLookup always uses the default class * ElementNamespaceClassLookup is the default namespace lookup mechanism * AttributeBasedElementClassLookup determines the class by looking up the value of a specific attribute in a dict. It falls back to the default classes. Other ways are of cause possible, so if anyone has an idea what to add, I'm open for suggestions. An example usage is this: from lxml.elements import classlookup classlookup.setElementClassLookup( classlookup.ElementDefaultClassLookup()) It registers the mechanism that always uses the default class for elements, comments and PIs (yes, I implemented that, too). This disables the namespace class lookup and thus speeds up the plain element object creation by up to 10%. Example usage for attribute based lookup: mydict = {'int' : IntElement, 'str' : StrElement} classlookup.setElementClassLookup( classlookup.AttributeBasedElementClassLookup('pytype', mydict)) root = etree.XML('5test') Internally, the lookup function is registered using the public C-API function "setElementClassLookupFunction()" and must be implemented in Pyrex (or C). It takes an object and the xmlNode* as arguments. The object can be used to keep some status, such as the attribute name and class dict in the AttributeBasedElementClassLookup case. It is registered together with the lookup function, passed as first argument on each call and otherwise ignored by lxml. The return value of the lookup function is a callable Python object (typically a subtype of _Element) that returns an element instance. The C API itself is briefly described here: http://codespeak.net/svn/lxml/branch/capi/doc/capi.txt Hope this is useful, Stefan From luto at myrealbox.com Mon Jul 24 21:13:16 2006 From: luto at myrealbox.com (Andrew Lutomirski) Date: Mon, 24 Jul 2006 12:13:16 -0700 Subject: [lxml-dev] Custom element class lookup mechanisms In-Reply-To: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de> References: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de> Message-ID: On 7/24/06, Stefan Behnel wrote: > > Hi all, > > as I was working on the C-API anyway (capi branch), I decided to add a > little > external module with different ways of determining the Python element > class > for a libxml2 node. The "lxml.elements.classlookup" module currently > implements three different ways of doing this: > > * ElementDefaultClassLookup always uses the default class > * ElementNamespaceClassLookup is the default namespace lookup mechanism > * AttributeBasedElementClassLookup determines the class by looking up the > value of a specific attribute in a dict. It falls back to the default > classes. > > Other ways are of cause possible, so if anyone has an idea what to add, > I'm > open for suggestions. How about a way to make this setting per-parser instead of global? --Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060724/e8d07f7d/attachment.htm From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 21:26:05 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 24 Jul 2006 21:26:05 +0200 Subject: [lxml-dev] Custom element class lookup mechanisms In-Reply-To: References: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de> Message-ID: <44C51ECD.8000101@gkec.informatik.tu-darmstadt.de> Andrew Lutomirski wrote: > On 7/24/06, *Stefan Behnel* wrote: > > Hi all, > > as I was working on the C-API anyway (capi branch), I decided to add > a little > external module with different ways of determining the Python > element class > for a libxml2 node. The "lxml.elements.classlookup " module currently > implements three different ways of doing this: > > * ElementDefaultClassLookup always uses the default class > * ElementNamespaceClassLookup is the default namespace lookup mechanism > * AttributeBasedElementClassLookup determines the class by looking > up the > value of a specific attribute in a dict. It falls back to the > default classes. > > Other ways are of cause possible, so if anyone has an idea what to > add, I'm > open for suggestions. > > > How about a way to make this setting per-parser instead of global? Sure, I thought about that, too (although rather at a per-document level). But that would require changing the signature of the lookup function to pass also the document (which, in turn, keeps a reference to its parser). I think that makes sense, so I'll pass the document also. You can then use a weak-dict to map documents (or parsers) to element classes. Stefan From vadud3 at gmail.com Tue Jul 25 11:26:31 2006 From: vadud3 at gmail.com (Asif Iqbal) Date: Tue, 25 Jul 2006 05:26:31 -0400 Subject: [lxml-dev] xml2 and xslt libraries Message-ID: Hi All How do I compile lxml with xml2 and xslt libraries from /usr/local/lib dir? I have two separate version of them. One is /usr/lib that comes with SUN and one in /usr/local/lib. I did run `python setup.py build_ext -L/usr/local/lib -R/usr/local/lib'. But only takes care of the -lxml. The xslt library was still being picked up from /usr/lib. The xslt library of SUN that sits on /usr/lib is integral part of SUN Sol 10 and used by SMF so I cannot get rid it. I need `xsltDocDefaultLoader' which is missing on /usr/lib/libxslt.so. Details: nm /usr/lib/libxslt.so | grep xsltDoc [431] | 116043| 297|FUNC |GLOB |0 |11 |xsltDocumentComp [495] | 133376| 3395|FUNC |GLOB |0 |11 |xsltDocumentElem [319] | 101736| 908|FUNC |GLOB |0 |11 |xsltDocumentFunction [156] | 101059| 677|FUNC |LOCL |0 |11 |xsltDocumentFunctionLoadDocument [277] | 48263| 127|FUNC |GLOB |0 |11 |xsltDocumentSortFunction nm /usr/local/lib/libxslt.so | grep xsltDoc [1251] | 254192| 4|OBJT |GLOB |0 |25 |xsltDocDefaultLoader [803] | 119356| 290|FUNC |LOCL |0 |10 |xsltDocDefaultLoaderFunc [1241] | 122376| 309|FUNC |GLOB |0 |10 |xsltDocumentComp [1527] | 139932| 3817|FUNC |GLOB |0 |10 |xsltDocumentElem [1601] | 105248| 1471|FUNC |GLOB |0 |10 |xsltDocumentFunction [1504] | 49964| 160|FUNC |GLOB |0 |10 |xsltDocumentSortFunction As you can see `xsltDocDefaultLoader' is only available on /usr/local/lib/libxslt.so -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060725/c7757c84/attachment-0001.htm From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 25 12:11:03 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue, 25 Jul 2006 12:11:03 +0200 Subject: [lxml-dev] xml2 and xslt libraries In-Reply-To: References: Message-ID: <44C5EE37.9010500@gkec.informatik.tu-darmstadt.de> Hi Asif, Asif Iqbal wrote: > How do I compile lxml with xml2 and xslt libraries from /usr/local/lib > dir? I have two separate version of them. One is /usr/lib that comes > with SUN and one in /usr/local/lib. > > I did run `python setup.py build_ext -L/usr/local/lib -R/usr/local/lib'. > But only takes care of the -lxml. The xslt library was still being > picked up from /usr/lib. The xslt library of SUN that sits on /usr/lib > is integral part of SUN Sol 10 and used by SMF so I cannot get rid it. > > I need `xsltDocDefaultLoader' which is missing on /usr/lib/libxslt.so. Guess that library is too old, then. You have a number of options, here are two. * Try setting LD_LIBRARY_PATH to "/usr/local/lib". * You can build etree statically, which completely avoids this kind of problems: http://codespeak.net/lxml/build.html#static-linking-on-windows (hope you don't mind the 'windows' bit in the text, you can also use that on other systems.) Hope it helps, Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 25 12:42:52 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue, 25 Jul 2006 12:42:52 +0200 Subject: [lxml-dev] xml2 and xslt libraries In-Reply-To: References: <44C5EE37.9010500@gkec.informatik.tu-darmstadt.de> Message-ID: <44C5F5AC.6070908@gkec.informatik.tu-darmstadt.de> Asif Iqbal wrote: > On 7/25/06, Stefan Behnel wrote: >> You can build etree statically, which completely avoids this kind >> of problems: >> >> http://codespeak.net/lxml/build.html#static-linking-on-windows > > I did see this since it comes with doc/build.txt file. Do I have to get > all the iconv and zlib all the other static libraries? No. It just means you have to build your flags by hand. I just happened to compile lxml statically today, against a Python 2.4 installed in my work directory. Apart from that, I only used libxml2 and libxslt for static compilation: cflags = [ "-I/path/to/libxml2-2.6.26/include", "-I/path/to/libxslt-1.1.17", "-I/path/to/PYTHON/include/python2.4", "-I/usr/include" ] xslt_libs = [ "-L/path/to/PYTHON/lib/python2.4", "/path/to/libxslt-1.1.17/libexslt/.libs/libexslt.a", "/path/to/libxslt-1.1.17/libxslt/.libs/libxslt.a", "/path/to/libxml2-2.6.26/.libs/libxml2.a", "-lz", "-lm", ] Worked for me, you'll likely have to adapt it. BTW, in case you use 1.1alpha, its setup.py has a bug regarding the static setup. Feel free to apply this patch: -------------------------------------- Index: setup.py =================================================================== --- setup.py (Revision 30429) +++ setup.py (Arbeitskopie) @@ -109,6 +109,7 @@ # use the static setup as configured in setupStaticBuild sys.argv.remove('--static') cflags, xslt_libs = setupStaticBuild() + ext_args['extra_link_args'] = xslt_libs else: cflags = flags('xslt-config --cflags') xslt_libs = flags('xslt-config --libs') -------------------------------------- Stefan From faassen at infrae.com Tue Jul 25 14:29:44 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue, 25 Jul 2006 14:29:44 +0200 Subject: [lxml-dev] missing ElementTree API? In-Reply-To: <44C4F4EB.5020700@gkec.informatik.tu-darmstadt.de> References: <44C4DB24.206@infrae.com> <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de> <44C4F4EB.5020700@gkec.informatik.tu-darmstadt.de> Message-ID: <44C60EB8.7010806@infrae.com> Stefan Behnel wrote: [snip] > I'm also considering making PIs and comments subject to custom Element class > selection, which would simplify the re-implementation of PHP over lxml. :] Arrgh! :) Anyway, it's likely PHP can have combinations of processing instructions that aren't valid XML. I don't know anything about PHP but I recall having such issues with the ClearSilver templating issue - it turned out to be impossible to generate it using XSLT (unless I was masochistic enough to generate it in text mode). Regards, Martijn From Olivier.Collioud at wipo.int Tue Jul 25 15:54:34 2006 From: Olivier.Collioud at wipo.int (Olivier Collioud) Date: Tue, 25 Jul 2006 15:54:34 +0200 Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4 runtime Message-ID: Hello, following these instructions: http://codespeak.net/lxml/build.html#static-linking-on-windows Running these commands: C:\Download\lxml-source\lxml-1.0.2>set PATH="C:\Program Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\bin";"C:\Program Files\Microsoft Visual Studio\VC98\Bin" C:\Download\lxml-source\lxml-1.0.2>set PYTHONPATH="C:\Program Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib" C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static Building lxml version 1.0.2 *NOTE*: Trying to build without Pyrex, needs pre-generated 'src/lxml/etree.c' ! running bdist_wininst running build running build_py running build_ext Traceback (most recent call last): File "setup.py", line 170, in ? ext_modules = ext_modules, File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\core.py", line 149, in setup dist.run_commands() File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\dist.py", line 907, in run_commands self.run_command(cmd) File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\dist.py", line 927, in run_command cmd_obj.run() File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\command\bdist_wininst.py", line 101, in run self.run_command('build') File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\dist.py", line 927, in run_command cmd_obj.run() File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\command\build.py", line 107, in run self.run_command(cmd_name) File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\dist.py", line 927, in run_command cmd_obj.run() File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\command\build_ext.py", line 243, in run force=self.force) File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\ccompiler.py", line 1173, in new_compiler return klass (None, dry_run, force) File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 206, in __init__ self.__macros = MacroExpander(self.__version) File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 112, in __init__ self.load_macros(version) File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 128, in load_macros self.set_macro("FrameworkSDKDir", net, "sdkinstallrootv1.1") File "C:\Program Files\OpenOffice.org 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 118, in set_macro self.macros["$(%s)" % macro] = d[key] KeyError: 'sdkinstallrootv1.1' My guess is that it is related to my MS-VS installation. I have no experience with MS compilation. (I don't even know why it is installed on my PC :-). I would be grateful if anybody can help me to build this or tell me where I can download an lxml 1.0.2 or more recent (or not too old) build for win32 and Python 2.3.4. Olivier. ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. From vadud3 at gmail.com Tue Jul 25 16:08:54 2006 From: vadud3 at gmail.com (Asif Iqbal) Date: Tue, 25 Jul 2006 10:08:54 -0400 Subject: [lxml-dev] xml2 and xslt libraries In-Reply-To: References: <44C5EE37.9010500@gkec.informatik.tu-darmstadt.de> <44C5F5AC.6070908@gkec.informatik.tu-darmstadt.de> Message-ID: On 7/25/06, Asif Iqbal wrote: > > On 7/25/06, Stefan Behnel > wrote: > > > > > Asif Iqbal wrote: > > > On 7/25/06, Stefan Behnel wrote: > > >> You can build etree statically, which completely avoids this kind > > >> of problems: > > >> > > >> http://codespeak.net/lxml/build.html#static-linking-on-windows > > > > > > I did see this since it comes with doc/build.txt file. Do I have to > > get > > > all the iconv and zlib all the other static libraries? > > > > No. It just means you have to build your flags by hand. I just happened > > to > > compile lxml statically today, against a Python 2.4 installed in my work > > directory. Apart from that, I only used libxml2 and libxslt for static > > compilation: > > > > cflags = [ > > "-I/path/to/libxml2-2.6.26/include", > > "-I/path/to/libxslt-1.1.17", > > "-I/path/to/PYTHON/include/python2.4", > > "-I/usr/include" > > ] > > xslt_libs = [ > > "-L/path/to/PYTHON/lib/python2.4", > > "/path/to/libxslt-1.1.17/libexslt/.libs/libexslt.a", > > "/path/to/libxslt-1.1.17/libxslt/.libs/libxslt.a", > > "/path/to/libxml2-2.6.26/.libs/libxml2.a", > > "-lz", "-lm", > > ] > > > > Worked for me, you'll likely have to adapt it. > > > I could not compile . I attached the compile output called > lxml.compile.out > Here is the attachment again lot smaller than last time. Hopefully it wont bounce this time BTW, in case you use 1.1alpha, its setup.py has a bug regarding the static > > setup. Feel free to apply this patch: > > > > Looks like 1.0.2 needed a patch to > > > --- setup.py.orig Tue Jul 25 08:03:15 2006 > +++ setup.py Tue Jul 25 08:03:29 2006 > @@ -18,7 +18,7 @@ > xslt_libs = [ > ] > result = (cflags, xslt_libs) > - # return result > + return result > raise NotImplementedError, \ > "Static build not configured, see doc/build.txt" > > > -- > Asif Iqbal > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > > > > -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060725/992a5811/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: lxml.compile.out2 Type: application/octet-stream Size: 2542 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060725/992a5811/attachment.obj From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 25 16:19:37 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue, 25 Jul 2006 16:19:37 +0200 Subject: [lxml-dev] xml2 and xslt libraries In-Reply-To: References: <44C5EE37.9010500@gkec.informatik.tu-darmstadt.de> <44C5F5AC.6070908@gkec.informatik.tu-darmstadt.de> Message-ID: <44C62879.7050206@gkec.informatik.tu-darmstadt.de> Hi Asif, Asif Iqbal wrote: > I could not compile . I attached the compile output called lxml.compile.out Hmm, ok, first things first: 3MB log files are not quite the thing you should send to a mailing list. That's something for a) private e-mail and b) compressors like bzip2, which, BTW, compresses it to some 100K. That said, here's the relevant portions: ------------------------------------- Building lxml version 1.0.2 running build_ext building 'lxml.etree' extension gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python2.4 -c src/lxml/etree.c -o build/temp.solaris-2.10-i86pc-2.4/src/lxml/etree.o -w -I/usr/local/include/libxml2 -I/usr/local/include/libxslt -I/usr/loc al/include/libexslt -I/usr/local/include/python2.4 -I/usr/include gcc -shared build/temp.solaris-2.10-i86pc-2.4/src/lxml/etree.o -o build/lib.solaris-2.10-i86pc-2.4/lxml/etree.so -L/usr/local/lib/python2.4 /usr/local/lib/libexslt.a /usr/local/lib/libxslt.a /usr/local/lib/libxml2.a -lz -lpthread -lsocket -lnsl -lm Text relocation remains referenced against symbol offset in file 0x17 /usr/local/lib/libexslt.a(common.o) 0xb6 /usr/local/lib/libexslt.a(common.o) 0xe3 /usr/local/lib/libexslt.a(common.o) 0x11c /usr/local/lib/libexslt.a(common.o) [...] fmod 0x7f3b /usr/local/lib/libxml2.a(xpath.o) fmod 0x8023 /usr/local/lib/libxml2.a(xpath.o) fmod 0x8c6c /usr/local/lib/libxml2.a(xpath.o) ld: fatal: relocations remain against allocatable but non-writable sections collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 ------------------------------------- Hmmm, don't know where that comes from. I can see that you added quite a number of libraries and I don't think it's because there is one missing. Maybe someone else who has more experience with Solaris compilation than I do could help you here... > Looks like 1.0.2 needed a patch to > > --- setup.py.orig Tue Jul 25 08:03:15 2006 > +++ setup.py Tue Jul 25 08:03:29 2006 > @@ -18,7 +18,7 @@ > xslt_libs = [ > ] > result = (cflags, xslt_libs) > - # return result > + return result > raise NotImplementedError, \ > "Static build not configured, see doc/build.txt" No, why? There is no standard configuration for static compilation, so you get an exception when you pass "--static" until you managed to a) read the docs and b) changed the static setup function accordingly. Stefan From faassen at infrae.com Tue Jul 25 19:48:18 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue, 25 Jul 2006 19:48:18 +0200 Subject: [lxml-dev] objectify feedback Message-ID: <44C65962.8030105@infrae.com> Hi there, I just read through the objectify.txt documentation/doctest. Quite interesting and impressive stuff! One thing that worries me is that it does introduce quite a new API with different behavior than ElementTree in some fundamental ways. How close is the behavior of the new API to Amara? I'd be nice if we weren't inventing too much that's new here somehow.. Anyway, this is "are we inventing too many new wheels?" worry - one of the ideas of lxml is not to invent too many, though on the other hand we shouldn't stop people from building cool stuff on top of the core, which is what this is. The other thing that worries me from a more practical perspective is that this is, as far as I can see, controlled globally. The beginning of the document says "Don't mix!" and this sounds like sensible advice, but as far as I understand if you use objectify in your application you cannot use ElementTree anymore in the same application, unless you do a lot of registration and reset work. It'd be *very* nice if this were not so - create an objectified tree separately not affecting the way normal trees are created. This way, you could have one module of your application using objectify but another module still sticking to normal ElementTree. What to do when someone tries to mix bits of one tree with another? Perhaps there's an efficient way to compare baseclass between the two lxml objects that are combined, and bail out with some reasonably clear exception in case of illegal combinations. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 25 20:24:09 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue, 25 Jul 2006 20:24:09 +0200 Subject: [lxml-dev] objectify feedback In-Reply-To: <44C65962.8030105@infrae.com> References: <44C65962.8030105@infrae.com> Message-ID: <44C661C9.4060401@gkec.informatik.tu-darmstadt.de> Hi Martijn, Martijn Faassen wrote: > I just read through the objectify.txt documentation/doctest. Quite > interesting and impressive stuff! Thanks. Wasn't even my idea (not alone, at least). > One thing that worries me is that it does introduce quite a new API with > different behavior than ElementTree in some fundamental ways. It is fundamentally different in some aspects (e.g. slicing works on siblings, not children), but I'm trying to keep it close enough to take as much advantage of the ET API as possible. What helps is that most parts of etree already work directly on the C tree, which does /not/ change its API, so there are few places that actually break. > How close > is the behavior of the new API to Amara? I'd be nice if we weren't > inventing too much that's new here somehow. > Anyway, this is "are we inventing too many new wheels?" worry - one of the > ideas of lxml is not to invent too many There are some things to invent, some things to keep. For example, Amara has all sorts of functionality that ET already provides, so I leave that out as much as possible (attribute access, for example). Things like the attribute access and the behaviour of slicing/indexing are directly borrowed from Amara. Another thing is XSD type support. When you add an xsi:type attribute to your elements, objectify will pick it up and look for a corresponding Python data type. So it's even somewhat standards compliant here. :) In a way it's a new API with many ideas borrowed from a few good places. > The other thing that worries me from a more practical perspective is > that this is, as far as I can see, controlled globally. > > The beginning of the document says "Don't mix!" and this sounds like > sensible advice, It is. You can get things totally messed up by mixing elements from different APIs in the same tree. This will break some parts of the API in a non-obvious way. One prominent example is _elementpath, which traverses the tree level by level. Now think of one element iterating over its children, the other one yielding its siblings. Great. Its only OK as long as you can control which API you use where, but that can be hard enough to control already. > but as far as I understand if you use objectify in your > application you cannot use ElementTree anymore in the same application, > unless you do a lot of registration and reset work. It'd be *very* nice > if this were not so - create an objectified tree separately not > affecting the way normal trees are created. This can be done and I already started providing the infrastructure. Look at the lxml.elements.classlookup module (elements.txt). It allows you to change the way nodes are mapped to element classes. I managed to let it support lookup chains by now so that you can define fallbacks if the selected strategy does not find a suitable class. One of the lookup schemes delegates to the parsers, so when you set that one globally, each parser can have its own lookup mechanism (with a fallback to the default lookup). I will soon integrate the objectify class lookup into this framework, which should answer your question. :) > This way, you could have one module of your application using objectify > but another module still sticking to normal ElementTree. I will try to make sure in the docs that that is the main intention and that mixing elements from different sources is A Bad Idea. > What to do when someone tries to mix bits of one tree with another? > Perhaps there's an efficient way to compare baseclass between the two > lxml objects that are combined, and bail out with some reasonably clear > exception in case of illegal combinations. Well, even worse, the problem will go away when element classes are garbage collected. Which can lead to nicely surprising effects like a function working in one run and failing in the next - without obvious changes and without an easily visible difference between the elements that were passed in (except for their type, that is). Even better, debugging then means that you have to figure out where the wrong element came from, or where the last reference to the element was stored that prevented garbage collection. Cool. Guess I'll have to make the respective warnings *very* clear in the docs ... Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 26 06:07:35 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed, 26 Jul 2006 06:07:35 +0200 Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4 runtime In-Reply-To: References: Message-ID: <44C6EA87.5080409@gkec.informatik.tu-darmstadt.de> Hi Olivier, Olivier Collioud wrote: > following these instructions: > http://codespeak.net/lxml/build.html#static-linking-on-windows > > Running these commands: > > C:\Download\lxml-source\lxml-1.0.2>set PATH="C:\Program > Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org > 2.0\program\python-core-2.3.4\bin";"C:\Program Files\Microsoft Visual > Studio\VC98\Bin" > > C:\Download\lxml-source\lxml-1.0.2>set PYTHONPATH="C:\Program > Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org > 2.0\program\python-core-2.3.4\lib" > > C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst > --static > Building lxml version 1.0.2 [...] > File "C:\Program Files\OpenOffice.org > 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 118, > in set_macro > self.macros["$(%s)" % macro] = d[key] > KeyError: 'sdkinstallrootv1.1' > > My guess is that it is related to my MS-VS installation. Looks like it. Though I'm not sure which 'sdk' is referred to here. Might also be the OOo SDK. > I have no experience with MS compilation. (I don't even know why it is > installed on my PC :-). Guess you accidentally installed MS-Windows, that tends to install a lot of Microsoft stuff. ;) > I would be grateful if anybody can help me to build this or tell me > where I can download an lxml 1.0.2 or more recent (or not too old) build > for win32 and Python 2.3.4. We usually don't have Windows binaries for Python 2.3. Martijn keeps building Linux eggs for it, mainly to support Web-Servers, but most people are on 2.4 by now. What about installing a normal Python 2.3 (from python.org), build lxml against that and then just /install/ it to the OOo Python directory? Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 26 08:50:23 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed, 26 Jul 2006 08:50:23 +0200 Subject: [lxml-dev] objectify feedback In-Reply-To: <44C661C9.4060401@gkec.informatik.tu-darmstadt.de> References: <44C65962.8030105@infrae.com> <44C661C9.4060401@gkec.informatik.tu-darmstadt.de> Message-ID: <44C710AF.8070204@gkec.informatik.tu-darmstadt.de> Stefan Behnel wrote: > There are some things to invent, some things to keep. For example, Amara has > all sorts of functionality that ET already provides, so I leave that out as > much as possible (attribute access, for example). > > Things like the attribute access and the behaviour of slicing/indexing are > directly borrowed from Amara. "attribute access" meaning "XML attributes" in the first paragraph (i.e. set/get/attrib), "object attributes" in the second (i.e. "parent.child"). Stefan From Olivier.Collioud at wipo.int Wed Jul 26 10:17:31 2006 From: Olivier.Collioud at wipo.int (Olivier Collioud) Date: Wed, 26 Jul 2006 10:17:31 +0200 Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4 runtime Message-ID: Voici le r?sultat de la compilation apr?s installation de python 2.3.5 C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static Building lxml version 1.0.2 *NOTE*: Trying to build without Pyrex, needs pre-generated 'src/lxml/etree.c' ! running bdist_wininst running build running build_py running build_ext building 'lxml.etree' extension C:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IC:\soft\python23\include -IC:\soft\python23\PC /Tcsrc/l xml/etree.c /Fobuild\temp.win32-2.3\Release\src/lxml/etree.obj -w -I..\libxml2-2.6.26.win32\include -I..\libxslt-1.1.17.win32\include -I..\zlib-1.2.3. win32\include -I..\iconv-1.9.2.win32\include Command line warning D4025 : overriding '/W3' with '/w' etree.c C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\soft\python23\libs /LIBPATH:C:\soft\python23\PCBui ld /EXPORT:initetree build\temp.win32-2.3\Release\src/lxml/etree.obj /OUT:build\lib.win32-2.3\lxml\etree.pyd /IMPLIB:build\temp.win32-2.3\Release\src/ lxml\etree.lib ..\libxml2-2.6.26.win32\lib\libxml2_a.lib ..\libxslt-1.1.17.win32\lib\libxslt_a.lib ..\libxslt-1.1.17.win32\lib\libexslt_a.lib ..\zlib- 1.2.3.win32\lib\zlib.lib ..\iconv-1.9.2.win32\lib\iconv_a.lib Creating library build\temp.win32-2.3\Release\src/lxml\etree.lib and object build\temp.win32-2.3\Release\src/lxml\etree.exp LINK : warning LNK4049: locally defined symbol "_xmlFree" imported LINK : warning LNK4049: locally defined symbol "_xsltDocDefaultLoader" imported LINK : warning LNK4049: locally defined symbol "_xsltLibxsltVersion" imported libxslt_a.lib(numbers.obj) : error LNK2001: unresolved external symbol __ftol2 libexslt_a.lib(date.obj) : error LNK2001: unresolved external symbol __ftol2 libexslt_a.lib(strings.obj) : error LNK2001: unresolved external symbol __ftol2 libexslt_a.lib(math.obj) : error LNK2001: unresolved external symbol __ftol2 libxml2_a.lib(xpath.obj) : error LNK2001: unresolved external symbol __ftol2 libxml2_a.lib(xpointer.obj) : error LNK2001: unresolved external symbol __ftol2 libxml2_a.lib(xmlschemastypes.obj) : error LNK2001: unresolved external symbol __ftol2 libxslt_a.lib(xsltutils.obj) : error LNK2001: unresolved external symbol __ftol2 build\lib.win32-2.3\lxml\etree.pyd : fatal error LNK1120: 1 unresolved externals error: command '"C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe"' failed with exit status 1120 >>> Stefan Behnel 26/07/06 9:02 AM >>> Hi Olivier, Olivier Collioud wrote: > I have no experience with Python C extension compilation too. > > Do you mean that I do not need any C compiler to build this kind of > beast. No, en fait, je pensais plut?t que ?a vaut le coup d'essayer avec un 'vrais' Python. Si le myst?rieux "sdkinstallrootv1.1" vient de OOo, ?a se peut que tu arriveras ? utiliser un lxml compil? pour un python.org Python dans le OOo Python. Donc: Installer Python 2.3.4 'normal', compiler et installer lxml avec ?a, et apr?s copier le r?pertoire install? (site-packages/lxml) dans le Python de OOo. Stefan >>>> Stefan Behnel > Olivier Collioud wrote: >> following these instructions: >> http://codespeak.net/lxml/build.html#static-linking-on-windows >> >> Running these commands: >> >> C:\Download\lxml-source\lxml-1.0.2>set PATH="C:\Program >> Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org >> 2.0\program\python-core-2.3.4\bin";"C:\Program Files\Microsoft > Visual >> Studio\VC98\Bin" >> >> C:\Download\lxml-source\lxml-1.0.2>set PYTHONPATH="C:\Program >> Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org >> 2.0\program\python-core-2.3.4\lib" >> >> C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst >> --static >> Building lxml version 1.0.2 > [...] >> File "C:\Program Files\OpenOffice.org >> 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line > 118, >> in set_macro >> self.macros["$(%s)" % macro] = d[key] >> KeyError: 'sdkinstallrootv1.1' >> >> My guess is that it is related to my MS-VS installation. > > Looks like it. Though I'm not sure which 'sdk' is referred to here. > Might also > be the OOo SDK. > > >> I have no experience with MS compilation. (I don't even know why it > is >> installed on my PC :-). > > Guess you accidentally installed MS-Windows, that tends to install a > lot of > Microsoft stuff. ;) > > >> I would be grateful if anybody can help me to build this or tell me >> where I can download an lxml 1.0.2 or more recent (or not too old) > build >> for win32 and Python 2.3.4. > > We usually don't have Windows binaries for Python 2.3. Martijn keeps > building > Linux eggs for it, mainly to support Web-Servers, but most people are > on 2.4 > by now. > > What about installing a normal Python 2.3 (from python.org), build > lxml > against that and then just /install/ it to the OOo Python directory? > > Stefan > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > > > ------ > World Intellectual Property Organization Disclaimer: > > This electronic message may contain privileged, confidential and > copyright protected information. If you have received this e-mail > by mistake, please immediately notify the sender and delete this > e-mail and all its attachments. Please ensure all e-mail attachments > are scanned for viruses prior to opening or using. > > ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 26 14:19:38 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed, 26 Jul 2006 14:19:38 +0200 Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4 runtime In-Reply-To: References: Message-ID: <44C75DDA.1040001@gkec.informatik.tu-darmstadt.de> Hi Olivier, Olivier Collioud wrote: > here is what I get after installing Python 2.3.5 > > C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static > Building lxml version 1.0.2 > *NOTE*: Trying to build without Pyrex, needs pre-generated 'src/lxml/etree.c' ! > running bdist_wininst > running build > running build_py > running build_ext > building 'lxml.etree' extension > C:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IC:\soft\python23\include -IC:\soft\python23\PC /Tcsrc/l > xml/etree.c /Fobuild\temp.win32-2.3\Release\src/lxml/etree.obj -w -I..\libxml2-2.6.26.win32\include -I..\libxslt-1.1.17.win32\include -I..\zlib-1.2.3. > win32\include -I..\iconv-1.9.2.win32\include > Command line warning D4025 : overriding '/W3' with '/w' > etree.c > C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\soft\python23\libs /LIBPATH:C:\soft\python23\PCBui > ld /EXPORT:initetree build\temp.win32-2.3\Release\src/lxml/etree.obj /OUT:build\lib.win32-2.3\lxml\etree.pyd /IMPLIB:build\temp.win32-2.3\Release\src/ > lxml\etree.lib ..\libxml2-2.6.26.win32\lib\libxml2_a.lib ..\libxslt-1.1.17.win32\lib\libxslt_a.lib ..\libxslt-1.1.17.win32\lib\libexslt_a.lib ..\zlib- > 1.2.3.win32\lib\zlib.lib ..\iconv-1.9.2.win32\lib\iconv_a.lib > Creating library build\temp.win32-2.3\Release\src/lxml\etree.lib and object build\temp.win32-2.3\Release\src/lxml\etree.exp > LINK : warning LNK4049: locally defined symbol "_xmlFree" imported > LINK : warning LNK4049: locally defined symbol "_xsltDocDefaultLoader" imported > LINK : warning LNK4049: locally defined symbol "_xsltLibxsltVersion" imported > libxslt_a.lib(numbers.obj) : error LNK2001: unresolved external symbol __ftol2 > libexslt_a.lib(date.obj) : error LNK2001: unresolved external symbol __ftol2 > libexslt_a.lib(strings.obj) : error LNK2001: unresolved external symbol __ftol2 > libexslt_a.lib(math.obj) : error LNK2001: unresolved external symbol __ftol2 > libxml2_a.lib(xpath.obj) : error LNK2001: unresolved external symbol __ftol2 > libxml2_a.lib(xpointer.obj) : error LNK2001: unresolved external symbol __ftol2 > libxml2_a.lib(xmlschemastypes.obj) : error LNK2001: unresolved external symbol __ftol2 > libxslt_a.lib(xsltutils.obj) : error LNK2001: unresolved external symbol __ftol2 > build\lib.win32-2.3\lxml\etree.pyd : fatal error LNK1120: 1 unresolved externals > error: command '"C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe"' failed with exit status 1120 Hmm, ok, lxml commonly links against libm, so my guess is that you need to supply that one also. Another possibility could be that libxml2/xslt/exslt were compiled with a newer compiler than the one you use. If you want to try, you can compile libxml2 and libxslt also from source. Since you're building lxml statically anyway, that should be ok for your purpose. Stefan PS: C'est soit l'Anglais pour la liste, soit le Fran?ais pour moi. Pas trop bien de malin les deux... From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 27 07:11:27 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu, 27 Jul 2006 07:11:27 +0200 Subject: [lxml-dev] Custom element class lookup mechanisms In-Reply-To: References: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de> Message-ID: <44C84AFF.6090702@gkec.informatik.tu-darmstadt.de> Andrew Lutomirski wrote: > On 7/24/06, Stefan Behnel wrote: > a little external module with different ways of determining the Python > element class for a libxml2 node. The "lxml.elements.classlookup" > m,odule currently implements three different ways of doing this: [...] > Other ways are of cause possible, so if anyone has an idea what to > add, I'm open for suggestions. > > How about a way to make this setting per-parser instead of global? Here is how to do it: http://codespeak.net/svn/lxml/branch/capi/doc/elements.txt Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 27 09:37:08 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu, 27 Jul 2006 09:37:08 +0200 Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4 runtime In-Reply-To: <44C75DDA.1040001@gkec.informatik.tu-darmstadt.de> References: <44C75DDA.1040001@gkec.informatik.tu-darmstadt.de> Message-ID: <44C86D24.6010608@gkec.informatik.tu-darmstadt.de> Stefan Behnel wrote: > Olivier Collioud wrote: >> C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static >> Building lxml version 1.0.2 >> C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\soft\python23\libs /LIBPATH:C:\soft\python23\PCBui >> ld /EXPORT:initetree build\temp.win32-2.3\Release\src/lxml/etree.obj /OUT:build\lib.win32-2.3\lxml\etree.pyd /IMPLIB:build\temp.win32-2.3\Release\src/ >> lxml\etree.lib ..\libxml2-2.6.26.win32\lib\libxml2_a.lib ..\libxslt-1.1.17.win32\lib\libxslt_a.lib ..\libxslt-1.1.17.win32\lib\libexslt_a.lib ..\zlib- >> 1.2.3.win32\lib\zlib.lib ..\iconv-1.9.2.win32\lib\iconv_a.lib >> Creating library build\temp.win32-2.3\Release\src/lxml\etree.lib and object build\temp.win32-2.3\Release\src/lxml\etree.exp >> LINK : warning LNK4049: locally defined symbol "_xmlFree" imported >> LINK : warning LNK4049: locally defined symbol "_xsltDocDefaultLoader" imported >> LINK : warning LNK4049: locally defined symbol "_xsltLibxsltVersion" imported >> libxslt_a.lib(numbers.obj) : error LNK2001: unresolved external symbol __ftol2 >> libexslt_a.lib(date.obj) : error LNK2001: unresolved external symbol __ftol2 >> libexslt_a.lib(strings.obj) : error LNK2001: unresolved external symbol __ftol2 >> libexslt_a.lib(math.obj) : error LNK2001: unresolved external symbol __ftol2 >> libxml2_a.lib(xpath.obj) : error LNK2001: unresolved external symbol __ftol2 >> libxml2_a.lib(xpointer.obj) : error LNK2001: unresolved external symbol __ftol2 >> libxml2_a.lib(xmlschemastypes.obj) : error LNK2001: unresolved external symbol __ftol2 >> libxslt_a.lib(xsltutils.obj) : error LNK2001: unresolved external symbol __ftol2 >> build\lib.win32-2.3\lxml\etree.pyd : fatal error LNK1120: 1 unresolved externals >> error: command '"C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe"' failed with exit status 1120 > could be that libxml2/xslt/exslt were compiled with a > newer compiler than the one you use. Yup, that's it: http://www.mail-archive.com/openssl-users at openssl.org/msg31551.html http://www.issociate.de/board/post/9216/Compiling_with_VC6.html So, try this: > If you want to try, you can compile > libxml2 and libxslt also from source. Since you're building lxml statically > anyway, that should be ok for your purpose. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 27 09:41:33 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu, 27 Jul 2006 09:41:33 +0200 Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4 runtime In-Reply-To: References: Message-ID: <44C86E2D.6040406@gkec.informatik.tu-darmstadt.de> Olivier Collioud wrote: > following these instructions: > http://codespeak.net/lxml/build.html#static-linking-on-windows > > Running these commands: [...] > C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static > Building lxml version 1.0.2 [...] > File "C:\Program Files\OpenOffice.org > 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 118, > in set_macro > self.macros["$(%s)" % macro] = d[key] > KeyError: 'sdkinstallrootv1.1' > > My guess is that it is related to my MS-VS installation. It's the OOo SDK. You might have to install it: http://www.openoffice.org/dev_docs/source/sdk/ Stefan From Olivier.Collioud at wipo.int Thu Jul 27 14:44:04 2006 From: Olivier.Collioud at wipo.int (Olivier Collioud) Date: Thu, 27 Jul 2006 14:44:04 +0200 Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4 runtime Message-ID: Stefan, Thanks a lot for your help. Maybe I will try later to compile lxml for OO1.0.3/Py2.3.5, but I suspect that OO need to be compiled as well withe the same compiler. I would then prefer to make the effort to compile OO1.0.3 with Py2.4.3 (and then I would prefer by far someone to do it for me with a recent MS compiler ;-p). Now it has been decided to port my app to Java because all of our business apps run on this platform. Anyway, they have been impressed by the result, the performance and how fast I wrote this app which will be a key component of our business sw env. I did not succeed in convincing my collegues to use Python but we aggreed on a compromise by using Jython and Dom4j. It has been easy to switch so far because Lxml and Dom4j are quite similar API. Anyway, I hope I will have other opportunities to build some app based on Python/lxml which is my favourite XML dev toolkit (using also EclipseWTP and PyDev). And I would like to congratulate and thank you for the work done so far and your help. Kind regards, Olivier. >>> Stefan Behnel 27/07/06 9:41 AM >>> Olivier Collioud wrote: > following these instructions: > http://codespeak.net/lxml/build.html#static-linking-on-windows > > Running these commands: [...] > C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static > Building lxml version 1.0.2 [...] > File "C:\Program Files\OpenOffice.org > 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 118, > in set_macro > self.macros["$(%s)" % macro] = d[key] > KeyError: 'sdkinstallrootv1.1' > > My guess is that it is related to my MS-VS installation. It's the OOo SDK. You might have to install it: http://www.openoffice.org/dev_docs/source/sdk/ Stefan _______________________________________________ lxml-dev mailing list lxml-dev at codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. From faassen at infrae.com Fri Jul 28 19:30:06 2006 From: faassen at infrae.com (Martijn Faassen) Date: Fri, 28 Jul 2006 19:30:06 +0200 Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4 runtime In-Reply-To: References: Message-ID: <44CA499E.20100@infrae.com> Hey Olivier, Olivier Collioud wrote: > Thanks a lot for your help. [kind note and a peek into Olivier's development environment] > And I would like to congratulate and thank you for the work done so far > and your help. I'll presume to speak for Stefan to thank you for the nice thank you! I was quite interested to hear about your development environment - both your personal preferred platform and the Java environment at work. It's always nice to get a peek into the way people and organisations approach software development, and what affects the choice for Python and lxml. Regards, Martijn From faassen at infrae.com Mon Jul 31 11:41:53 2006 From: faassen at infrae.com (Martijn Faassen) Date: Mon, 31 Jul 2006 11:41:53 +0200 Subject: [lxml-dev] lxml eggs and unicode strings Message-ID: <44CDD061.1020106@infrae.com> Hi there, I just found out that there is a hidden incompatibility in the compiled versions of lxml eggs we provide, at least in linux. Our provided versions are compiled with a Python that has 4 bytes unicode support (probably the default on ubuntu on which I built the 2.4 extension). If you try to install such an egg on a machine where unicode support is compiled with 2 bytes only, it'll fail with errors such as: ImportError: /usr/local/lib/python2.4/site-packages/lxml-1.0.2-py2.4-linux-i686.egg/lxml/etree.so: undefined symbol: PyUnicodeUCS4_FromEncodedObject I wonder whether there's anything within the egg distribution mechanism that lets us distinguish between such platforms. If not, I wonder what to do instead -- the simplest would be to add a FAQ entry and tell people to recompile from the sources. By the way, does Pyrex generate different C code depending on whether 4 or 2 byte unicode is used? If so, then that would mean an installation of pyrex as well for these people... Regards, Martijn From gracinet at nuxeo.com Mon Jul 31 11:45:18 2006 From: gracinet at nuxeo.com (Georges Racinet) Date: Mon, 31 Jul 2006 11:45:18 +0200 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: <44CDD061.1020106@infrae.com> References: <44CDD061.1020106@infrae.com> Message-ID: <436357DF-BAE8-4E96-BF77-8A089BABCD54@nuxeo.com> On Jul 31, 2006, at 11:41 AM, Martijn Faassen wrote: > Hi there, > > I just found out that there is a hidden incompatibility in the > compiled > versions of lxml eggs we provide, at least in linux. Our provided > versions are compiled with a Python that has 4 bytes unicode support > (probably the default on ubuntu on which I built the 2.4 extension). Noticed that last week, too. Sorry I forgot to mention it over there. > > If you try to install such an egg on a machine where unicode > support is > compiled with 2 bytes only, it'll fail with errors such as: > > ImportError: > /usr/local/lib/python2.4/site-packages/lxml-1.0.2-py2.4-linux- > i686.egg/lxml/etree.so: > undefined symbol: PyUnicodeUCS4_FromEncodedObject > > I wonder whether there's anything within the egg distribution > mechanism > that lets us distinguish between such platforms. If not, I wonder what > to do instead -- the simplest would be to add a FAQ entry and tell > people to recompile from the sources. As far as I know, this is typical of the Ubuntu distribution, and I'm 100% sure this egg was laid from Ubuntu. If the egg system could make a difference between distributions, it would be ok, imho. Charset problems are a plague. > > By the way, does Pyrex generate different C code depending on > whether 4 > or 2 byte unicode is used? If so, then that would mean an installation > of pyrex as well for these people... I tried to compile from source on Mandriva, and it failed. I had no time to investigate (low priority for the task I was working on), it could very well have been something very trivial. Yours, --------- Georges Racinet Nuxeo SAS gracinet at nuxeo.com http://nuxeo.com Tel: +33 (0) 1 40 33 71 73 From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 31 12:05:51 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 31 Jul 2006 12:05:51 +0200 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: <44CDD061.1020106@infrae.com> References: <44CDD061.1020106@infrae.com> Message-ID: <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> Hi Martijn, Martijn Faassen wrote: > I just found out that there is a hidden incompatibility in the compiled > versions of lxml eggs we provide, at least in linux. Our provided > versions are compiled with a Python that has 4 bytes unicode support > (probably the default on ubuntu on which I built the 2.4 extension). AFAIK, UCS4 is the default on most (though maybe not all) Python desktop/server installations under Linux, including SuSE, Redhat and (apparently) Debian/Ubuntu. Distributors tend to care more about broad support for all possible use cases than about memory requirements. > If you try to install such an egg on a machine where unicode support is > compiled with 2 bytes only, it'll fail with errors such as: > > ImportError: > /usr/local/lib/python2.4/site-packages/lxml-1.0.2-py2.4-linux-i686.egg/lxml/etree.so: > undefined symbol: PyUnicodeUCS4_FromEncodedObject Sure. These cannot be compatible in current CPython (and that's highly unlikely to change). > I wonder whether there's anything within the egg distribution mechanism > that lets us distinguish between such platforms. If not, I wonder what > to do instead -- the simplest would be to add a FAQ entry and tell > people to recompile from the sources. I wouldn't know any way egg naming could help here. Google yields some discussions about this topic on the distutils list, but it seems they have not made their way into either distutils or setuptools. http://mail.python.org/pipermail/distutils-sig/2005-October/005222.html Anyway, if you have to recompile your Python version to get UCS2 strings, there's no reason not to require the same for the C extensions. Given the fact that all major distributions seem to use UCS4, a FAQ entry should be enough. > By the way, does Pyrex generate different C code depending on whether 4 > or 2 byte unicode is used? If so, then that would mean an installation > of pyrex as well for these people... No, the distinction between different unicode encodings is handled completely inside the Python interpreter. The C code is not affected and Pyrex does not rely on it. To support parsing from unicode, lxml even has generic run-time support code to detect the internal unicode encoding, which should work for any encoding supported by libxml2/libiconv. Stefan From faassen at infrae.com Mon Jul 31 12:59:54 2006 From: faassen at infrae.com (Martijn Faassen) Date: Mon, 31 Jul 2006 12:59:54 +0200 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: <436357DF-BAE8-4E96-BF77-8A089BABCD54@nuxeo.com> References: <44CDD061.1020106@infrae.com> <436357DF-BAE8-4E96-BF77-8A089BABCD54@nuxeo.com> Message-ID: <44CDE2AA.6040701@infrae.com> Georges Racinet wrote: > > On Jul 31, 2006, at 11:41 AM, Martijn Faassen wrote: > >> Hi there, >> >> I just found out that there is a hidden incompatibility in the compiled >> versions of lxml eggs we provide, at least in linux. Our provided >> versions are compiled with a Python that has 4 bytes unicode support >> (probably the default on ubuntu on which I built the 2.4 extension). > > Noticed that last week, too. Sorry I forgot to mention it over there. What platform were you on when you noticed this? Mandriva (as you mention below)? [snip] > As far as I know, this is typical of the Ubuntu distribution, and I'm > 100% sure this egg was laid from Ubuntu. If the egg system could make a > difference between distributions, it would be ok, imho. I think Red Hat has been compiling Python with 4 bytes characters for ages too, so while this was Ubuntu (I did it), I'm also pretty sure it's also the case on Fedora. > Charset problems are a plague. This is not your common charset problems. Mostly one can avoid the plague by just using unicode, but that's what we're doing here.. >> By the way, does Pyrex generate different C code depending on whether 4 >> or 2 byte unicode is used? If so, then that would mean an installation >> of pyrex as well for these people... > > I tried to compile from source on Mandriva, and it failed. I had no time > to investigate (low priority for the task I was working on), it could > very well have been something very trivial. Interesting; let us know if you find out more. It's important to have the lxml C sources compile on all platforms, as otherwise people will be forced to use Pyrex, possibly even the forked version of Pyrex Stephan is maintaining. Regards, Martijn From faassen at infrae.com Mon Jul 31 13:03:03 2006 From: faassen at infrae.com (Martijn Faassen) Date: Mon, 31 Jul 2006 13:03:03 +0200 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> Message-ID: <44CDE367.50109@infrae.com> Stefan Behnel wrote: [snip] > Anyway, if you have to recompile your Python version to get UCS2 strings, > there's no reason not to require the same for the C extensions. Ah, so current CPython sources builds with 4 byte unicode by default? If this is for sure, then we're fairly safe. If not, then I wonder what to do - you'd like lxml to work with hand-compiled Pythons.. > Given the fact that all major distributions seem to use UCS4, a FAQ entry > should be enough. It definitely is encouraging. >> By the way, does Pyrex generate different C code depending on whether 4 >> or 2 byte unicode is used? If so, then that would mean an installation >> of pyrex as well for these people... > > No, the distinction between different unicode encodings is handled completely > inside the Python interpreter. The C code is not affected and Pyrex does not > rely on it. Good, that's what I was hoping for. That at least means people should be able to recompile without installing Pyrex first. > To support parsing from unicode, lxml even has generic run-time support code > to detect the internal unicode encoding, which should work for any encoding > supported by libxml2/libiconv. Cool! Regards, Martijn From tseaver at palladion.com Mon Jul 31 16:37:07 2006 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 31 Jul 2006 10:37:07 -0400 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: <44CDE367.50109@infrae.com> References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com> Message-ID: <44CE1593.70902@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martijn Faassen wrote: > Ah, so current CPython sources builds with 4 byte unicode by default? If > this is for sure, then we're fairly safe. If not, then I wonder what to > do - you'd like lxml to work with hand-compiled Pythons.. Nope. The distros all pass the '--enable-unicode=ucs4' to configure. The default value for that option is 'yes', which maps to 'ucs2' unless you also have a usc4-enabled TCL. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEzhWT+gerLs4ltQ4RAprdAKDatD1WDYW+wKzJlLZYra0OGXcxLACeJSDs CGNZmUnpCDiYbPuF9lwNO00= =8iYC -----END PGP SIGNATURE----- From tseaver at palladion.com Mon Jul 31 16:44:24 2006 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 31 Jul 2006 10:44:24 -0400 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: <44CE1593.70902@palladion.com> References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com> <44CE1593.70902@palladion.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tres Seaver wrote: > Martijn Faassen wrote: > >>> Ah, so current CPython sources builds with 4 byte unicode by default? If >>> this is for sure, then we're fairly safe. If not, then I wonder what to >>> do - you'd like lxml to work with hand-compiled Pythons.. > > Nope. The distros all pass the '--enable-unicode=ucs4' to configure. > The default value for that option is 'yes', which maps to 'ucs2' unless > you also have a usc4-enabled TCL. > > > Tres. > -- > =================================================================== > Tres Seaver +1 202-558-7113 tseaver at palladion.com > Palladion Software "Excellence by Design" http://palladion.com Perhaps we could use the following test inside 'setup.py', and modify the name of the binary egg to include the 'ucs2' vs. 'ucs4' flag?:: ucs_flag = sys.maxunicode > 65536 and 'ucs4' or 'ucs2' Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEzhdI+gerLs4ltQ4RAtQHAKDaSQm9mvJDj+oGUQJZOgHjdENnagCgh0gZ qQ9dwzju5C7s9KIlJVOJsVs= =qiPy -----END PGP SIGNATURE----- From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 31 18:15:57 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 31 Jul 2006 18:15:57 +0200 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com> <44CE1593.70902@palladion.com> Message-ID: <44CE2CBD.103@gkec.informatik.tu-darmstadt.de> Tres Seaver wrote: > Tres Seaver wrote: >>> Martijn Faassen wrote: >>> >>>>> Ah, so current CPython sources builds with 4 byte unicode by default? If >>>>> this is for sure, then we're fairly safe. If not, then I wonder what to >>>>> do - you'd like lxml to work with hand-compiled Pythons.. >>> >>> Nope. The distros all pass the '--enable-unicode=ucs4' to configure. >>> The default value for that option is 'yes', which maps to 'ucs2' unless >>> you also have a usc4-enabled TCL. Right, that's what I witness, too. > Perhaps we could use the following test inside 'setup.py', and modify > the name of the binary egg to include the 'ucs2' vs. 'ucs4' flag?:: > > ucs_flag = sys.maxunicode > 65536 and 'ucs4' or 'ucs2' While that's nice to have, it doesn't really help us as a) we'd still have to build and ship both eggs (while the current UCS4 eggs seem to fit most users) and b) easy_install doesn't currently handle these extensions, so it would most likely just stop finding the eggs on cheeseshop if we added additional sections to the egg name. I still think it's enough to add a FAQ entry (which I already did) and otherwise ignore the problem for now. That way, the major distros are supported out-of-the-box. And for those who happen to use a UCS2 system, it's really not a big deal to build lxml from sources on a fairly recent and well installed Linux system. Stefan From faassen at infrae.com Mon Jul 31 20:09:20 2006 From: faassen at infrae.com (Martijn Faassen) Date: Mon, 31 Jul 2006 20:09:20 +0200 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: <44CE2CBD.103@gkec.informatik.tu-darmstadt.de> References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com> <44CE1593.70902@palladion.com> <44CE2CBD.103@gkec.informatik.tu-darmstadt.de> Message-ID: <44CE4750.9010607@infrae.com> Stefan Behnel wrote: > Tres Seaver wrote: [snip] >> Perhaps we could use the following test inside 'setup.py', and modify >> the name of the binary egg to include the 'ucs2' vs. 'ucs4' flag?:: >> >> ucs_flag = sys.maxunicode > 65536 and 'ucs4' or 'ucs2' > > While that's nice to have, it doesn't really help us as a) we'd still have to > build and ship both eggs (while the current UCS4 eggs seem to fit most users) There'd be a significant amount of people who just build Python by hand though, and they can't use our eggs... [snip] > I still think it's enough to add a FAQ entry (which I already did) and > otherwise ignore the problem for now. That way, the major distros are > supported out-of-the-box. And for those who happen to use a UCS2 system, it's > really not a big deal to build lxml from sources on a fairly recent and well > installed Linux system. I agree that's all we can do on the lxml side. Apart from that, we can also talk to the distutils/setuptools people and raise this issue again. It's a fundamental problem with binary eggs that use unicode as long as Python ships with this configuration option. I'll send off a mail on this to the distutils SIG. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 31 20:10:02 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 31 Jul 2006 20:10:02 +0200 Subject: [lxml-dev] lxml - exslt - regexp:match() In-Reply-To: <149473834@web.de> References: <149473834@web.de> Message-ID: <44CE477A.1030000@gkec.informatik.tu-darmstadt.de> Michael Zeidler wrote: > regexp:match('123abc567','([0-9]+)([a-z]+)([0-9]+)') gibt kein Arry mit den gematchten > Gruppen zur?ck: > Wenn ich also mit > > die variable $test setzte, m?sste ich mit $test[0], $test[1], usw. auf die gematchten gruppen > zugreifen k?nnen. Siehe http://www.exslt.org/regexp/functions/match/index.html > > [translation]: > > regexp:match('123abc567','([0-9]+)([a-z]+)([0-9]+)') > > does not return an array containing the matched groups. Something like this > > should allow me to ask for "$test[0]" etc. > > See http://www.exslt.org/regexp/functions/match/index.html Hmm, interesting. The page doesn't actually say that this is supposed to work. All they provide is an example with a /single/ group. The result of your test case is not defined. For comparison, I now implemented the examples from the page as unit tests, which sadly showed that Python's regexps are incompatible with what EXSLT requires. The Python RE "([a-z])+ " does not match "test " as in EXSLT, only the last "t" is returned for the group by re.findall(). So we can't claim compatibility with EXSLT at this point. -- Note, though, that I never really said it was compatible, it just builds on Python's re module. I still think that's enough for a Python XML library. That said, I fixed your use case in the current trunk, as I think it makes sense to expect the result above from such a call. Note, however, that EXSLT dictates that the first element in a non-global RE result (without 'g' flag) must be the entire string that matched, which even fits the semantics of the group() method in Python's MatchObjects. So your $test[0] will contain "123abc567", $test[1] is "123" etc. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 31 20:15:31 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 31 Jul 2006 20:15:31 +0200 Subject: [lxml-dev] lxml eggs and unicode strings In-Reply-To: <44CE4750.9010607@infrae.com> References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com> <44CE1593.70902@palladion.com> <44CE2CBD.103@gkec.informatik.tu-darmstadt.de> <44CE4750.9010607@infrae.com> Message-ID: <44CE48C3.9030305@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Apart from that, we can also talk to the distutils/setuptools people and > raise this issue again. It's a fundamental problem with binary eggs that > use unicode as long as Python ships with this configuration option. I'll > send off a mail on this to the distutils SIG. Good idea. Thanks for taking care of it. It may no longer fit into the 2.5 time frame, but it's still a problem that needs to be solved some time... Stefan From agustin.villena at gmail.com Mon Jul 31 18:53:24 2006 From: agustin.villena at gmail.com (=?ISO-8859-1?Q?Agust=EDn_Villena?=) Date: Mon, 31 Jul 2006 12:53:24 -0400 Subject: [lxml-dev] An intriguing behaviour of xpath in lxml Message-ID: Hi! Just a question. Assume the next code from lxml import etree from StringIO import StringIO xmlText = "This is a test" doc = etree.parse(StringIO(xmlText)) root = doc.xpath("/") The last line throws the next exception Not yet implemented result node type: 9 Traceback (most recent call last): File "C:\Archivos de programa\ActiveState Komodo 3.5\lib\support\dbgp\pythonlib\dbgp\client.py", line 1843, in runMain self.dbg.runfile(debug_args[0], debug_args) File "C:\Archivos de programa\ActiveState Komodo 3.5\lib\support\dbgp\pythonlib\dbgp\client.py", line 1538, in runfile h_execfile(file, args, module=main, tracer=self) File "C:\Archivos de programa\ActiveState Komodo 3.5\lib\support\dbgp\pythonlib\dbgp\client.py", line 596, in __init__ execfile(file, globals, locals) File "C:\dev\projects\python\python-xpath-example.py", line 7, in __main__ root = doc.xpath("/") File "c:\python24\lib\site-packages\lxml\etree.pyx", line 485, in etree._ElementTree.xpath File "c:\python24\lib\site-packages\lxml\xpath.pxi", line 75, in etree._XPathEvaluatorBase.evaluate File "c:\python24\lib\site-packages\lxml\xpath.pxi", line 212, in etree.XPathDocumentEvaluator.__call__ File "c:\python24\lib\site-packages\lxml\xpath.pxi", line 108, in etree._XPathEvaluatorBase._handle_result File "c:\python24\lib\site-packages\lxml\extensions.pxi", line 269, in etree._unwrapXPathObject File "c:\python24\lib\site-packages\lxml\extensions.pxi", line 317, in etree._createNodeSetResult NotImplementedError My question is: which is the reason behind this behaviour (if is there one)? (I already know that xpath(".") in the document node works, but is beyond my understanding why xpath("/") is not implemented. Cheers Agustin