From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Jul 1 14:53:03 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 01 Jul 2006 14:53:03 +0200
Subject: [lxml-dev] let lxml write the ?xml pi
In-Reply-To: <20060629102825.GA16494@tttech.com>
References: <20060619101402.GA11174@morpheus.apaku.dnsalias.org> <20060619103555.GA11768@morpheus.apaku.dnsalias.org> <44968105.1060307@infrae.com>
<20060629102825.GA16494@tttech.com>
Message-ID: <44A6702E.3050800@gkec.informatik.tu-darmstadt.de>
Hi Albert,
Albert Brandl wrote:
> I started using lxml some weeks ago, and have been lurking on the
> mailing list for some time now. Recently I had the problem that the xml
> prologue is not included by default, and stumbled over the following
> mail:
>
> On Mon, Jun 19, 2006 at 12:48:37PM +0200, Martijn Faassen wrote:
>> I.e., try the following:
>>
>> etree.tostring(t, 'utf-8', xml_declaration=True)
>
> Is there any reason that the method write_c14n() does not support this
> flag? The canonical form is a bit more readable, therefore I'd prefer
> to use this method.
As the documentation of the write_c14n() method states, it always writes UTF-8
encoded byte streams, so there is no real need for the prologue. I wouldn't
mind adding this, though. Things like 'standalone' and the XML version would
otherwise not be available in the output.
BTW, if it's about the readability, pretty printing might be closer to what
you want anyway.
Stefan
From albert.brandl at tttech.com Tue Jul 11 17:57:20 2006
From: albert.brandl at tttech.com (Albert Brandl)
Date: Tue, 11 Jul 2006 17:57:20 +0200
Subject: [lxml-dev] let lxml write the ?xml pi
In-Reply-To: <44A6702E.3050800@gkec.informatik.tu-darmstadt.de>
References: <20060619101402.GA11174@morpheus.apaku.dnsalias.org>
<20060619103555.GA11768@morpheus.apaku.dnsalias.org>
<44968105.1060307@infrae.com> <20060629102825.GA16494@tttech.com>
<44A6702E.3050800@gkec.informatik.tu-darmstadt.de>
Message-ID: <20060711155720.GA2018@tttech.com>
Hi Stefan,
On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote:
> As the documentation of the write_c14n() method states, it always writes UTF-8
> encoded byte streams, so there is no real need for the prologue. I wouldn't
> mind adding this, though. Things like 'standalone' and the XML version would
> otherwise not be available in the output.
I recently learned about section 4.1 of the C14N recommendation,
http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical
form does not contain a prologue. Therefore, write_c14n() is ok - sorry
for the request.
> BTW, if it's about the readability, pretty printing might be closer to what
> you want anyway.
Thanks for the hint. In lxml 1.0.1, the pretty printed version adds
information about the namespace to every tag. Unfortunately, this
decreases the readibility, since in my case, almost all tags have a
namespace. A "pretty_print" flag for write_c14n() would be a
perfect workaround, though :-)
Best regards,
Albert
From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 11 18:34:37 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Tue, 11 Jul 2006 18:34:37 +0200
Subject: [lxml-dev] let lxml write the ?xml pi
In-Reply-To: <20060711155720.GA2018@tttech.com>
References: <20060619101402.GA11174@morpheus.apaku.dnsalias.org> <20060619103555.GA11768@morpheus.apaku.dnsalias.org> <44968105.1060307@infrae.com>
<20060629102825.GA16494@tttech.com> <44A6702E.3050800@gkec.informatik.tu-darmstadt.de>
<20060711155720.GA2018@tttech.com>
Message-ID: <44B3D31D.9030009@gkec.informatik.tu-darmstadt.de>
Hi Albert,
Albert Brandl wrote:
> On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote:
>> As the documentation of the write_c14n() method states, it always writes UTF-8
>> encoded byte streams, so there is no real need for the prologue. I wouldn't
>> mind adding this, though. Things like 'standalone' and the XML version would
>> otherwise not be available in the output.
>
> I recently learned about section 4.1 of the C14N recommendation,
> http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical
> form does not contain a prologue. Therefore, write_c14n() is ok - sorry
> for the request.
Thought so. Thanks for checking.
>> BTW, if it's about the readability, pretty printing might be closer to what
>> you want anyway.
>
> Thanks for the hint. In lxml 1.0.1, the pretty printed version adds
> information about the namespace to every tag.
Not on my side. How do you build the tree?
> Unfortunately, this
> decreases the readibility, since in my case, almost all tags have a
> namespace. A "pretty_print" flag for write_c14n() would be a
> perfect workaround, though :-)
I don't think that's gonna happen. C14N is meant to be a well-defined XML
formatting style, and pretty printing is not part of the standard.
Stefan
From Geraldjohn.M.Manipon at jpl.nasa.gov Thu Jul 13 08:26:09 2006
From: Geraldjohn.M.Manipon at jpl.nasa.gov (Gerald John M. Manipon)
Date: Wed, 12 Jul 2006 23:26:09 -0700
Subject: [lxml-dev] tostring() escapes and adding cdata section
Message-ID: <44B5E781.6010305@jpl.nasa.gov>
Hi,
Quick question: How can I prevent the escaping (specifically '&' into
'&' that occurs when I use tostring()?
i.e.
>>> from lxml.etree import *
>>> r = Element('root')
>>> s = SubElement(r,'sub')
>>> s.text = 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
>>> s.text
'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
>>> tostring(s)
'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
I'm currently just doing a .replace('&','&') on the string I get
back.
Also, is there a way to specify that an element's text should be
enclosed as a CDATA?
Thanks for any help.
Gerald
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 13 08:35:28 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 13 Jul 2006 08:35:28 +0200
Subject: [lxml-dev] tostring() escapes and adding cdata section
In-Reply-To: <44B5E781.6010305@jpl.nasa.gov>
References: <44B5E781.6010305@jpl.nasa.gov>
Message-ID: <44B5E9B0.1020809@gkec.informatik.tu-darmstadt.de>
Hi Gerald,
Gerald John M. Manipon wrote:
> Quick question: How can I prevent the escaping (specifically '&' into
> '&' that occurs when I use tostring()?
You (obviously) can't. The output would not be (well-formed) XML.
> I'm currently just doing a .replace('&','&') on the string I get
> back.
You can use 'unescape' from the xml.sax.saxutils module.
http://docs.python.org/lib/module-xml.sax.saxutils.html
But why don't you use lxml itself? Unescaping is done automatically when you
parse the string.
> >>> from lxml.etree import *
> >>> r = Element('root')
> >>> s = SubElement(r,'sub')
> >>> s.text = 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
> >>> s.text
> 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
> >>> tostring(s)
> 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
So? What's the problem? That's perfect XML. Any XML parser will be able to
handle that.
> Also, is there a way to specify that an element's text should be
> enclosed as a CDATA?
No. What would be the use case?
Stefan
From Geraldjohn.M.Manipon at jpl.nasa.gov Thu Jul 13 10:09:32 2006
From: Geraldjohn.M.Manipon at jpl.nasa.gov (Gerald John M. Manipon)
Date: Thu, 13 Jul 2006 01:09:32 -0700
Subject: [lxml-dev] tostring() escapes and adding cdata section
In-Reply-To: <44B5E9B0.1020809@gkec.informatik.tu-darmstadt.de>
References: <44B5E781.6010305@jpl.nasa.gov>
<44B5E9B0.1020809@gkec.informatik.tu-darmstadt.de>
Message-ID: <44B5FFBC.5000607@jpl.nasa.gov>
Stefan Behnel wrote:
> Hi Gerald,
>
> Gerald John M. Manipon wrote:
>> Quick question: How can I prevent the escaping (specifically '&' into
>> '&' that occurs when I use tostring()?
>
> You (obviously) can't. The output would not be (well-formed) XML.
Okay.
>
>
>> I'm currently just doing a .replace('&','&') on the string I get
>> back.
>
> You can use 'unescape' from the xml.sax.saxutils module.
> http://docs.python.org/lib/module-xml.sax.saxutils.html
I'll look into that.
>
> But why don't you use lxml itself? Unescaping is done automatically when you
> parse the string.
>
>
>> >>> from lxml.etree import *
>> >>> r = Element('root')
>> >>> s = SubElement(r,'sub')
>> >>> s.text = 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
>> >>> s.text
>> 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
>> >>> tostring(s)
>> 'http://test/cgi-bin/test.cgi?a=123.2&b=asdfe&b="3asd"'
>
> So? What's the problem? That's perfect XML. Any XML parser will be able to
> handle that.
Yes, I understand. We're posting our xml that we get from tostring() to
one of our partner's web services (I don't know the exact backend but
it looks Java-based) and their services do not like the '&'. I
guess it's a problem on their end.
>
>
>> Also, is there a way to specify that an element's text should be
>> enclosed as a CDATA?
>
> No. What would be the use case?
Getting around the invalid xml with '&' in an elements text:
I'm guessing that since serialization replaces the '&' anyway, the above
would be impossible to produce via lxml.
Thanks for your response,
Gerald
>
> Stefan
>
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 13 10:45:20 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 13 Jul 2006 10:45:20 +0200
Subject: [lxml-dev] tostring() escapes and adding cdata section
In-Reply-To: <44B5FFBC.5000607@jpl.nasa.gov>
References: <44B5E781.6010305@jpl.nasa.gov>
<44B5E9B0.1020809@gkec.informatik.tu-darmstadt.de>
<44B5FFBC.5000607@jpl.nasa.gov>
Message-ID: <44B60820.60802@gkec.informatik.tu-darmstadt.de>
Hi Gerald,
Gerald John M. Manipon wrote:
> We're posting our xml that we get from tostring() to
> one of our partner's web services (I don't know the exact backend but
> it looks Java-based) and their services do not like the '&'. I
> guess it's a problem on their end.
Oh, definitely:
http://www.w3.org/TR/2004/REC-xml-20040204/#syntax
"""
The ampersand character (&) and the left angle bracket (<) MUST NOT appear in
their literal form, except when used as markup delimiters, or within a
comment, a processing instruction, or a CDATA section. If they are needed
elsewhere, they MUST be escaped using either numeric character references or
the strings "&" and "<" respectively.
"""
If it doesn't work for them, they should start using an XML parser (which is
the best choice for parsing XML anyway...)
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Jul 14 21:24:44 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 14 Jul 2006 21:24:44 +0200
Subject: [lxml-dev] a C-level API for lxml
Message-ID: <44B7EF7C.2010308@gkec.informatik.tu-darmstadt.de>
Hi all,
as part of a project on lxml, I'm building an external element API module
(objectify style) as a Pyrex extension. To make this independent of lxml
itself, I decided to add an external C-level API that allows external modules
to efficiently interface with the lxml module. Usage in other modules will be
as easy as including a header file or cimporting a .pxd file in Pyrex, and
then calling an init function from the external module. The match is done by
comparing char* strings for the function names at initialisation time, so this
is pretty future proof (no missing symbols when the C API changes etc.).
This requires some changes in Pyrex, so lxml 1.1 will depend on a patched
version (again), until (one day) my patches are accepted upstream. I also
published some Python 2.5 related fixes, BTW, to make lxml 1.1 run nicely on
Python 2.5. I can't currently test that since I can't get the 2.5 beta
versions to work on my machine (broken compiled-in PYTHONPATH). Anyway, at
least I got positive feedback that the exception stuff seems to be fixed. The
Py_ssize_t fixes are not verified on 2.5, but should also work.
A preliminary version of the patched Pyrex is here:
http://codespeak.net/svn/lxml/pyrex/
So, if someone could test lxml with it under 2.5 (preferably on a 64-bit
machine) ...
When the lxml C-API is in place, it will be easy to add new functions to it
(basically by adding the "public" keyword to a Pyrex C function). So I'd be
glad if everyone who thinks this API would be useful for them could propose
more functions to be made public. I know specifically that Andreas had a
problem with extending the XPath implementation, so maybe there are ways to
get this solved at the C level. This thread is the right place to discuss
these things.
Regards,
Stefan
From buro at petr.com Sun Jul 16 00:34:56 2006
From: buro at petr.com (Petr van Blokland)
Date: Sun, 16 Jul 2006 00:34:56 +0200
Subject: [lxml-dev] Python values in xpath functions
In-Reply-To: <916E60BB-6C4E-4104-B03D-D98B799E3BF0@petr.com>
References:
<44985C5D.60900@gkec.informatik.tu-darmstadt.de>
<916E60BB-6C4E-4104-B03D-D98B799E3BF0@petr.com>
Message-ID: <2C4E4E13-856B-49DA-B4AD-A856424EDE8C@petr.com>
Hi,
may be someone can get me out.
I am returning an etree from a Python function in XPath.
But it does not seem to work stepping through the result
as in ...
where ...works fine
for the current node. What do I do wrong. Should the function
answer something different from an etree, as in:
def myfunction(dummy, *args):
... # create etree from args
return etree
Kind regards,
Petr van Blokland
----------------------------------------------
Petr van Blokland
buro at petr.com | www.petr.com | +31 15 219 10 40
----------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060716/b43e93c7/attachment.htm
From behnel_ml at gkec.informatik.tu-darmstadt.de Sun Jul 16 09:47:38 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sun, 16 Jul 2006 09:47:38 +0200
Subject: [lxml-dev] Python values in xpath functions
In-Reply-To: <2C4E4E13-856B-49DA-B4AD-A856424EDE8C@petr.com>
References: <44985C5D.60900@gkec.informatik.tu-darmstadt.de> <916E60BB-6C4E-4104-B03D-D98B799E3BF0@petr.com>
<2C4E4E13-856B-49DA-B4AD-A856424EDE8C@petr.com>
Message-ID: <44B9EF1A.1060405@gkec.informatik.tu-darmstadt.de>
Hi Petr,
Petr van Blokland wrote:
> I am returning an etree from a Python function in XPath.
"etree" is the name of the module. I guess you mean an ElementTree object?
> But it does not seem to work stepping through the result
> as in ...
> where ...works fine
> for the current node. What do I do wrong.
Don't return an ElementTree (don't you get an exception for that anyway?).
Return an Element or a list of Elements.
Stefan
From buro at petr.com Sun Jul 16 09:57:02 2006
From: buro at petr.com (Petr van Blokland)
Date: Sun, 16 Jul 2006 09:57:02 +0200
Subject: [lxml-dev] Python values in xpath functions
In-Reply-To: <44B9EF1A.1060405@gkec.informatik.tu-darmstadt.de>
References: <44985C5D.60900@gkec.informatik.tu-darmstadt.de> <916E60BB-6C4E-4104-B03D-D98B799E3BF0@petr.com>
<2C4E4E13-856B-49DA-B4AD-A856424EDE8C@petr.com>
<44B9EF1A.1060405@gkec.informatik.tu-darmstadt.de>
Message-ID: <5D85BAE4-ACF6-4D4C-AFFC-920553A0DA94@petr.com>
On Jul 16, 2006, at 9:47 AM, Stefan Behnel wrote:
> Hi Petr,
>
> Petr van Blokland wrote:
>> I am returning an etree from a Python function in XPath.
>
> "etree" is the name of the module. I guess you mean an ElementTree
> object?
>
Yes.
>> But it does not seem to work stepping through the result
>> as in ...
>> where ...works fine
>> for the current node. What do I do wrong.
>
> Don't return an ElementTree (don't you get an exception for that
> anyway?).
I do.
> Return an Element or a list of Elements.
Ok, I'll try.
Thanks.
Petr
----------------------------------------------
Petr van Blokland
buro at petr.com | www.petr.com | +31 15 219 10 40
----------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060716/5e6b60d4/attachment.htm
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 17 10:57:46 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 17 Jul 2006 10:57:46 +0200
Subject: [lxml-dev] a C-level API for lxml
In-Reply-To: <44B7EF7C.2010308@gkec.informatik.tu-darmstadt.de>
References: <44B7EF7C.2010308@gkec.informatik.tu-darmstadt.de>
Message-ID: <44BB510A.9080505@gkec.informatik.tu-darmstadt.de>
Hi again,
Stefan Behnel wrote:
> I decided to add an external C-level API that allows external modules
> to efficiently interface with the lxml module. Usage in other modules will be
> as easy as including a header file or cimporting a .pxd file in Pyrex, and
> then calling an init function from the external module. The match is done by
> comparing char* strings for the function names at initialisation time, so this
> is pretty future proof (no missing symbols when the C API changes etc.).
>
> [...] I'd be
> glad if everyone who thinks this API would be useful for them could propose
> more functions to be made public. I know specifically that Andreas had a
> problem with extending the XPath implementation, so maybe there are ways to
> get this solved at the C level. This thread is the right place to discuss
> these things.
There is now some documentation on the C-API and its usage online:
http://codespeak.net/svn/lxml/branch/capi/doc/capi.txt
The current state of the API is described here:
http://codespeak.net/svn/lxml/branch/capi/src/lxml/etreepublic.pxd
Stefan
From luto at myrealbox.com Wed Jul 19 01:59:47 2006
From: luto at myrealbox.com (Andrew Lutomirski)
Date: Tue, 18 Jul 2006 16:59:47 -0700
Subject: [lxml-dev] segfault in iterparse
Message-ID:
Thanks for iterparse -- it (mostly) rocks. However, I can segfault it on
large files when I try to clear out the tree to avoid unbounded memory use.
See attached code.
I'm _guessing_ that the problem is that iterparse doesn't like the deletion
of the current node.
Thanks,
Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060718/c9fff54f/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lxmlbug.py
Type: application/octet-stream
Size: 1140 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060718/c9fff54f/attachment.obj
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 19 08:10:48 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 19 Jul 2006 08:10:48 +0200
Subject: [lxml-dev] segfault in iterparse
In-Reply-To:
References:
Message-ID: <44BDCCE8.6050701@gkec.informatik.tu-darmstadt.de>
Hi Andrew,
Andrew Lutomirski wrote:
> Thanks for iterparse -- it (mostly) rocks. However, I can segfault it
> on large files when I try to clear out the tree to avoid unbounded
> memory use. See attached code.
Thanks for the test script - perfect bug report!
> I'm _guessing_ that the problem is that iterparse doesn't like the
> deletion of the current node.
Yup. Even worse, what you do is delete the entire tree, including all parents:
for i in etree.iterparse(StringIO(xml)):
i[1].getroottree().getroot().clear()
While lxml prevents the tree from being garbage collected immediately (it uses
a parent stack), the above code still unlinks the root node from rest of the
tree - looks like libxml2 doesn't like that...
The above usage is not currently prevented, only forbidden:
http://codespeak.net/svn/lxml/trunk/doc/api.txt
"""
Note that you should not modify or move the ancestors or siblings
of the element during either of the two events [start/end]. You should also
avoid moving the element itself.
"""
I do not know if it is worth taking special measures against accessing the
parent (or root tree) of the current element. On the one hand, lxml should not
segfault. On the other hand, /blocking/ the access to parents requires some
kind of additional flag on each element (AFAICT), so elements returned by
iterparse() would have to behave different from normal elements (i.e. be
different classes). In any case, it would not give you what you want -
clearing the entire tree would then simply raise an exception instead of
segfaulting.
So, maybe it's enough to rephrase the above quote to *must not* to 'fix' this
bug...
As for your problem, what you /can/ do, is remove the preceding siblings of
the element:
for event, element in etree.iterparse(StringIO(xml)):
# do something with element
element.clear() # clean up children
if element.getprevious(): # clean up preceding siblings
del element.getparent()[0]
This cleans up *after* the element. If you decided to skip elements ('tag'
argument), you can use "while" instead of "if" to remove all siblings you
might not have seen.
Admittedly, there's a little more consideration required than for the
ElementTree library, but I guess that's the price we pay for lxml being based
on libxml2.
Hope it helps.
Stefan
From luto at myrealbox.com Wed Jul 19 08:34:33 2006
From: luto at myrealbox.com (Andrew Lutomirski)
Date: Tue, 18 Jul 2006 23:34:33 -0700
Subject: [lxml-dev] segfault in iterparse
In-Reply-To: <44BDCCE8.6050701@gkec.informatik.tu-darmstadt.de>
References:
<44BDCCE8.6050701@gkec.informatik.tu-darmstadt.de>
Message-ID:
(sorry for resend -- I borked it the first time)
On 7/18/06, Stefan Behnel wrote:
> Hi Andrew,
>
> Andrew Lutomirski wrote:
> > Thanks for iterparse -- it (mostly) rocks. However, I can segfault it
> > on large files when I try to clear out the tree to avoid unbounded
> > memory use. See attached code.
>
> Thanks for the test script - perfect bug report!
>
>
> > I'm _guessing_ that the problem is that iterparse doesn't like the
> > deletion of the current node.
>
> Yup. Even worse, what you do is delete the entire tree, including all
> parents:
>
> for i in etree.iterparse(StringIO(xml)):
> i[1].getroottree().getroot().clear()
>
> While lxml prevents the tree from being garbage collected immediately (it
> uses
> a parent stack), the above code still unlinks the root node from rest of
> the
> tree - looks like libxml2 doesn't like that...
>
> The above usage is not currently prevented, only forbidden:
>
> http://codespeak.net/svn/lxml/trunk/doc/api.txt
>
> """
> Note that you should not modify or move the ancestors or siblings
> of the element during either of the two events [start/end]. You should
> also
> avoid moving the element itself.
> """
>
Phooey. I didn't read that -- I read
http://effbot.org/zone/element-iterparse.htm, which suggests:
for event, elem in context:
if event == "end" and elem.tag == "record":
... process record elements ...
root.clear()
(Apologies if gmail butchered that.)
>
> As for your problem, what you /can/ do, is remove the preceding siblings
> of
> the element:
Didn't the doc just say that you're _not_ supposed to modify siblings of the
current element? Perhaps the doc should give some canonical way to do the
huge document parsing? (For reference, 100k children of the root element is
probably an underestimate for my application, unfortunately.)
Probably the rule is "don't do anything that'll make the next event have
trouble linking itself into the tree." Removing siblings on an "end" is
safe, I guess?
Anyway, I'll keep fiddling around.
Thanks,
Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060718/d5b2842a/attachment.htm
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 19 08:47:48 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 19 Jul 2006 08:47:48 +0200
Subject: [lxml-dev] segfault in iterparse
In-Reply-To:
References:
<44BDCCE8.6050701@gkec.informatik.tu-darmstadt.de>
Message-ID: <44BDD594.8030905@gkec.informatik.tu-darmstadt.de>
Hi Andrew,
Andrew Lutomirski wrote:
> Phooey. I didn't read that -- I read
> http://effbot.org/zone/element-iterparse.htm, which suggests:
>
> for event, elem in context:
>
> if event == "end" and elem.tag == "record":
> ... process record elements ...
> root.clear()
Sure, I though so. I now updated the docs to make that clearer and also added
a FAQ section so that it is easier to find.
http://codespeak.net/svn/lxml/trunk/doc/api.txt
http://codespeak.net/svn/lxml/trunk/doc/FAQ.txt
> Didn't the doc just say that you're _not_ supposed to modify siblings of
> the current element?
That was not phrased correctly. What was meant was "following siblings" (which
might already be available due to internal implementation details).
> Perhaps the doc should give some canonical way to
> do the huge document parsing? (For reference, 100k children of the root
> element is probably an underestimate for my application, unfortunately.)
As I said, clearing the element and deleting the preceding siblings should do
the trick.
> Probably the rule is "don't do anything that'll make the next event have
> trouble linking itself into the tree."
I guess that's a good way of putting it. Everything that has to be touched
again *after* the current element is a no-no for modification.
> Removing siblings on an "end" is safe, I guess?
Preceding siblings, yes.
Stefan
From luto at myrealbox.com Fri Jul 21 00:31:56 2006
From: luto at myrealbox.com (Andrew Lutomirski)
Date: Thu, 20 Jul 2006 15:31:56 -0700
Subject: [lxml-dev] another iterparse segfault
Message-ID:
This one mystifies me competely -- three line testcase attached.
This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as
Python 2.4 on Gentoo with lxml trunk as of yesterday.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060720/10a636db/attachment.htm
-------------- next part --------------
#!/usr/bin/env python
from cStringIO import StringIO
from lxml import etree
xml = '' + ''.join(['' for b in xrange(10000)]) + ''
# Uncomment these and it will crash instead of failing an assertion.
#class myelement(etree.ElementBase):
# pass
#etree.setDefaultElementClass(None)
iter = etree.iterparse(StringIO(xml))
# The following variant will not crash:
#iter = etree.iterwalk(etree.parse(StringIO(xml)))
for x in iter:
elem = x[1]
#print elem, type(elem)
# If you uncomment the setDefaultElementClass stuff, you may need to
# uncomment this to make it crash.
#if len(dir(type(elem).__dict__)) == 0:
# print 'This happens sometimes.'
#dir(type(elem))
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Jul 21 07:06:12 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 21 Jul 2006 07:06:12 +0200
Subject: [lxml-dev] another iterparse segfault
In-Reply-To:
References:
Message-ID: <44C060C4.2030907@gkec.informatik.tu-darmstadt.de>
Hi Andrew,
Andrew Lutomirski wrote:
> This one mystifies me competely -- three line testcase attached.
>
> This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as
> Python 2.4 on Gentoo with lxml trunk as of yesterday.
Again, thanks for the bug report. This one really is a bug and I can reproduce
it with your test. It is related to the __ITERPARSE_CHUNK_SIZE (iterparse.pxi)
that is used internally to read the data in small chunks and hand it to the
parser to generate events. If you reduce the value, the chunk size is passed
earlier (after less than the 10000 elements you needed for your test) and the
bug occurs after a smaller number of parsed elements.
I'll have to take a closer look at it to figure out what's going wrong here.
Thanks again,
Stefan
From Olivier.Collioud at wipo.int Fri Jul 21 16:51:29 2006
From: Olivier.Collioud at wipo.int (Olivier Collioud)
Date: Fri, 21 Jul 2006 16:51:29 +0200
Subject: [lxml-dev] exceptions.TypeError: 'Argument must be string or
unicode.'
Message-ID:
Hello,
I'm having this error:
Exception exceptions.TypeError: 'Argument must be string or unicode.'
in 'etree._setAttributeValue' ignored
I suspect that I'm setting an attribute value to None but I don't know
where.
Is there a way to figure out where in my code the error occure ?
Olivier.
------
World Intellectual Property Organization Disclaimer:
This electronic message may contain privileged, confidential and
copyright protected information. If you have received this e-mail
by mistake, please immediately notify the sender and delete this
e-mail and all its attachments. Please ensure all e-mail attachments
are scanned for viruses prior to opening or using.
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Jul 21 21:53:17 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 21 Jul 2006 21:53:17 +0200
Subject: [lxml-dev] exceptions.TypeError: 'Argument must be string or
unicode.'
In-Reply-To:
References:
Message-ID: <44C130AD.9080608@gkec.informatik.tu-darmstadt.de>
Hi Olivier,
Olivier Collioud wrote:
> I'm having this error:
> Exception exceptions.TypeError: 'Argument must be string or unicode.'
> in 'etree._setAttributeValue' ignored
>
> I suspect that I'm setting an attribute value to None but I don't know
> where.
Sorry, my fault. That's a bug in lxml. You didn't tell us which version you
are using, but it's both in 1.0.2 and 1.1alpha. The fix is attached. With this
applied, you will get an exception including a normal traceback.
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: attribute-value-fix.patch
Type: text/x-patch
Size: 2098 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060721/617c4f44/attachment.bin
From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Jul 22 22:15:35 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 22 Jul 2006 22:15:35 +0200
Subject: [lxml-dev] another iterparse segfault
In-Reply-To: <44C060C4.2030907@gkec.informatik.tu-darmstadt.de>
References:
<44C060C4.2030907@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C28767.4020400@gkec.informatik.tu-darmstadt.de>
Stefan Behnel wrote:
> Andrew Lutomirski wrote:
>> This one mystifies me competely -- three line testcase attached.
>>
>> This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as
>> Python 2.4 on Gentoo with lxml trunk as of yesterday.
>
> Again, thanks for the bug report. This one really is a bug and I can reproduce
> it with your test. It is related to the __ITERPARSE_CHUNK_SIZE (iterparse.pxi)
> that is used internally to read the data in small chunks and hand it to the
> parser to generate events. If you reduce the value, the chunk size is passed
> earlier (after less than the 10000 elements you needed for your test) and the
> bug occurs after a smaller number of parsed elements.
>
> I'll have to take a closer look at it to figure out what's going wrong here.
... and so I did. It was a bug in the iterparse.next() method. The events and
corresponding elements are stored in a 'queue' (a Python list) and retrieved
by a call to PyList_GET_ITEM(). That funtion (or macro) returns a so-called
"borrowed reference" that must be INCREF'd by hand (Pyrex does not know about
it). Otherwise, the refcount is too low and will be garbage collected before
the last reference is gone.
Here's the patch. Thanks for the report,
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iterparse-next-incref.patch
Type: text/x-patch
Size: 1730 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060722/63d1a7eb/attachment.bin
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 09:55:58 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 24 Jul 2006 09:55:58 +0200
Subject: [lxml-dev] exceptions.TypeError: 'Argument must be string or
unicode.'
In-Reply-To:
References:
Message-ID: <44C47D0E.2080706@gkec.informatik.tu-darmstadt.de>
Hi Olivier,
Olivier Collioud wrote:
> I'm using 1.0.2 installed with lxml-1.0.2.win32-static-py2.4.exe.
>
> I guess that I need to pick the source package before applying you
> patch and then compile.
Right. There are build instructions on the lxml page:
http://codespeak.net/lxml/build.html#static-linking-on-windows
> When do you think the next version will be provided ?
1.1 beta will be out early next month.
As for 1.0.3: there are currently no critical bugs in 1.0.2, so I can't tell
if it will be available any earlier - but not later either.
Stefan
From faassen at infrae.com Mon Jul 24 16:37:24 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Mon, 24 Jul 2006 16:37:24 +0200
Subject: [lxml-dev] missing ElementTree API?
Message-ID: <44C4DB24.206@infrae.com>
Hi there,
Reading the "What changed in Python 2.5" document I ran into the
following bit of text concerning ElementTree:
"""
Comments and processing instructions are also represented as Element
nodes. To check if a node is a comment or processing instructions:
if elem.tag is ET.Comment:
...
elif elem.tag is ET.ProcessingInstruction:
...
"""
As far as I can determine with a simple 'grep' on the source code, this
isn't supported yet by lxml, at least the ProcessingInstruction bit.
Since this now appears to be documented in a rather central document,
perhaps we should. :)
I'm a bit surprised about this use of ET.Comment - this would imply
elem.tag returns the Comment class when its element is representing a
comment?
Regards,
Martijn
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 16:52:02 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 24 Jul 2006 16:52:02 +0200
Subject: [lxml-dev] missing ElementTree API?
In-Reply-To: <44C4DB24.206@infrae.com>
References: <44C4DB24.206@infrae.com>
Message-ID: <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de>
Hi Martijn,
Martijn Faassen wrote:
> Reading the "What changed in Python 2.5" document I ran into the
> following bit of text concerning ElementTree:
>
> """
> Comments and processing instructions are also represented as Element
> nodes. To check if a node is a comment or processing instructions:
>
> if elem.tag is ET.Comment:
> ...
> elif elem.tag is ET.ProcessingInstruction:
> ...
> """
Interesting. I don't think I've ever seen something like that before.
> As far as I can determine with a simple 'grep' on the source code, this
> isn't supported yet by lxml, at least the ProcessingInstruction bit.
> Since this now appears to be documented in a rather central document,
> perhaps we should. :)
Perhaps, yes. Guess we should also start supporting PIs, just like normal
Elements and Comments (i.e. _isElement would get a third value to check).
> I'm a bit surprised about this use of ET.Comment - this would imply
> elem.tag returns the Comment class when its element is representing a
> comment?
Looks like it. ET does this:
---------------------------------
def ProcessingInstruction(target, text=None):
element = Element(ProcessingInstruction)
element.text = target
if text:
element.text = element.text + " " + text
return element
PI = ProcessingInstruction
---------------------------------
Same for Comment.
Funny, hu?
So we'd have to return the Comment factory function from _Comment.tag then
instead of the current None.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 17:07:51 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 24 Jul 2006 17:07:51 +0200
Subject: [lxml-dev] missing ElementTree API?
In-Reply-To: <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de>
References: <44C4DB24.206@infrae.com>
<44C4DE92.1090803@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C4E247.60506@gkec.informatik.tu-darmstadt.de>
Stefan Behnel wrote:
> ET does this:
>
> ---------------------------------
> def ProcessingInstruction(target, text=None):
> element = Element(ProcessingInstruction)
> element.text = target
> if text:
> element.text = element.text + " " + text
> return element
> PI = ProcessingInstruction
> ---------------------------------
One thing that bothers me is that this prevents us from returning the target
as tag. libxml2 uses the 'name' for this. Maybe we should add a property
"target" to PIs?
Even ET (1.3?) could easily emulate that by setting
element.target = target
before returning it.
Stefan
From faassen at infrae.com Mon Jul 24 18:24:20 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Mon, 24 Jul 2006 18:24:20 +0200
Subject: [lxml-dev] missing ElementTree API?
In-Reply-To: <44C4E247.60506@gkec.informatik.tu-darmstadt.de>
References: <44C4DB24.206@infrae.com> <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de>
<44C4E247.60506@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C4F434.2090606@infrae.com>
Stefan Behnel wrote:
> Stefan Behnel wrote:
>> ET does this:
>>
>> ---------------------------------
>> def ProcessingInstruction(target, text=None):
>> element = Element(ProcessingInstruction)
>> element.text = target
>> if text:
>> element.text = element.text + " " + text
>> return element
>> PI = ProcessingInstruction
>> ---------------------------------
>
> One thing that bothers me is that this prevents us from returning the target
> as tag. libxml2 uses the 'name' for this. Maybe we should add a property
> "target" to PIs?
>
> Even ET (1.3?) could easily emulate that by setting
>
> element.target = target
>
> before returning it.
I'm fine with extending the API that way to return this information if
ET doesn't do it.
Heh, all this makes me think we should start working on a ElementTree
community standard with accompanying testsuite. Now the only thing we
need is time. :)
Regards,
Martijn
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 18:27:23 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 24 Jul 2006 18:27:23 +0200
Subject: [lxml-dev] missing ElementTree API?
In-Reply-To: <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de>
References: <44C4DB24.206@infrae.com>
<44C4DE92.1090803@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C4F4EB.5020700@gkec.informatik.tu-darmstadt.de>
Stefan Behnel wrote:
> ET does this:
>
> ---------------------------------
> def ProcessingInstruction(target, text=None):
> element = Element(ProcessingInstruction)
> element.text = target
> if text:
> element.text = element.text + " " + text
> return element
> PI = ProcessingInstruction
> ---------------------------------
Ok, I think that the target of a PI has a sufficiently high importance to give
it an API, so I decided not to go the ET way here. lxml.etree will have a
".target" property that returns the PI target and the ".text" property will
not contain the target. This means that
......
will give this in lxml.etree:
pi.target == "test"
pi.text == "my test PI "
and this in ET:
pi.text == "test my test PI"
I'm also considering making PIs and comments subject to custom Element class
selection, which would simplify the re-implementation of PHP over lxml. :] But
maybe that can wait until lxml 1.2... :)
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 20:09:13 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 24 Jul 2006 20:09:13 +0200
Subject: [lxml-dev] missing ElementTree API?
In-Reply-To: <44C4F434.2090606@infrae.com>
References: <44C4DB24.206@infrae.com> <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de>
<44C4E247.60506@gkec.informatik.tu-darmstadt.de>
<44C4F434.2090606@infrae.com>
Message-ID: <44C50CC9.1000609@gkec.informatik.tu-darmstadt.de>
Martijn Faassen wrote:
> Heh, all this makes me think we should start working on a ElementTree
> community standard with accompanying testsuite. Now the only thing we
> need is time. :)
Well, we have tests/test_elementtree.py to contribute for now. It's not quite
what I'd call a 'complete' test suite, but it's a good point to start.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 21:06:02 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 24 Jul 2006 21:06:02 +0200
Subject: [lxml-dev] Custom element class lookup mechanisms
Message-ID: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de>
Hi all,
as I was working on the C-API anyway (capi branch), I decided to add a little
external module with different ways of determining the Python element class
for a libxml2 node. The "lxml.elements.classlookup" module currently
implements three different ways of doing this:
* ElementDefaultClassLookup always uses the default class
* ElementNamespaceClassLookup is the default namespace lookup mechanism
* AttributeBasedElementClassLookup determines the class by looking up the
value of a specific attribute in a dict. It falls back to the default classes.
Other ways are of cause possible, so if anyone has an idea what to add, I'm
open for suggestions.
An example usage is this:
from lxml.elements import classlookup
classlookup.setElementClassLookup(
classlookup.ElementDefaultClassLookup())
It registers the mechanism that always uses the default class for elements,
comments and PIs (yes, I implemented that, too). This disables the namespace
class lookup and thus speeds up the plain element object creation by up to 10%.
Example usage for attribute based lookup:
mydict = {'int' : IntElement, 'str' : StrElement}
classlookup.setElementClassLookup(
classlookup.AttributeBasedElementClassLookup('pytype', mydict))
root = etree.XML('5test')
Internally, the lookup function is registered using the public C-API function
"setElementClassLookupFunction()" and must be implemented in Pyrex (or C). It
takes an object and the xmlNode* as arguments. The object can be used to keep
some status, such as the attribute name and class dict in the
AttributeBasedElementClassLookup case. It is registered together with the
lookup function, passed as first argument on each call and otherwise ignored
by lxml.
The return value of the lookup function is a callable Python object (typically
a subtype of _Element) that returns an element instance.
The C API itself is briefly described here:
http://codespeak.net/svn/lxml/branch/capi/doc/capi.txt
Hope this is useful,
Stefan
From luto at myrealbox.com Mon Jul 24 21:13:16 2006
From: luto at myrealbox.com (Andrew Lutomirski)
Date: Mon, 24 Jul 2006 12:13:16 -0700
Subject: [lxml-dev] Custom element class lookup mechanisms
In-Reply-To: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de>
References: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de>
Message-ID:
On 7/24/06, Stefan Behnel wrote:
>
> Hi all,
>
> as I was working on the C-API anyway (capi branch), I decided to add a
> little
> external module with different ways of determining the Python element
> class
> for a libxml2 node. The "lxml.elements.classlookup" module currently
> implements three different ways of doing this:
>
> * ElementDefaultClassLookup always uses the default class
> * ElementNamespaceClassLookup is the default namespace lookup mechanism
> * AttributeBasedElementClassLookup determines the class by looking up the
> value of a specific attribute in a dict. It falls back to the default
> classes.
>
> Other ways are of cause possible, so if anyone has an idea what to add,
> I'm
> open for suggestions.
How about a way to make this setting per-parser instead of global?
--Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060724/e8d07f7d/attachment.htm
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 24 21:26:05 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 24 Jul 2006 21:26:05 +0200
Subject: [lxml-dev] Custom element class lookup mechanisms
In-Reply-To:
References: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C51ECD.8000101@gkec.informatik.tu-darmstadt.de>
Andrew Lutomirski wrote:
> On 7/24/06, *Stefan Behnel* wrote:
>
> Hi all,
>
> as I was working on the C-API anyway (capi branch), I decided to add
> a little
> external module with different ways of determining the Python
> element class
> for a libxml2 node. The "lxml.elements.classlookup " module currently
> implements three different ways of doing this:
>
> * ElementDefaultClassLookup always uses the default class
> * ElementNamespaceClassLookup is the default namespace lookup mechanism
> * AttributeBasedElementClassLookup determines the class by looking
> up the
> value of a specific attribute in a dict. It falls back to the
> default classes.
>
> Other ways are of cause possible, so if anyone has an idea what to
> add, I'm
> open for suggestions.
>
>
> How about a way to make this setting per-parser instead of global?
Sure, I thought about that, too (although rather at a per-document level). But
that would require changing the signature of the lookup function to pass also
the document (which, in turn, keeps a reference to its parser).
I think that makes sense, so I'll pass the document also. You can then use a
weak-dict to map documents (or parsers) to element classes.
Stefan
From vadud3 at gmail.com Tue Jul 25 11:26:31 2006
From: vadud3 at gmail.com (Asif Iqbal)
Date: Tue, 25 Jul 2006 05:26:31 -0400
Subject: [lxml-dev] xml2 and xslt libraries
Message-ID:
Hi All
How do I compile lxml with xml2 and xslt libraries from /usr/local/lib dir?
I have two separate version of them. One is /usr/lib that comes with SUN and
one in /usr/local/lib.
I did run `python setup.py build_ext -L/usr/local/lib -R/usr/local/lib'. But
only takes care of the -lxml. The xslt library was still being picked up
from /usr/lib. The xslt library of SUN that sits on /usr/lib is integral
part of SUN Sol 10 and used by SMF so I cannot get rid it.
I need `xsltDocDefaultLoader' which is missing on /usr/lib/libxslt.so.
Details:
nm /usr/lib/libxslt.so | grep xsltDoc
[431] | 116043| 297|FUNC |GLOB |0 |11 |xsltDocumentComp
[495] | 133376| 3395|FUNC |GLOB |0 |11 |xsltDocumentElem
[319] | 101736| 908|FUNC |GLOB |0 |11 |xsltDocumentFunction
[156] | 101059| 677|FUNC |LOCL |0 |11
|xsltDocumentFunctionLoadDocument
[277] | 48263| 127|FUNC |GLOB |0 |11
|xsltDocumentSortFunction
nm /usr/local/lib/libxslt.so | grep xsltDoc
[1251] | 254192| 4|OBJT |GLOB |0 |25 |xsltDocDefaultLoader
[803] | 119356| 290|FUNC |LOCL |0 |10
|xsltDocDefaultLoaderFunc
[1241] | 122376| 309|FUNC |GLOB |0 |10 |xsltDocumentComp
[1527] | 139932| 3817|FUNC |GLOB |0 |10 |xsltDocumentElem
[1601] | 105248| 1471|FUNC |GLOB |0 |10 |xsltDocumentFunction
[1504] | 49964| 160|FUNC |GLOB |0 |10
|xsltDocumentSortFunction
As you can see `xsltDocDefaultLoader' is only available on
/usr/local/lib/libxslt.so
--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060725/c7757c84/attachment-0001.htm
From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 25 12:11:03 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Tue, 25 Jul 2006 12:11:03 +0200
Subject: [lxml-dev] xml2 and xslt libraries
In-Reply-To:
References:
Message-ID: <44C5EE37.9010500@gkec.informatik.tu-darmstadt.de>
Hi Asif,
Asif Iqbal wrote:
> How do I compile lxml with xml2 and xslt libraries from /usr/local/lib
> dir? I have two separate version of them. One is /usr/lib that comes
> with SUN and one in /usr/local/lib.
>
> I did run `python setup.py build_ext -L/usr/local/lib -R/usr/local/lib'.
> But only takes care of the -lxml. The xslt library was still being
> picked up from /usr/lib. The xslt library of SUN that sits on /usr/lib
> is integral part of SUN Sol 10 and used by SMF so I cannot get rid it.
>
> I need `xsltDocDefaultLoader' which is missing on /usr/lib/libxslt.so.
Guess that library is too old, then.
You have a number of options, here are two.
* Try setting LD_LIBRARY_PATH to "/usr/local/lib".
* You can build etree statically, which completely avoids this kind of problems:
http://codespeak.net/lxml/build.html#static-linking-on-windows
(hope you don't mind the 'windows' bit in the text, you can also use that on
other systems.)
Hope it helps,
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 25 12:42:52 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Tue, 25 Jul 2006 12:42:52 +0200
Subject: [lxml-dev] xml2 and xslt libraries
In-Reply-To:
References:
<44C5EE37.9010500@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C5F5AC.6070908@gkec.informatik.tu-darmstadt.de>
Asif Iqbal wrote:
> On 7/25/06, Stefan Behnel wrote:
>> You can build etree statically, which completely avoids this kind
>> of problems:
>>
>> http://codespeak.net/lxml/build.html#static-linking-on-windows
>
> I did see this since it comes with doc/build.txt file. Do I have to get
> all the iconv and zlib all the other static libraries?
No. It just means you have to build your flags by hand. I just happened to
compile lxml statically today, against a Python 2.4 installed in my work
directory. Apart from that, I only used libxml2 and libxslt for static
compilation:
cflags = [
"-I/path/to/libxml2-2.6.26/include",
"-I/path/to/libxslt-1.1.17",
"-I/path/to/PYTHON/include/python2.4",
"-I/usr/include"
]
xslt_libs = [
"-L/path/to/PYTHON/lib/python2.4",
"/path/to/libxslt-1.1.17/libexslt/.libs/libexslt.a",
"/path/to/libxslt-1.1.17/libxslt/.libs/libxslt.a",
"/path/to/libxml2-2.6.26/.libs/libxml2.a",
"-lz", "-lm",
]
Worked for me, you'll likely have to adapt it.
BTW, in case you use 1.1alpha, its setup.py has a bug regarding the static
setup. Feel free to apply this patch:
--------------------------------------
Index: setup.py
===================================================================
--- setup.py (Revision 30429)
+++ setup.py (Arbeitskopie)
@@ -109,6 +109,7 @@
# use the static setup as configured in setupStaticBuild
sys.argv.remove('--static')
cflags, xslt_libs = setupStaticBuild()
+ ext_args['extra_link_args'] = xslt_libs
else:
cflags = flags('xslt-config --cflags')
xslt_libs = flags('xslt-config --libs')
--------------------------------------
Stefan
From faassen at infrae.com Tue Jul 25 14:29:44 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Tue, 25 Jul 2006 14:29:44 +0200
Subject: [lxml-dev] missing ElementTree API?
In-Reply-To: <44C4F4EB.5020700@gkec.informatik.tu-darmstadt.de>
References: <44C4DB24.206@infrae.com> <44C4DE92.1090803@gkec.informatik.tu-darmstadt.de>
<44C4F4EB.5020700@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C60EB8.7010806@infrae.com>
Stefan Behnel wrote:
[snip]
> I'm also considering making PIs and comments subject to custom Element class
> selection, which would simplify the re-implementation of PHP over lxml. :]
Arrgh! :) Anyway, it's likely PHP can have combinations of processing
instructions that aren't valid XML. I don't know anything about PHP but
I recall having such issues with the ClearSilver templating issue - it
turned out to be impossible to generate it using XSLT (unless I was
masochistic enough to generate it in text mode).
Regards,
Martijn
From Olivier.Collioud at wipo.int Tue Jul 25 15:54:34 2006
From: Olivier.Collioud at wipo.int (Olivier Collioud)
Date: Tue, 25 Jul 2006 15:54:34 +0200
Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4
runtime
Message-ID:
Hello,
following these instructions:
http://codespeak.net/lxml/build.html#static-linking-on-windows
Running these commands:
C:\Download\lxml-source\lxml-1.0.2>set PATH="C:\Program
Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\bin";"C:\Program Files\Microsoft Visual
Studio\VC98\Bin"
C:\Download\lxml-source\lxml-1.0.2>set PYTHONPATH="C:\Program
Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib"
C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst
--static
Building lxml version 1.0.2
*NOTE*: Trying to build without Pyrex, needs pre-generated
'src/lxml/etree.c' !
running bdist_wininst
running build
running build_py
running build_ext
Traceback (most recent call last):
File "setup.py", line 170, in ?
ext_modules = ext_modules,
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\core.py", line 149, in
setup
dist.run_commands()
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\dist.py", line 907, in
run_commands
self.run_command(cmd)
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\dist.py", line 927, in
run_command
cmd_obj.run()
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\command\bdist_wininst.py",
line 101, in run
self.run_command('build')
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\cmd.py", line 333, in
run_command
self.distribution.run_command(command)
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\dist.py", line 927, in
run_command
cmd_obj.run()
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\command\build.py", line 107,
in run
self.run_command(cmd_name)
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\cmd.py", line 333, in
run_command
self.distribution.run_command(command)
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\dist.py", line 927, in
run_command
cmd_obj.run()
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\command\build_ext.py", line
243, in run
force=self.force)
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\ccompiler.py", line 1173, in
new_compiler
return klass (None, dry_run, force)
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 206,
in __init__
self.__macros = MacroExpander(self.__version)
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 112,
in __init__
self.load_macros(version)
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 128,
in load_macros
self.set_macro("FrameworkSDKDir", net, "sdkinstallrootv1.1")
File "C:\Program Files\OpenOffice.org
2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 118,
in set_macro
self.macros["$(%s)" % macro] = d[key]
KeyError: 'sdkinstallrootv1.1'
My guess is that it is related to my MS-VS installation.
I have no experience with MS compilation. (I don't even know why it is
installed on my PC :-).
I would be grateful if anybody can help me to build this or tell me
where I can download an lxml 1.0.2 or more recent (or not too old) build
for win32 and Python 2.3.4.
Olivier.
------
World Intellectual Property Organization Disclaimer:
This electronic message may contain privileged, confidential and
copyright protected information. If you have received this e-mail
by mistake, please immediately notify the sender and delete this
e-mail and all its attachments. Please ensure all e-mail attachments
are scanned for viruses prior to opening or using.
From vadud3 at gmail.com Tue Jul 25 16:08:54 2006
From: vadud3 at gmail.com (Asif Iqbal)
Date: Tue, 25 Jul 2006 10:08:54 -0400
Subject: [lxml-dev] xml2 and xslt libraries
In-Reply-To:
References:
<44C5EE37.9010500@gkec.informatik.tu-darmstadt.de>
<44C5F5AC.6070908@gkec.informatik.tu-darmstadt.de>
Message-ID:
On 7/25/06, Asif Iqbal wrote:
>
> On 7/25/06, Stefan Behnel
> wrote:
>
> >
> > Asif Iqbal wrote:
> > > On 7/25/06, Stefan Behnel wrote:
> > >> You can build etree statically, which completely avoids this kind
> > >> of problems:
> > >>
> > >> http://codespeak.net/lxml/build.html#static-linking-on-windows
> > >
> > > I did see this since it comes with doc/build.txt file. Do I have to
> > get
> > > all the iconv and zlib all the other static libraries?
> >
> > No. It just means you have to build your flags by hand. I just happened
> > to
> > compile lxml statically today, against a Python 2.4 installed in my work
> > directory. Apart from that, I only used libxml2 and libxslt for static
> > compilation:
> >
> > cflags = [
> > "-I/path/to/libxml2-2.6.26/include",
> > "-I/path/to/libxslt-1.1.17",
> > "-I/path/to/PYTHON/include/python2.4",
> > "-I/usr/include"
> > ]
> > xslt_libs = [
> > "-L/path/to/PYTHON/lib/python2.4",
> > "/path/to/libxslt-1.1.17/libexslt/.libs/libexslt.a",
> > "/path/to/libxslt-1.1.17/libxslt/.libs/libxslt.a",
> > "/path/to/libxml2-2.6.26/.libs/libxml2.a",
> > "-lz", "-lm",
> > ]
> >
> > Worked for me, you'll likely have to adapt it.
>
>
> I could not compile . I attached the compile output called
> lxml.compile.out
>
Here is the attachment again lot smaller than last time. Hopefully it wont
bounce this time
BTW, in case you use 1.1alpha, its setup.py has a bug regarding the static
> > setup. Feel free to apply this patch:
>
>
>
> Looks like 1.0.2 needed a patch to
>
>
> --- setup.py.orig Tue Jul 25 08:03:15 2006
> +++ setup.py Tue Jul 25 08:03:29 2006
> @@ -18,7 +18,7 @@
> xslt_libs = [
> ]
> result = (cflags, xslt_libs)
> - # return result
> + return result
> raise NotImplementedError, \
> "Static build not configured, see doc/build.txt"
>
>
> --
> Asif Iqbal
> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
>
>
>
>
--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060725/992a5811/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lxml.compile.out2
Type: application/octet-stream
Size: 2542 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060725/992a5811/attachment.obj
From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 25 16:19:37 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Tue, 25 Jul 2006 16:19:37 +0200
Subject: [lxml-dev] xml2 and xslt libraries
In-Reply-To:
References:
<44C5EE37.9010500@gkec.informatik.tu-darmstadt.de>
<44C5F5AC.6070908@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C62879.7050206@gkec.informatik.tu-darmstadt.de>
Hi Asif,
Asif Iqbal wrote:
> I could not compile . I attached the compile output called lxml.compile.out
Hmm, ok, first things first: 3MB log files are not quite the thing you should
send to a mailing list. That's something for a) private e-mail and b)
compressors like bzip2, which, BTW, compresses it to some 100K.
That said, here's the relevant portions:
-------------------------------------
Building lxml version 1.0.2
running build_ext
building 'lxml.etree' extension
gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC
-I/usr/local/include/python2.4 -c src/lxml/etree.c -o
build/temp.solaris-2.10-i86pc-2.4/src/lxml/etree.o -w
-I/usr/local/include/libxml2 -I/usr/local/include/libxslt -I/usr/loc
al/include/libexslt -I/usr/local/include/python2.4 -I/usr/include
gcc -shared build/temp.solaris-2.10-i86pc-2.4/src/lxml/etree.o -o
build/lib.solaris-2.10-i86pc-2.4/lxml/etree.so -L/usr/local/lib/python2.4
/usr/local/lib/libexslt.a /usr/local/lib/libxslt.a /usr/local/lib/libxml2.a
-lz -lpthread -lsocket -lnsl -lm
Text relocation remains referenced
against symbol offset in file
0x17
/usr/local/lib/libexslt.a(common.o)
0xb6
/usr/local/lib/libexslt.a(common.o)
0xe3
/usr/local/lib/libexslt.a(common.o)
0x11c
/usr/local/lib/libexslt.a(common.o)
[...]
fmod 0x7f3b /usr/local/lib/libxml2.a(xpath.o)
fmod 0x8023 /usr/local/lib/libxml2.a(xpath.o)
fmod 0x8c6c /usr/local/lib/libxml2.a(xpath.o)
ld: fatal: relocations remain against allocatable but non-writable sections
collect2: ld returned 1 exit status
error: command 'gcc' failed with exit status 1
-------------------------------------
Hmmm, don't know where that comes from. I can see that you added quite a
number of libraries and I don't think it's because there is one missing.
Maybe someone else who has more experience with Solaris compilation than I do
could help you here...
> Looks like 1.0.2 needed a patch to
>
> --- setup.py.orig Tue Jul 25 08:03:15 2006
> +++ setup.py Tue Jul 25 08:03:29 2006
> @@ -18,7 +18,7 @@
> xslt_libs = [
> ]
> result = (cflags, xslt_libs)
> - # return result
> + return result
> raise NotImplementedError, \
> "Static build not configured, see doc/build.txt"
No, why? There is no standard configuration for static compilation, so you get
an exception when you pass "--static" until you managed to a) read the docs
and b) changed the static setup function accordingly.
Stefan
From faassen at infrae.com Tue Jul 25 19:48:18 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Tue, 25 Jul 2006 19:48:18 +0200
Subject: [lxml-dev] objectify feedback
Message-ID: <44C65962.8030105@infrae.com>
Hi there,
I just read through the objectify.txt documentation/doctest. Quite
interesting and impressive stuff!
One thing that worries me is that it does introduce quite a new API with
different behavior than ElementTree in some fundamental ways. How close
is the behavior of the new API to Amara? I'd be nice if we weren't
inventing too much that's new here somehow.. Anyway, this is "are we
inventing too many new wheels?" worry - one of the ideas of lxml is not
to invent too many, though on the other hand we shouldn't stop people
from building cool stuff on top of the core, which is what this is.
The other thing that worries me from a more practical perspective is
that this is, as far as I can see, controlled globally.
The beginning of the document says "Don't mix!" and this sounds like
sensible advice, but as far as I understand if you use objectify in your
application you cannot use ElementTree anymore in the same application,
unless you do a lot of registration and reset work. It'd be *very* nice
if this were not so - create an objectified tree separately not
affecting the way normal trees are created.
This way, you could have one module of your application using objectify
but another module still sticking to normal ElementTree.
What to do when someone tries to mix bits of one tree with another?
Perhaps there's an efficient way to compare baseclass between the two
lxml objects that are combined, and bail out with some reasonably clear
exception in case of illegal combinations.
Regards,
Martijn
From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Jul 25 20:24:09 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Tue, 25 Jul 2006 20:24:09 +0200
Subject: [lxml-dev] objectify feedback
In-Reply-To: <44C65962.8030105@infrae.com>
References: <44C65962.8030105@infrae.com>
Message-ID: <44C661C9.4060401@gkec.informatik.tu-darmstadt.de>
Hi Martijn,
Martijn Faassen wrote:
> I just read through the objectify.txt documentation/doctest. Quite
> interesting and impressive stuff!
Thanks. Wasn't even my idea (not alone, at least).
> One thing that worries me is that it does introduce quite a new API with
> different behavior than ElementTree in some fundamental ways.
It is fundamentally different in some aspects (e.g. slicing works on siblings,
not children), but I'm trying to keep it close enough to take as much
advantage of the ET API as possible. What helps is that most parts of etree
already work directly on the C tree, which does /not/ change its API, so there
are few places that actually break.
> How close
> is the behavior of the new API to Amara? I'd be nice if we weren't
> inventing too much that's new here somehow.
> Anyway, this is "are we inventing too many new wheels?" worry - one of the
> ideas of lxml is not to invent too many
There are some things to invent, some things to keep. For example, Amara has
all sorts of functionality that ET already provides, so I leave that out as
much as possible (attribute access, for example).
Things like the attribute access and the behaviour of slicing/indexing are
directly borrowed from Amara.
Another thing is XSD type support. When you add an xsi:type attribute to your
elements, objectify will pick it up and look for a corresponding Python data
type. So it's even somewhat standards compliant here. :)
In a way it's a new API with many ideas borrowed from a few good places.
> The other thing that worries me from a more practical perspective is
> that this is, as far as I can see, controlled globally.
>
> The beginning of the document says "Don't mix!" and this sounds like
> sensible advice,
It is. You can get things totally messed up by mixing elements from different
APIs in the same tree. This will break some parts of the API in a non-obvious
way. One prominent example is _elementpath, which traverses the tree level by
level. Now think of one element iterating over its children, the other one
yielding its siblings. Great.
Its only OK as long as you can control which API you use where, but that can
be hard enough to control already.
> but as far as I understand if you use objectify in your
> application you cannot use ElementTree anymore in the same application,
> unless you do a lot of registration and reset work. It'd be *very* nice
> if this were not so - create an objectified tree separately not
> affecting the way normal trees are created.
This can be done and I already started providing the infrastructure. Look at
the lxml.elements.classlookup module (elements.txt). It allows you to change
the way nodes are mapped to element classes. I managed to let it support
lookup chains by now so that you can define fallbacks if the selected strategy
does not find a suitable class. One of the lookup schemes delegates to the
parsers, so when you set that one globally, each parser can have its own
lookup mechanism (with a fallback to the default lookup). I will soon
integrate the objectify class lookup into this framework, which should answer
your question. :)
> This way, you could have one module of your application using objectify
> but another module still sticking to normal ElementTree.
I will try to make sure in the docs that that is the main intention and that
mixing elements from different sources is A Bad Idea.
> What to do when someone tries to mix bits of one tree with another?
> Perhaps there's an efficient way to compare baseclass between the two
> lxml objects that are combined, and bail out with some reasonably clear
> exception in case of illegal combinations.
Well, even worse, the problem will go away when element classes are garbage
collected. Which can lead to nicely surprising effects like a function working
in one run and failing in the next - without obvious changes and without an
easily visible difference between the elements that were passed in (except for
their type, that is). Even better, debugging then means that you have to
figure out where the wrong element came from, or where the last reference to
the element was stored that prevented garbage collection. Cool.
Guess I'll have to make the respective warnings *very* clear in the docs ...
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 26 06:07:35 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 26 Jul 2006 06:07:35 +0200
Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4
runtime
In-Reply-To:
References:
Message-ID: <44C6EA87.5080409@gkec.informatik.tu-darmstadt.de>
Hi Olivier,
Olivier Collioud wrote:
> following these instructions:
> http://codespeak.net/lxml/build.html#static-linking-on-windows
>
> Running these commands:
>
> C:\Download\lxml-source\lxml-1.0.2>set PATH="C:\Program
> Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org
> 2.0\program\python-core-2.3.4\bin";"C:\Program Files\Microsoft Visual
> Studio\VC98\Bin"
>
> C:\Download\lxml-source\lxml-1.0.2>set PYTHONPATH="C:\Program
> Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org
> 2.0\program\python-core-2.3.4\lib"
>
> C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst
> --static
> Building lxml version 1.0.2
[...]
> File "C:\Program Files\OpenOffice.org
> 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 118,
> in set_macro
> self.macros["$(%s)" % macro] = d[key]
> KeyError: 'sdkinstallrootv1.1'
>
> My guess is that it is related to my MS-VS installation.
Looks like it. Though I'm not sure which 'sdk' is referred to here. Might also
be the OOo SDK.
> I have no experience with MS compilation. (I don't even know why it is
> installed on my PC :-).
Guess you accidentally installed MS-Windows, that tends to install a lot of
Microsoft stuff. ;)
> I would be grateful if anybody can help me to build this or tell me
> where I can download an lxml 1.0.2 or more recent (or not too old) build
> for win32 and Python 2.3.4.
We usually don't have Windows binaries for Python 2.3. Martijn keeps building
Linux eggs for it, mainly to support Web-Servers, but most people are on 2.4
by now.
What about installing a normal Python 2.3 (from python.org), build lxml
against that and then just /install/ it to the OOo Python directory?
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 26 08:50:23 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 26 Jul 2006 08:50:23 +0200
Subject: [lxml-dev] objectify feedback
In-Reply-To: <44C661C9.4060401@gkec.informatik.tu-darmstadt.de>
References: <44C65962.8030105@infrae.com>
<44C661C9.4060401@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C710AF.8070204@gkec.informatik.tu-darmstadt.de>
Stefan Behnel wrote:
> There are some things to invent, some things to keep. For example, Amara has
> all sorts of functionality that ET already provides, so I leave that out as
> much as possible (attribute access, for example).
>
> Things like the attribute access and the behaviour of slicing/indexing are
> directly borrowed from Amara.
"attribute access" meaning "XML attributes" in the first paragraph (i.e.
set/get/attrib), "object attributes" in the second (i.e. "parent.child").
Stefan
From Olivier.Collioud at wipo.int Wed Jul 26 10:17:31 2006
From: Olivier.Collioud at wipo.int (Olivier Collioud)
Date: Wed, 26 Jul 2006 10:17:31 +0200
Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4
runtime
Message-ID:
Voici le r?sultat de la compilation apr?s installation de python 2.3.5
C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static
Building lxml version 1.0.2
*NOTE*: Trying to build without Pyrex, needs pre-generated 'src/lxml/etree.c' !
running bdist_wininst
running build
running build_py
running build_ext
building 'lxml.etree' extension
C:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IC:\soft\python23\include -IC:\soft\python23\PC /Tcsrc/l
xml/etree.c /Fobuild\temp.win32-2.3\Release\src/lxml/etree.obj -w -I..\libxml2-2.6.26.win32\include -I..\libxslt-1.1.17.win32\include -I..\zlib-1.2.3.
win32\include -I..\iconv-1.9.2.win32\include
Command line warning D4025 : overriding '/W3' with '/w'
etree.c
C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\soft\python23\libs /LIBPATH:C:\soft\python23\PCBui
ld /EXPORT:initetree build\temp.win32-2.3\Release\src/lxml/etree.obj /OUT:build\lib.win32-2.3\lxml\etree.pyd /IMPLIB:build\temp.win32-2.3\Release\src/
lxml\etree.lib ..\libxml2-2.6.26.win32\lib\libxml2_a.lib ..\libxslt-1.1.17.win32\lib\libxslt_a.lib ..\libxslt-1.1.17.win32\lib\libexslt_a.lib ..\zlib-
1.2.3.win32\lib\zlib.lib ..\iconv-1.9.2.win32\lib\iconv_a.lib
Creating library build\temp.win32-2.3\Release\src/lxml\etree.lib and object build\temp.win32-2.3\Release\src/lxml\etree.exp
LINK : warning LNK4049: locally defined symbol "_xmlFree" imported
LINK : warning LNK4049: locally defined symbol "_xsltDocDefaultLoader" imported
LINK : warning LNK4049: locally defined symbol "_xsltLibxsltVersion" imported
libxslt_a.lib(numbers.obj) : error LNK2001: unresolved external symbol __ftol2
libexslt_a.lib(date.obj) : error LNK2001: unresolved external symbol __ftol2
libexslt_a.lib(strings.obj) : error LNK2001: unresolved external symbol __ftol2
libexslt_a.lib(math.obj) : error LNK2001: unresolved external symbol __ftol2
libxml2_a.lib(xpath.obj) : error LNK2001: unresolved external symbol __ftol2
libxml2_a.lib(xpointer.obj) : error LNK2001: unresolved external symbol __ftol2
libxml2_a.lib(xmlschemastypes.obj) : error LNK2001: unresolved external symbol __ftol2
libxslt_a.lib(xsltutils.obj) : error LNK2001: unresolved external symbol __ftol2
build\lib.win32-2.3\lxml\etree.pyd : fatal error LNK1120: 1 unresolved externals
error: command '"C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe"' failed with exit status 1120
>>> Stefan Behnel 26/07/06 9:02 AM >>>
Hi Olivier,
Olivier Collioud wrote:
> I have no experience with Python C extension compilation too.
>
> Do you mean that I do not need any C compiler to build this kind of
> beast.
No, en fait, je pensais plut?t que ?a vaut le coup d'essayer avec un 'vrais'
Python. Si le myst?rieux "sdkinstallrootv1.1" vient de OOo, ?a se peut que tu
arriveras ? utiliser un lxml compil? pour un python.org Python dans le OOo Python.
Donc: Installer Python 2.3.4 'normal', compiler et installer lxml avec ?a, et
apr?s copier le r?pertoire install? (site-packages/lxml) dans le Python de OOo.
Stefan
>>>> Stefan Behnel
> Olivier Collioud wrote:
>> following these instructions:
>> http://codespeak.net/lxml/build.html#static-linking-on-windows
>>
>> Running these commands:
>>
>> C:\Download\lxml-source\lxml-1.0.2>set PATH="C:\Program
>> Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org
>> 2.0\program\python-core-2.3.4\bin";"C:\Program Files\Microsoft
> Visual
>> Studio\VC98\Bin"
>>
>> C:\Download\lxml-source\lxml-1.0.2>set PYTHONPATH="C:\Program
>> Files\OpenOffice.org 2.0\program";"C:\Program Files\OpenOffice.org
>> 2.0\program\python-core-2.3.4\lib"
>>
>> C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst
>> --static
>> Building lxml version 1.0.2
> [...]
>> File "C:\Program Files\OpenOffice.org
>> 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line
> 118,
>> in set_macro
>> self.macros["$(%s)" % macro] = d[key]
>> KeyError: 'sdkinstallrootv1.1'
>>
>> My guess is that it is related to my MS-VS installation.
>
> Looks like it. Though I'm not sure which 'sdk' is referred to here.
> Might also
> be the OOo SDK.
>
>
>> I have no experience with MS compilation. (I don't even know why it
> is
>> installed on my PC :-).
>
> Guess you accidentally installed MS-Windows, that tends to install a
> lot of
> Microsoft stuff. ;)
>
>
>> I would be grateful if anybody can help me to build this or tell me
>> where I can download an lxml 1.0.2 or more recent (or not too old)
> build
>> for win32 and Python 2.3.4.
>
> We usually don't have Windows binaries for Python 2.3. Martijn keeps
> building
> Linux eggs for it, mainly to support Web-Servers, but most people are
> on 2.4
> by now.
>
> What about installing a normal Python 2.3 (from python.org), build
> lxml
> against that and then just /install/ it to the OOo Python directory?
>
> Stefan
>
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
>
> ------
> World Intellectual Property Organization Disclaimer:
>
> This electronic message may contain privileged, confidential and
> copyright protected information. If you have received this e-mail
> by mistake, please immediately notify the sender and delete this
> e-mail and all its attachments. Please ensure all e-mail attachments
> are scanned for viruses prior to opening or using.
>
>
------
World Intellectual Property Organization Disclaimer:
This electronic message may contain privileged, confidential and
copyright protected information. If you have received this e-mail
by mistake, please immediately notify the sender and delete this
e-mail and all its attachments. Please ensure all e-mail attachments
are scanned for viruses prior to opening or using.
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Jul 26 14:19:38 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 26 Jul 2006 14:19:38 +0200
Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4
runtime
In-Reply-To:
References:
Message-ID: <44C75DDA.1040001@gkec.informatik.tu-darmstadt.de>
Hi Olivier,
Olivier Collioud wrote:
> here is what I get after installing Python 2.3.5
>
> C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static
> Building lxml version 1.0.2
> *NOTE*: Trying to build without Pyrex, needs pre-generated 'src/lxml/etree.c' !
> running bdist_wininst
> running build
> running build_py
> running build_ext
> building 'lxml.etree' extension
> C:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IC:\soft\python23\include -IC:\soft\python23\PC /Tcsrc/l
> xml/etree.c /Fobuild\temp.win32-2.3\Release\src/lxml/etree.obj -w -I..\libxml2-2.6.26.win32\include -I..\libxslt-1.1.17.win32\include -I..\zlib-1.2.3.
> win32\include -I..\iconv-1.9.2.win32\include
> Command line warning D4025 : overriding '/W3' with '/w'
> etree.c
> C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\soft\python23\libs /LIBPATH:C:\soft\python23\PCBui
> ld /EXPORT:initetree build\temp.win32-2.3\Release\src/lxml/etree.obj /OUT:build\lib.win32-2.3\lxml\etree.pyd /IMPLIB:build\temp.win32-2.3\Release\src/
> lxml\etree.lib ..\libxml2-2.6.26.win32\lib\libxml2_a.lib ..\libxslt-1.1.17.win32\lib\libxslt_a.lib ..\libxslt-1.1.17.win32\lib\libexslt_a.lib ..\zlib-
> 1.2.3.win32\lib\zlib.lib ..\iconv-1.9.2.win32\lib\iconv_a.lib
> Creating library build\temp.win32-2.3\Release\src/lxml\etree.lib and object build\temp.win32-2.3\Release\src/lxml\etree.exp
> LINK : warning LNK4049: locally defined symbol "_xmlFree" imported
> LINK : warning LNK4049: locally defined symbol "_xsltDocDefaultLoader" imported
> LINK : warning LNK4049: locally defined symbol "_xsltLibxsltVersion" imported
> libxslt_a.lib(numbers.obj) : error LNK2001: unresolved external symbol __ftol2
> libexslt_a.lib(date.obj) : error LNK2001: unresolved external symbol __ftol2
> libexslt_a.lib(strings.obj) : error LNK2001: unresolved external symbol __ftol2
> libexslt_a.lib(math.obj) : error LNK2001: unresolved external symbol __ftol2
> libxml2_a.lib(xpath.obj) : error LNK2001: unresolved external symbol __ftol2
> libxml2_a.lib(xpointer.obj) : error LNK2001: unresolved external symbol __ftol2
> libxml2_a.lib(xmlschemastypes.obj) : error LNK2001: unresolved external symbol __ftol2
> libxslt_a.lib(xsltutils.obj) : error LNK2001: unresolved external symbol __ftol2
> build\lib.win32-2.3\lxml\etree.pyd : fatal error LNK1120: 1 unresolved externals
> error: command '"C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe"' failed with exit status 1120
Hmm, ok, lxml commonly links against libm, so my guess is that you need to
supply that one also.
Another possibility could be that libxml2/xslt/exslt were compiled with a
newer compiler than the one you use. If you want to try, you can compile
libxml2 and libxslt also from source. Since you're building lxml statically
anyway, that should be ok for your purpose.
Stefan
PS: C'est soit l'Anglais pour la liste, soit le Fran?ais pour moi. Pas trop
bien de malin les deux...
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 27 07:11:27 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 27 Jul 2006 07:11:27 +0200
Subject: [lxml-dev] Custom element class lookup mechanisms
In-Reply-To:
References: <44C51A1A.4090203@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C84AFF.6090702@gkec.informatik.tu-darmstadt.de>
Andrew Lutomirski wrote:
> On 7/24/06, Stefan Behnel wrote:
> a little external module with different ways of determining the Python
> element class for a libxml2 node. The "lxml.elements.classlookup"
> m,odule currently implements three different ways of doing this:
[...]
> Other ways are of cause possible, so if anyone has an idea what to
> add, I'm open for suggestions.
>
> How about a way to make this setting per-parser instead of global?
Here is how to do it:
http://codespeak.net/svn/lxml/branch/capi/doc/elements.txt
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 27 09:37:08 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 27 Jul 2006 09:37:08 +0200
Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4
runtime
In-Reply-To: <44C75DDA.1040001@gkec.informatik.tu-darmstadt.de>
References:
<44C75DDA.1040001@gkec.informatik.tu-darmstadt.de>
Message-ID: <44C86D24.6010608@gkec.informatik.tu-darmstadt.de>
Stefan Behnel wrote:
> Olivier Collioud wrote:
>> C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static
>> Building lxml version 1.0.2
>> C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\soft\python23\libs /LIBPATH:C:\soft\python23\PCBui
>> ld /EXPORT:initetree build\temp.win32-2.3\Release\src/lxml/etree.obj /OUT:build\lib.win32-2.3\lxml\etree.pyd /IMPLIB:build\temp.win32-2.3\Release\src/
>> lxml\etree.lib ..\libxml2-2.6.26.win32\lib\libxml2_a.lib ..\libxslt-1.1.17.win32\lib\libxslt_a.lib ..\libxslt-1.1.17.win32\lib\libexslt_a.lib ..\zlib-
>> 1.2.3.win32\lib\zlib.lib ..\iconv-1.9.2.win32\lib\iconv_a.lib
>> Creating library build\temp.win32-2.3\Release\src/lxml\etree.lib and object build\temp.win32-2.3\Release\src/lxml\etree.exp
>> LINK : warning LNK4049: locally defined symbol "_xmlFree" imported
>> LINK : warning LNK4049: locally defined symbol "_xsltDocDefaultLoader" imported
>> LINK : warning LNK4049: locally defined symbol "_xsltLibxsltVersion" imported
>> libxslt_a.lib(numbers.obj) : error LNK2001: unresolved external symbol __ftol2
>> libexslt_a.lib(date.obj) : error LNK2001: unresolved external symbol __ftol2
>> libexslt_a.lib(strings.obj) : error LNK2001: unresolved external symbol __ftol2
>> libexslt_a.lib(math.obj) : error LNK2001: unresolved external symbol __ftol2
>> libxml2_a.lib(xpath.obj) : error LNK2001: unresolved external symbol __ftol2
>> libxml2_a.lib(xpointer.obj) : error LNK2001: unresolved external symbol __ftol2
>> libxml2_a.lib(xmlschemastypes.obj) : error LNK2001: unresolved external symbol __ftol2
>> libxslt_a.lib(xsltutils.obj) : error LNK2001: unresolved external symbol __ftol2
>> build\lib.win32-2.3\lxml\etree.pyd : fatal error LNK1120: 1 unresolved externals
>> error: command '"C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe"' failed with exit status 1120
> could be that libxml2/xslt/exslt were compiled with a
> newer compiler than the one you use.
Yup, that's it:
http://www.mail-archive.com/openssl-users at openssl.org/msg31551.html
http://www.issociate.de/board/post/9216/Compiling_with_VC6.html
So, try this:
> If you want to try, you can compile
> libxml2 and libxslt also from source. Since you're building lxml statically
> anyway, that should be ok for your purpose.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Jul 27 09:41:33 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 27 Jul 2006 09:41:33 +0200
Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4
runtime
In-Reply-To:
References:
Message-ID: <44C86E2D.6040406@gkec.informatik.tu-darmstadt.de>
Olivier Collioud wrote:
> following these instructions:
> http://codespeak.net/lxml/build.html#static-linking-on-windows
>
> Running these commands:
[...]
> C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst --static
> Building lxml version 1.0.2
[...]
> File "C:\Program Files\OpenOffice.org
> 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line 118,
> in set_macro
> self.macros["$(%s)" % macro] = d[key]
> KeyError: 'sdkinstallrootv1.1'
>
> My guess is that it is related to my MS-VS installation.
It's the OOo SDK. You might have to install it:
http://www.openoffice.org/dev_docs/source/sdk/
Stefan
From Olivier.Collioud at wipo.int Thu Jul 27 14:44:04 2006
From: Olivier.Collioud at wipo.int (Olivier Collioud)
Date: Thu, 27 Jul 2006 14:44:04 +0200
Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4
runtime
Message-ID:
Stefan,
Thanks a lot for your help.
Maybe I will try later to compile lxml for OO1.0.3/Py2.3.5, but I
suspect that OO need to be compiled as well withe the same compiler.
I would then prefer to make the effort to compile OO1.0.3 with Py2.4.3
(and then I would prefer by far someone to do it for me with a recent MS
compiler ;-p).
Now it has been decided to port my app to Java because all of our
business apps run on this platform.
Anyway, they have been impressed by the result, the performance and how
fast I wrote this app which will be a key component of our business sw
env.
I did not succeed in convincing my collegues to use Python but we
aggreed on a compromise by using Jython and Dom4j.
It has been easy to switch so far because Lxml and Dom4j are quite
similar API.
Anyway, I hope I will have other opportunities to build some app based
on Python/lxml which is my favourite XML dev toolkit (using also
EclipseWTP and PyDev).
And I would like to congratulate and thank you for the work done so far
and your help.
Kind regards,
Olivier.
>>> Stefan Behnel 27/07/06
9:41 AM >>>
Olivier Collioud wrote:
> following these instructions:
> http://codespeak.net/lxml/build.html#static-linking-on-windows
>
> Running these commands:
[...]
> C:\Download\lxml-source\lxml-1.0.2>python setup.py bdist_wininst
--static
> Building lxml version 1.0.2
[...]
> File "C:\Program Files\OpenOffice.org
> 2.0\program\python-core-2.3.4\lib\distutils\msvccompiler.py", line
118,
> in set_macro
> self.macros["$(%s)" % macro] = d[key]
> KeyError: 'sdkinstallrootv1.1'
>
> My guess is that it is related to my MS-VS installation.
It's the OOo SDK. You might have to install it:
http://www.openoffice.org/dev_docs/source/sdk/
Stefan
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
------
World Intellectual Property Organization Disclaimer:
This electronic message may contain privileged, confidential and
copyright protected information. If you have received this e-mail
by mistake, please immediately notify the sender and delete this
e-mail and all its attachments. Please ensure all e-mail attachments
are scanned for viruses prior to opening or using.
From faassen at infrae.com Fri Jul 28 19:30:06 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Fri, 28 Jul 2006 19:30:06 +0200
Subject: [lxml-dev] Compiling lxml with OpenOffice embeded python 2.3.4
runtime
In-Reply-To:
References:
Message-ID: <44CA499E.20100@infrae.com>
Hey Olivier,
Olivier Collioud wrote:
> Thanks a lot for your help.
[kind note and a peek into Olivier's development environment]
> And I would like to congratulate and thank you for the work done so far
> and your help.
I'll presume to speak for Stefan to thank you for the nice thank you!
I was quite interested to hear about your development environment - both
your personal preferred platform and the Java environment at work. It's
always nice to get a peek into the way people and organisations approach
software development, and what affects the choice for Python and lxml.
Regards,
Martijn
From faassen at infrae.com Mon Jul 31 11:41:53 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Mon, 31 Jul 2006 11:41:53 +0200
Subject: [lxml-dev] lxml eggs and unicode strings
Message-ID: <44CDD061.1020106@infrae.com>
Hi there,
I just found out that there is a hidden incompatibility in the compiled
versions of lxml eggs we provide, at least in linux. Our provided
versions are compiled with a Python that has 4 bytes unicode support
(probably the default on ubuntu on which I built the 2.4 extension).
If you try to install such an egg on a machine where unicode support is
compiled with 2 bytes only, it'll fail with errors such as:
ImportError:
/usr/local/lib/python2.4/site-packages/lxml-1.0.2-py2.4-linux-i686.egg/lxml/etree.so:
undefined symbol: PyUnicodeUCS4_FromEncodedObject
I wonder whether there's anything within the egg distribution mechanism
that lets us distinguish between such platforms. If not, I wonder what
to do instead -- the simplest would be to add a FAQ entry and tell
people to recompile from the sources.
By the way, does Pyrex generate different C code depending on whether 4
or 2 byte unicode is used? If so, then that would mean an installation
of pyrex as well for these people...
Regards,
Martijn
From gracinet at nuxeo.com Mon Jul 31 11:45:18 2006
From: gracinet at nuxeo.com (Georges Racinet)
Date: Mon, 31 Jul 2006 11:45:18 +0200
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To: <44CDD061.1020106@infrae.com>
References: <44CDD061.1020106@infrae.com>
Message-ID: <436357DF-BAE8-4E96-BF77-8A089BABCD54@nuxeo.com>
On Jul 31, 2006, at 11:41 AM, Martijn Faassen wrote:
> Hi there,
>
> I just found out that there is a hidden incompatibility in the
> compiled
> versions of lxml eggs we provide, at least in linux. Our provided
> versions are compiled with a Python that has 4 bytes unicode support
> (probably the default on ubuntu on which I built the 2.4 extension).
Noticed that last week, too. Sorry I forgot to mention it over there.
>
> If you try to install such an egg on a machine where unicode
> support is
> compiled with 2 bytes only, it'll fail with errors such as:
>
> ImportError:
> /usr/local/lib/python2.4/site-packages/lxml-1.0.2-py2.4-linux-
> i686.egg/lxml/etree.so:
> undefined symbol: PyUnicodeUCS4_FromEncodedObject
>
> I wonder whether there's anything within the egg distribution
> mechanism
> that lets us distinguish between such platforms. If not, I wonder what
> to do instead -- the simplest would be to add a FAQ entry and tell
> people to recompile from the sources.
As far as I know, this is typical of the Ubuntu distribution, and I'm
100% sure this egg was laid from Ubuntu. If the egg system could make
a difference between distributions, it would be ok, imho.
Charset problems are a plague.
>
> By the way, does Pyrex generate different C code depending on
> whether 4
> or 2 byte unicode is used? If so, then that would mean an installation
> of pyrex as well for these people...
I tried to compile from source on Mandriva, and it failed. I had no
time to investigate (low priority for the task I was working on), it
could very well have been something very trivial.
Yours,
---------
Georges Racinet Nuxeo SAS
gracinet at nuxeo.com http://nuxeo.com
Tel: +33 (0) 1 40 33 71 73
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 31 12:05:51 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 31 Jul 2006 12:05:51 +0200
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To: <44CDD061.1020106@infrae.com>
References: <44CDD061.1020106@infrae.com>
Message-ID: <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de>
Hi Martijn,
Martijn Faassen wrote:
> I just found out that there is a hidden incompatibility in the compiled
> versions of lxml eggs we provide, at least in linux. Our provided
> versions are compiled with a Python that has 4 bytes unicode support
> (probably the default on ubuntu on which I built the 2.4 extension).
AFAIK, UCS4 is the default on most (though maybe not all) Python
desktop/server installations under Linux, including SuSE, Redhat and
(apparently) Debian/Ubuntu. Distributors tend to care more about broad support
for all possible use cases than about memory requirements.
> If you try to install such an egg on a machine where unicode support is
> compiled with 2 bytes only, it'll fail with errors such as:
>
> ImportError:
> /usr/local/lib/python2.4/site-packages/lxml-1.0.2-py2.4-linux-i686.egg/lxml/etree.so:
> undefined symbol: PyUnicodeUCS4_FromEncodedObject
Sure. These cannot be compatible in current CPython (and that's highly
unlikely to change).
> I wonder whether there's anything within the egg distribution mechanism
> that lets us distinguish between such platforms. If not, I wonder what
> to do instead -- the simplest would be to add a FAQ entry and tell
> people to recompile from the sources.
I wouldn't know any way egg naming could help here. Google yields some
discussions about this topic on the distutils list, but it seems they have not
made their way into either distutils or setuptools.
http://mail.python.org/pipermail/distutils-sig/2005-October/005222.html
Anyway, if you have to recompile your Python version to get UCS2 strings,
there's no reason not to require the same for the C extensions.
Given the fact that all major distributions seem to use UCS4, a FAQ entry
should be enough.
> By the way, does Pyrex generate different C code depending on whether 4
> or 2 byte unicode is used? If so, then that would mean an installation
> of pyrex as well for these people...
No, the distinction between different unicode encodings is handled completely
inside the Python interpreter. The C code is not affected and Pyrex does not
rely on it.
To support parsing from unicode, lxml even has generic run-time support code
to detect the internal unicode encoding, which should work for any encoding
supported by libxml2/libiconv.
Stefan
From faassen at infrae.com Mon Jul 31 12:59:54 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Mon, 31 Jul 2006 12:59:54 +0200
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To: <436357DF-BAE8-4E96-BF77-8A089BABCD54@nuxeo.com>
References: <44CDD061.1020106@infrae.com>
<436357DF-BAE8-4E96-BF77-8A089BABCD54@nuxeo.com>
Message-ID: <44CDE2AA.6040701@infrae.com>
Georges Racinet wrote:
>
> On Jul 31, 2006, at 11:41 AM, Martijn Faassen wrote:
>
>> Hi there,
>>
>> I just found out that there is a hidden incompatibility in the compiled
>> versions of lxml eggs we provide, at least in linux. Our provided
>> versions are compiled with a Python that has 4 bytes unicode support
>> (probably the default on ubuntu on which I built the 2.4 extension).
>
> Noticed that last week, too. Sorry I forgot to mention it over there.
What platform were you on when you noticed this? Mandriva (as you
mention below)?
[snip]
> As far as I know, this is typical of the Ubuntu distribution, and I'm
> 100% sure this egg was laid from Ubuntu. If the egg system could make a
> difference between distributions, it would be ok, imho.
I think Red Hat has been compiling Python with 4 bytes characters for
ages too, so while this was Ubuntu (I did it), I'm also pretty sure it's
also the case on Fedora.
> Charset problems are a plague.
This is not your common charset problems. Mostly one can avoid the
plague by just using unicode, but that's what we're doing here..
>> By the way, does Pyrex generate different C code depending on whether 4
>> or 2 byte unicode is used? If so, then that would mean an installation
>> of pyrex as well for these people...
>
> I tried to compile from source on Mandriva, and it failed. I had no time
> to investigate (low priority for the task I was working on), it could
> very well have been something very trivial.
Interesting; let us know if you find out more.
It's important to have the lxml C sources compile on all platforms, as
otherwise people will be forced to use Pyrex, possibly even the forked
version of Pyrex Stephan is maintaining.
Regards,
Martijn
From faassen at infrae.com Mon Jul 31 13:03:03 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Mon, 31 Jul 2006 13:03:03 +0200
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To: <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de>
References: <44CDD061.1020106@infrae.com>
<44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de>
Message-ID: <44CDE367.50109@infrae.com>
Stefan Behnel wrote:
[snip]
> Anyway, if you have to recompile your Python version to get UCS2 strings,
> there's no reason not to require the same for the C extensions.
Ah, so current CPython sources builds with 4 byte unicode by default? If
this is for sure, then we're fairly safe. If not, then I wonder what to
do - you'd like lxml to work with hand-compiled Pythons..
> Given the fact that all major distributions seem to use UCS4, a FAQ entry
> should be enough.
It definitely is encouraging.
>> By the way, does Pyrex generate different C code depending on whether 4
>> or 2 byte unicode is used? If so, then that would mean an installation
>> of pyrex as well for these people...
>
> No, the distinction between different unicode encodings is handled completely
> inside the Python interpreter. The C code is not affected and Pyrex does not
> rely on it.
Good, that's what I was hoping for. That at least means people should be
able to recompile without installing Pyrex first.
> To support parsing from unicode, lxml even has generic run-time support code
> to detect the internal unicode encoding, which should work for any encoding
> supported by libxml2/libiconv.
Cool!
Regards,
Martijn
From tseaver at palladion.com Mon Jul 31 16:37:07 2006
From: tseaver at palladion.com (Tres Seaver)
Date: Mon, 31 Jul 2006 10:37:07 -0400
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To: <44CDE367.50109@infrae.com>
References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de>
<44CDE367.50109@infrae.com>
Message-ID: <44CE1593.70902@palladion.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martijn Faassen wrote:
> Ah, so current CPython sources builds with 4 byte unicode by default? If
> this is for sure, then we're fairly safe. If not, then I wonder what to
> do - you'd like lxml to work with hand-compiled Pythons..
Nope. The distros all pass the '--enable-unicode=ucs4' to configure.
The default value for that option is 'yes', which maps to 'ucs2' unless
you also have a usc4-enabled TCL.
Tres.
- --
===================================================================
Tres Seaver +1 202-558-7113 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEzhWT+gerLs4ltQ4RAprdAKDatD1WDYW+wKzJlLZYra0OGXcxLACeJSDs
CGNZmUnpCDiYbPuF9lwNO00=
=8iYC
-----END PGP SIGNATURE-----
From tseaver at palladion.com Mon Jul 31 16:44:24 2006
From: tseaver at palladion.com (Tres Seaver)
Date: Mon, 31 Jul 2006 10:44:24 -0400
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To: <44CE1593.70902@palladion.com>
References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com>
<44CE1593.70902@palladion.com>
Message-ID:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Tres Seaver wrote:
> Martijn Faassen wrote:
>
>>> Ah, so current CPython sources builds with 4 byte unicode by default? If
>>> this is for sure, then we're fairly safe. If not, then I wonder what to
>>> do - you'd like lxml to work with hand-compiled Pythons..
>
> Nope. The distros all pass the '--enable-unicode=ucs4' to configure.
> The default value for that option is 'yes', which maps to 'ucs2' unless
> you also have a usc4-enabled TCL.
>
>
> Tres.
> --
> ===================================================================
> Tres Seaver +1 202-558-7113 tseaver at palladion.com
> Palladion Software "Excellence by Design" http://palladion.com
Perhaps we could use the following test inside 'setup.py', and modify
the name of the binary egg to include the 'ucs2' vs. 'ucs4' flag?::
ucs_flag = sys.maxunicode > 65536 and 'ucs4' or 'ucs2'
Tres.
- --
===================================================================
Tres Seaver +1 202-558-7113 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEzhdI+gerLs4ltQ4RAtQHAKDaSQm9mvJDj+oGUQJZOgHjdENnagCgh0gZ
qQ9dwzju5C7s9KIlJVOJsVs=
=qiPy
-----END PGP SIGNATURE-----
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 31 18:15:57 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 31 Jul 2006 18:15:57 +0200
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To:
References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com> <44CE1593.70902@palladion.com>
Message-ID: <44CE2CBD.103@gkec.informatik.tu-darmstadt.de>
Tres Seaver wrote:
> Tres Seaver wrote:
>>> Martijn Faassen wrote:
>>>
>>>>> Ah, so current CPython sources builds with 4 byte unicode by default? If
>>>>> this is for sure, then we're fairly safe. If not, then I wonder what to
>>>>> do - you'd like lxml to work with hand-compiled Pythons..
>>>
>>> Nope. The distros all pass the '--enable-unicode=ucs4' to configure.
>>> The default value for that option is 'yes', which maps to 'ucs2' unless
>>> you also have a usc4-enabled TCL.
Right, that's what I witness, too.
> Perhaps we could use the following test inside 'setup.py', and modify
> the name of the binary egg to include the 'ucs2' vs. 'ucs4' flag?::
>
> ucs_flag = sys.maxunicode > 65536 and 'ucs4' or 'ucs2'
While that's nice to have, it doesn't really help us as a) we'd still have to
build and ship both eggs (while the current UCS4 eggs seem to fit most users)
and b) easy_install doesn't currently handle these extensions, so it would
most likely just stop finding the eggs on cheeseshop if we added additional
sections to the egg name.
I still think it's enough to add a FAQ entry (which I already did) and
otherwise ignore the problem for now. That way, the major distros are
supported out-of-the-box. And for those who happen to use a UCS2 system, it's
really not a big deal to build lxml from sources on a fairly recent and well
installed Linux system.
Stefan
From faassen at infrae.com Mon Jul 31 20:09:20 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Mon, 31 Jul 2006 20:09:20 +0200
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To: <44CE2CBD.103@gkec.informatik.tu-darmstadt.de>
References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com> <44CE1593.70902@palladion.com>
<44CE2CBD.103@gkec.informatik.tu-darmstadt.de>
Message-ID: <44CE4750.9010607@infrae.com>
Stefan Behnel wrote:
> Tres Seaver wrote:
[snip]
>> Perhaps we could use the following test inside 'setup.py', and modify
>> the name of the binary egg to include the 'ucs2' vs. 'ucs4' flag?::
>>
>> ucs_flag = sys.maxunicode > 65536 and 'ucs4' or 'ucs2'
>
> While that's nice to have, it doesn't really help us as a) we'd still have to
> build and ship both eggs (while the current UCS4 eggs seem to fit most users)
There'd be a significant amount of people who just build Python by hand
though, and they can't use our eggs...
[snip]
> I still think it's enough to add a FAQ entry (which I already did) and
> otherwise ignore the problem for now. That way, the major distros are
> supported out-of-the-box. And for those who happen to use a UCS2 system, it's
> really not a big deal to build lxml from sources on a fairly recent and well
> installed Linux system.
I agree that's all we can do on the lxml side.
Apart from that, we can also talk to the distutils/setuptools people and
raise this issue again. It's a fundamental problem with binary eggs that
use unicode as long as Python ships with this configuration option. I'll
send off a mail on this to the distutils SIG.
Regards,
Martijn
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 31 20:10:02 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 31 Jul 2006 20:10:02 +0200
Subject: [lxml-dev] lxml - exslt - regexp:match()
In-Reply-To: <149473834@web.de>
References: <149473834@web.de>
Message-ID: <44CE477A.1030000@gkec.informatik.tu-darmstadt.de>
Michael Zeidler wrote:
> regexp:match('123abc567','([0-9]+)([a-z]+)([0-9]+)') gibt kein Arry mit den gematchten
> Gruppen zur?ck:
> Wenn ich also mit
>
> die variable $test setzte, m?sste ich mit $test[0], $test[1], usw. auf die gematchten gruppen
> zugreifen k?nnen. Siehe http://www.exslt.org/regexp/functions/match/index.html
>
> [translation]:
>
> regexp:match('123abc567','([0-9]+)([a-z]+)([0-9]+)')
>
> does not return an array containing the matched groups. Something like this
>
> should allow me to ask for "$test[0]" etc.
>
> See http://www.exslt.org/regexp/functions/match/index.html
Hmm, interesting. The page doesn't actually say that this is supposed to work.
All they provide is an example with a /single/ group. The result of your test
case is not defined.
For comparison, I now implemented the examples from the page as unit tests,
which sadly showed that Python's regexps are incompatible with what EXSLT
requires. The Python RE "([a-z])+ " does not match "test " as in EXSLT, only
the last "t" is returned for the group by re.findall(). So we can't claim
compatibility with EXSLT at this point. -- Note, though, that I never really
said it was compatible, it just builds on Python's re module. I still think
that's enough for a Python XML library.
That said, I fixed your use case in the current trunk, as I think it makes
sense to expect the result above from such a call. Note, however, that EXSLT
dictates that the first element in a non-global RE result (without 'g' flag)
must be the entire string that matched, which even fits the semantics of the
group() method in Python's MatchObjects. So your $test[0] will contain
"123abc567", $test[1] is "123" etc.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Jul 31 20:15:31 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 31 Jul 2006 20:15:31 +0200
Subject: [lxml-dev] lxml eggs and unicode strings
In-Reply-To: <44CE4750.9010607@infrae.com>
References: <44CDD061.1020106@infrae.com> <44CDD5FF.9060908@gkec.informatik.tu-darmstadt.de> <44CDE367.50109@infrae.com> <44CE1593.70902@palladion.com>
<44CE2CBD.103@gkec.informatik.tu-darmstadt.de>
<44CE4750.9010607@infrae.com>
Message-ID: <44CE48C3.9030305@gkec.informatik.tu-darmstadt.de>
Martijn Faassen wrote:
> Apart from that, we can also talk to the distutils/setuptools people and
> raise this issue again. It's a fundamental problem with binary eggs that
> use unicode as long as Python ships with this configuration option. I'll
> send off a mail on this to the distutils SIG.
Good idea. Thanks for taking care of it.
It may no longer fit into the 2.5 time frame, but it's still a problem that
needs to be solved some time...
Stefan
From agustin.villena at gmail.com Mon Jul 31 18:53:24 2006
From: agustin.villena at gmail.com (=?ISO-8859-1?Q?Agust=EDn_Villena?=)
Date: Mon, 31 Jul 2006 12:53:24 -0400
Subject: [lxml-dev] An intriguing behaviour of xpath in lxml
Message-ID:
Hi!
Just a question.
Assume the next code
from lxml import etree
from StringIO import StringIO
xmlText = "This is a test"
doc = etree.parse(StringIO(xmlText))
root = doc.xpath("/")
The last line throws the next exception
Not yet implemented result node type: 9
Traceback (most recent call last):
File "C:\Archivos de programa\ActiveState Komodo
3.5\lib\support\dbgp\pythonlib\dbgp\client.py", line 1843, in runMain
self.dbg.runfile(debug_args[0], debug_args)
File "C:\Archivos de programa\ActiveState Komodo
3.5\lib\support\dbgp\pythonlib\dbgp\client.py", line 1538, in runfile
h_execfile(file, args, module=main, tracer=self)
File "C:\Archivos de programa\ActiveState Komodo
3.5\lib\support\dbgp\pythonlib\dbgp\client.py", line 596, in __init__
execfile(file, globals, locals)
File "C:\dev\projects\python\python-xpath-example.py", line 7, in
__main__
root = doc.xpath("/")
File "c:\python24\lib\site-packages\lxml\etree.pyx", line 485, in
etree._ElementTree.xpath
File "c:\python24\lib\site-packages\lxml\xpath.pxi", line 75, in
etree._XPathEvaluatorBase.evaluate
File "c:\python24\lib\site-packages\lxml\xpath.pxi", line 212, in
etree.XPathDocumentEvaluator.__call__
File "c:\python24\lib\site-packages\lxml\xpath.pxi", line 108, in
etree._XPathEvaluatorBase._handle_result
File "c:\python24\lib\site-packages\lxml\extensions.pxi", line 269,
in etree._unwrapXPathObject
File "c:\python24\lib\site-packages\lxml\extensions.pxi", line 317,
in etree._createNodeSetResult
NotImplementedError
My question is: which is the reason behind this behaviour (if is there
one)?
(I already know that xpath(".") in the document node works, but is
beyond my understanding why xpath("/") is not implemented.
Cheers
Agustin