From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Oct 2 11:35:43 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 02 Oct 2006 11:35:43 +0200
Subject: [lxml-dev] lxml replace() deletes tail
In-Reply-To: <451D35D7.4040306@openplans.org>
References: <451D35D7.4040306@openplans.org>
Message-ID: <4520DD6F.7090206@gkec.informatik.tu-darmstadt.de>
Hi,
Chris Abraham wrote:
> We have a question about the etree.replace() function. We found that it
> doesn't preserve the tail text from the replaced node when inserting a
> new node. Perhaps this is the intended behavior, but, to us, it was
> unexpected. In the example below, notice how the "tail" text is deleted
> when the
is replaced:
>
>>>> tree = etree.HTML("
text
> beforetextin
tail")
>>>> newel = etree.HTML("new
")
>>>> tree[0].replace(tree[0][0], newel[0][0])
>>>> etree.tostring(tree)
> 'text beforenew
'
That *is* the expected behaviour. :)
When you replace the element "textin
tail" with the element
"new
" you get "new
".
Note that the tail is a property of the element, so it would rather be
unexpected if the replaced element copied its own tail over to the new element.
You can always copy the tail from the original element by hand, in case you
need to.
Stefan
From faassen at infrae.com Tue Oct 3 15:22:21 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Tue, 03 Oct 2006 15:22:21 +0200
Subject: [lxml-dev] a buildout for lxml
In-Reply-To: <451BAEF8.1020101@infrae.com>
References: <451BAEF8.1020101@infrae.com>
Message-ID: <4522640D.1040105@infrae.com>
Hey,
I've just expanded on this and added some more explanation on my weblog,
here:
http://faassen.n--tree.net/blog/view/weblog/2006/10/03/0
Regards,
Martijn
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Oct 4 09:14:38 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 04 Oct 2006 09:14:38 +0200
Subject: [lxml-dev] a buildout for lxml
In-Reply-To: <4522640D.1040105@infrae.com>
References: <451BAEF8.1020101@infrae.com> <4522640D.1040105@infrae.com>
Message-ID: <45235F5E.20307@gkec.informatik.tu-darmstadt.de>
Martijn Faassen wrote:
> I've just expanded on this and added some more explanation on my weblog,
> here:
>
> http://faassen.n--tree.net/blog/view/weblog/2006/10/03/0
Thanks, Martijn.
Sounds pretty cool, but I'll have to see when I find the time to look into it.
Stefan
From sidnei at enfoldsystems.com Wed Oct 4 15:02:12 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Wed, 4 Oct 2006 10:02:12 -0300
Subject: [lxml-dev] a buildout for lxml
In-Reply-To: <45235F5E.20307@gkec.informatik.tu-darmstadt.de>
References: <451BAEF8.1020101@infrae.com> <4522640D.1040105@infrae.com>
<45235F5E.20307@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061004130212.GH4262@cotia>
On Wed, Oct 04, 2006 at 09:14:38AM +0200, Stefan Behnel wrote:
| Martijn Faassen wrote:
| > I've just expanded on this and added some more explanation on my weblog,
| > here:
| >
| > http://faassen.n--tree.net/blog/view/weblog/2006/10/03/0
Hey Martijn,
Would you like to add the buildout to the buildbot so it gets tested too?
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From faassen at infrae.com Wed Oct 4 17:33:03 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Wed, 04 Oct 2006 17:33:03 +0200
Subject: [lxml-dev] a buildout for lxml
In-Reply-To: <20061004130212.GH4262@cotia>
References: <451BAEF8.1020101@infrae.com>
<4522640D.1040105@infrae.com> <45235F5E.20307@gkec.informatik.tu-darmstadt.de>
<20061004130212.GH4262@cotia>
Message-ID: <4523D42F.6030401@infrae.com>
Sidnei da Silva wrote:
> On Wed, Oct 04, 2006 at 09:14:38AM +0200, Stefan Behnel wrote:
> | Martijn Faassen wrote:
> | > I've just expanded on this and added some more explanation on my weblog,
> | > here:
> | >
> | > http://faassen.n--tree.net/blog/view/weblog/2006/10/03/0
>
> Hey Martijn,
>
> Would you like to add the buildout to the buildbot so it gets tested too?
I've no idea how to do that and don't really have time to learn
buildbot, but if someone can do that then it might be a nice way to test
a number of versions of lxml against a number of versions of
libxml2/libxslt automatically. It'd essentially just take a bunch of
different buildout.cfg files with some download URLs and version numbers
changed.
Regards,
Martijn
From sidnei at enfoldsystems.com Wed Oct 4 20:31:57 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Wed, 4 Oct 2006 15:31:57 -0300
Subject: [lxml-dev] a buildout for lxml
In-Reply-To: <4523D42F.6030401@infrae.com>
References: <451BAEF8.1020101@infrae.com> <4522640D.1040105@infrae.com>
<45235F5E.20307@gkec.informatik.tu-darmstadt.de>
<20061004130212.GH4262@cotia> <4523D42F.6030401@infrae.com>
Message-ID: <20061004183157.GG4164@cotia>
On Wed, Oct 04, 2006 at 05:33:03PM +0200, Martijn Faassen wrote:
| I've no idea how to do that and don't really have time to learn
| buildbot, but if someone can do that then it might be a nice way to test
| a number of versions of lxml against a number of versions of
| libxml2/libxslt automatically.
I'm not asking you to do that. I'm asking if you want to see it
done. :)
| It'd essentially just take a bunch of
| different buildout.cfg files with some download URLs and version numbers
| changed.
Sounds good. I will take a look at that.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From Holger.Joukl at LBBW.de Thu Oct 5 14:03:50 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Thu, 5 Oct 2006 14:03:50 +0200
Subject: [lxml-dev] [objectify] StringElement: Implementing string
methods, revisited
In-Reply-To: <451AA12F.2040801@gkec.informatik.tu-darmstadt.de>
Message-ID:
Hi,
Stefan Behnel schrieb am
27.09.2006 18:05:03:
>
> Maintenance is not my main concern. The problem is that we provide an
> incomplete interface here, so it's "kinda compatible, but not quite",
which I
> consider worse than "no string methods there". I fear that the choice of
> methods may look too arbitrary to understand.
>
> But as I said, feel free to convince me.
>
> Stefan
I've experimented with that some more and came to think you're right.
It's more of a documentation problem than maintenance and it is a lot
more concise to have "wanna use string methods, use .pyval" than
having a bunch of supported and some unsupported string methods.
Greetings,
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From Holger.Joukl at LBBW.de Thu Oct 5 17:23:24 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Thu, 5 Oct 2006 17:23:24 +0200
Subject: [lxml-dev] [objectify] optimization issues
In-Reply-To:
Message-ID:
Hi,
I'm currently running into some optimization issues. Be warned this
post is rather lenghty...
First some background:
I'm experimenting with a custom objectified datetime class based on
Python's
datetime that employs the dateutil.parser module to detect if some element
value
is in a valid datetime format, i.e. the parse function from dateutil.parser
is used to implement the type_check for the PyType type registry.
1)
Invoking this parse method is quite expensive, so I want this to happen
rarely. As I am using "recursive element dumping" as default I found that
for every __str__ call .pyval of the ObjectifiedDataElements in a tree is
accessed, which in turn triggers parsing for my custom datetime class.
As I don't really see a way to avoid this I propose the introduction of
an additional property "_pyval_repr" that can be overridden in subclasses,
which makes it possible to simply return element.text, if getting .pyval
is expensive. S.th. like:
*** ORIG/lxml-1.1/src/lxml/objectify.pyx Wed Sep 27 09:18:30 2006
--- src/lxml/objectify.pyx Wed Oct 4 11:00:09 2006
***************
*** 484,489 ****
--- 484,493 ----
def __get__(self):
return textOf(self._c_node)
+ property _pyval_repr:
+ def __get__(self):
+ return self.pyval
+
def __str__(self):
return textOf(self._c_node) or ''
***************
*** 931,938 ****
cdef object _dump(_Element element, int indent):
indentstr = " " * indent
! if hasattr(element, "pyval"):
! value = element.pyval
else:
value = textOf(element._c_node)
if value and not value.strip():
--- 935,942 ----
cdef object _dump(_Element element, int indent):
indentstr = " " * indent
! if hasattr(element, "_pyval_repr"):
! value = element._pyval_repr
else:
value = textOf(element._c_node)
if value and not value.strip():
This can substantially speed up things for complicated type_check
routines (in my usecase :)
2)
Then, I figured to reduce the calls to ObjectifiedElement.__str__ in
general.
I am using a custom logging module that implies a function that converts
its
input arguments to strings, concatenates them and then writes them out
through
the logger (which substitutes stdout) if the loglevel of the caller meets
the
set loglevel for the output file/stdout.
As the conversion to strings is performed before any loglevel checking,
reversing
this order leads to a lot less str() calls on the objects. To my
astonishment
things actually slowed massively down, though.
I tried to come up with a minimal example of what seems to happen, using
only lxml standard:
Runs slow:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root.i
print root.f
print root.s
print root.d
""" "n = root.i; n = root.f; n = root.s; n = root.d"
17
238.3343
what
2006-03-03
10 loops -> 0.0102 secs
17
238.3343
what
2006-03-03
100 loops -> 0.101 secs
17
238.3343
what
2006-03-03
1000 loops -> 1.02 secs
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
raw times: 1.03 1.02 1.02
1000 loops, best of 3: 1.02 msec per loop
Runs fast:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root
""" "n = root.i; n = root.f; n = root.s; n = root.d"
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
10 loops -> 0.00109 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
100 loops -> 0.00928 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
1000 loops -> 0.0897 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
10000 loops -> 0.905 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
raw times: 0.893 0.911 0.911
10000 loops, best of 3: 89.3 usec per loop
Recursively outputting root before accessing its child elements
really speeds things up, even though I accessed all elements in
the slow example, too.
Why is this? I'm clueless.
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From faassen at infrae.com Thu Oct 5 18:30:41 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Thu, 05 Oct 2006 18:30:41 +0200
Subject: [lxml-dev] a buildout for lxml
In-Reply-To: <20061004183157.GG4164@cotia>
References: <451BAEF8.1020101@infrae.com>
<4522640D.1040105@infrae.com> <45235F5E.20307@gkec.informatik.tu-darmstadt.de> <20061004130212.GH4262@cotia>
<4523D42F.6030401@infrae.com> <20061004183157.GG4164@cotia>
Message-ID: <45253331.5040706@infrae.com>
Sidnei da Silva wrote:
> On Wed, Oct 04, 2006 at 05:33:03PM +0200, Martijn Faassen wrote:
> | I've no idea how to do that and don't really have time to learn
> | buildbot, but if someone can do that then it might be a nice way to test
> | a number of versions of lxml against a number of versions of
> | libxml2/libxslt automatically.
>
> I'm not asking you to do that. I'm asking if you want to see it
> done. :)
Oh, I misread "would you like to add the buildout to the buildbot?" in
your mail as me actually doing something. :) Of course I'd like it done! :)
> | It'd essentially just take a bunch of
> | different buildout.cfg files with some download URLs and version numbers
> | changed.
>
> Sounds good. I will take a look at that.
Great! Thanks.
Martijn
From achimkern at hirschmanngmbh.com Fri Oct 6 10:40:04 2006
From: achimkern at hirschmanngmbh.com (Achim Kern)
Date: Fri, 06 Oct 2006 10:40:04 +0200
Subject: [lxml-dev] Building Problems
In-Reply-To: <451A9FF6.3010108@gkec.informatik.tu-darmstadt.de> (behnel ml's
message of "27 Sep 2006 15:59:50 UT")
References: <451A9FF6.3010108@gkec.informatik.tu-darmstadt.de>
Message-ID: <87vemxet23.fsf@hirschmanngmbh.com>
Hi Stefan,
thanks for your rapid answer. As I wasn't in office until today it
I wasn't able to answer. Sorry for this.
behnel_ml at gkec.informatik.tu-darmstadt.de writes:
> Hi Achim,
>
> Achim Kern wrote:
>> during googeling on how to write easier xml datastores with python I
>> just found our project. Especialy the objectify modules impressed
>> me. So to test things I wanted to install it. Unfortunatly I can not
>> use the provided debian package as there is only one for version 1.03
>> not including the objectify extension. So I downloaded the source of
>> 1.1.1 from codespeak.net extracted it and that's what it did.
>>
>> # tar -xzf lxml-1.1.1.tgz
>> # ce lxml-1.1.1
>> # make clean test.
>> python setup.py build_ext -i
>> Building lxml version 1.1
>
> 1.1? Not 1.1.1?
I tried both but none seamed to work for me.
>
>
>> running build_ext
>> python test.py -p -v
>
> You did build it, right? I assume this is a second try after already having
> built it once.
>
I assume not. :-(
> Did you do "make clean" in between? That removes the ".c" files, which means
> you need a special Pyrex version to rebuild it. See "doc/build.txt". If you
> only unpack the tgz and build from that, you should not need Pyrex as the ".c"
> files are included.
>
> Please retry the above with a clean setup and if that still fails, send a
> complete copy of your attempted commands and the resulting output to the list.
>
I tested it with a clean version which I downloaded and it builds like
a dream. I really not clear what happend the first time. Maybe it was
because I messed something up with the debian package which I had
installed.
Sorry for wasting your time.
Regards
Achim
From Holger.Joukl at LBBW.de Fri Oct 6 11:51:16 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Fri, 6 Oct 2006 11:51:16 +0200
Subject: [lxml-dev] [objectify] DataElement factory problem
In-Reply-To:
Message-ID:
Hi,
I ran into a problem using the objectify DataElement factory function.
When implementing an _init method in a derived ObjectifiedDataElement
class, it is impossible to access the element.text in _init because
this has not yet been set when _init gets called by _elementFactory.
Don't see a nice clean way to solve that. Maybe instrument
_elementFactory with an optional skip_init argument that allows for a
delayed manual call of _init in corner cases?
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From Holger.Joukl at LBBW.de Fri Oct 6 17:24:24 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Fri, 6 Oct 2006 17:24:24 +0200
Subject: [lxml-dev] [objectify] optimization issues
In-Reply-To:
Message-ID:
Hi,
as a followup to my last post some more strange observations.
To find out why the call to str(root) aka objectify.dump(root)
speeds up things:
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
objectify.dump(root)
""" "n = root.i; n = root.f; n = root.s; n = root.d"
10 loops -> 0.000898 secs
100 loops -> 0.00887 secs
1000 loops -> 0.0885 secs
10000 loops -> 0.887 secs
raw times: 0.893 0.899 0.903
10000 loops, best of 3: 89.3 usec per loop
I implemented a visit function that does
nothing more than visit every node:
def visit(_Element element not None):
"""Return a recursively generated string representation of an element.
"""
_visit(element)
cdef object _visit(_Element element):
for child in element.iterchildren():
_visit(child)
But:
/apps/pydev/gcc/3.4.4/bin/python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
objectify.visit(root)
""" "n = root.i; n = root.f; n = root.s; n = root.d"
10 loops -> 0.0104 secs
100 loops -> 0.103 secs
1000 loops -> 1.04 secs
raw times: 1.04 1.02 1.03
1000 loops, best of 3: 1.02 msec per loop
This is actually much slower, again.
Now if I change the visit code to:
def visit(_Element element not None):
"""Return a recursively generated string representation of an element.
"""
_visit(element)
cdef object _visit(_Element element):
element.items() # my only addition
for child in element.iterchildren():
_visit(child)
Now it's fast, again:
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
objectify.visit(root)
""" "n = root.i; n = root.f; n = root.s; n = root.d"
10 loops -> 0.000887 secs
100 loops -> 0.0087 secs
1000 loops -> 0.088 secs
10000 loops -> 0.874 secs
raw times: 0.876 0.865 0.87
10000 loops, best of 3: 86.5 usec per loop
All of this because of the additional element.items()???
I'm lost. Hope somebody can point out a serious misunderstanding of mine,
where my systematic testing error lies or come up with an actual
explanation :)
As I'm abroad next week I'll follow up on this Tuesday in a week.
Greetings,
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From sidnei at enfoldsystems.com Fri Oct 6 17:57:33 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Fri, 6 Oct 2006 12:57:33 -0300
Subject: [lxml-dev] 'dist' directory in Pyrex
Message-ID: <20061006155733.GD4491@cotia>
Is it intentional that there's a 'dist' directory in the lxml copy of
Pyrex? I suspect that it shouldn't be there (makes checkout needlesly
long).
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From sidnei at enfoldsystems.com Fri Oct 6 18:10:32 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Fri, 6 Oct 2006 13:10:32 -0300
Subject: [lxml-dev] setup.py won't work with svn 1.4
Message-ID: <20061006161031.GE4491@cotia>
Subversion 1.4 changed the .svn/entries format. setup.py will break.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Oct 9 11:04:00 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 09 Oct 2006 11:04:00 +0200
Subject: [lxml-dev] 'dist' directory in Pyrex
In-Reply-To: <20061006155733.GD4491@cotia>
References: <20061006155733.GD4491@cotia>
Message-ID: <452A1080.1040702@gkec.informatik.tu-darmstadt.de>
Sidnei da Silva wrote:
> Is it intentional that there's a 'dist' directory in the lxml copy of
> Pyrex? I suspect that it shouldn't be there (makes checkout needlesly
> long).
True, just removed the content.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Oct 9 19:21:04 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 09 Oct 2006 19:21:04 +0200
Subject: [lxml-dev] [objectify] optimization issues
In-Reply-To:
References:
Message-ID: <452A8500.4000905@gkec.informatik.tu-darmstadt.de>
Hi Holger,
first of all: please create a new thread for a new topic instead of responding
to an existing message. Most mail client honour the "in reply to" hint in the
header and sort them into the old thread.
Then: what you observe are most likely GC 'issues'. The thing is: if the
element already exists as Python object, it is reused, which is much faster
then creating a new one. So in the cases where your code runs faster, you can
assume that the object survived a larger portion of your code without being
re-instantiated.
Especially recursive printing instantiates the entire tree, so if the objects
are not deleted directly afterwards, this has a performance effect on code
that runs afterwards.
Stefan
From cabraham at openplans.org Mon Oct 9 23:41:19 2006
From: cabraham at openplans.org (Chris Abraham)
Date: Mon, 09 Oct 2006 17:41:19 -0400
Subject: [lxml-dev] lxml and html encodings
Message-ID: <452AC1FF.3030403@openplans.org>
Hello,
We are getting some unexpected behavior when processing documents with a
Shift_JIS encoding.
We are trying to serialize an HTML document using an XSLT transform.
Our results don't agree with the FAQ:
http://codespeak.net/lxml/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings.
Please see the comments in the attached demo.py which reads in home.html
and demonstrates our problem.
Any ideas about this? Thanks.
Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: demo.py
Type: text/x-python
Size: 857 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20061009/40a644a3/attachment.py
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20061009/40a644a3/attachment.html
From ianb at colorstudy.com Wed Oct 11 00:16:51 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 10 Oct 2006 17:16:51 -0500
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <452AC1FF.3030403@openplans.org>
References: <452AC1FF.3030403@openplans.org>
Message-ID: <452C1BD3.3050804@colorstudy.com>
Chris Abraham wrote:
> Hello,
> We are getting some unexpected behavior when processing documents with a
> Shift_JIS encoding.
> We are trying to serialize an HTML document using an XSLT transform.
> Our results don't agree with the FAQ:
> http://codespeak.net/lxml/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings.
> Please see the comments in the attached demo.py which reads in home.html
> and demonstrates our problem.
Does etree.HTML() pay any attention to ? I notice it generates that
tag (through the XSL I assume), but the parser doesn't necessarily have
the same logic.
I think for HTML it is better if the encoding is determined before
parsing, as there's several types of information that come into play. I
think the FAQ entry doesn't really apply here, since it isn't really
XML. This library probably has the best rules for determining encoding:
http://chardet.feedparser.org/
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
From ianb at colorstudy.com Wed Oct 11 00:22:29 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 10 Oct 2006 17:22:29 -0500
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <452C1BD3.3050804@colorstudy.com>
References: <452AC1FF.3030403@openplans.org> <452C1BD3.3050804@colorstudy.com>
Message-ID: <452C1D25.4050403@colorstudy.com>
Ian Bicking wrote:
> I think for HTML it is better if the encoding is determined before
> parsing, as there's several types of information that come into play. I
> think the FAQ entry doesn't really apply here, since it isn't really
> XML. This library probably has the best rules for determining encoding:
> http://chardet.feedparser.org/
Actually, now that I look at this library it's probably more clever than
necessary. Generally there should be good encoding information already
present in the request, and you don't need heuristics like this to
figure it out. Nevertheless, you should probably figure out decoding
early, before parsing. To figure out the encoding specified in the
tag, you should probably just use a regular expression (since you
can't very well parse it to figure out how to decode it before you pass
it to the parser).
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
From ltucker at openplans.org Wed Oct 11 16:10:21 2006
From: ltucker at openplans.org (Luke Tucker)
Date: Wed, 11 Oct 2006 10:10:21 -0400
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <452C1BD3.3050804@colorstudy.com>
References: <452AC1FF.3030403@openplans.org> <452C1BD3.3050804@colorstudy.com>
Message-ID: <1160575821.19594.65.camel@ltucker.openplans.org>
> Does etree.HTML() pay any attention to content="text/html; charset=Shift_JIS"> ?
[...]
> I think for HTML it is better if the encoding is determined before
> parsing, as there's several types of information that come into play. I
> think the FAQ entry doesn't really apply here, since it isn't really
> XML.
[...]
I'm not certain. The FAQ entry says that using HTML unicode strings
with charset meta tags also does not work. I thought that meant parsing
via etree.HTML(). We can certainly extract the encoding and decode to a
unicode string before calling the parser, but it seemed like we ought to
get some clarification on the intended behavior as well.
- Luke
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Oct 12 18:48:54 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 12 Oct 2006 18:48:54 +0200
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <452AC1FF.3030403@openplans.org>
References: <452AC1FF.3030403@openplans.org>
Message-ID: <452E71F6.6080803@gkec.informatik.tu-darmstadt.de>
Hi,
Chris Abraham wrote:
> We are getting some unexpected behavior when processing documents with a
> Shift_JIS encoding.
> We are trying to serialize an HTML document using an XSLT transform.
> Our results don't agree with the FAQ:
> http://codespeak.net/lxml/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings.
> Please see the comments in the attached demo.py which reads in home.html
> and demonstrates our problem.
I looked into it and found that the behaviour of the libxml2 parser depends on
the position of the tag. Your HTML is pretty broken in many regards.
However, when you move the tag within and before any text
(especially before the tag), it is treated correctly.
I attached a modified HTML file that parses nicely and serialises into UTF-8.
So, the right place to ask this question is on the libxml2 mailing list, not
on the lxml mailing list.
Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20061012/cc9f832d/attachment.html
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Oct 12 19:10:31 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 12 Oct 2006 19:10:31 +0200
Subject: [lxml-dev] [objectify] DataElement factory problem
In-Reply-To:
References:
Message-ID: <452E7707.6020804@gkec.informatik.tu-darmstadt.de>
Hi Holger,
Holger Joukl wrote:
> I ran into a problem using the objectify DataElement factory function.
> When implementing an _init method in a derived ObjectifiedDataElement
> class, it is impossible to access the element.text in _init because
> this has not yet been set when _init gets called by _elementFactory.
True, that's a problem.
> Don't see a nice clean way to solve that. Maybe instrument
> _elementFactory with an optional skip_init argument that allows for a
> delayed manual call of _init in corner cases?
Not a good idea, as it is rarely used.
I already thought about adding a public C-API function for creating elements a
while ago, that takes all necessary parameters including the text content. I
think that's the cleanest solution.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Oct 13 14:45:35 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 13 Oct 2006 14:45:35 +0200
Subject: [lxml-dev] [objectify] DataElement factory problem
In-Reply-To:
References:
Message-ID: <452F8A6F.6090507@gkec.informatik.tu-darmstadt.de>
Hi again,
Holger Joukl wrote:
> I ran into a problem using the objectify DataElement factory function.
> When implementing an _init method in a derived ObjectifiedDataElement
> class, it is impossible to access the element.text in _init because
> this has not yet been set when _init gets called by _elementFactory.
etree's C-API now has a new makeElement() function that creates an _Element
straight through with everything it can carry: attributes, text, tail and a
prefix mapping, either for an existing _Document or by creating a new document
also. Objectify uses it to overcome the above problem.
Stefan
From cabraham at openplans.org Tue Oct 17 16:38:59 2006
From: cabraham at openplans.org (Chris Abraham)
Date: Tue, 17 Oct 2006 10:38:59 -0400
Subject: [lxml-dev] problem with lxml and copy.deepcopy
Message-ID: <4534EB03.6030402@openplans.org>
Hi,
I'm having a problem with performing a copy.deepcopy on a list of
elements. I'm finding the etree._Comment elements get turned into
None. Please see my test case:
>>> a = etree.HTML('hi
')
>>> b = a.xpath('//body/child::node()')
>>> b
[, ' ', , ' ']
>>> import copy
>>> c = copy.deepcopy(b)
>>> c
[, ' ', None, ' ']
BTW, I'm using the CVS HEAD version of libxml2.
Any ideas? Thanks,
Chris
From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Oct 17 19:12:43 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Tue, 17 Oct 2006 19:12:43 +0200
Subject: [lxml-dev] problem with lxml and copy.deepcopy
In-Reply-To: <4534EB03.6030402@openplans.org>
References: <4534EB03.6030402@openplans.org>
Message-ID: <45350F0B.4020000@gkec.informatik.tu-darmstadt.de>
Hi,
Chris Abraham wrote:
> I'm having a problem with performing a copy.deepcopy on a list of
> elements. I'm finding the etree._Comment elements get turned into
> None.
Verified, thanks for reporting this. It's easy to reproduce like this:
a = Comment("ONE")
b = copy.deepcopy(a)
The reason is that we create a new document internally and make the new
element the root node. If it's a comment (or PI), however, libxml2 can't look
it up right away with the normal call for the document root node, so we have
to special case this (rare) use case.
Fixed for 1.1 and trunk.
Stefan
From cabraham at openplans.org Tue Oct 17 20:46:39 2006
From: cabraham at openplans.org (Chris Abraham)
Date: Tue, 17 Oct 2006 14:46:39 -0400
Subject: [lxml-dev] problem with lxml and copy.deepcopy
In-Reply-To: <45350F0B.4020000@gkec.informatik.tu-darmstadt.de>
References: <4534EB03.6030402@openplans.org>
<45350F0B.4020000@gkec.informatik.tu-darmstadt.de>
Message-ID: <4535250F.6060802@openplans.org>
Stefan,
Thanks. This works well (and seems to have solved other problems I was
having.)
Chris
Stefan Behnel wrote:
> Hi,
>
> Chris Abraham wrote:
>
>> I'm having a problem with performing a copy.deepcopy on a list of
>> elements. I'm finding the etree._Comment elements get turned into
>> None.
>>
>
> Verified, thanks for reporting this. It's easy to reproduce like this:
>
> a = Comment("ONE")
> b = copy.deepcopy(a)
>
>
> The reason is that we create a new document internally and make the new
> element the root node. If it's a comment (or PI), however, libxml2 can't look
> it up right away with the normal call for the document root node, so we have
> to special case this (rare) use case.
>
> Fixed for 1.1 and trunk.
>
> Stefan
>
> !DSPAM:1018,45350f1366461116498154!
>
>
From cabraham at openplans.org Tue Oct 17 21:47:10 2006
From: cabraham at openplans.org (Chris Abraham)
Date: Tue, 17 Oct 2006 15:47:10 -0400
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <452E71F6.6080803@gkec.informatik.tu-darmstadt.de>
References: <452AC1FF.3030403@openplans.org>
<452E71F6.6080803@gkec.informatik.tu-darmstadt.de>
Message-ID: <4535333E.4080807@openplans.org>
Stefan,
Thanks for this. Who should I contact to get the FAQ updated?
http://codespeak.net/lxml/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings
It states that lxml "will not parse" Python unicode strings that carry
encoding info. But here we see that it does.
Also in the API's specific to lxml:
http://codespeak.net/lxml/api.html
"Similarly, you will get errors when you try the same with HTML data in
a unicode string that specifies a charset in a meta tag of the header.
You should generally avoid converting XML/HTML data to unicode before
passing it into the parsers. It is both slower and error prone."
...just a minor detail but thought it was worth following up on.
Chris
Stefan Behnel wrote:
> Hi,
>
> Chris Abraham wrote:
>
>> We are getting some unexpected behavior when processing documents with a
>> Shift_JIS encoding.
>> We are trying to serialize an HTML document using an XSLT transform.
>> Our results don't agree with the FAQ:
>> http://codespeak.net/lxml/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings.
>> Please see the comments in the attached demo.py which reads in home.html
>> and demonstrates our problem.
>>
>
> I looked into it and found that the behaviour of the libxml2 parser depends on
> the position of the tag. Your HTML is pretty broken in many regards.
> However, when you move the tag within and before any text
> (especially before the tag), it is treated correctly.
>
> I attached a modified HTML file that parses nicely and serialises into UTF-8.
>
> So, the right place to ask this question is on the libxml2 mailing list, not
> on the lxml mailing list.
>
> Stefan
>
>
> !DSPAM:1018,452fb2f5125711410093335!
>
> ------------------------------------------------------------------------
>
> ?? !DSPAM:1018,452fb2f5125711410093335!
>
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Oct 18 08:51:20 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 18 Oct 2006 08:51:20 +0200
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <4535333E.4080807@openplans.org>
References: <452AC1FF.3030403@openplans.org> <452E71F6.6080803@gkec.informatik.tu-darmstadt.de>
<4535333E.4080807@openplans.org>
Message-ID: <4535CEE8.6090107@gkec.informatik.tu-darmstadt.de>
Hi Chris,
Chris Abraham wrote:
> Thanks for this. Who should I contact to get the FAQ updated?
> http://codespeak.net/lxml/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings
Well, the FAQ isn't really wrong in what it says. In your case, the encoding
information is simply not taken into account as it is in a totally wrong
position. So it's more like the document did not contain any encoding
information at all.
Note that the HTML parser is not guaranteed to create correct HTML that is
'equivalent' to the broken HTML. It just tries its best, which may mean that
some of the original content may get lost. And in this case, it's meta data
that gets lost.
Stefan
From ltucker at openplans.org Wed Oct 18 17:00:33 2006
From: ltucker at openplans.org (Luke Tucker)
Date: Wed, 18 Oct 2006 11:00:33 -0400
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <4535CEE8.6090107@gkec.informatik.tu-darmstadt.de>
References: <452AC1FF.3030403@openplans.org>
<452E71F6.6080803@gkec.informatik.tu-darmstadt.de>
<4535333E.4080807@openplans.org>
<4535CEE8.6090107@gkec.informatik.tu-darmstadt.de>
Message-ID: <1161183633.19594.103.camel@ltucker.openplans.org>
Hey,
I could be confused, but I think the issue chris is referring
to here might be clouded by the bad HTML in the original
message. Here's some behavior that, to me, doesn't appear to
match up entirely with the FAQ (as far as where errors are
produced) using fixed up HTML.
>>> html = open('home2.html').read()
>>> unicode = html.decode('Shift_JIS')
>>> from lxml import etree
>>> rh = etree.HTML(html)
>>> uh = etree.HTML(unicode)
>>> rh[0][1].text
Traceback (most recent call last):
File "", line 1, in ?
File "etree.pyx", line 859, in etree._Element.text.__get__
File "apihelpers.pxi", line 291, in etree._collectText
File "apihelpers.pxi", line 552, in etree.funicode
UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 0:
unexpected code byte
>>> uh[0][1].text
u'\u30b3\u30df'
It looked to me like uh = etree.HTML(unicode) in this case should
produce errors (since it is unicode and contains a proper meta
charset entry) and that rh should behave normally. Apologies if I'm
simply confusing the issue further :)
- Luke
On Wed, 2006-10-18 at 08:51 +0200, Stefan Behnel wrote:
> Hi Chris,
>
> Chris Abraham wrote:
> > Thanks for this. Who should I contact to get the FAQ updated?
> > http://codespeak.net/lxml/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings
>
> Well, the FAQ isn't really wrong in what it says. In your case, the encoding
> information is simply not taken into account as it is in a totally wrong
> position. So it's more like the document did not contain any encoding
> information at all.
>
> Note that the HTML parser is not guaranteed to create correct HTML that is
> 'equivalent' to the broken HTML. It just tries its best, which may mean that
> some of the original content may get lost. And in this case, it's meta data
> that gets lost.
>
> Stefan
>
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
> !DSPAM:1014,4535cefc145172207481331!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20061018/62ea6596/attachment.html
From ianb at colorstudy.com Wed Oct 18 20:31:10 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed, 18 Oct 2006 13:31:10 -0500
Subject: [lxml-dev] Cheese Shop link
Message-ID: <453672EE.90509@colorstudy.com>
Can you guys add some text like this to the Cheese Shop description:
The in-development version of lxml can be found in the subversion
repository at `http://codespeak.net/svn/lxml/trunk
`_ or installed with
``easy_install lxml==dev``
Thanks.
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
From ianb at colorstudy.com Wed Oct 18 20:41:13 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed, 18 Oct 2006 13:41:13 -0500
Subject: [lxml-dev] Cheese Shop link
In-Reply-To: <453672EE.90509@colorstudy.com>
References: <453672EE.90509@colorstudy.com>
Message-ID: <45367549.3040608@colorstudy.com>
Ian Bicking wrote:
> Can you guys add some text like this to the Cheese Shop description:
>
> The in-development version of lxml can be found in the subversion
> repository at `http://codespeak.net/svn/lxml/trunk
> `_ or installed with
> ``easy_install lxml==dev``
Maybe a link to
http://codespeak.net/svn/lxml/branch/lxml-1.1#egg=lxml-1.1bugfix would
also be useful here.
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Oct 19 08:59:17 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 19 Oct 2006 08:59:17 +0200
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <1161183633.19594.103.camel@ltucker.openplans.org>
References: <452AC1FF.3030403@openplans.org> <452E71F6.6080803@gkec.informatik.tu-darmstadt.de> <4535333E.4080807@openplans.org> <4535CEE8.6090107@gkec.informatik.tu-darmstadt.de>
<1161183633.19594.103.camel@ltucker.openplans.org>
Message-ID: <45372245.5060502@gkec.informatik.tu-darmstadt.de>
Hi,
Luke Tucker wrote:
> I could be confused, but I think the issue chris is referring
> to here might be clouded by the bad HTML in the original
> message.
Sure, that's why I was referring him to the libxml2 mailing list.
> Here's some behavior that, to me, doesn't appear to
> match up entirely with the FAQ (as far as where errors are
> produced) using fixed up HTML.
>
>>>> html = open('home2.html').read()
>>>> unicode = html.decode('Shift_JIS')
>>>> from lxml import etree
>>>> rh = etree.HTML(html)
>>>> uh = etree.HTML(unicode)
>>>> rh[0][1].text
> Traceback (most recent call last):
> File "", line 1, in ?
> File "etree.pyx", line 859, in etree._Element.text.__get__
> File "apihelpers.pxi", line 291, in etree._collectText
> File "apihelpers.pxi", line 552, in etree.funicode
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 0:
> unexpected code byte
>>>> uh[0][1].text
> u'\u30b3\u30df'
>
> It looked to me like uh = etree.HTML(unicode) in this case should
> produce errors (since it is unicode and contains a proper meta
> charset entry) and that rh should behave normally. Apologies if I'm
> simply confusing the issue further :)
Sorry, but your HTML is very broken, too. It has two tags and two
contradictory tags (saying both "us-ascii" and "shift_jis"), so don't
expect libxml2's HTML parser to magically know what you really meant when you
wrote it. That's like saying: Ok, I know this function only works for values
from 1-5, so I'll put in a 99 and complain if it breaks.
If you parse broken HTML and the parser doesn't handle it correctly, the
reason is your broken HTML, really.
If you think libxml2 should be able to parse this kind of non-HTML, please
file a bug on the libxml2 parser. There is nothing lxml can do about it.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Oct 19 09:30:01 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 19 Oct 2006 09:30:01 +0200
Subject: [lxml-dev] Cheese Shop link
In-Reply-To: <45367549.3040608@colorstudy.com>
References: <453672EE.90509@colorstudy.com> <45367549.3040608@colorstudy.com>
Message-ID: <45372979.1020601@gkec.informatik.tu-darmstadt.de>
Hi Ian,
Ian Bicking wrote:
> Ian Bicking wrote:
>> Can you guys add some text like this to the Cheese Shop description:
>>
>> The in-development version of lxml can be found in the subversion
>> repository at `http://codespeak.net/svn/lxml/trunk
>> `_ or installed with
>> ``easy_install lxml==dev``
>
> Maybe a link to
> http://codespeak.net/svn/lxml/branch/lxml-1.1#egg=lxml-1.1bugfix would
> also be useful here.
I just added both, please check out if it works for you.
Stefan
From rwiker at gmail.com Thu Oct 19 10:53:20 2006
From: rwiker at gmail.com (Raymond Wiker)
Date: Thu, 19 Oct 2006 10:53:20 +0200
Subject: [lxml-dev] doctype declarations in XML output
Message-ID: <9cd322050610190153x2289d5f9h30ed870e0583f6c2@mail.gmail.com>
Hi. I've just started using Python 2.5 and lxml.etree, and am very
happy so far - with this combination, I have an easy-to-use XML
solution that includes good XPath and XSLT support.
One question: is it possible to add doctype definitions to the output?
I have not been able to find anything about this in the documentation,
and I cannot see any references to xmlCreateIntSubset or xmlNewDtd in
the source code. These should be trivial to add, as far as I can see.
From ltucker at openplans.org Thu Oct 19 16:48:35 2006
From: ltucker at openplans.org (Luke Tucker)
Date: Thu, 19 Oct 2006 10:48:35 -0400
Subject: [lxml-dev] lxml and html encodings
In-Reply-To: <45372245.5060502@gkec.informatik.tu-darmstadt.de>
References: <452AC1FF.3030403@openplans.org>
<452E71F6.6080803@gkec.informatik.tu-darmstadt.de>
<4535333E.4080807@openplans.org>
<4535CEE8.6090107@gkec.informatik.tu-darmstadt.de>
<1161183633.19594.103.camel@ltucker.openplans.org>
<45372245.5060502@gkec.informatik.tu-darmstadt.de>
Message-ID: <1161269315.19594.123.camel@ltucker.openplans.org>
hah jeez,
erg. sorry to waste your time and thanks for your patience. Wasn't
intending to suggest it should handle malformed stuff, just a
mistake, but I can definitely understand what you're saying all the
same.
- Luke
> Sorry, but your HTML is very broken, too. It has two tags and two
> contradictory tags (saying both "us-ascii" and "shift_jis"), so don't
> expect libxml2's HTML parser to magically know what you really meant when you
> wrote it. That's like saying: Ok, I know this function only works for values
> from 1-5, so I'll put in a 99 and complain if it breaks.
>
> If you parse broken HTML and the parser doesn't handle it correctly, the
> reason is your broken HTML, really.
>
> If you think libxml2 should be able to parse this kind of non-HTML, please
> file a bug on the libxml2 parser. There is nothing lxml can do about it.
>
> Stefan
>
> !DSPAM:1014,453722533261362196140!
>
From ianb at colorstudy.com Thu Oct 19 18:03:09 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 19 Oct 2006 11:03:09 -0500
Subject: [lxml-dev] Cheese Shop link
In-Reply-To: <45372979.1020601@gkec.informatik.tu-darmstadt.de>
References: <453672EE.90509@colorstudy.com> <45367549.3040608@colorstudy.com>
<45372979.1020601@gkec.informatik.tu-darmstadt.de>
Message-ID: <4537A1BD.3040401@colorstudy.com>
Stefan Behnel wrote:
> I just added both, please check out if it works for you.
Thanks, they both work. For people wanting to use these in
requirements, you have to be a little tricky. For instance:
lxml==dev,1.2dev
since lxml==dev will install 1.2dev, which doesn't actually satisfy the
first requirement ("dev").
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Oct 20 09:49:08 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 20 Oct 2006 09:49:08 +0200
Subject: [lxml-dev] doctype declarations in XML output
In-Reply-To: <9cd322050610190153x2289d5f9h30ed870e0583f6c2@mail.gmail.com>
References: <9cd322050610190153x2289d5f9h30ed870e0583f6c2@mail.gmail.com>
Message-ID: <45387F74.9040101@gkec.informatik.tu-darmstadt.de>
Hi,
Raymond Wiker wrote:
> Hi. I've just started using Python 2.5 and lxml.etree, and am very
> happy so far - with this combination, I have an easy-to-use XML
> solution that includes good XPath and XSLT support.
Always happy to hear that.
> One question: is it possible to add doctype definitions to the output?
> I have not been able to find anything about this in the documentation,
> and I cannot see any references to xmlCreateIntSubset or xmlNewDtd in
> the source code. These should be trivial to add, as far as I can see.
There's not currently any support for that. However, we might consider making
_Element.docinfo writable to achieve this.
Any patches are welcome.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Oct 20 14:22:26 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 20 Oct 2006 14:22:26 +0200
Subject: [lxml-dev] Failure to Compile on Windows
In-Reply-To: <20060926025606.GP4356@cotia>
References: <20060925220913.GL4356@cotia> <20060926025606.GP4356@cotia>
Message-ID: <4538BF82.2060700@gkec.informatik.tu-darmstadt.de>
Hi Sidnei,
finally coming back to this.
Sidnei da Silva wrote:
> On Mon, Sep 25, 2006 at 07:09:13PM -0300, Sidnei da Silva wrote:
> | I've got pretty far with setting up the dependencies and all but after
> | that the compilation failed with this error. Anyone have a clue? Looks
> | like it's missing some .h that 'nano http' depends on.
>
> FWIW, I've got past that. It was missing Ws2_32.lib.
>
> I've managed to build lxml with python trunk, but when running the
> tests I've got a fatal error due to a negative refcount to a tuple
> (!).
>
> I think I will disable running lxml tests with python trunk, unless
> someone wants to digg into this.
Thanks for setting this up. The fact that it's a tuple does not necessarily
mean it's a Python problem. Could you come up with a stack trace or at least
the test name that triggered it? Try running test.py -v.
Stefan
From sidnei at awkly.org Sat Oct 21 02:12:20 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Fri, 20 Oct 2006 21:12:20 -0300
Subject: [lxml-dev] Failure to Compile on Windows
In-Reply-To: <4538BF82.2060700@gkec.informatik.tu-darmstadt.de>
References: <20060925220913.GL4356@cotia> <20060926025606.GP4356@cotia>
<4538BF82.2060700@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061021001220.GA4350@cotia>
On Fri, Oct 20, 2006 at 02:22:26PM +0200, Stefan Behnel wrote:
| Thanks for setting this up. The fact that it's a tuple does not necessarily
| mean it's a Python problem. Could you come up with a stack trace or at least
| the test name that triggered it? Try running test.py -v.
There you have it:
test_empty_parse (lxml.tests.test_errors.ErrorTestCase) ... ok
test_XMLDTDID (lxml.tests.test_etree.ETreeOnlyTestCase) ... Fatal Python error:
\pybots\slave\2.5.dasilva-x86\build\Objects\tupleobject.c:169 object at 013A91B8
has negative ref count -606348326
Stack Trace:
-------------------------------------------------------------------------------
ntdll.dll!7c822583()
> python25_d.dll!Py_FatalError(const char * msg=0x0021d274) Line 1552 C
python25_d.dll!_Py_NegativeRefcount(const char * fname=0x1e2b7da8, int lineno=0x000000a9, _object * op=0x013a91b8) Line 193 + 0xc C
python25_d.dll!tupledealloc(PyTupleObject * op=0x0144e478) Line 169 + 0x75 C
python25_d.dll!_Py_Dealloc(_object * op=0x0144e478) Line 1928 + 0x7 C
python25_d.dll!tupledealloc(PyTupleObject * op=0x01451ab8) Line 169 + 0x8a C
python25_d.dll!_Py_Dealloc(_object * op=0x01451ab8) Line 1928 + 0x7 C
etree_d.pyd!__pyx_tp_dealloc_5etree__IDDict(_object * o=0x013a94f8) Line 45102 + 0x73 C
python25_d.dll!_Py_Dealloc(_object * op=0x013a94f8) Line 1928 + 0x7 C
python25_d.dll!frame_dealloc(_frame * f=0x01470b68) Line 416 + 0x6a C
python25_d.dll!_Py_Dealloc(_object * op=0x01470b68) Line 1928 + 0x7 C
python25_d.dll!fast_function(_object * func=0x00922690, _object * * * pp_stack=0x0021d4dc, int n=0x00000001, int na=0x00000001, int nk=0x00000000) Line 3654 + 0x6 C
python25_d.dll!call_function(_object * * * pp_stack=0x0021d4dc, int oparg=0x00000001) Line 3587 + 0x12 C
python25_d.dll!PyEval_EvalFrameEx(_frame * f=0x01337c68, int throwflag=0x00000000) Line 2271 C
python25_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x00a70808, _object * globals=0x01337c68, _object * locals=0x00000000, _object * * args=0x013af60c, int argcount=0x00000002, _object * * kws=0x01449038, int kwcount=0x00000000, _object * * defs=0x00a7e7bc, int defcount=0x00000001, _object * closure=0x00000000) Line 2833 + 0xb C
python25_d.dll!function_call(_object * func=0x00b04508, _object * arg=0x013af5f8, _object * kw=0x0144a380) Line 522 + 0x40 C
python25_d.dll!PyObject_Call(_object * func=0x00b04508, _object * arg=0x013af5f8, _object * kw=0x0144a380) Line 1858 + 0xf C
python25_d.dll!ext_do_call(_object * func=0x00b04508, _object * * * pp_stack=0x0021da3c, int flags=0x00000003, int na=0x00000001, int nk=0x00000000) Line 3848 C
python25_d.dll!PyEval_EvalFrameEx(_frame * f=0x01337ac0, int throwflag=0x00000000) Line 2312 C
python25_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x00a70868, _object * globals=0x01337ac0, _object * locals=0x00000000, _object * * args=0x013afd8c, int argcount=0x00000002, _object * * kws=0x00000000, int kwcount=0x00000000, _object * * defs=0x00000000, int defcount=0x00000000, _object * closure=0x00000000) Line 2833 + 0xb C
python25_d.dll!function_call(_object * func=0x00b04560, _object * arg=0x013afd78, _object * kw=0x00000000) Line 522 + 0x40 C
python25_d.dll!PyObject_Call(_object * func=0x00b04560, _object * arg=0x013afd78, _object * kw=0x00000000) Line 1858 + 0xf C
python25_d.dll!instancemethod_call(_object * func=0x00b04560, _object * arg=0x013afd78, _object * kw=0x00000000) Line 2495 + 0x11 C
python25_d.dll!PyObject_Call(_object * func=0x00b8a2b8, _object * arg=0x014e02a0, _object * kw=0x00000000) Line 1858 + 0xf C
python25_d.dll!slot_tp_call(_object * self=0x016fe738, _object * args=0x014e02a0, _object * kwds=0x00000000) Line 4581 + 0x11 C
python25_d.dll!PyObject_Call(_object * func=0x016fe738, _object * arg=0x014e02a0, _object * kw=0x00000000) Line 1858 + 0xf C
python25_d.dll!do_call(_object * func=0x016fe738, _object * * * pp_stack=0x0021e250, int na=0x00000001, int nk=0x00000000) Line 3779 C
python25_d.dll!call_function(_object * * * pp_stack=0x0021e250, int oparg=0x00000001) Line 3589 + 0xa C
python25_d.dll!PyEval_EvalFrameEx(_frame * f=0x01337918, int throwflag=0x00000000) Line 2271 C
python25_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x00a75148, _object * globals=0x01337918, _object * locals=0x00000000, _object * * args=0x013d638c, int argcount=0x00000002, _object * * kws=0x01449028, int kwcount=0x00000000, _object * * defs=0x00000000, int defcount=0x00000000, _object * closure=0x00000000) Line 2833 + 0xb C
python25_d.dll!function_call(_object * func=0x00b04b38, _object * arg=0x013d6378, _object * kw=0x0144a230) Line 522 + 0x40 C
python25_d.dll!PyObject_Call(_object * func=0x00b04b38, _object * arg=0x013d6378, _object * kw=0x0144a23
python25_d.dll!PyObject_Call(_object * func=0x00b04b90, _object * arg=0x0176a538, _object * kw=0x00000000) Line 1858 + 0xf C
python25_d.dll!instancemethod_call(_object * func=0x00b04b90, _object * arg=0x0176a538, _object * kw=0x00000000) Line 2495 + 0x11 C
python25_d.dll!PyObject_Call(_object * func=0x00b8a138, _object * arg=0x0142c690, _object * kw=0x00000000) Line 1858 + 0xf C
python25_d.dll!slot_tp_call(_object * self=0x01503dc8, _object * args=0x0142c690, _object * kwds=0x00000000) Line 4581 + 0x11 C
python25_d.dll!PyObject_Call(_object * func=0x01503dc8, _object * arg=0x0142c690, _object * kw=0x00000000) Line 1858 + 0xf C
python25_d.dll!do_call(_object * func=0x01503dc8, _object * * * pp_stack=0x0021efc4, int na=0x00000001, int nk=0x00000000) Line 3779 C
python25_d.dll!call_function(_object * * * pp_stack=0x0021efc4, int oparg=0x00000001) Line 3589 + 0xa C
python25_d.dll!PyEval_EvalFrameEx(_frame * f=0x013369b8, int throwflag=0x00000000) Line 2271 C
python25_d.dll!fast_function(_object * func=0x00922690, _object * * * pp_stack=0x0021f498, int n=0x00000002, int na=0x00000002, int nk=0x00000000) Line 3653 C
python25_d.dll!call_function(_object * * * pp_stack=0x0021f498, int oparg=0x00000002) Line 3587 + 0x12 C
python25_d.dll!PyEval_EvalFrameEx(_frame * f=0x00a85120, int throwflag=0x00000000) Line 2271 C
python25_d.dll!fast_function(_object * func=0x00922690, _object * * * pp_stack=0x0021f96c, int n=0x00000001, int na=0x00000001, int nk=0x00000000) Line 3653 C
python25_d.dll!call_function(_object * * * pp_stack=0x0021f96c, int oparg=0x00000001) Line 3587 + 0x12 C
python25_d.dll!PyEval_EvalFrameEx(_frame * f=0x009f6648, int throwflag=0x00000000) Line 2271 C
python25_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x00a698c8, _object * globals=0x009f6648, _object * locals=0x009674d0, _object * * args=0x00000000, int argcount=0x00000000, _object * * kws=0x00000000, int kwcount=0x00000000, _object * * defs=0x00000000, int defcount=0x00000000, _object * closure=0x00000000) Line 2833 + 0xb C
python25_d.dll!PyEval_EvalCode(PyCodeObject * co=0x00a698c8, _object * globals=0x009674d0, _object * locals=0x009674d0) Line 499 + 0x1f C
python25_d.dll!run_mod(_mod * mod=0x00a3cc30, const char * filename=0x00923fe1, _object * globals=0x009674d0, _object * locals=0x009674d0, PyCompilerFlags * flags=0x0021ff2c, _arena * arena=0x00987e20) Line 1264 + 0x11 C
python25_d.dll!PyRun_FileExFlags(_iobuf * fp=0x1027c898, const char * filename=0x00923fe1, int start=0x00000101, _object * globals=0x009674d0, _object * locals=0x009674d0, int closeit=0x00000001, PyCompilerFlags * flags=0x0021ff2c) Line 1250 + 0x1d C
python25_d.dll!PyRun_SimpleFileExFlags(_iobuf * fp=0x1027c898, const char * filename=0x00923fe1, int closeit=0x00000001, PyCompilerFlags * flags=0x0021ff2c) Line 871 + 0x22 C
python25_d.dll!PyRun_AnyFileExFlags(_iobuf * fp=0x1027c898, const char * filename=0x00923fe1, int closeit=0x00000001, PyCompilerFlags * flags=0x0021ff2c) Line 689 + 0x15 C
python25_d.dll!Py_Main(int argc=0x00000003, char * * argv=0x00923fa0) Line 499 + 0x30 C
python_d.exe!main(int argc=0x00000003, char * * argv=0x00923fa0) Line 23 + 0xe C
python_d.exe!mainCRTStartup() Line 398 + 0x11 C
kernel32.dll!77e523e5()
-------------------------------------------------------------------------------
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Oct 21 22:14:56 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 21 Oct 2006 22:14:56 +0200
Subject: [lxml-dev] Failure to Compile on Windows
In-Reply-To: <20061021001220.GA4350@cotia>
References: <20060925220913.GL4356@cotia> <20060926025606.GP4356@cotia>
<4538BF82.2060700@gkec.informatik.tu-darmstadt.de>
<20061021001220.GA4350@cotia>
Message-ID: <453A7FC0.9080802@gkec.informatik.tu-darmstadt.de>
Hi Sidney,
Sidnei da Silva wrote:
> On Fri, Oct 20, 2006 at 02:22:26PM +0200, Stefan Behnel wrote:
> | Thanks for setting this up. The fact that it's a tuple does not necessarily
> | mean it's a Python problem. Could you come up with a stack trace or at least
> | the test name that triggered it? Try running test.py -v.
>
> There you have it:
>
> test_empty_parse (lxml.tests.test_errors.ErrorTestCase) ... ok
> test_XMLDTDID (lxml.tests.test_etree.ETreeOnlyTestCase) ... Fatal Python error:
> \pybots\slave\2.5.dasilva-x86\build\Objects\tupleobject.c:169 object at 013A91B8
> has negative ref count -606348326
>
>
> Stack Trace:
> -------------------------------------------------------------------------------
> ntdll.dll!7c822583()
>> python25_d.dll!Py_FatalError(const char * msg=0x0021d274) Line 1552 C
> python25_d.dll!_Py_NegativeRefcount(const char * fname=0x1e2b7da8, int lineno=0x000000a9, _object * op=0x013a91b8) Line 193 + 0xc C
> python25_d.dll!tupledealloc(PyTupleObject * op=0x0144e478) Line 169 + 0x75 C
> python25_d.dll!_Py_Dealloc(_object * op=0x0144e478) Line 1928 + 0x7 C
> python25_d.dll!tupledealloc(PyTupleObject * op=0x01451ab8) Line 169 + 0x8a C
> python25_d.dll!_Py_Dealloc(_object * op=0x01451ab8) Line 1928 + 0x7 C
> etree_d.pyd!__pyx_tp_dealloc_5etree__IDDict(_object * o=0x013a94f8) Line 45102 + 0x73 C
[...]
Thanks. That wasn't the greatest code anyway, so thanks for pointing me at it.
I couldn't reproduce the bug and didn't find anything suspicious under
valgrind, so I just committed a cleaned up version of some code parts that may
have lead to the problem and I hope that changes the refcount behaviour also.
Could you retry with the current trunk version?
Thanks again,
Stefan
From Holger.Joukl at LBBW.de Mon Oct 23 13:37:28 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Mon, 23 Oct 2006 13:37:28 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
Message-ID:
Hi,
sorry for the inconvenience, I now put this into a new thread.
And I'd have gotten back to that sooner but have been ill.
>Then: what you observe are most likely GC 'issues'. The thing is: if the
>element already exists as Python object, it is reused, which is much
faster
>then creating a new one. So in the cases where your code runs faster, you
can
>assume that the object survived a larger portion of your code without
being
>re-instantiated.
I probably have some misunderstandings how the reuse of elements works.
When I "visit" a node, like:
>>> from lxml import etree
>>> from lxml import objectify
>>> parser = etree.XMLParser(remove_blank_text=True)
>>> lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
>>> parser.setElementClassLookup(lookup)
>>> objectify.setDefaultParser(parser)
>>> objectify.enableRecursiveStr()
>>> root = objectify.Element('root')
>>> root.i = 17
>>> root.i
>>>
the Python Element object for "i" is being created.
Will that Python Element be garbage-collected afterwards, if I do not
explicitly delete "i"
from the xml tree? I thought this element survived in the element proxy.
>Especially recursive printing instantiates the entire tree, so if the
objects
>are not deleted directly afterwards, this has a performance effect on code
>that runs afterwards.
I see, but why would "manual access" of the nodes not have the same effect:
Runs slow:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root.i
print root.f
print root.s
print root.d
""" "n = root.i; n = root.f; n = root.s; n = root.d"
17
238.3343
what
2006-03-03
10 loops -> 0.0102 secs
17
238.3343
what
2006-03-03
100 loops -> 0.101 secs
17
238.3343
what
2006-03-03
1000 loops -> 1.02 secs
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
raw times: 1.03 1.02 1.02
1000 loops, best of 3: 1.02 msec per loop
Runs fast:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root
""" "n = root.i; n = root.f; n = root.s; n = root.d"
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
10 loops -> 0.00109 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
100 loops -> 0.00928 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
1000 loops -> 0.0897 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
10000 loops -> 0.905 secs
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
i = 17 [IntElement]
f = 238.33430000000001 [FloatElement]
s = 'what' [StringElement]
d = '2006-03-03' [StringElement]
raw times: 0.893 0.911 0.911
10000 loops, best of 3: 89.3 usec per loop
Recursively outputting root before accessing its child elements
really speeds things up, even though I accessed all elements in
the slow example, too.
Why is this? I'm clueless.
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Oct 23 22:53:51 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 23 Oct 2006 22:53:51 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
In-Reply-To:
References:
Message-ID: <453D2BDF.6040804@gkec.informatik.tu-darmstadt.de>
Hi Holger,
Holger Joukl wrote:
>> Then: what you observe are most likely GC 'issues'. The thing is: if the
>> element already exists as Python object, it is reused, which is much
> faster
>> then creating a new one. So in the cases where your code runs faster, you
> can
>> assume that the object survived a larger portion of your code without
> being
>> re-instantiated.
>
> I probably have some misunderstandings how the reuse of elements works.
> When I "visit" a node, like:
>
>>>> root = objectify.Element('root')
>>>> root.i = 17
>>>> root.i
>
>
> the Python Element object for "i" is being created.
> Will that Python Element be garbage-collected afterwards, if I do not
> explicitly delete "i"
> from the xml tree? I thought this element survived in the element proxy.
>
>> Especially recursive printing instantiates the entire tree, so if the
> objects
>> are not deleted directly afterwards, this has a performance effect on code
>> that runs afterwards.
>
> I see, but why would "manual access" of the nodes not have the same effect.
> Recursively outputting root before accessing its child elements
> really speeds things up, even though I accessed all elements in
> the slow example, too.
> Why is this? I'm clueless.
I think I can give an answer here. The difference lies in the two cleanup
modes in the Python interpreter: GC and ref counting. Ref-counted objects
disappear immediately after loosing the last reference, however, when there
are circular references between elements, the GC is required to clean them up.
These objects can be garbage collected at any time, but they are usually kept
until there is a good opportunity to clean them up, i.e. enough time has
passed to merit the GC overhead or memory is filling up so it has to run.
Now, the way recursive dumping is currently implemented instantiates an
additional object for each element: _Attrib. This generates circular
references between the element and its attribute proxy which enforces use of
the GC instead of the normal ref-count algorithm. So elements that were
recursively printed stay alive until the next run of the GC. Elements that do
not have an _Attrib dictionary proxy can be deleted when ref-counting them out.
You should be able to reproduce the behaviour observed after recursive
printing with elements for which you called ".attrib".
You should not rely on either behaviour as this deals with implementation
details in both lxml and Python. However, I would not object to patches that
make the behaviour more predictable.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Oct 24 00:23:54 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Tue, 24 Oct 2006 00:23:54 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
In-Reply-To:
References:
Message-ID: <453D40FA.7090802@gkec.informatik.tu-darmstadt.de>
Hi again,
I rewrote the current recursive string printing implementation to use a real
iterator for attribute access, which also lead to much shorter code in _Attrib
(after a cleanup). This should remove the difference you see, although moving
towards the slower variant. However, if it worked, this means that the
elements are immediately garbage collected, which is the right thing to do
from a memory perspective.
Please test on your machine a) if the two code snippets still differ in
performance and b) if the new implementation resulted in any noticeable slow down.
If you feel ambitious, take a look at the benchmark directory and try to come
up with a new benchmark suite "bench_objectify.py". The benchmark framework
makes new benchmarks extremely easy to write and the four test XML trees
should be well suited for objectify already.
Thanks,
Stefan
From Holger.Joukl at LBBW.de Tue Oct 24 14:13:37 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Tue, 24 Oct 2006 14:13:37 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
In-Reply-To: <453D2BDF.6040804@gkec.informatik.tu-darmstadt.de>
Message-ID:
Hello Stefan,
Stefan Behnel schrieb am
23.10.2006 22:53:51:
> I think I can give an answer here. The difference lies in the two cleanup
> modes in the Python interpreter: GC and ref counting. Ref-counted objects
> disappear immediately after loosing the last reference, however, when
there
> are circular references between elements, the GC is required to clean
them up.
> These objects can be garbage collected at any time, but they are usually
kept
> until there is a good opportunity to clean them up, i.e. enough time has
> passed to merit the GC overhead or memory is filling up so it has to run.
>
> Now, the way recursive dumping is currently implemented instantiates an
> additional object for each element: _Attrib. This generates circular
> references between the element and its attribute proxy which enforces use
of
> the GC instead of the normal ref-count algorithm. So elements that were
> recursively printed stay alive until the next run of the GC. Elements
that do
> not have an _Attrib dictionary proxy can be deleted when ref-
> counting them out.
>
> You should be able to reproduce the behaviour observed after recursive
> printing with elements for which you called ".attrib".
Thanks for this good explanation! This of course also explains why my
experiments
with an additional "visit" function (a stripped down _dump, essentially)
only worked when there was a call to element.items() included.
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From Holger.Joukl at LBBW.de Tue Oct 24 15:10:05 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Tue, 24 Oct 2006 15:10:05 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
In-Reply-To: <453D40FA.7090802@gkec.informatik.tu-darmstadt.de>
Message-ID:
Stefan Behnel schrieb am
24.10.2006 00:23:54:
> Hi again,
>
> I rewrote the current recursive string printing implementation to use a
real
> iterator for attribute access, which also lead to much shorter code in
_Attrib
> (after a cleanup). This should remove the difference you see, although
moving
> towards the slower variant. However, if it worked, this means that the
> elements are immediately garbage collected, which is the right thing to
do
> from a memory perspective.
>
> Please test on your machine a) if the two code snippets still differ in
> performance and b) if the new implementation resulted in any
> noticeable slow down.
I can confirm
a) no performance difference between recursive element printing and "manual
element access" any more
b) no significant slow down
using the little timeit snippets for benchmarking.
> If you feel ambitious, take a look at the benchmark directory and try to
come
> up with a new benchmark suite "bench_objectify.py". The benchmark
framework
> makes new benchmarks extremely easy to write and the four test XML trees
> should be well suited for objectify already.
Will take a look.
Some more need for clarification:
If I understand correctly the lxml element proxy only speeds up things if
- I hold a python reference to the element object or
- a circular reference to the element in question prevents it from being
gc-ed
To speed up my usecase I could force-create and hold python references to
every node before
starting to operate on the tree.
Would it also be possible to modify objectify in a way that the lifespan of
the python _Element, once it has been instantiated, is tied to the
existence of the
underlying _c_node (xmlNode)?
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From sidnei at enfoldsystems.com Tue Oct 24 22:02:16 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Tue, 24 Oct 2006 17:02:16 -0300
Subject: [lxml-dev] Resolvers and open files
Message-ID: <20061024200216.GH4388@cotia>
I have a long-running process that uses a custom resolver to resolve a
simple filename to a file relative to a pre-configured directory.
It looks like this:
class RelativeUrlResolver(etree.Resolver):
def __init__(self, prefix):
self.prefix = prefix
def resolve(self, url, id, context):
print "Resolving URL '%s'" % url
if not url.startswith('http'):
url = self.prefix + urllib.quote_plus(url)
ssf = urllib.urlopen(url)
if ssf is None:
raise ValueError, 'could not resolve url: %r' % url
return self.resolve_file(ssf, context)
I'm creating the parser like this:
parser = etree.XMLParser()
parser.resolvers.add(RelativeUrlResolver(BASE))
(Where BASE = 'file:///path/to/some/dir')
Now, the issue that's biting me is that it looks like the file is kept
open after the processing has finished.
The parser is re-created every time ATM, and goes 'out of scope' right
after doing the transformation, so I would expect it all to be garbage
collected, and the file to be closed.
Do I need to do anything special to get this file to be closed?
Thanks.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Oct 25 08:49:08 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 25 Oct 2006 08:49:08 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
In-Reply-To:
References:
Message-ID: <453F08E4.60606@gkec.informatik.tu-darmstadt.de>
Hi Holger,
Holger Joukl wrote:
> To speed up my usecase I could force-create and hold python references to
> every node before starting to operate on the tree.
I added a FAQ entry on performance tweaking in objectify. If you find other
things to add, I'd be happy to hear about them.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Oct 24 20:39:16 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Tue, 24 Oct 2006 20:39:16 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
In-Reply-To:
References:
Message-ID: <453E5DD4.1070905@gkec.informatik.tu-darmstadt.de>
Hi Holger,
Holger Joukl wrote:
> Stefan Behnel wrote:
>> Please test on your machine a) if the two code snippets still differ in
>> performance and b) if the new implementation resulted in any
>> noticeable slow down.
>
> I can confirm
> a) no performance difference between recursive element printing and "manual
> element access" any more
> b) no significant slow down
> using the little timeit snippets for benchmarking.
Good, thanks.
> Some more need for clarification:
> If I understand correctly the lxml element proxy only speeds up things if
> - I hold a python reference to the element object or
> - a circular reference to the element in question prevents it from being
> gc-ed
Correct. However, as I said: do not rely on the second thing. GC runs are
unpredictable (unless you run it by hand).
> To speed up my usecase I could force-create and hold python references to
> every node before starting to operate on the tree.
... the fastest approach likely being
cache[root] = list(root.getiterator())
> Would it also be possible to modify objectify in a way that the lifespan of
> the python _Element, once it has been instantiated, is tied to the
> existence of the underlying _c_node (xmlNode)?
Hmm, I don't know if that's a good thing in general. It eats substantially
more memory than the C-tree does already.
I mean, feel free to fill a cache like the above when XML comes in and delete
it when it goes back out during processing. It should not be that much slower
than doing it inside objectify, but it's simple enough to not require a
dedicated API and it gives you absolute control over the trade-off between
space and speed.
Stefan
From Holger.Joukl at LBBW.de Wed Oct 25 09:29:36 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Wed, 25 Oct 2006 09:29:36 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
In-Reply-To: <453F08E4.60606@gkec.informatik.tu-darmstadt.de>
Message-ID:
Hi Stefan,
Stefan Behnel schrieb am
25.10.2006 08:49:08:
> Hi Holger,
>
> Holger Joukl wrote:
> > To speed up my usecase I could force-create and hold python references
to
> > every node before starting to operate on the tree.
>
> I added a FAQ entry on performance tweaking in objectify. If you find
other
> things to add, I'd be happy to hear about them.
>
> Stefan
Very helpful! Thanks a lot.
I think the first two hints might also be helpful for someone using classic
lxml.etree.
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From Holger.Joukl at LBBW.de Wed Oct 25 09:38:05 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Wed, 25 Oct 2006 09:38:05 +0200
Subject: [lxml-dev] [lxml][objectify] optimization of recursive object
dumping
Message-ID:
Hi,
I've posted this before but messed with the threads, so here it is again:
(Note: patch line numbers might differ, this was based on 1.1 branch of
2 weeks ago, but I could of course update this and send a new patch)
First some background:
I'm experimenting with a custom objectified datetime class based on
Python's
datetime that employs the dateutil.parser module to detect if some element
value
is in a valid datetime format, i.e. the parse function from dateutil.parser
is used to implement the type_check for the PyType type registry.
1)
Invoking this parse method is quite expensive, so I want this to happen
rarely. As I am using "recursive element dumping" as default I found that
for every __str__ call .pyval of the ObjectifiedDataElements in a tree is
accessed, which in turn triggers parsing for my custom datetime class.
As I don't really see a way to avoid this I propose the introduction of
an additional property "_pyval_repr" that can be overridden in subclasses,
which makes it possible to simply return element.text, if getting .pyval
is expensive. S.th. like:
*** ORIG/lxml-1.1/src/lxml/objectify.pyx Wed Sep 27 09:18:30 2006
--- src/lxml/objectify.pyx Wed Oct 4 11:00:09 2006
***************
*** 484,489 ****
--- 484,493 ----
def __get__(self):
return textOf(self._c_node)
+ property _pyval_repr:
+ def __get__(self):
+ return self.pyval
+
def __str__(self):
return textOf(self._c_node) or ''
***************
*** 931,938 ****
cdef object _dump(_Element element, int indent):
indentstr = " " * indent
! if hasattr(element, "pyval"):
! value = element.pyval
else:
value = textOf(element._c_node)
if value and not value.strip():
--- 935,942 ----
cdef object _dump(_Element element, int indent):
indentstr = " " * indent
! if hasattr(element, "_pyval_repr"):
! value = element._pyval_repr
else:
value = textOf(element._c_node)
if value and not value.strip():
This can substantially speed up things for complicated type_check
routines (in my usecase :)
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Oct 25 09:49:36 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 25 Oct 2006 09:49:36 +0200
Subject: [lxml-dev] Resolvers and open files
In-Reply-To: <20061024200216.GH4388@cotia>
References: <20061024200216.GH4388@cotia>
Message-ID: <453F1710.8000205@gkec.informatik.tu-darmstadt.de>
Hi Sidnei,
Sidnei da Silva wrote:
> I have a long-running process that uses a custom resolver to resolve a
> simple filename to a file relative to a pre-configured directory.
> Now, the issue that's biting me is that it looks like the file is kept
> open after the processing has finished.
Right, the resolver context that stores the temporary references was not
cleaned up after use.
Should be fixed on the trunk now. Please test it with your setup.
Stefan
From sidnei at enfoldsystems.com Wed Oct 25 15:33:03 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Wed, 25 Oct 2006 10:33:03 -0300
Subject: [lxml-dev] Failure to Compile on Windows
In-Reply-To: <453A7FC0.9080802@gkec.informatik.tu-darmstadt.de>
References: <20060925220913.GL4356@cotia> <20060926025606.GP4356@cotia>
<4538BF82.2060700@gkec.informatik.tu-darmstadt.de>
<20061021001220.GA4350@cotia>
<453A7FC0.9080802@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061025133303.GJ4388@cotia>
On Sat, Oct 21, 2006 at 10:14:56PM +0200, Stefan Behnel wrote:
| Thanks. That wasn't the greatest code anyway, so thanks for pointing me at it.
| I couldn't reproduce the bug and didn't find anything suspicious under
| valgrind, so I just committed a cleaned up version of some code parts that may
| have lead to the problem and I hope that changes the refcount behaviour also.
|
| Could you retry with the current trunk version?
Great! That seems to have fixed it. I do not get the refcount problem
anymore. There are a couple failing tests, mainly due to calling
os.remove() on an open file (that does not work on Windows).
Ex:
>>> f = open('/src/test.bat')
>>> os.remove('/src/test.bat')
Traceback (most recent call last):
File "", line 1, in ?
OSError: [Errno 13] Permission denied: '/src/test.bat'
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From sidnei at enfoldsystems.com Wed Oct 25 16:06:12 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Wed, 25 Oct 2006 11:06:12 -0300
Subject: [lxml-dev] Resolvers and open files
In-Reply-To: <453F1710.8000205@gkec.informatik.tu-darmstadt.de>
References: <20061024200216.GH4388@cotia>
<453F1710.8000205@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061025140612.GM4388@cotia>
On Wed, Oct 25, 2006 at 09:49:36AM +0200, Stefan Behnel wrote:
| Right, the resolver context that stores the temporary references was not
| cleaned up after use.
|
| Should be fixed on the trunk now. Please test it with your setup.
Ok, seems to work now!
Thank you a lot!
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Oct 25 18:20:48 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 25 Oct 2006 18:20:48 +0200
Subject: [lxml-dev] [lxml][objectify] optimization of recursive object
dumping
In-Reply-To:
References:
Message-ID: <453F8EE0.3090106@gkec.informatik.tu-darmstadt.de>
Hi Holger,
Holger Joukl worote:
> I'm experimenting with a custom objectified datetime class based on
> Python's
> datetime that employs the dateutil.parser module to detect if some element
> value
> is in a valid datetime format, i.e. the parse function from dateutil.parser
> is used to implement the type_check for the PyType type registry.
>
> Invoking this parse method is quite expensive, so I want this to happen
> rarely. As I am using "recursive element dumping" as default I found that
> for every __str__ call .pyval of the ObjectifiedDataElements in a tree is
> accessed, which in turn triggers parsing for my custom datetime class.
But that should only happen for normal text content (well, and dates). Numbers
should always be parsed first.
> As I don't really see a way to avoid this I propose the introduction of
> an additional property "_pyval_repr" that can be overridden in subclasses,
> which makes it possible to simply return element.text, if getting .pyval
> is expensive.
Hmmm, I don't really like the idea of adding a new Python method only to
optimise the debug output (which is what dump() is essentially meant for). I
understand that you use this as default, but I don't think many people will
rely on the performance of this function...
Have you considered switching from "dump() by default" to "implement __str__()
for all data types by hand"? There are not that many standard types...
On the other hand, what if we did something like this:
cdef object _dump(_Element element, int indent):
indentstr = " " * indent
if isinstance(element, ObjectifiedDataElement):
value = str(element)
else:
...
Would that help?
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Oct 25 19:32:34 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 25 Oct 2006 19:32:34 +0200
Subject: [lxml-dev] [lxml][objectify] optimization of recursive object
dumping
In-Reply-To: <453F8EE0.3090106@gkec.informatik.tu-darmstadt.de>
References:
<453F8EE0.3090106@gkec.informatik.tu-darmstadt.de>
Message-ID: <453F9FB2.9030306@gkec.informatik.tu-darmstadt.de>
Hi again,
Stefan Behnel wrote:
> On the other hand, what if we did something like this:
>
> cdef object _dump(_Element element, int indent):
> indentstr = " " * indent
> if isinstance(element, ObjectifiedDataElement):
> value = str(element)
> else:
> ...
I wrote up a patch that could do the trick. Sadly, it requires a behavioural
change in NumberElement to return repr(value) for str(). Not that beautiful.
I'm just posting it to give you an idea about what looks like a viable
approach to me.
I'll try to get 1.1.2 out tomorrow, so unless I get convinced by then that
this is a sufficiently solid idea, this will have to wait for the next release.
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: objectify-repr-dump.patch
Type: text/x-patch
Size: 2005 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20061025/4967224d/attachment.bin
From Holger.Joukl at LBBW.de Thu Oct 26 08:59:28 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Thu, 26 Oct 2006 08:59:28 +0200
Subject: [lxml-dev] [lxml][objectify] optimization of recursive object
dumping
In-Reply-To: <453F9FB2.9030306@gkec.informatik.tu-darmstadt.de>
Message-ID:
Hi Stefan,
Stefan Behnel schrieb am
25.10.2006 19:32:34:
> I wrote up a patch that could do the trick. Sadly, it requires a
behavioural
> change in NumberElement to return repr(value) for str(). Not that
beautiful.
> I'm just posting it to give you an idea about what looks like a viable
> approach to me.
>
> I'll try to get 1.1.2 out tomorrow, so unless I get convinced by then
that
> this is a sufficiently solid idea, this will have to wait for the
> next release.
I'll probably be not able to look at anything before tomorrow, sorry.
But as svn trunk usually works rock solid anyway I'm not that tied
to official releases, anyway :)
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Oct 26 09:02:32 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 26 Oct 2006 09:02:32 +0200
Subject: [lxml-dev] [lxml][objectify] optimization questions
In-Reply-To:
References:
Message-ID: <45405D88.3010104@gkec.informatik.tu-darmstadt.de>
Hi Holger,
Holger Joukl wrote:
> Stefan Behnel wrote:
>> If you feel ambitious, take a look at the benchmark directory and try to
>> come up with a new benchmark suite "bench_objectify.py". The benchmark
>> framework
>> makes new benchmarks extremely easy to write and the four test XML trees
>> should be well suited for objectify already.
>
> Will take a look.
Well, it wasn't quite that well suited after all. I added some smaller
benchmarks myself and adapted the benchmark trees to simplify their use
through objectify. Tree 3 still doesn't fit, but trees 1,2 and 4 can be used
for benchmarking. Just look at bench_objectify.py to see how that works. So,
as you're testing anyway, I'd be happy if you could come up with some
additional benchmarks. That way, we could put some more results up on the
performance web page.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Oct 26 11:48:11 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 26 Oct 2006 11:48:11 +0200
Subject: [lxml-dev] lxml now ships with Pyrex included
Message-ID: <4540845B.90904@gkec.informatik.tu-darmstadt.de>
Hi,
as lxml will most likely continue to depend on a non-release version of pyrex
for quite a while, I decided to add the patched Pyrex version to the source
distribution. It is part of lxml's SVN checkout, I only add the Pyrex
directory to the project root in my release script. The setup.py script simply
prepends that directory to the Python path so that the modified Pyrex is used
automatically.
This has a couple of consequences for users:
* Distributors that want to apply patches to the lxml sources no longer have
to check out Pyrex themselves and can shrink their build scripts to a simple
run of setup.py.
* Users can now modify and build lxml distributions without globally
installing a patched Pyrex. They can also run "make clean" if something went
wrong, without breaking the build afterwards, even if they have an unpatched
Pyrex installed in their site-packages.
I hope this helps in making lxml easier to build. Note that SVN users still
have to install our Pyrex version, but I think that's acceptable.
Now that this works, we could also remove the .c files to shrink the source
distribution. Any objections?
Stefan
From sidnei at enfoldsystems.com Thu Oct 26 17:27:09 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Thu, 26 Oct 2006 12:27:09 -0300
Subject: [lxml-dev] NameError on docloader.pxi
Message-ID:
Hi there,
I had spotted a NameError in docloader.pxi, but fixed it locally then
forgot to send the patch. So here it is. It did bite me on the back
when I tried to compile lxml-trunk on another box :)
Traceback:
File "xslt.pxi", line 593, in etree._XSLTProcessingInstruction.parseXSL
File "parser.pxi", line 889, in etree._parseDocument
File "parser.pxi", line 893, in etree._parseDocumentFromURL
File "parser.pxi", line 810, in etree._parseDocFromFile
File "parser.pxi", line 522, in etree._BaseParser._parseDocFromFile
File "parser.pxi", line 591, in etree._handleParseResult
File "etree.pyx", line 201, in etree._ExceptionContext._raise_if_stored
File "parser.pxi", line 285, in etree._parser_resolve_from_python
File "docloader.pxi", line 84, in etree._ResolverRegistry.resolve
File "lxmlfilter.pyc", line 50, in resolve
File "docloader.pxi", line 47, in etree.Resolver.resolve_file
NameError: _ParserInput
--
Sidnei da Silva
http://www.enfoldsystems.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: docloader.diff
Type: application/octet-stream
Size: 867 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20061026/132fdfad/attachment.obj
From sidnei at enfoldsystems.com Thu Oct 26 20:07:38 2006
From: sidnei at enfoldsystems.com (Sidnei da Silva)
Date: Thu, 26 Oct 2006 15:07:38 -0300
Subject: [lxml-dev] Test failures on Windows
Message-ID:
I think I've mentioned the other day the fact some failures are seen
when running tests for trunk on Windows. Here's the log:
http://tinyurl.com/yhd4hc
If you want to follow the status of the builds, you can use this page.
You can also force a build from there by clicking on the builder name
(same line as 'changes' header):
http://tinyurl.com/yykhz7
--
Sidnei da Silva
http://www.enfoldsystems.com
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Oct 27 09:08:24 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 27 Oct 2006 09:08:24 +0200
Subject: [lxml-dev] NameError on docloader.pxi
In-Reply-To:
References:
Message-ID: <4541B068.6080407@gkec.informatik.tu-darmstadt.de>
Sidnei da Silva wrote:
> I had spotted a NameError in docloader.pxi, but fixed it locally then
> forgot to send the patch. So here it is. It did bite me on the back
> when I tried to compile lxml-trunk on another box :)
Obvious bug, obvious patch.
Thanks,
Stefan
From faassen at infrae.com Fri Oct 27 10:51:18 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Fri, 27 Oct 2006 10:51:18 +0200
Subject: [lxml-dev] lxml now ships with Pyrex included
In-Reply-To: <4540845B.90904@gkec.informatik.tu-darmstadt.de>
References: <4540845B.90904@gkec.informatik.tu-darmstadt.de>
Message-ID: <4541C886.20406@infrae.com>
Stefan Behnel wrote:
> Now that this works, we could also remove the .c files to shrink the source
> distribution. Any objections?
Yes, I object to removing the .c files. the .c files allow a guaranteed
sure build of lxml. There is absolutely no doubt which version of Pyrex
is used here, as that's under our own control (otherwise there might be
interesting import issues in play).
Also important, it allows tools like easy_install to download and
compile lxml fully automatically (such as in a buildout). I use this
feature all the time. I don't know whether that can work if a Pyrex is
bundled - the setup.py would likely become more complicated to account
for PYTHONPATH manipulation, and that might possibly break easy_install.
Regards,
Martijn
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Oct 27 12:57:59 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 27 Oct 2006 12:57:59 +0200
Subject: [lxml-dev] lxml now ships with Pyrex included
In-Reply-To: <4541C886.20406@infrae.com>
References: <4540845B.90904@gkec.informatik.tu-darmstadt.de>
<4541C886.20406@infrae.com>
Message-ID: <4541E637.2000504@gkec.informatik.tu-darmstadt.de>
Hi,
Martijn Faassen wrote:
> Stefan Behnel wrote:
>> Now that this works, we could also remove the .c files to shrink the
>> source distribution. Any objections?
>
> Yes, I object to removing the .c files. the .c files allow a guaranteed
> sure build of lxml. There is absolutely no doubt which version of Pyrex
> is used here, as that's under our own control (otherwise there might be
> interesting import issues in play).
I won't argue for it and it's fine to leave them in. It's only some 230K
difference in the tgz (50% more), so if that prevents us from running into
install problems ...
> Also important, it allows tools like easy_install to download and
> compile lxml fully automatically (such as in a buildout). I use this
> feature all the time. I don't know whether that can work if a Pyrex is
> bundled - the setup.py would likely become more complicated to account
> for PYTHONPATH manipulation, and that might possibly break easy_install.
No changes in setup.py are required. I only added Pyrex' package directory to
the lxml root directory and setup.py imports it nicely from there. I tested
that it builds with setuptools, so I don't quite see where this could
interfere with buildout or easy_install.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Oct 27 22:55:11 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 27 Oct 2006 22:55:11 +0200
Subject: [lxml-dev] Test failures on Windows
In-Reply-To:
References:
Message-ID: <4542722F.9060900@gkec.informatik.tu-darmstadt.de>
Hi,
Sidnei da Silva wrote:
> I think I've mentioned the other day the fact some failures are seen
> when running tests for trunk on Windows. Here's the log:
>
> http://tinyurl.com/yhd4hc
Thanks. Most of those are the usual Windows bugs that you can't delete an open
file. I fixed some and just silenced the remaining two - shouldn't be too much
of a problem if tiny temporary files are not deleted after running the test
cases (which is a rare enough event anyway...)
There's one problem left, though, and I don't have any idea where it might
come from.
======================================================================
FAIL: test_xslt_parameters (lxml.tests.test_xslt.ETreeXSLTTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\pybots\slave\trunk.dasilva-x86\build\lib\unittest.py", line 260, in run
testMethod()
File "C:\pybots\slave\python-tool\lxml-trunk\src\lxml\tests\test_xslt.py",
line 210, in test_xslt_parameters
st.apply, tree)
File "C:\pybots\slave\trunk.dasilva-x86\build\lib\unittest.py", line 326, in
failUnlessRaises
raise self.failureException, "%s not raised" % excName
AssertionError: XSLTApplyError not raised
----------------------------------------------------------------------
Test case being:
----------------------------------------------------------------------
def test_xslt_parameters(self):
tree = self.parse('BC')
style = self.parse('''\
''')
st = etree.XSLT(style)
res = st.apply(tree, bar="'Bar'")
self.assertEquals('''\
Bar
''',
st.tostring(res))
# apply without needed parameter will lead to XSLTApplyError
self.assertRaises(etree.XSLTApplyError,
st.apply, tree)
----------------------------------------------------------------------
Apparently, the comment is wrong here...
I'll have to find some time to look into this, unless someone has an idea?
Could anyone test this under Windows and maybe figure out what happens here?
Sometimes this means that an unexpected exception is raised instead - or none
at all?
Stefan
From sidnei at awkly.org Fri Oct 27 23:04:54 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Fri, 27 Oct 2006 18:04:54 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <4542722F.9060900@gkec.informatik.tu-darmstadt.de>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061027210454.GA4460@cotia>
On Fri, Oct 27, 2006 at 10:55:11PM +0200, Stefan Behnel wrote:
| Thanks. Most of those are the usual Windows bugs that you can't delete an open
| file. I fixed some and just silenced the remaining two - shouldn't be too much
| of a problem if tiny temporary files are not deleted after running the test
| cases (which is a rare enough event anyway...)
Yeah, as long as they are tiny :) I will be running the tests
constantly on that box. Maybe I can setup some task to clean up $TMP.
| There's one problem left, though, and I don't have any idea where it might
| come from.
|
| ======================================================================
| FAIL: test_xslt_parameters (lxml.tests.test_xslt.ETreeXSLTTestCase)
| ----------------------------------------------------------------------
| Traceback (most recent call last):
| File "C:\pybots\slave\trunk.dasilva-x86\build\lib\unittest.py", line 260, in run
| testMethod()
| File "C:\pybots\slave\python-tool\lxml-trunk\src\lxml\tests\test_xslt.py",
| line 210, in test_xslt_parameters
| st.apply, tree)
| File "C:\pybots\slave\trunk.dasilva-x86\build\lib\unittest.py", line 326, in
| failUnlessRaises
| raise self.failureException, "%s not raised" % excName
| AssertionError: XSLTApplyError not raised
| ----------------------------------------------------------------------
|
| Test case being:
|
| ----------------------------------------------------------------------
| def test_xslt_parameters(self):
| tree = self.parse('BC')
| style = self.parse('''\
|
|
|
|
|
| ''')
|
| st = etree.XSLT(style)
| res = st.apply(tree, bar="'Bar'")
| self.assertEquals('''\
|
| Bar
| ''',
| st.tostring(res))
| # apply without needed parameter will lead to XSLTApplyError
| self.assertRaises(etree.XSLTApplyError,
| st.apply, tree)
| ----------------------------------------------------------------------
|
| Apparently, the comment is wrong here...
|
| I'll have to find some time to look into this, unless someone has an idea?
| Could anyone test this under Windows and maybe figure out what happens here?
| Sometimes this means that an unexpected exception is raised instead - or none
| at all?
I can look at that later today.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From sidnei at awkly.org Sat Oct 28 02:06:59 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Fri, 27 Oct 2006 21:06:59 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <4542722F.9060900@gkec.informatik.tu-darmstadt.de>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061028000659.GD4460@cotia>
| Apparently, the comment is wrong here...
|
| I'll have to find some time to look into this, unless someone has an idea?
| Could anyone test this under Windows and maybe figure out what happens here?
| Sometimes this means that an unexpected exception is raised instead - or none
| at all?
So here's what that line gives:
(Pdb) p st.tostring(st.apply(tree))
'\n\n'
Looks like it just assumed the parameter was empty or something.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From sidnei at awkly.org Sat Oct 28 03:26:12 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Fri, 27 Oct 2006 22:26:12 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <4542722F.9060900@gkec.informatik.tu-darmstadt.de>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061028012612.GE4460@cotia>
On Fri, Oct 27, 2006 at 10:55:11PM +0200, Stefan Behnel wrote:
| Thanks. Most of those are the usual Windows bugs that you can't delete an open
| file. I fixed some and just silenced the remaining two - shouldn't be too much
| of a problem if tiny temporary files are not deleted after running the test
| cases (which is a rare enough event anyway...)
Two issues here:
- In Python2.4 it raises OSError instead of WindowsError. I guess
that is one of the changes in Python2.5.
- I believe that this might be a real bug that needs fixing.
Why it might be a bug:
- I looked at the source in lxml and I see that this ends up calling
xmlparser.xmlCtxtReadFile, which just delegates down to
libxml2. Well, somewhere in there it seems like the file is read
but not closed.
By trial-and-failure, I've come up with the attached patch, which fixes
the failures on Windows. Someone more experienced should review this.
careful-not-to-hide-the-dirt-under-the-rug'ly yours,
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
-------------- next part --------------
Index: lxml/xmlparser.pxd
===================================================================
--- lxml/xmlparser.pxd (revision 33825)
+++ lxml/xmlparser.pxd (working copy)
@@ -53,6 +53,7 @@
int recovery
int options
xmlError lastError
+ xmlParserInput* input
xmlNode* node
xmlSAXHandler* sax
@@ -127,4 +128,5 @@
char* buffer)
cdef xmlParserInput* xmlNewInputFromFile(xmlParserCtxt* ctxt,
char* filename)
+ cdef xmlParserInput* inputPop(xmlParserCtxt* ctxt)
cdef void xmlFreeInputStream(xmlParserInput* input)
Index: lxml/parser.pxi
===================================================================
--- lxml/parser.pxi (revision 33825)
+++ lxml/parser.pxi (working copy)
@@ -574,6 +574,9 @@
tree.xmlFreeDoc(ctxt.myDoc)
ctxt.myDoc = NULL
+ if ctxt.input is not NULL:
+ xmlparser.xmlFreeInputStream(xmlparser.inputPop(ctxt))
+
if result is not NULL:
if ctxt.wellFormed or recover:
__GLOBAL_PARSER_CONTEXT.initDocDict(result)
From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Oct 28 10:05:14 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 28 Oct 2006 10:05:14 +0200
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061028012612.GE4460@cotia>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
Message-ID: <45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
Hi Sidnei,
Sidnei da Silva wrote:
> On Fri, Oct 27, 2006 at 10:55:11PM +0200, Stefan Behnel wrote:
> | Thanks. Most of those are the usual Windows bugs that you can't delete an open
> | file. I fixed some and just silenced the remaining two - shouldn't be too much
> | of a problem if tiny temporary files are not deleted after running the test
> | cases (which is a rare enough event anyway...)
>
> Two issues here:
>
> - In Python2.4 it raises OSError instead of WindowsError. I guess
> that is one of the changes in Python2.5.
Good to know.
> - I believe that this might be a real bug that needs fixing.
>
> Why it might be a bug:
>
> - I looked at the source in lxml and I see that this ends up calling
> xmlparser.xmlCtxtReadFile, which just delegates down to
> libxml2. Well, somewhere in there it seems like the file is read
> but not closed.
You got me convinced. I think that's because we are using the context reusing
calls (xmlCtxt*). They require calling xmlCtxtReset afterwards to clean up
both the input stack and memory resources. This is normally called
automatically when using the parser context the next time (which is why there
never were any enduring side effects), but waiting for that has the temporal
side effect of leaving the input stream open when passing control back to the
user code.
Now, the problem is, running xmlCtxtReset can currently segfault in some
cases, so we can't just call it carelessly. I played with it a bit to figure
out in which cases it can be called, but it doesn't look like we can safely
call it in every case where it would make sense. Guess I'll file a bug report
on it and try to come up with a work-around...
Stefan
From sidnei at awkly.org Sat Oct 28 15:07:19 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Sat, 28 Oct 2006 10:07:19 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061028130719.GA4589@cotia>
On Sat, Oct 28, 2006 at 10:05:14AM +0200, Stefan Behnel wrote:
| You got me convinced. I think that's because we are using the context reusing
| calls (xmlCtxt*). They require calling xmlCtxtReset afterwards to clean up
| both the input stack and memory resources. This is normally called
| automatically when using the parser context the next time (which is why there
| never were any enduring side effects), but waiting for that has the temporal
| side effect of leaving the input stream open when passing control back to the
| user code.
|
| Now, the problem is, running xmlCtxtReset can currently segfault in some
| cases, so we can't just call it carelessly. I played with it a bit to figure
| out in which cases it can be called, but it doesn't look like we can safely
| call it in every case where it would make sense. Guess I'll file a bug report
| on it and try to come up with a work-around...
Erm.... did you look at the attached patch? It just frees ctxt->input
if its not NULL. I guess you're looking for a generic fix though.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Oct 28 15:16:43 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 28 Oct 2006 15:16:43 +0200
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061028130719.GA4589@cotia>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
Message-ID: <4543583B.1020702@gkec.informatik.tu-darmstadt.de>
Hi Sidnei,
Sidnei da Silva wrote:
> On Sat, Oct 28, 2006 at 10:05:14AM +0200, Stefan Behnel wrote:
> | You got me convinced. I think that's because we are using the context reusing
> | calls (xmlCtxt*). They require calling xmlCtxtReset afterwards to clean up
> | both the input stack and memory resources. This is normally called
> | automatically when using the parser context the next time (which is why there
> | never were any enduring side effects), but waiting for that has the temporal
> | side effect of leaving the input stream open when passing control back to the
> | user code.
> |
> | Now, the problem is, running xmlCtxtReset can currently segfault in some
> | cases, so we can't just call it carelessly. I played with it a bit to figure
> | out in which cases it can be called, but it doesn't look like we can safely
> | call it in every case where it would make sense. Guess I'll file a bug report
> | on it and try to come up with a work-around...
>
> Erm.... did you look at the attached patch? It just frees ctxt->input
> if its not NULL. I guess you're looking for a generic fix though.
Not only generic. Pending open files is only a symptom here. The real problem
is that none of the resources allocated for parsing is freed before you call
the parser again (in which case new resources will be allocated right away).
So popping the input streams fixes the windows problem, but calling
xmlClearParserCtxt() after parsing would be the right thing to do - if it
didn't crash.
Stefan
From sidnei at awkly.org Sat Oct 28 15:25:35 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Sat, 28 Oct 2006 10:25:35 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <4543583B.1020702@gkec.informatik.tu-darmstadt.de>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061028132535.GB4589@cotia>
On Sat, Oct 28, 2006 at 03:16:43PM +0200, Stefan Behnel wrote:
| Not only generic. Pending open files is only a symptom here. The real problem
| is that none of the resources allocated for parsing is freed before you call
| the parser again (in which case new resources will be allocated right away).
|
| So popping the input streams fixes the windows problem, but calling
| xmlClearParserCtxt() after parsing would be the right thing to do - if it
| didn't crash.
Thanks for the explanation. That makes a lot of sense.
A question though. Is the parser context expensive to allocate? Why
not use xmlFreeParserCtxt() and allocate a new one instead of
xmlResetParserCtxt()?
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Oct 28 15:43:05 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 28 Oct 2006 15:43:05 +0200
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061028132535.GB4589@cotia>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
Message-ID: <45435E69.6050609@gkec.informatik.tu-darmstadt.de>
Hi Sidnei,
Sidnei da Silva wrote:
> A question though. Is the parser context expensive to allocate? Why
> not use xmlFreeParserCtxt() and allocate a new one instead of
> xmlResetParserCtxt()?
It is pretty expensive to allocate. There are some hash-table allocations
involved that are rather costly (that's why there is a function for resetting
the context). In lxml, we try to reuse context objects wherever possible.
Stefan
From sidnei at awkly.org Sat Oct 28 16:01:00 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Sat, 28 Oct 2006 11:01:00 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <45435E69.6050609@gkec.informatik.tu-darmstadt.de>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
<45435E69.6050609@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061028140100.GC4589@cotia>
On Sat, Oct 28, 2006 at 03:43:05PM +0200, Stefan Behnel wrote:
| It is pretty expensive to allocate. There are some hash-table allocations
| involved that are rather costly (that's why there is a function for resetting
| the context). In lxml, we try to reuse context objects wherever possible.
One last question, do you have a small test that can reproduce the
segfault or is it just random? I would like to spend some time
tracking that one down.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Oct 28 16:12:30 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 28 Oct 2006 16:12:30 +0200
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061028140100.GC4589@cotia>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
<45435E69.6050609@gkec.informatik.tu-darmstadt.de>
<20061028140100.GC4589@cotia>
Message-ID: <4543654E.1040006@gkec.informatik.tu-darmstadt.de>
Hi,
Sidnei da Silva wrote:
> On Sat, Oct 28, 2006 at 03:43:05PM +0200, Stefan Behnel wrote:
> | It is pretty expensive to allocate. There are some hash-table allocations
> | involved that are rather costly (that's why there is a function for resetting
> | the context). In lxml, we try to reuse context objects wherever possible.
>
> One last question, do you have a small test that can reproduce the
> segfault or is it just random? I would like to spend some time
> tracking that one down.
No need to do that. It's in line 12837 in file parser.c. Apparently, the
assumption that ctxt->spaceTab has always been initialised when calling
xmlCtxtReset() is wrong. Here's my bug report:
http://bugzilla.gnome.org/show_bug.cgi?id=366161
My current work around is to do the NULL check myself and to initialise it the
way libxml2 normally does before calling the reset. That's what's I'd normally
expect libxml2 to do...
BTW, I don't know in which cases this field remains uninitialised, so if you
want to add some more infos to the bug report, feel free to investigate. I
only know that it definitely happens in the case where we call
xmlCtxtReadFile(). It is possible that this requires previous runs of the
parser, maybe even a failed run preceding the crash, can't tell...
Stefan
From sidnei at awkly.org Sat Oct 28 16:55:36 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Sat, 28 Oct 2006 11:55:36 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <4543654E.1040006@gkec.informatik.tu-darmstadt.de>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
<45435E69.6050609@gkec.informatik.tu-darmstadt.de>
<20061028140100.GC4589@cotia>
<4543654E.1040006@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061028145536.GD4589@cotia>
On Sat, Oct 28, 2006 at 04:12:30PM +0200, Stefan Behnel wrote:
| No need to do that. It's in line 12837 in file parser.c. Apparently, the
| assumption that ctxt->spaceTab has always been initialised when calling
| xmlCtxtReset() is wrong. Here's my bug report:
|
| http://bugzilla.gnome.org/show_bug.cgi?id=366161
|
| My current work around is to do the NULL check myself and to initialise it the
| way libxml2 normally does before calling the reset. That's what's I'd normally
| expect libxml2 to do...
Yes, that check is certainly missing. Note that htmlCtxtReset() does
the check! I added that info to the bug report.
| BTW, I don't know in which cases this field remains uninitialised, so if you
| want to add some more infos to the bug report, feel free to investigate. I
| only know that it definitely happens in the case where we call
| xmlCtxtReadFile(). It is possible that this requires previous runs of the
| parser, maybe even a failed run preceding the crash, can't tell...
Maybe in spacePush, if memory allocation fails. That seems a bit odd
though... unless you're really short on memory :)
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Oct 28 21:55:55 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 28 Oct 2006 21:55:55 +0200
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061028145536.GD4589@cotia>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
<45435E69.6050609@gkec.informatik.tu-darmstadt.de>
<20061028140100.GC4589@cotia>
<4543654E.1040006@gkec.informatik.tu-darmstadt.de>
<20061028145536.GD4589@cotia>
Message-ID: <4543B5CB.90705@gkec.informatik.tu-darmstadt.de>
Hi Sidnei,
Sidnei da Silva wrote:
> On Sat, Oct 28, 2006 at 04:12:30PM +0200, Stefan Behnel wrote:
> | the assumption that ctxt->spaceTab has always been initialised when calling
> | xmlCtxtReset() is wrong.
> | I don't know in which cases this field remains uninitialised, so if you
> | want to add some more infos to the bug report, feel free to investigate. I
> | only know that it definitely happens in the case where we call
> | xmlCtxtReadFile(). It is possible that this requires previous runs of the
> | parser, maybe even a failed run preceding the crash, can't tell...
>
> Maybe in spacePush, if memory allocation fails. That seems a bit odd
> though... unless you're really short on memory :)
I think I was on the wrong track. It's a bug in libxml2, but not the right
one. The crash appears somewhere in the long doctest in api.txt, most likely
in a place where errors are tested when parsing from a string. That's not what
we are looking for.
You said that your patch fixes the problem, are you certain about that?
Because calling xmlCtxtReset() should do exactly that (and a lot more) and it
doesn't seem to solve the problem - according to the buildbot.
Stefan
From sidnei at awkly.org Sat Oct 28 23:06:06 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Sat, 28 Oct 2006 18:06:06 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <4543B5CB.90705@gkec.informatik.tu-darmstadt.de>
References: <20061028012612.GE4460@cotia>
<45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
<45435E69.6050609@gkec.informatik.tu-darmstadt.de>
<20061028140100.GC4589@cotia>
<4543654E.1040006@gkec.informatik.tu-darmstadt.de>
<20061028145536.GD4589@cotia>
<4543B5CB.90705@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061028210606.GF4589@cotia>
On Sat, Oct 28, 2006 at 09:55:55PM +0200, Stefan Behnel wrote:
| I think I was on the wrong track. It's a bug in libxml2, but not the right
| one. The crash appears somewhere in the long doctest in api.txt, most likely
| in a place where errors are tested when parsing from a string. That's not what
| we are looking for.
Ok... but that's no excuse for the check in HTMLparser to not be done
on parser.
| You said that your patch fixes the problem, are you certain about that?
| Because calling xmlCtxtReset() should do exactly that (and a lot more) and it
| doesn't seem to solve the problem - according to the buildbot.
Yes, it does solve the problem for me on Python 2.4, ie, OSError is
not raised after applying my patch.
--
-e Sidnei da Silva
-e Enfold Systems http://enfoldsystems.com
-e Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From sidnei at awkly.org Sat Oct 28 23:17:33 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Sat, 28 Oct 2006 18:17:33 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061028210606.GF4589@cotia>
References: <45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
<45435E69.6050609@gkec.informatik.tu-darmstadt.de>
<20061028140100.GC4589@cotia>
<4543654E.1040006@gkec.informatik.tu-darmstadt.de>
<20061028145536.GD4589@cotia>
<4543B5CB.90705@gkec.informatik.tu-darmstadt.de>
<20061028210606.GF4589@cotia>
Message-ID: <20061028211733.GG4589@cotia>
| | You said that your patch fixes the problem, are you certain about that?
| | Because calling xmlCtxtReset() should do exactly that (and a lot more) and it
| | doesn't seem to solve the problem - according to the buildbot.
I see that you only call xmlClearContext if spaceTab is not
NULL. Maybe that's the issue.
--
-e Sidnei da Silva
-e Enfold Systems http://enfoldsystems.com
-e Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Sun Oct 29 00:05:40 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sun, 29 Oct 2006 00:05:40 +0200
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061028211733.GG4589@cotia>
References: <45430F3A.7050602@gkec.informatik.tu-darmstadt.de>
<20061028130719.GA4589@cotia>
<4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
<45435E69.6050609@gkec.informatik.tu-darmstadt.de>
<20061028140100.GC4589@cotia>
<4543654E.1040006@gkec.informatik.tu-darmstadt.de>
<20061028145536.GD4589@cotia>
<4543B5CB.90705@gkec.informatik.tu-darmstadt.de>
<20061028210606.GF4589@cotia> <20061028211733.GG4589@cotia>
Message-ID: <4543D434.1020901@gkec.informatik.tu-darmstadt.de>
Hi Sidnei,
Sidnei da Silva wrote:
> | | You said that your patch fixes the problem, are you certain about that?
> | | Because calling xmlCtxtReset() should do exactly that (and a lot more) and it
> | | doesn't seem to solve the problem - according to the buildbot.
>
> I see that you only call xmlClearContext if spaceTab is not
> NULL. Maybe that's the issue.
No, I also tried initialising spaceTab by hand to always call reset(). No
difference. So this is definitely not the problem. And it really makes me
wonder how your patch can work if xmlClearParserCtxt() does not, because your
code snippet is straight in there and I can't see a way it should not get
executed if clear() is called.
AFAICT, with the call to clear(), all input streams and memory resources
should get freed after parsing, so that's all I wanted. And still the problem
of a file descriptor staying open remains. I'll wait for the next buildbot run
to see, but if that fails, I'll really get clueless...
Note, BTW, that the error occurs in a finally block, so maybe it's already
shadowing an exception?
Stefan
From sidnei at awkly.org Sun Oct 29 02:17:41 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Sat, 28 Oct 2006 22:17:41 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <4543D434.1020901@gkec.informatik.tu-darmstadt.de>
References: <4543583B.1020702@gkec.informatik.tu-darmstadt.de>
<20061028132535.GB4589@cotia>
<45435E69.6050609@gkec.informatik.tu-darmstadt.de>
<20061028140100.GC4589@cotia>
<4543654E.1040006@gkec.informatik.tu-darmstadt.de>
<20061028145536.GD4589@cotia>
<4543B5CB.90705@gkec.informatik.tu-darmstadt.de>
<20061028210606.GF4589@cotia> <20061028211733.GG4589@cotia>
<4543D434.1020901@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061029011741.GH4589@cotia>
On Sun, Oct 29, 2006 at 12:05:40AM +0200, Stefan Behnel wrote:
| No, I also tried initialising spaceTab by hand to always call reset(). No
| difference. So this is definitely not the problem. And it really makes me
| wonder how your patch can work if xmlClearParserCtxt() does not, because your
| code snippet is straight in there and I can't see a way it should not get
| executed if clear() is called.
|
| AFAICT, with the call to clear(), all input streams and memory resources
| should get freed after parsing, so that's all I wanted. And still the problem
| of a file descriptor staying open remains. I'll wait for the next buildbot run
| to see, but if that fails, I'll really get clueless...
FWIW, I reverted my patch and did a svn up, and your changes seem to
work here. No segfault or anything. The buildbot seems to be in some
funny state now, maybe due to some checkin on Python 2.5/trunk.
--
-e Sidnei da Silva
-e Enfold Systems http://enfoldsystems.com
-e Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Oct 30 16:28:06 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 30 Oct 2006 16:28:06 +0100
Subject: [lxml-dev] lxml 1.1.2 released
Message-ID: <45461A06.20909@gkec.informatik.tu-darmstadt.de>
Hi everyone,
lxml 1.1.2 finally made it to the cheeseshop.
http://cheeseshop.python.org/pypi/lxml
This is mainly a bugfix release for the stable 1.1 series, the changelog is
below. As there were a number of important fixes, updating is recommended.
Eggs for x86-64 are uploaded already, and I'd love to see more eggs thrown in
that direction.
Have fun,
Stefan
1.1.2 (2006-10-30)
Features added
* Data elements in objectify support repr(), which is now used by dump()
* Source distribution now ships with a patched Pyrex
* New C-API function makeElement() to create new elements with text, tail,
attributes and namespaces
* Reuse original parser flags for XInclude
* Simplified support for handling XSLT processing instructions
Bugs fixed
* Parser resources were not freed before the next parser run
* Open files and XML strings returned by Python resolvers were not
closed/freed
* Crash in the IDDict returned by XMLDTDID
* Copying Comments and ProcessingInstructions failed
* Memory leak for external URLs in _XSLTProcessingInstruction.parseXSL()
* Memory leak when garbage collecting tailed root elements
* HTML script/style content was not propagated to .text
* Show text xincluded between text nodes correctly in .text and .tail
* 'integer * objectify.StringElement' operation was not supported
From sidnei at awkly.org Mon Oct 30 18:12:33 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Mon, 30 Oct 2006 14:12:33 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061028000659.GD4460@cotia>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028000659.GD4460@cotia>
Message-ID: <20061030171233.GA4630@cotia>
Hi Stefan,
On Fri, Oct 27, 2006 at 09:06:59PM -0300, Sidnei da Silva wrote:
| | Apparently, the comment is wrong here...
| |
| | I'll have to find some time to look into this, unless someone has an idea?
| | Could anyone test this under Windows and maybe figure out what happens here?
| | Sometimes this means that an unexpected exception is raised instead - or none
| | at all?
|
| So here's what that line gives:
|
| (Pdb) p st.tostring(st.apply(tree))
| '\n\n'
|
| Looks like it just assumed the parameter was empty or something.
Did you had a chance to look at the XSLTApplyError not being raised?
Does that test fail on Linux? Maybe it's an issue with the version of
libxml2 that I'm using on the buildbot?
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Oct 30 18:28:56 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 30 Oct 2006 18:28:56 +0100
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <20061030171233.GA4630@cotia>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028000659.GD4460@cotia> <20061030171233.GA4630@cotia>
Message-ID: <45463658.8030801@gkec.informatik.tu-darmstadt.de>
Hi Sidnei,
Sidnei da Silva wrote:
> Hi Stefan,
>
> On Fri, Oct 27, 2006 at 09:06:59PM -0300, Sidnei da Silva wrote:
> | | Apparently, the comment is wrong here...
> | |
> | | I'll have to find some time to look into this, unless someone has an idea?
> | | Could anyone test this under Windows and maybe figure out what happens here?
> | | Sometimes this means that an unexpected exception is raised instead - or none
> | | at all?
> |
> | So here's what that line gives:
> |
> | (Pdb) p st.tostring(st.apply(tree))
> | '\n\n'
> |
> | Looks like it just assumed the parameter was empty or something.
>
> Did you had a chance to look at the XSLTApplyError not being raised?
Yes, I looked into it and decided it's not critical enough to delay 1.1.2 even
more.
> Does that test fail on Linux?
No, not on my machine.
> Maybe it's an issue with the version of
> libxml2 that I'm using on the buildbot?
I tried with libxml2 2.6.24 and 2.6.26. Both pass the test nicely. I also
looked through the code path and couldn't find anything obvious that would
behave differently on different systems. I have no idea why that test fails on
the buildbot.
BTW, the buildbot logs (all of them) seem to be broken currently, don't know
where that comes from.
Stefan
From sidnei at awkly.org Tue Oct 31 12:28:41 2006
From: sidnei at awkly.org (Sidnei da Silva)
Date: Tue, 31 Oct 2006 08:28:41 -0300
Subject: [lxml-dev] Test failures on Windows
In-Reply-To: <45463658.8030801@gkec.informatik.tu-darmstadt.de>
References:
<4542722F.9060900@gkec.informatik.tu-darmstadt.de>
<20061028000659.GD4460@cotia> <20061030171233.GA4630@cotia>
<45463658.8030801@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061031112841.GA4594@cotia>
On Mon, Oct 30, 2006 at 06:28:56PM +0100, Stefan Behnel wrote:
| I tried with libxml2 2.6.24 and 2.6.26. Both pass the test nicely. I also
| looked through the code path and couldn't find anything obvious that would
| behave differently on different systems. I have no idea why that test fails on
| the buildbot.
I see that you added some extra output with the version numbers to the output.
http://tinyurl.com/ymbmrt
| BTW, the buildbot logs (all of them) seem to be broken currently, don't know
| where that comes from.
Yeah, the master got out of sync, somehow. It's working again now.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
From dan at danposluns.com Tue Oct 31 22:55:28 2006
From: dan at danposluns.com (Dan Posluns)
Date: Tue, 31 Oct 2006 13:55:28 -0800
Subject: [lxml-dev] Can't build on Windows
Message-ID: <4547C650.70104@danposluns.com>
I'm not normally a Windows user so I'm at my wit's end here trying to
get lxml to install.
When I try easy_install I get the following:
C:\Python24\Scripts>easy_install lxml
Searching for lxml
Reading http://www.python.org/pypi/lxml/
Reading http://codespeak.net/lxml
Reading http://www.python.org/pypi/lxml/1.1.2
Best match: lxml 1.1.2
Downloading http://codespeak.net/lxml/lxml-1.1.2.tgz
Processing lxml-1.1.2.tgz
Running lxml-1.1.2\setup.py -q bdist_egg --dist-dir
c:\docume~1\dposluns\locals~
1\temp\easy_install-vc3ufv\lxml-1.1.2\egg-dist-tmp-vxzlya
Building lxml version 1.1.2
warning: no files found matching 'etree.c' under directory 'src\lxml'
warning: no files found matching 'objectify.c' under directory 'src\lxml'
warning: no files found matching 'etree.h' under directory 'src\lxml'
warning: no files found matching 'etree_defs.h' under directory 'src\lxml'
warning: no files found matching 'pubkey.asc' under directory 'doc'
warning: no previously-included files found matching 'doc\pyrex.txt'
warning: no previously-included files found matching 'src\lxml\etree.pxi'
cl : Command line warning D4025 : overriding '/W3' with '/w'
cl : Command line warning D4029 : optimization is not available in the
standard
edition compiler
etree.c
c:\Documents and Settings\dposluns\Local
Settings\Temp\easy_install-vc3ufv\lxml-
1.1.2\src\lxml\etree_defs.h(20) : fatal error C1083: Cannot open include
file: '
libxml/xmlversion.h': No such file or directory
error: Setup script exited with error: command '"C:\Program
Files\Microsoft Visu
al Studio .NET 2003\Vc7\bin\cl.exe"' failed with exit status 2
When I try David Sankel's technique to build the libraries statically, I
manage to get as far as adding wsock32 (which took me long enough to
figure out how to do - again, not a Windows programmer) before the build
process borked out on me with the same errors. I was using the Windows
distributions of libxml, libxslt, iconv and zlib from zlatkovic.com.
Can anyone help me figure this one out?
Thanks,
Dan.
--
Dan Posluns, B. Eng. & Scty. (Software Engineering and Society)
dan at danposluns.com - ICQ: 35758902
http://www.danposluns.com
"The great thing about being the only species on the planet that makes
a distinction between right and wrong is that we get to make the rules
up for ourselves as we go."
- Douglas Adams, Last Chance to See