From chris0wj at gmail.com Sat Aug 1 13:48:39 2009
From: chris0wj at gmail.com (Chris Wj)
Date: Sat, 1 Aug 2009 07:48:39 -0400
Subject: [lxml-dev] Splitting an xml file.
Message-ID: <3a0f5ffd0908010448y19bcdbacv1d219b623f0fb36f@mail.gmail.com>
I'm looking for the best way to split an xml file with many children into
multiple files with the same parent tags but individual children.
Example:
Turn this...
file1.xml
a whole bunch of stuff...
child 1 stuff...
child 2 stuff...
Into 2 files...
file1a.xml
a whole bunch of stuff...
child 1 stuff...
file1b.xml
a whole bunch of stuff...
child 2 stuff...
Should I use lxml.etree to find the line numbers and then just use file
operations? You guys think that is most efficient?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090801/794ff602/attachment.htm
From shigin at rambler-co.ru Tue Aug 4 21:22:33 2009
From: shigin at rambler-co.ru (Alexander Shigin)
Date: Tue, 04 Aug 2009 23:22:33 +0400
Subject: [lxml-dev] Splitting an xml file.
In-Reply-To: <3a0f5ffd0908010448y19bcdbacv1d219b623f0fb36f@mail.gmail.com>
References: <3a0f5ffd0908010448y19bcdbacv1d219b623f0fb36f@mail.gmail.com>
Message-ID: <1249413753.11605.13.camel@dervish>
? ???, 01/08/2009 ? 07:48 -0400, Chris Wj ?????:
> I'm looking for the best way to split an xml file with many children
> into multiple files with the same parent tags but individual children.
...
> Should I use lxml.etree to find the line numbers and then just use
> file operations? You guys think that is most efficient?
I think that the simplest way to split the file is to remove all 'child'
element from parsed document and serialize document with different
childs.
In [3]: parsed = etree.parse('q.xml')
In [4]: root = parsed.getroot()
In [5]: childs = parsed.findall('child')
In [6]: for child in childs: root.remove(child)
In [7]: for num, child in enumerate(childs):
... root.append(child)
... f = codecs.open('file1%s.xml' % num, 'w', encoding='utf-8')
... f.write(etree.tounicode(parsed))
... f.close()
... root.remove(child)
This solution works incorrect with tail text or nodes after 'child'
nodes. I don't know if it's critical for you, but the next XML will be
split in wrong way.
....
1234
........
If you need to split big XML files, it's much better to use SAX
interface. But SAX reader/writer is a way harder to implement.
From chris0wj at gmail.com Tue Aug 4 22:16:49 2009
From: chris0wj at gmail.com (Chris Wj)
Date: Tue, 4 Aug 2009 16:16:49 -0400
Subject: [lxml-dev] Splitting an xml file.
In-Reply-To: <1249413753.11605.13.camel@dervish>
References: <3a0f5ffd0908010448y19bcdbacv1d219b623f0fb36f@mail.gmail.com>
<1249413753.11605.13.camel@dervish>
Message-ID: <3a0f5ffd0908041316o3a2621ffxabf2beab5cdbd0d8@mail.gmail.com>
What about xslt, can I use that to accomplish the task?
On Tue, Aug 4, 2009 at 3:22 PM, Alexander Shigin wrote:
> ? ???, 01/08/2009 ? 07:48 -0400, Chris Wj ?????:
> > I'm looking for the best way to split an xml file with many children
> > into multiple files with the same parent tags but individual children.
> ...
> > Should I use lxml.etree to find the line numbers and then just use
> > file operations? You guys think that is most efficient?
>
> I think that the simplest way to split the file is to remove all 'child'
> element from parsed document and serialize document with different
> childs.
>
> In [3]: parsed = etree.parse('q.xml')
> In [4]: root = parsed.getroot()
> In [5]: childs = parsed.findall('child')
> In [6]: for child in childs: root.remove(child)
> In [7]: for num, child in enumerate(childs):
> ... root.append(child)
> ... f = codecs.open('file1%s.xml' % num, 'w', encoding='utf-8')
> ... f.write(etree.tounicode(parsed))
> ... f.close()
> ... root.remove(child)
>
> This solution works incorrect with tail text or nodes after 'child'
> nodes. I don't know if it's critical for you, but the next XML will be
> split in wrong way.
>
> ....
> 1234
> ....
> ....
>
>
> If you need to split big XML files, it's much better to use SAX
> interface. But SAX reader/writer is a way harder to implement.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090804/59a89c72/attachment-0001.htm
From kris at cs.ucsb.edu Wed Aug 5 00:44:10 2009
From: kris at cs.ucsb.edu (kristian kvilekval)
Date: Tue, 04 Aug 2009 15:44:10 -0700
Subject: [lxml-dev] Key error on del attribute?
Message-ID: <1249425850.27934.78.camel@loup.ece.ucsb.edu>
We need to delete an attribute on an Element node,
however we are receiving a strange exception.
> a=etree.Element('a', z='1', x='2')
> a.attrib['x']
'2'
> del a.attrib['x']
> del a.attrib['x']
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (3059, 0))
We could add a call to has_key, however we expect a simple KeyError
exception to be raised.
From jlovell at nwesd.org Wed Aug 5 01:23:13 2009
From: jlovell at nwesd.org (John Lovell)
Date: Tue, 4 Aug 2009 16:23:13 -0700
Subject: [lxml-dev] Key error on del attribute?
In-Reply-To: <1249425850.27934.78.camel@loup.ece.ucsb.edu>
References: <1249425850.27934.78.camel@loup.ece.ucsb.edu>
Message-ID:
On Ubuntu 9.04 I get a KeyError thrown. Can you provide a list of versions like the below?
python: 2.6.2
lxml.etree: (2, 1, 5, 0)
libxml used: (2, 6, 32)
libxml compiled: (2, 6, 32)
libxslt used: (1, 1, 24)
libxslt compiled: (1, 1, 24)
This should help...
http://codespeak.net/lxml/2.0/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do
Good luck,
John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
(360) 299-4086
jlovell at nwesd.org
www.nwesd.org
Together We Can ...
-----Original Message-----
From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of kristian kvilekval
Sent: Tuesday, August 04, 2009 3:44 PM
To: lxml-dev at codespeak.net
Subject: [lxml-dev] Key error on del attribute?
We need to delete an attribute on an Element node, however we are receiving a strange exception.
> a=etree.Element('a', z='1', x='2')
> a.attrib['x']
'2'
> del a.attrib['x']
> del a.attrib['x']
ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (3059, 0))
We could add a call to has_key, however we expect a simple KeyError exception to be raised.
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
From kris at cs.ucsb.edu Wed Aug 5 01:34:08 2009
From: kris at cs.ucsb.edu (kristian kvilekval)
Date: Tue, 04 Aug 2009 16:34:08 -0700
Subject: [lxml-dev] Key error on del attribute?
In-Reply-To:
References: <1249425850.27934.78.camel@loup.ece.ucsb.edu>
Message-ID: <1249428848.27934.94.camel@loup.ece.ucsb.edu>
On Tue, 2009-08-04 at 16:16 -0700, John Lovell wrote:
> On Ubuntu 9.04 I get a KeyError thrown. Can you provide a list of versions like the below?
>
> python: 2.6.2
> lxml.etree: (2, 1, 5, 0)
> libxml used: (2, 6, 32)
> libxml compiled: (2, 6, 32)
> libxslt used: (1, 1, 24)
> libxslt compiled: (1, 1, 24)
Bizarre .. your right it works in python..
it's the error parsing in ipython that runs into trouble:
Not sure if the bug is in ipython or lxml but no matter.
lxml.etree: (2, 1, 5, 0)
libxml used: (2, 6, 32)
libxml compiled: (2, 6, 32)
libxslt used: (1, 1, 24)
libxslt compiled: (1, 1, 24)
--------------------------------------------------------------------
Python 2.5.2 (r252:60911, Jan 4 2009, 21:59:32)
Type "copyright", "credits" or "license" for more information.
IPython 0.8.4 -- An enhanced Interactive Python.
In [3]: a=etree.Element('a', z='1', x='2')
In [4]: del a.attrib['x']
In [5]: del a.attrib['x']
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (3059, 0))
-------------------------------------------------
$ python
Python 2.5.2 (r252:60911, Jan 4 2009, 21:59:32)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=etree.Element('a', z='1', x='2')
Traceback (most recent call last):
File "", line 1, in
NameError: name 'etree' is not defined
>>> from lxml import etree
>>> a=etree.Element('a', z='1', x='2')
>>> del a.attrib['x']
>>> del a.attrib['x']
Traceback (most recent call last):
File "", line 1, in
File "lxml.etree.pyx", line 1857, in lxml.etree._Attrib.__delitem__
(src/lxml/lxml.etree.c:18787)
File "apihelpers.pxi", line 435, in lxml.etree._delAttribute
(src/lxml/lxml.etree.c:31747)
KeyError: 'x'
> This should help...
> http://codespeak.net/lxml/2.0/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do
>
Thanks,
From piet at cs.uu.nl Wed Aug 5 04:33:26 2009
From: piet at cs.uu.nl (Piet van Oostrum)
Date: Wed, 5 Aug 2009 04:33:26 +0200
Subject: [lxml-dev] Key error on del attribute?
In-Reply-To: <1249428848.27934.94.camel@loup.ece.ucsb.edu>
References: <1249425850.27934.78.camel@loup.ece.ucsb.edu>
<1249428848.27934.94.camel@loup.ece.ucsb.edu>
Message-ID: <19064.61302.617505.125663@Cochabamba.local>
With iPython 0.9.1 on Python 2.6.2 it just works:
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/ipython-0.9.1-py2.6.egg/IPython/Magic.py:38: DeprecationWarning: the sets module is deprecated
from sets import Set
Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)
Type "copyright", "credits" or "license" for more information.
IPython 0.9.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: from lxml import etree
In [2]: a=etree.Element('a', z='1', x='2')
In [3]: del a.attrib['x']
In [4]: del a.attrib['x']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/Users/piet/Mail/ in ()
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lxml-2.2.2-py2.6-macosx-10.3-fat.egg/lxml/etree.so in lxml.etree._Attrib.__delitem__ (src/lxml/lxml.etree.c:42562)()
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lxml-2.2.2-py2.6-macosx-10.3-fat.egg/lxml/etree.so in lxml.etree._delAttribute (src/lxml/lxml.etree.c:13933)()
KeyError: 'x'
--
Piet van Oostrum
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org
From bl8cki at gmail.com Wed Aug 5 14:21:05 2009
From: bl8cki at gmail.com (bl8cki)
Date: Wed, 5 Aug 2009 15:21:05 +0300
Subject: [lxml-dev] iterating xpath?
Message-ID:
I was searching the api and found things like iterfind, but it seems
that this work with ElementPath
I would like to do something like iterxpath. Is there any way to achieve this?
Thanks a lot!
From lei at ipac.caltech.edu Wed Aug 5 18:48:57 2009
From: lei at ipac.caltech.edu (Mary Lei)
Date: Wed, 05 Aug 2009 09:48:57 -0700
Subject: [lxml-dev] lxml2.2 doctype missing
Message-ID: <4A79B7F9.4060109@ipac.caltech.edu>
I noticed that the xhtml converted from
the parse tree has doctype missing.
I am using lxml 2.2.
Is this bug still not fixed in lxml 2.2 ?
--
Mary Lei
Software Testing
IPAC-NExScl
Rm: KS-233
MS: 220-6
Phone: 395-1998
From shigin at rambler-co.ru Wed Aug 5 20:26:41 2009
From: shigin at rambler-co.ru (Alexander Shigin)
Date: Wed, 05 Aug 2009 22:26:41 +0400
Subject: [lxml-dev] Splitting an xml file.
In-Reply-To: <3a0f5ffd0908041316o3a2621ffxabf2beab5cdbd0d8@mail.gmail.com>
References: <3a0f5ffd0908010448y19bcdbacv1d219b623f0fb36f@mail.gmail.com>
<1249413753.11605.13.camel@dervish>
<3a0f5ffd0908041316o3a2621ffxabf2beab5cdbd0d8@mail.gmail.com>
Message-ID: <1249496801.11605.70.camel@dervish>
? ???, 04/08/2009 ? 16:16 -0400, Chris Wj ?????:
> What about xslt, can I use that to accomplish the task?
I've never used the ability of xslt to produce many output files. I've
just briefly reviewed XSLT specification and can't find how to use it.
You can use xslt param and produce different output by changing 'keep'
param.
For example, you can use xsltproc and example file q.xslt.
$ xsltproc --param keep 2 q.xslt q.xml
=== q.xslt ===
======
This solution has another issue: I don't know how to find out position
numbers. The next XML has 'child' elements in 2 and 4 position.
1234
............
From herve.cauwelier at free.fr Thu Aug 6 12:32:45 2009
From: herve.cauwelier at free.fr (=?UTF-8?B?SGVydsOpIENhdXdlbGllcg==?=)
Date: Thu, 06 Aug 2009 12:32:45 +0200
Subject: [lxml-dev] xpath going crazy
Message-ID: <4A7AB14D.2090602@free.fr>
Hi,
Consider the following XML document: http://pastebin.ca/1520331
This is an ODF presentation produced by OpenOffice.org, assumed to be a
valid XML document.
Now I type this:
>>> from lxml import etree
>>> t = etree.parse('content.xml')
>>> ns = {'draw': "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"}
>>> t.xpath('//draw:frame', namespaces=ns)
[, ]
There are indeed two frames in the document.
>>> t.xpath('//draw:frame[0]', namespaces=ns)
[]
The position counting starts at 1 in XPath so this is expected.
>>> t.xpath('//draw:frame[1]', namespaces=ns)
[, ]
I get the two elements at once.
>>> t.xpath('//draw:frame[2]', namespaces=ns)
[]
I can't get the second element.
The same thing happens when asking the root instead of the tree.
I know my XPath knowledge is limited by I don't think I'm doing any
wrong assumption.
>>> print "lxml.etree: ", etree.LXML_VERSION
lxml.etree: (2, 2, 2, 0)
>>> print "libxml used: ", etree.LIBXML_VERSION
libxml used: (2, 7, 3)
>>> print "libxml compiled: ", etree.LIBXML_COMPILED_VERSION
libxml compiled: (2, 7, 3)
>>> print "libxslt used: ", etree.LIBXSLT_VERSION
libxslt used: (1, 1, 24)
>>> print "libxslt compiled: ", etree.LIBXSLT_COMPILED_VERSION
libxslt compiled: (1, 1, 24)
Thanks for your lights,
Herv?
From jq at qdevelop.de Thu Aug 6 14:05:42 2009
From: jq at qdevelop.de (Jens Quade)
Date: Thu, 6 Aug 2009 14:05:42 +0200
Subject: [lxml-dev] xpath going crazy
In-Reply-To: <4A7AB14D.2090602@free.fr>
References: <4A7AB14D.2090602@free.fr>
Message-ID: <8BDC1B54-7065-411E-B08A-77FB39631EC4@qdevelop.de>
On 06.08.2009, at 12:32, Herv? Cauwelier wrote:
> Hi,
>
> Consider the following XML document: http://pastebin.ca/1520331
>
> This is an ODF presentation produced by OpenOffice.org, assumed to
> be a
> valid XML document.
> The position counting starts at 1 in XPath so this is expected.
>
>>>> t.xpath('//draw:frame[1]', namespaces=ns)
> [ 7f604469eec0>, {urn:oasis:names:tc:opendocument:xmlns:drawing:1.0}frame at
> 7f604469ee68>]
>
> I get the two elements at once.
>
>>>> t.xpath('//draw:frame[2]', namespaces=ns)
> []
>
> I can't get the second element.
You ask for all draw:frame-Elements that are the first in their
specific context.
//draw:frame[1] only omits all draw:frame that have a draw:frame in
their preceding-siblings.
If you look at the XPath-results in e.g. Oxygen, it is easy to see.
(//draw:frame)[1]
should do what you want. (only the first of all //draw:frame in the
document)
From herve.cauwelier at free.fr Thu Aug 6 15:56:05 2009
From: herve.cauwelier at free.fr (=?UTF-8?B?SGVydsOpIENhdXdlbGllcg==?=)
Date: Thu, 06 Aug 2009 15:56:05 +0200
Subject: [lxml-dev] xpath going crazy
In-Reply-To: <8BDC1B54-7065-411E-B08A-77FB39631EC4@qdevelop.de>
References: <4A7AB14D.2090602@free.fr>
<8BDC1B54-7065-411E-B08A-77FB39631EC4@qdevelop.de>
Message-ID: <4A7AE0F5.8090406@free.fr>
Jens Quade a ?crit :
> You ask for all draw:frame-Elements that are the first in their specific
> context.
> //draw:frame[1] only omits all draw:frame that have a draw:frame in
> their preceding-siblings.
>
> If you look at the XPath-results in e.g. Oxygen, it is easy to see.
>
> (//draw:frame)[1]
>
> should do what you want. (only the first of all //draw:frame in the
> document)
Thanks for the quick reply. I fixed my expressions.
Herv?
From stefan_ml at behnel.de Thu Aug 6 20:36:43 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 20:36:43 +0200
Subject: [lxml-dev] Jython and XPointer support
In-Reply-To: <1248352207.20640.107.camel@tttdal>
References: <1248352207.20640.107.camel@tttdal>
Message-ID: <4A7B22BB.4020601@behnel.de>
Hi,
Daniel Albeseder wrote:
> I wonder if there is, or will be any Jython support of lxml? We are
> currently evaluating several possibilities to support several
> XML-features in Python and Jython together. This includes XPath, XSLT,
> XPointer. The last one is supported by libxml2 but I have not found any
> support inside lxml. Have I missed something? Will there be XPointer
> support in lxml in the future?
That XPointer part of libxml2's API is not currently wrapped. If anyone
writes the code, I'll be happy to include it.
> About Jython: There is of course the possibility to use JNA to access
> the libxml2 library natively, but I do not comprehend how to access lxml
> from within Jython, since it is not a "normal" shared library, but a
> shared object created for CPython in Cython. Does anyone have any hint
> about that?
As lxml is written in Cython, the generated C code is heavily tied into
CPython. So you will not be able to access lxml 'natively' from Jython. If
you can afford to run a CPython interpreter next to Jython, tools like
JPype may work for you.
But I don't know of any library that provides portable support for XSLT for
both CPython and Jython.
Stefan
From stefan_ml at behnel.de Thu Aug 6 20:41:30 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 20:41:30 +0200
Subject: [lxml-dev] lxml
In-Reply-To: <5d8599fb0907231519k2b00fe81v7dd1bc177801bf9f@mail.gmail.com>
References: <5d8599fb0907231519k2b00fe81v7dd1bc177801bf9f@mail.gmail.com>
Message-ID: <4A7B23DA.9020601@behnel.de>
Yassin Ezbakhe wrote:
> Hi, I'm using Python 2.6.2 (Windows) and lxml 2.2.2.
>
> When I run the following piece of code, it gets stuck.
>
> import lxml.etree as et
> s = ""
> xml = et.fromstring(s)
> print xml[0] # prints element a
> print xml[2] # prints element e
> print xml[3] # it should raise an out of range exception, but it gets stuck
>
> What is the problem? In the original ElementTree implementation, I get an
> IndexError.
Works for me:
Python 3.1 (r31:73572, Jun 28 2009, 21:07:35)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml.etree as et
>>> print(et.__version__)
2.2.2
>>> s = ""
>>> xml = et.fromstring(s)
>>> print(xml[0]) # prints element a
>>> print(xml[2]) # prints element e
>>> print(xml[3]) # it should raise an out of range exception, but it gets
stuck
Traceback (most recent call last):
File "", line 1, in
File "lxml.etree.pyx", line 961, in lxml.etree._Element.__getitem__
(src/lxml/lxml.etree.c:33945)
IndexError: list index out of range
Stefan
From stefan_ml at behnel.de Thu Aug 6 20:46:01 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 20:46:01 +0200
Subject: [lxml-dev] Getting 'user-visible' text from HTML
In-Reply-To: <20090724110516.500d74e0@nrri.umn.edu>
References:
<20090724110516.500d74e0@nrri.umn.edu>
Message-ID: <4A7B24E9.4030502@behnel.de>
Terry Brown wrote:
> On Fri, 24 Jul 2009 14:30:03 +0000 (UTC)
> Adam Nelson wrote:
>
>> Is there a shortcut method (or even a pasted script) that allows lxml
>> to get all the 'user-visible' text?
>
> doc.xpath("//text()") should return a list of every piece of text in
> the html.
... whereas doc.xpath("string()") will return the text content as a plain
string. You can also serialise the document to plain text, as shown here:
http://codespeak.net/lxml/tutorial.html#serialisation
Note the unicode string serialisation at the end of that section.
Stefan
From stefan_ml at behnel.de Thu Aug 6 20:52:06 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 20:52:06 +0200
Subject: [lxml-dev] id function of xpath and parseid
In-Reply-To: <1248684566.20640.137.camel@tttdal>
References: <1248684566.20640.137.camel@tttdal>
Message-ID: <4A7B2656.7080505@behnel.de>
Daniel Albeseder wrote:
> I wonder why there is no parseid function inside the objectify module?
> Is there a reason, why this is only in etree?
No, it's just too rarely used to become duplicated.
You can pass a parser to parseid(), which you can set up to use objectify API.
http://codespeak.net/lxml/objectify.html#advanced-element-class-lookup
Stefan
From stefan_ml at behnel.de Thu Aug 6 20:53:09 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 20:53:09 +0200
Subject: [lxml-dev] id function of xpath and parseid
In-Reply-To: <1248687797.20640.143.camel@tttdal>
References: <1248684566.20640.137.camel@tttdal>
<1248687797.20640.143.camel@tttdal>
Message-ID: <4A7B2695.4010108@behnel.de>
Daniel Albeseder wrote:
> On Mon, 2009-07-27 at 10:49 +0200, Daniel Albeseder wrote:
>
>> Additionally I wonder, how the `id` function of XPath does work with
>> lxml. I created a schema-aware parser, which reads an XML-file, where
>> some attributes are declared as xs:ID inside the schema. However the
>> `xpath` method always returns an empty list of nodes, even if the IDs
>> given are inside the XML.
>
> I just found in the archive, that it only seems to work with either an
> xml:id attribute or a given DTD. However since XML Schema is the modern
> way to restrict XML-files, and XML schema also has the xsd:ID attribute
> type, I see not reason, why this does not work.
>
> The XPath specification does talk about DTDs, thats correct, but since
> this specification is older than the XML Schema document and XML schema
> is designed to be a superset of DTD (and even to replace it), it sounds
> strange, that some features only work for DTDs.
Well, as you said: XMLSchema didn't exist when XPath 1.0 was defined. I
assume the definition of id() is different for XPath 2.0, but that's not
supported by libxml2.
Stefan
From stefan_ml at behnel.de Thu Aug 6 21:28:04 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 21:28:04 +0200
Subject: [lxml-dev] Strange segmentation fault if class inherited from
objectify.ObjectifiedElement
In-Reply-To: <1249039073.20640.371.camel@tttdal>
References: <1249039073.20640.371.camel@tttdal>
Message-ID: <4A7B2EC4.4020005@behnel.de>
Daniel Albeseder wrote:
> I just tried this and got an segmentation fault :-(
>
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from lxml import objectify
>>>> from lxml import etree
>>>>
>>>> print etree.LXML_VERSION, etree.LIBXML_VERSION
> (2, 2, 2, 0) (2, 6, 32)
>>>> class test (objectify.ObjectifiedElement) :
> ... pass
> ...
>>>> good = objectify.Element ("abc")
>>>> print type (good), repr (good)
>
>>>> bad = test ("abc")
> Segmentation fault
http://codespeak.net/lxml/element_classes.html
Stefan
From stefan_ml at behnel.de Thu Aug 6 21:30:12 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 21:30:12 +0200
Subject: [lxml-dev] Whitespace foiling pretty_print - any fix?
In-Reply-To: <204abd770907301047t1c805ab8k30ce8e7b6821e9cf@mail.gmail.com>
References: <204abd770907301047t1c805ab8k30ce8e7b6821e9cf@mail.gmail.com>
Message-ID: <4A7B2F44.10506@behnel.de>
B Wooster wrote:
> f an XML element has whitespace along with sub elements, pretty_print
> does not work well.
>
> This is probably a XML whitespace issue, but is there any way to work
> around this issue?
>
> Here's example code and the output it prints, and what I would like to get:
>
> from lxml import etree as ET
> root = ET.XML("""
> test
> """)
> ET.SubElement(root, "c")
> ET.SubElement(root, "d")
> print ET.tostring(root, pretty_print=True)
> # This prints:
> """
> test
>
> """
> # Would like:
> """
> test
>
>
>
> """
http://codespeak.net/lxml/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output
Stefan
From stefan_ml at behnel.de Thu Aug 6 21:42:50 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 21:42:50 +0200
Subject: [lxml-dev] automatic attribute unicode decode?
In-Reply-To: <4A731041.4020908@free.fr>
References: <4A731041.4020908@free.fr>
Message-ID: <4A7B323A.4010308@behnel.de>
Herv? Cauwelier wrote:
> I'm quite puzzled by the following excerpt:
>
> >>> from lxml import etree
> >>> r = etree.fromstring(''
> >>> r.attrib
> {'titi': 'ascii', 'toto': u'fran\xe7ais', 'tata': '1'}
>
> In a bare document with no encoding declaration, lxml has decoded itself
> a string that did not match the ascii table (what heuristic did it
> use?).
No heuristic. It follows the XML specification in that the absence of an
XML declaration defines the encoding as UTF-8.
I assume your console was set to UTF-8 when you typed the above?
> Now I have three attributes of two different types. I wonder why
> the integer was not decoded. ;-)
>
> I actually found this in a real-world document with encoding and
> namespaces (An ODF xml part).
>
> Is this a bug to report and how to circumvent it?
Definitely not a bug. What would be the behaviour you expected instead?
Stefan
From stefan_ml at behnel.de Thu Aug 6 21:54:14 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 21:54:14 +0200
Subject: [lxml-dev] Splitting an xml file.
In-Reply-To: <3a0f5ffd0908041316o3a2621ffxabf2beab5cdbd0d8@mail.gmail.com>
References: <3a0f5ffd0908010448y19bcdbacv1d219b623f0fb36f@mail.gmail.com> <1249413753.11605.13.camel@dervish>
<3a0f5ffd0908041316o3a2621ffxabf2beab5cdbd0d8@mail.gmail.com>
Message-ID: <4A7B34E6.2070609@behnel.de>
Chris Wj wrote:
> On Tue, Aug 4, 2009 at 3:22 PM, Alexander Shigin wrote:
>
>> ? ???, 01/08/2009 ? 07:48 -0400, Chris Wj ?????:
>>> I'm looking for the best way to split an xml file with many children
>>> into multiple files with the same parent tags but individual children.
>> ...
>>> Should I use lxml.etree to find the line numbers and then just use
>>> file operations? You guys think that is most efficient?
>>
>> I think that the simplest way to split the file is to remove all 'child'
>> element from parsed document and serialize document with different
>> childs.
>>
>> In [3]: parsed = etree.parse('q.xml')
>> In [4]: root = parsed.getroot()
>> In [5]: childs = parsed.findall('child')
>> In [6]: for child in childs: root.remove(child)
>> In [7]: for num, child in enumerate(childs):
>> ... root.append(child)
>> ... f = codecs.open('file1%s.xml' % num, 'w', encoding='utf-8')
>> ... f.write(etree.tounicode(parsed))
>> ... f.close()
>> ... root.remove(child)
Note that serialising to a unicode string and then using the codecs module
to encode to UTF-8 is very inefficient. Instead, pass encoding='UTF-8' to
tostring() and use the 'wb' mode when opening the file.
> What about xslt, can I use that to accomplish the task?
http://www.exslt.org/exsl/elements/document/index.html
However, the above should be simple enough in Python, so doing the same in
XSLT sounds like overkill to me.
Stefan
From stefan_ml at behnel.de Thu Aug 6 21:56:11 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Aug 2009 21:56:11 +0200
Subject: [lxml-dev] iterating xpath?
In-Reply-To:
References:
Message-ID: <4A7B355B.9010802@behnel.de>
bl8cki wrote:
> I was searching the api and found things like iterfind, but it seems
> that this work with ElementPath
> I would like to do something like iterxpath. Is there any way to achieve this?
No, that's not supported by libxml2.
Stefan
From mike_mp at zzzcomputing.com Fri Aug 7 20:06:18 2009
From: mike_mp at zzzcomputing.com (Michael Bayer)
Date: Fri, 7 Aug 2009 14:06:18 -0400
Subject: [lxml-dev] setuptools issues with python2.6 maint
Message-ID: <543dc5d5c9a7062c5ed361e669c17d20.squirrel@www.geekisp.com>
My apologies for dumping a bad build on the list here, but googling
returned absolutely nothing for this one. I haven't had this issue before
so it may be related to my usage of the latest python 2.6 mainentance
branch which I got from
http://svn.python.org/projects/python/branches/release26-maint .
It builds fine if I run a straight distutils build without setuptools
being installed. Otherwise I get the below - tested against 2.1.5, 2.2,
and 2.2.2, any ideas are appreciated.
root at stageassets lxml-2.2.2]# /usr/local/bin/python setup.py install
Building lxml version 2.2.2.
NOTE: Trying to build without Cython, pre-generated
'src/lxml/lxml.etree.c' needs to be available.
Using build configuration of libxslt 1.1.24
Building against libxml2/libxslt in the following directory: /usr/lib
running install
running build
running build_py
running build_ext
running install_lib
creating /usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/sax.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/pyclasslookup.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/ElementInclude.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/etree.so ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/_elementpath.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/__init__.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/usedoctest.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/doctestcompare.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/cssselect.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/builder.py ->
/usr/local/lib/python2.6/site-packages/lxml
copying build/lib.linux-i686-2.6/lxml/objectify.so ->
/usr/local/lib/python2.6/site-packages/lxml
creating /usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/diff.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/defs.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/_setmixin.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/_diffcommand.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/__init__.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/usedoctest.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/ElementSoup.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/_html5builder.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/builder.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/formfill.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/html5parser.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/soupparser.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/clean.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
copying build/lib.linux-i686-2.6/lxml/html/_dictmixin.py ->
/usr/local/lib/python2.6/site-packages/lxml/html
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/sax.py to sax.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/pyclasslookup.py to
pyclasslookup.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/ElementInclude.py to
ElementInclude.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/_elementpath.py
to _elementpath.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/__init__.py to
__init__.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/usedoctest.py
to usedoctest.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/doctestcompare.py to
doctestcompare.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/cssselect.py to
cssselect.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/builder.py to
builder.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/html/diff.py to
diff.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/html/defs.py to
defs.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/_setmixin.py to
_setmixin.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/_diffcommand.py to
_diffcommand.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/__init__.py to
__init__.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/usedoctest.py to
usedoctest.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/ElementSoup.py to
ElementSoup.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/_html5builder.py to
_html5builder.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/html/builder.py
to builder.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/formfill.py to
formfill.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/html5parser.py to
html5parser.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/soupparser.py to
soupparser.pyc
byte-compiling /usr/local/lib/python2.6/site-packages/lxml/html/clean.py
to clean.pyc
byte-compiling
/usr/local/lib/python2.6/site-packages/lxml/html/_dictmixin.py to
_dictmixin.pyc
running install_egg_info
Writing /usr/local/lib/python2.6/site-packages/lxml-2.2.2-py2.6.egg-info
[root at stageassets lxml-2.2.2]#
/var/web/hosts/fanfeedr.com/snapscores.com/bin/python setup.py install
Building lxml version 2.2.2.
NOTE: Trying to build without Cython, pre-generated
'src/lxml/lxml.etree.c' needs to be available.
Using build configuration of libxslt 1.1.24
Building against libxml2/libxslt in the following directory: /usr/lib
running install
running bdist_egg
running egg_info
writing src/lxml.egg-info/PKG-INFO
writing top-level names to src/lxml.egg-info/top_level.txt
writing dependency_links to src/lxml.egg-info/dependency_links.txt
reading manifest file 'src/lxml.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'src/lxml.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-i686/egg
running install_lib
running build_py
running build_ext
Traceback (most recent call last):
File "setup.py", line 116, in
**extra_options
File "/usr/local/lib/python2.6/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/local/lib/python2.6/distutils/dist.py", line 975, in
run_commands
self.run_command(cmd)
File "/usr/local/lib/python2.6/distutils/dist.py", line 995, in run_command
cmd_obj.run()
File
"/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/install.py",
line 76, in run
File
"/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/install.py",
line 96, in do_egg_install
File "/usr/local/lib/python2.6/distutils/cmd.py", line 333, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python2.6/distutils/dist.py", line 995, in run_command
cmd_obj.run()
File
"/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/bdist_egg.py",
line 174, in run
File
"/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/bdist_egg.py",
line 161, in call_command
File "/usr/local/lib/python2.6/distutils/cmd.py", line 333, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python2.6/distutils/dist.py", line 995, in run_command
cmd_obj.run()
File
"/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/install_lib.py",
line 20, in run
File "/usr/local/lib/python2.6/distutils/command/install_lib.py", line
113, in build
self.run_command('build_ext')
File "/usr/local/lib/python2.6/distutils/cmd.py", line 333, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python2.6/distutils/dist.py", line 995, in run_command
cmd_obj.run()
File
"/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/build_ext.py",
line 46, in run
File "/usr/local/lib/python2.6/distutils/command/build_ext.py", line
340, in run
self.build_extensions()
File "/usr/local/lib/python2.6/distutils/command/build_ext.py", line
449, in build_extensions
self.build_extension(ext)
File
"/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/build_ext.py",
line 175, in build_extension
File "/usr/local/lib/python2.6/distutils/command/build_ext.py", line
460, in build_extension
ext_path = self.get_ext_fullpath(ext.name)
File "/usr/local/lib/python2.6/distutils/command/build_ext.py", line
633, in get_ext_fullpath
filename = self.get_ext_filename(modpath[-1])
File
"/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/build_ext.py",
line 85, in get_ext_filename
KeyError: 'etree'
From mike_mp at zzzcomputing.com Fri Aug 7 20:13:05 2009
From: mike_mp at zzzcomputing.com (Michael Bayer)
Date: Fri, 7 Aug 2009 14:13:05 -0400
Subject: [lxml-dev] setuptools issues with python2.6 maint
In-Reply-To: <543dc5d5c9a7062c5ed361e669c17d20.squirrel@www.geekisp.com>
References: <543dc5d5c9a7062c5ed361e669c17d20.squirrel@www.geekisp.com>
Message-ID:
I would add that I successfully built/installed psycopg2 and PIL with the
same Python install, so its not that my distutils/easy_install is "broken"
across the board for compiler builds....the issue is specific to lxml.
From ndudfield at gmail.com Sat Aug 8 06:25:17 2009
From: ndudfield at gmail.com (Nicholas Dudfield)
Date: Sat, 08 Aug 2009 14:25:17 +1000
Subject: [lxml-dev] Catalog for entities such as for XHTML parser.
In-Reply-To:
References:
<4A63669C.4050404@behnel.de>
<4A642134.7070206@behnel.de>
<4A644E4B.4070309@behnel.de>
<4A647835.9040303@behnel.de>
Message-ID: <4A7CFE2D.1070302@gmail.com>
List,
Please excuse me if this question has been answered but I couldn't find
anything on the list archives that spelled it out for dummies.
My usage situation is this:
* I'm using windows
* I'm parsing xhtml with the xhtml parser
* I'm calling lxml from within a python extensible editor.
My problem:
* Parsing failures due to `unknown` entities, even quite common
ones such as
eg. XMLSyntaxError: Entity 'nbsp' not defined, line 11, column 11
How can I set up an external file with common entity definitions that I
can parse as an argument to the parser constructor?
I read something about a `catalog` but the only docs I could find on it
assumed *nix.
If someone could help out with a code snippet example I would be very
much appreciative.
Cheers.
From ndudfield at gmail.com Sat Aug 8 07:28:08 2009
From: ndudfield at gmail.com (Nicholas Dudfield)
Date: Sat, 08 Aug 2009 15:28:08 +1000
Subject: [lxml-dev] Catalog for entities such as for XHTML parser.
In-Reply-To: <4A7CFE2D.1070302@gmail.com>
References:
<4A63669C.4050404@behnel.de>
<4A642134.7070206@behnel.de>
<4A644E4B.4070309@behnel.de>
<4A647835.9040303@behnel.de>
<4A7CFE2D.1070302@gmail.com>
Message-ID: <4A7D0CE8.3060202@gmail.com>
Nicholas Dudfield wrote:
> List,
>
> Please excuse me if this question has been answered but I couldn't
> find anything on the list archives that spelled it out for dummies.
>
> My usage situation is this:
>
> * I'm using windows
> * I'm parsing xhtml with the xhtml parser
> * I'm calling lxml from within a python extensible editor.
>
> My problem:
> * Parsing failures due to `unknown` entities, even quite common
> ones such as
>
> eg. XMLSyntaxError: Entity 'nbsp' not defined, line 11, column 11
>
> How can I set up an external file with common entity definitions that
> I can parse as an argument to the parser constructor?
> I read something about a `catalog` but the only docs I could find on
> it assumed *nix.
>
> If someone could help out with a code snippet example I would be very
> much appreciative.
>
> Cheers.
>
Passing `resolve_entities=False` to the parser constructor ought to work
for my case.
There seems to be a bug related to this in the feed interface. If you
feed the whole document in one go it will honor the constructor, however
if you pass it `chunks` ( as you typically would ) it fails.
I have attached some test cases. For better or worse they are written to
all `pass` proving `errors` using assertRaises.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lxmltest.py
Url: http://codespeak.net/pipermail/lxml-dev/attachments/20090808/491a0b80/attachment.diff
From stefan_ml at behnel.de Sat Aug 8 15:53:42 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Aug 2009 15:53:42 +0200
Subject: [lxml-dev] Bug with whitespace in namespaces
In-Reply-To:
References:
Message-ID: <4A7D8366.8060003@behnel.de>
Christian Zagrodnick wrote:
> it is possible to create invalid XML with lxml:
>
>>>> import lxml.etree
>>>> import lxml.objectify
>>>> xml = lxml.objectify.XML('')
>>>> xml.set('{a b}c', 'foo') # This should fail!
>>>> lxml.etree.tostring(xml)
> ''
>>>> lxml.objectify.fromstring(lxml.etree.tostring(xml))
> Traceback (most recent call last):
> ...
> File "parser.pxi", line 625, in lxml.etree._handleParseResult
> (src/lxml/lxml.etree.c:64741)
> File "parser.pxi", line 565, in lxml.etree._raiseParseError
> (src/lxml/lxml.etree.c:64084)
> lxml.etree.XMLSyntaxError: xmlns:ns0: 'a b' is not a valid URI, line 1,
> column 13
Well, URI checking is actually a new feature in libxml2 2.7 (IIRC), that's
why it wasn't used before. Newer libxml2 versions are strict about RFC 3986
syntax, so I agree that it would make sense to also check namespace URIs on
the way in.
This should go into lxml 2.3.
Stefan
From stefan_ml at behnel.de Sat Aug 8 17:55:48 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Aug 2009 17:55:48 +0200
Subject: [lxml-dev] Bug with whitespace in namespaces
In-Reply-To: <4A7D8366.8060003@behnel.de>
References: <4A7D8366.8060003@behnel.de>
Message-ID: <4A7DA004.6080300@behnel.de>
Stefan Behnel wrote:
> Christian Zagrodnick wrote:
>> it is possible to create invalid XML with lxml:
>>
>>>>> import lxml.etree
>>>>> import lxml.objectify
>>>>> xml = lxml.objectify.XML('')
>>>>> xml.set('{a b}c', 'foo') # This should fail!
>>>>> lxml.etree.tostring(xml)
>> ''
>>>>> lxml.objectify.fromstring(lxml.etree.tostring(xml))
>> Traceback (most recent call last):
>> ...
>> File "parser.pxi", line 625, in lxml.etree._handleParseResult
>> (src/lxml/lxml.etree.c:64741)
>> File "parser.pxi", line 565, in lxml.etree._raiseParseError
>> (src/lxml/lxml.etree.c:64084)
>> lxml.etree.XMLSyntaxError: xmlns:ns0: 'a b' is not a valid URI, line 1,
>> column 13
>
> Well, URI checking is actually a new feature in libxml2 2.7 (IIRC), that's
> why it wasn't used before. Newer libxml2 versions are strict about RFC 3986
> syntax, so I agree that it would make sense to also check namespace URIs on
> the way in.
>
> This should go into lxml 2.3.
Fixed on the trunk.
Stefan
From belred at gmail.com Sat Aug 8 18:21:40 2009
From: belred at gmail.com (Bryan)
Date: Sat, 8 Aug 2009 09:21:40 -0700
Subject: [lxml-dev] xml vulnerability
Message-ID: <38f48f590908080921t78510de7q2587c160b2b1415@mail.gmail.com>
we use this library at our work, and i was asked to find out if xml
parsers in lxml are affected for the following xml vulnerability?
http://voices.washingtonpost.com/securityfix/2009/08/researchers_xml_security_flaw.html
thanks,
bryan
From stefan_ml at behnel.de Sat Aug 8 18:45:56 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Aug 2009 18:45:56 +0200
Subject: [lxml-dev] xml vulnerability
In-Reply-To: <38f48f590908080921t78510de7q2587c160b2b1415@mail.gmail.com>
References: <38f48f590908080921t78510de7q2587c160b2b1415@mail.gmail.com>
Message-ID: <4A7DABC4.7070002@behnel.de>
Bryan wrote:
> we use this library at our work, and i was asked to find out if xml
> parsers in lxml are affected for the following xml vulnerability?
>
> http://voices.washingtonpost.com/securityfix/2009/08/researchers_xml_security_flaw.html
This article contains mostly underinformed journalist rubbish, but when you
follow the link at the end of the article:
https://www.cert.fi/en/reports/2009/vulnerability2009085.html
you get to the CERT advisory that hints on what the problem is (possible
crashes or DoS attacks related to character decoding and parsing) and
states which parsers were found to be vulnerable. lxml is based on libxml2,
which is not on that list (whereas pyexpat is, so the stdlib ElementTree is
vulnerable, for example).
As usual, this doesn't mean libxml2/lxml is bug-free, uncrackable software,
just that it's not on the list for this problem.
If you need more information regarding this issue, please ask on the
libxml2 mailing list.
Stefan
From stefan_ml at behnel.de Sat Aug 8 19:06:19 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Aug 2009 19:06:19 +0200
Subject: [lxml-dev] xml vulnerability
In-Reply-To: <4A7DABC4.7070002@behnel.de>
References: <38f48f590908080921t78510de7q2587c160b2b1415@mail.gmail.com>
<4A7DABC4.7070002@behnel.de>
Message-ID: <4A7DB08B.7040607@behnel.de>
Stefan Behnel wrote:
> If you need more information regarding this issue, please ask on the
> libxml2 mailing list.
On a related note: if you care about parsing XML from untrusted sources,
it's best to use libxml2 2.7.x, as it's less vulnerable to XML bombs due to
size limitations inside the parser (which are enabled by default).
Stefan
From stefan_ml at behnel.de Sat Aug 8 19:31:32 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Aug 2009 19:31:32 +0200
Subject: [lxml-dev] lxml2.2 doctype missing
In-Reply-To: <4A79B7F9.4060109@ipac.caltech.edu>
References: <4A79B7F9.4060109@ipac.caltech.edu>
Message-ID: <4A7DB674.50001@behnel.de>
Mary Lei wrote:
> I noticed that the xhtml converted from
> the parse tree has doctype missing.
> I am using lxml 2.2.
>
> Is this bug still not fixed in lxml 2.2 ?
In order to convince others that this is a bug, you might want to provide
some more information. Could you present a short code snippet that shows
what you do and the (unexpected) result you get?
Stefan
From stefan_ml at behnel.de Sat Aug 8 19:38:02 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Aug 2009 19:38:02 +0200
Subject: [lxml-dev] setuptools issues with python2.6 maint
In-Reply-To: <543dc5d5c9a7062c5ed361e669c17d20.squirrel@www.geekisp.com>
References: <543dc5d5c9a7062c5ed361e669c17d20.squirrel@www.geekisp.com>
Message-ID: <4A7DB7FA.2090400@behnel.de>
Michael Bayer wrote:
> My apologies for dumping a bad build on the list here, but googling
> returned absolutely nothing for this one. I haven't had this issue before
> so it may be related to my usage of the latest python 2.6 mainentance
> branch which I got from
> http://svn.python.org/projects/python/branches/release26-maint .
>
> It builds fine if I run a straight distutils build without setuptools
> being installed. Otherwise I get the below - tested against 2.1.5, 2.2,
> and 2.2.2, any ideas are appreciated.
>
> root at stageassets lxml-2.2.2]# /usr/local/bin/python setup.py install
[...]
> running build_ext
> Traceback (most recent call last):
[...]
> File "/usr/local/lib/python2.6/distutils/command/build_ext.py", line
> 449, in build_extensions
> self.build_extension(ext)
> File
> "/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/build_ext.py",
> line 175, in build_extension
> File "/usr/local/lib/python2.6/distutils/command/build_ext.py", line
> 460, in build_extension
> ext_path = self.get_ext_fullpath(ext.name)
> File "/usr/local/lib/python2.6/distutils/command/build_ext.py", line
> 633, in get_ext_fullpath
> filename = self.get_ext_filename(modpath[-1])
> File
> "/var/web/hosts/fanfeedr.com/snapscores.com/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg/setuptools/command/build_ext.py",
> line 85, in get_ext_filename
> KeyError: 'etree'
No idea, never seen this before. I use the same setuptools version under
Py2.6.2, and it works perfectly well.
Have you tried the bdist_egg target instead of a mere "install"?
Also, the way setuptools patches into distutils makes it quite possible
that newer Python releases introduce incompatibilities, so maybe there's an
issue over there.
Stefan
From stefan_ml at behnel.de Sat Aug 8 20:17:42 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Aug 2009 20:17:42 +0200
Subject: [lxml-dev] Catalog for entities such as for XHTML parser.
In-Reply-To: <4A7D0CE8.3060202@gmail.com>
References: <4A63669C.4050404@behnel.de> <4A642134.7070206@behnel.de> <4A644E4B.4070309@behnel.de> <4A647835.9040303@behnel.de> <4A7CFE2D.1070302@gmail.com>
<4A7D0CE8.3060202@gmail.com>
Message-ID: <4A7DC146.8080907@behnel.de>
Hi,
Nicholas Dudfield wrote:
> Nicholas Dudfield wrote:
>> My usage situation is this:
>>
>> * I'm using windows
>> * I'm parsing xhtml with the xhtml parser
>> * I'm calling lxml from within a python extensible editor.
>>
>> My problem:
>> * Parsing failures due to `unknown` entities, even quite common
>> ones such as
>>
>> eg. XMLSyntaxError: Entity 'nbsp' not defined, line 11, column 11
>>
>> How can I set up an external file with common entity definitions that
>> I can parse as an argument to the parser constructor?
You can let the parser load the DTD by setting load_dtd=True. lxml will not
load DTDs by default and if there is no DTD, the parser will fail on
unknown entity references. Also, lxml will not access the network by
default, so unless you use a catalog, you must also pass no_network=False.
Note that this may slow down parsing considerably, as each document
requires loading the DTD from the network first.
>> I read something about a `catalog` but the only docs I could find on
>> it assumed *nix.
You need to set the XML_CATALOG_FILES environment variable to a space
separated list of catalog files.
http://xmlsoft.org/catalog.html
I have no idea how to install or manage XML catalogs under Windows, though.
> There seems to be a bug related to this in the feed interface. If you
> feed the whole document in one go it will honor the constructor, however
> if you pass it `chunks` ( as you typically would ) it fails.
>
> I have attached some test cases. For better or worse they are written to
> all `pass` proving `errors` using assertRaises.
This sounds like a bug to me. Could you file a bug report?
https://bugs.launchpad.net/lxml
Thanks!
Stefan
From ndudfield at gmail.com Sun Aug 9 04:36:12 2009
From: ndudfield at gmail.com (Nicholas Dudfield)
Date: Sun, 09 Aug 2009 12:36:12 +1000
Subject: [lxml-dev] Catalog for entities such as for XHTML parser.
In-Reply-To: <4A7DC146.8080907@behnel.de>
References: <4A63669C.4050404@behnel.de> <4A642134.7070206@behnel.de> <4A644E4B.4070309@behnel.de> <4A647835.9040303@behnel.de> <4A7CFE2D.1070302@gmail.com>
<4A7D0CE8.3060202@gmail.com> <4A7DC146.8080907@behnel.de>
Message-ID: <4A7E361C.30101@gmail.com>
> There seems to be a bug related to this in the feed interface. If you
> feed the whole document in one go it will honor the constructor, however
> if you pass it `chunks` ( as you typically would ) it fails.
>
> I have attached some test cases. For better or worse they are written to
> all `pass` proving `errors` using assertRaises.
>
>>> This sounds like a bug to me. Could you file a bug report?
>>> https://bugs.launchpad.net/lxml
>>> Thanks!
>>> Stefan
Of course :) I just signed up to LP and filed the report with test cases ( modified from the one I sent earlier to the list: buggy behaviour `fails` )
There was also a (possible bug) I noticed in relation to using XPath searches for text() when a parser was initiated with `strip_cdata=False`. I'll have a look into that now and see if I can write a test case that consistently exposes the bug.
I as well noticed a fault in the css parser (used for lxml.cssselect.css_to_xpath) which can put the interpreter in an infinite loop but IIRC the bug was already mentioned on the list.
From lei at ipac.caltech.edu Mon Aug 10 20:30:27 2009
From: lei at ipac.caltech.edu (Mary Lei)
Date: Mon, 10 Aug 2009 11:30:27 -0700
Subject: [lxml-dev] lxml2.2 doctype missing
In-Reply-To: <4A7DB674.50001@behnel.de>
References: <4A79B7F9.4060109@ipac.caltech.edu> <4A7DB674.50001@behnel.de>
Message-ID: <4A806743.4040503@ipac.caltech.edu>
here is an example:
#!/bin/sh
# next line restarts python \
"exec" "python" "-O" "$0" "$@"
import urllib
import urllib2
import urlparse
import os
import sys, getopt, difflib
import re
import string
version = sys.version_info
if version < (2,6):
print "Need python version 2.6 or better, %s.%s too old!" % version
else:
print "python version: ", version
from lxml.html import parse,submit_form,fromstring,tostring
import lxml.html
from lxml import etree
from StringIO import StringIO
url = "http://nsted.ipac.caltech.edu"
try:
rc = urllib2.urlopen(url)
contents = rc.read()
rc.close()
except urllib2.HTTPError,e:
print "Error: Page not found",e
sys.exit(1)
except urllib2.URLError,e:
print "Error: Connection refused ",e
sys.exit(1)
print "contents-------------\n"+contents[0:300]
root = fromstring(contents)
fd = open ("tempfile", "w")
fd.write(contents)
fd.close()
root = parse("tempfile").getroot()
htmlstr = lxml.html.tostring(root,\
encoding="iso-8859-1",pretty_print=True)
print "htmlstr--------------\n"+htmlstr[0:300]
htmlstr = lxml.html.tostring(root,\
encoding="iso-8859-1",pretty_print=True,\
include_meta_content_type=False,method='xml')
print "htmlstr1-------------\n"+htmlstr[0:300]
try:
print root.docinfo.doctype
except AttributeError,e:
print e
tree = etree.parse(StringIO(""""""))
print "doctype",tree.docinfo.doctype
Output:
python version: (2, 6, 2, 'final', 0)
contents------------- has original doctype
Welcome to NStED
Welcome to NStEDWelcome to NStED
fd.write(contents)
> fd.close()
>
> root = parse("tempfile").getroot()
> htmlstr = lxml.html.tostring(root,\
> encoding="iso-8859-1",pretty_print=True)
> ## htmlstr-------------- from lxml tostring, no doctype
> htmlstr = lxml.html.tostring(root,\
> encoding="iso-8859-1",pretty_print=True,\
> include_meta_content_type=False,method='xml')
> ## htmlstr1------------- from lxml tostring, no doctype, convert as xml
>
> tree = etree.parse(StringIO(""""""))
> print "doctype",tree.docinfo.doctype
> ## doctype <---- this one is ok
>
> So it was in contents from urlopen but missing
> in lxml fromstring and then tostring.
> Am I missing something ?
Yes. When you tell lxml to serialise an element, you get the element and
nothing but that.
If you want doctype declarations, DTDs, processing instructions and the
like (i.e. stuff that doesn't belong to the element itself), you must wrap
the element in an ElementTree and serialise that.
Stefan
From mike_mp at zzzcomputing.com Mon Aug 10 21:48:48 2009
From: mike_mp at zzzcomputing.com (Michael Bayer)
Date: Mon, 10 Aug 2009 15:48:48 -0400
Subject: [lxml-dev] setuptools issues with python2.6 maint
In-Reply-To: <4A7DB7FA.2090400@behnel.de>
References: <543dc5d5c9a7062c5ed361e669c17d20.squirrel@www.geekisp.com>
<4A7DB7FA.2090400@behnel.de>
Message-ID: <159c0df5ffd364411bf21a6d9edd3574.squirrel@www.geekisp.com>
Stefan Behnel wrote:
>
>
> No idea, never seen this before. I use the same setuptools version under
> Py2.6.2, and it works perfectly well.
>
> Have you tried the bdist_egg target instead of a mere "install"?
>
> Also, the way setuptools patches into distutils makes it quite possible
> that newer Python releases introduce incompatibilities, so maybe there's
> an
> issue over there.
>
I was running build_ext in most cases. Didn't try bdist_egg. Anyway my
dependency on that version of python is over for now, so if it is in fact
an issue with py2.6+, we'll all find out soon enough. thanks for the
help.
From manu3d at gmail.com Tue Aug 11 13:03:08 2009
From: manu3d at gmail.com (Emanuele D'Arrigo)
Date: Tue, 11 Aug 2009 12:03:08 +0100
Subject: [lxml-dev] default namespace and xpath evaluation
Message-ID: <915dc91d0908110403ub68b31dx6257d618bf3c046c@mail.gmail.com>
Hi everybody,
I can't seem to find a more compact way to do this:
nsDict = {"default":anElement.nsmap[None]}
aChildElement = anElement.xpath("/default:elem", namespaces=nsDict
)[0]
Specifically, I would have thought that if the element is in the
default namespace the simple string:
aChildElement = anElement.xpath("/elem")[0]
would be sufficient, but the element is not found - probably appropriately.
However, passing the element's nsmap such as in:
aChildElement = anElement.xpath("/elem"
,namespaces=anElement.nsmap)[0]
results in the following error message:
Traceback (most recent call last):
File "", line 1, in
File "lxml.etree.pyx", line 1314, in lxml.etree._Element.xpath
(src/lxml/lxml.etree.c:38871)
File "xpath.pxi", line 245, in lxml.etree.XPathElementEvaluator.__init__
(src/lxml/lxml.etree.c:106924)
File "xpath.pxi", line 117, in lxml.etree._XPathEvaluatorBase.__init__
(src/lxml/lxml.etree.c:105514)
File "xpath.pxi", line 55, in lxml.etree._XPathContext.__init__
(src/lxml/lxml.etree.c:104808)
File "extensions.pxi", line 77, in lxml.etree._BaseContext.__init__
(src/lxml/lxml.etree.c:96771)
TypeError: empty namespace prefix is not supported in XPath
Am I missing a nicer way of doing this?
Manu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090811/7810d69e/attachment.htm
From chris at simplistix.co.uk Tue Aug 11 18:10:06 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 11 Aug 2009 17:10:06 +0100
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
Message-ID: <4A8197DE.70800@simplistix.co.uk>
Hi All,
I'm getting the following error when trying to install:
Building lxml version 2.2.
NOTE: Trying to build without Cython, pre-generated
'src/lxml/lxml.etree.c' needs to be available.
Using build configuration of libxslt 1.1.11
Building against libxml2/libxslt in the following directory: /usr/lib
src/lxml/lxml.etree.c:169:31:src/lxml/lxml.etree.c:169:31: error:
libxml/schematron.h: No such file or directory
error: libxml/schematron.h: No such file or directory
src/lxml/lxml.etree.c:135067: error: dereferencing pointer to incomplete
type
src/lxml/lxml.etree.c:135068: error: dereferencing pointer to incomplete
type
src/lxml/lxml.etree.c: At top level:
src/lxml/lxml.etree.c:135174: error: invalid application of 'sizeof' to
incomplete type 'struct
__pyx_obj_4lxml_5etree__ParserSchemaValidationContext'
lipo: can't figure out the architecture type of: /var/tmp//ccxExHVU.out
error: Setup script exited with error: command 'gcc' failed with exit
status 1
What am I doing wrong?
cheers,
Chris
From stefan_ml at behnel.de Tue Aug 11 20:27:12 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 11 Aug 2009 20:27:12 +0200
Subject: [lxml-dev] default namespace and xpath evaluation
In-Reply-To: <915dc91d0908110403ub68b31dx6257d618bf3c046c@mail.gmail.com>
References: <915dc91d0908110403ub68b31dx6257d618bf3c046c@mail.gmail.com>
Message-ID: <4A81B800.1080707@behnel.de>
Hi,
Emanuele D'Arrigo wrote:
> I can't seem to find a more compact way to do this:
>
> nsDict = {"default":anElement.nsmap[None]}
> aChildElement = anElement.xpath("/default:elem", namespaces=nsDict
> )[0]
I wonder about the use case. Why would you want to look for an element of
which you do not know the namespace URI in advance?
> Specifically, I would have thought that if the element is in the
> default namespace the simple string:
>
> aChildElement = anElement.xpath("/elem")[0]
>
> would be sufficient, but the element is not found - probably appropriately.
Yes.
http://codespeak.net/lxml/FAQ.html#how-can-i-specify-a-default-namespace-for-xpath-expressions
> However, passing the element's nsmap such as in:
>
> aChildElement = anElement.xpath("/elem"
> ,namespaces=anElement.nsmap)[0]
Note that the nsmap of an element does not necessarily contain a definition
of the namespace that you are looking for inside a subtree.
> results in the following error message:
>
> Traceback (most recent call last):
> File "", line 1, in
> File "lxml.etree.pyx", line 1314, in lxml.etree._Element.xpath
> (src/lxml/lxml.etree.c:38871)
> File "xpath.pxi", line 245, in lxml.etree.XPathElementEvaluator.__init__
> (src/lxml/lxml.etree.c:106924)
> File "xpath.pxi", line 117, in lxml.etree._XPathEvaluatorBase.__init__
> (src/lxml/lxml.etree.c:105514)
> File "xpath.pxi", line 55, in lxml.etree._XPathContext.__init__
> (src/lxml/lxml.etree.c:104808)
> File "extensions.pxi", line 77, in lxml.etree._BaseContext.__init__
> (src/lxml/lxml.etree.c:96771)
> TypeError: empty namespace prefix is not supported in XPath
In addition to the above FAQ, there is also:
http://codespeak.net/lxml/FAQ.html#how-can-i-find-out-which-namespace-prefixes-are-used-in-a-document
which gives a bit more background.
Stefan
From stefan_ml at behnel.de Wed Aug 12 09:09:31 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 12 Aug 2009 09:09:31 +0200
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A8197DE.70800@simplistix.co.uk>
References: <4A8197DE.70800@simplistix.co.uk>
Message-ID: <4A826AAB.8070601@behnel.de>
Chris Withers wrote:
> I'm getting the following error when trying to install:
>
> Building lxml version 2.2.
> NOTE: Trying to build without Cython, pre-generated
> 'src/lxml/lxml.etree.c' needs to be available.
> Using build configuration of libxslt 1.1.11
> Building against libxml2/libxslt in the following directory: /usr/lib
> src/lxml/lxml.etree.c:169:31:src/lxml/lxml.etree.c:169:31: error:
> libxml/schematron.h: No such file or directory
> error: libxml/schematron.h: No such file or directory
The libxml2 installation is (or at least the header files are) too old.
http://codespeak.net/lxml/build.html#building-lxml-on-macos-x
Stefan
From manu3d at gmail.com Wed Aug 12 09:59:25 2009
From: manu3d at gmail.com (Emanuele D'Arrigo)
Date: Wed, 12 Aug 2009 08:59:25 +0100
Subject: [lxml-dev] default namespace and xpath evaluation
In-Reply-To: <4A81B800.1080707@behnel.de>
References: <915dc91d0908110403ub68b31dx6257d618bf3c046c@mail.gmail.com>
<4A81B800.1080707@behnel.de>
Message-ID: <915dc91d0908120059r48315094yade0f3f55270b99f@mail.gmail.com>
2009/8/11 Stefan Behnel
> Emanuele D'Arrigo wrote:
> > I can't seem to find a more compact way to do this:
> >
> > nsDict = {"default":anElement.nsmap[None]}
> > aChildElement = anElement.xpath("/default:elem",
> namespaces=nsDict
> > )[0]
>
> I wonder about the use case. Why would you want to look for an element of
> which you do not know the namespace URI in advance?
I do know its namespace: it's the default namespace. In the xml document
it's
something like rather than . For this reason I find it
peculiar
that I have to first create an arbitrary namespace and then use
xpath("/arbitrary:elem").
Intuitively I would have expected xpath("/elem") to be enough.
Thank you for pointing me to the FAQ though (and sorry I didn't check it
myself first):
http://codespeak.net/lxml/FAQ.html#how-can-i-specify-a-default-namespace-for-xpath-expressions
in turn it pointed me to a common misuse of default namespaces, illustrated
here:
http://www.edankert.com/defaultnamespaces.html
That was my problem. I should probably not use a default namespaces anyway.
xml documents with the potential for tags from multiple namespaces are more
readable anyway if all tags have a namespace.
Thank you again.
Manu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090812/66dca716/attachment.htm
From lists at cheimes.de Wed Aug 12 12:51:22 2009
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 12 Aug 2009 12:51:22 +0200
Subject: [lxml-dev] libxml2 crash on 64bit Ubuntu and solution
Message-ID:
Dear lxml users,
My blog post may safe some of you several hours of debugging. If you
compile libxml2 yourself on a 64bit Ubuntu system you are going to run
into the same problem.
http://lipyrary.blogspot.com/2009/08/libxml2-crash-on-64bit-ubuntu.html
HTH
Christian
From libxml at bestley.co.uk Tue Aug 11 22:30:53 2009
From: libxml at bestley.co.uk (Mark Bestley)
Date: Tue, 11 Aug 2009 21:30:53 +0100
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
References: <4A8197DE.70800@simplistix.co.uk>
Message-ID:
Chris Withers writes:
> Hi All,
>
> I'm getting the following error when trying to install:
>
> Building lxml version 2.2.
> NOTE: Trying to build without Cython, pre-generated
> 'src/lxml/lxml.etree.c' needs to be available.
> Using build configuration of libxslt 1.1.11
> Building against libxml2/libxslt in the following directory: /usr/lib
> src/lxml/lxml.etree.c:169:31:src/lxml/lxml.etree.c:169:31: error:
> libxml/schematron.h: No such file or directory
> error: libxml/schematron.h: No such file or directory
>
>
>
> src/lxml/lxml.etree.c:135067: error: dereferencing pointer to incomplete
> type
> src/lxml/lxml.etree.c:135068: error: dereferencing pointer to incomplete
> type
> src/lxml/lxml.etree.c: At top level:
> src/lxml/lxml.etree.c:135174: error: invalid application of 'sizeof' to
> incomplete type 'struct
> __pyx_obj_4lxml_5etree__ParserSchemaValidationContext'
> lipo: can't figure out the architecture type of: /var/tmp//ccxExHVU.out
> error: Setup script exited with error: command 'gcc' failed with exit
> status 1
>
> What am I doing wrong?
>
You are using the libxslt supplied by Apple. You need a newer one.
I got libxml and libxslt from macports .
Fink would probably provide another war or look in the last month or two
on this list I think someone produced a binary build of lxml for OSX
--
Mark
From stefan_ml at behnel.de Wed Aug 12 14:35:45 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 12 Aug 2009 14:35:45 +0200
Subject: [lxml-dev] default namespace and xpath evaluation
In-Reply-To: <915dc91d0908120059r48315094yade0f3f55270b99f@mail.gmail.com>
References: <915dc91d0908110403ub68b31dx6257d618bf3c046c@mail.gmail.com>
<4A81B800.1080707@behnel.de>
<915dc91d0908120059r48315094yade0f3f55270b99f@mail.gmail.com>
Message-ID: <4A82B721.2070409@behnel.de>
Emanuele D'Arrigo wrote:
> 2009/8/11 Stefan Behnel
>
>> Emanuele D'Arrigo wrote:
>>> I can't seem to find a more compact way to do this:
>>>
>>> nsDict = {"default":anElement.nsmap[None]}
>>> aChildElement = anElement.xpath("/default:elem",
>> namespaces=nsDict
>>> )[0]
>> I wonder about the use case. Why would you want to look for an element of
>> which you do not know the namespace URI in advance?
>
>
> I do know its namespace: it's the default namespace. In the xml document
> it's something like rather than .
This sounds like you are confusing namespaces (i.e. URIs) and prefixes
(which are a document internal work-around for readability reasons).
Prefixes can be defined and redefined all over the place in a document.
Even the default namespace can be redefined at any element. Defining XPath
based on the prefixes used in a particular document would render it
completely unusable.
Stefan
From manu3d at gmail.com Wed Aug 12 14:42:48 2009
From: manu3d at gmail.com (Emanuele D'Arrigo)
Date: Wed, 12 Aug 2009 13:42:48 +0100
Subject: [lxml-dev] default namespace and xpath evaluation
In-Reply-To: <4A82B721.2070409@behnel.de>
References: <915dc91d0908110403ub68b31dx6257d618bf3c046c@mail.gmail.com>
<4A81B800.1080707@behnel.de>
<915dc91d0908120059r48315094yade0f3f55270b99f@mail.gmail.com>
<4A82B721.2070409@behnel.de>
Message-ID: <915dc91d0908120542l5e53296kc49bb605b9053f0d@mail.gmail.com>
2009/8/12 Stefan Behnel
> Prefixes can be defined and redefined all over the place in a document.
> Even the default namespace can be redefined at any element. Defining XPath
> based on the prefixes used in a particular document would render it
> completely unusable.
A-ha! Thank you, I didn't know that! I thought namespaces were defined once
and for all in
the root of the document! It all makes sense now. Thank you again!
Manu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090812/382024e4/attachment.htm
From chris at simplistix.co.uk Wed Aug 12 15:33:30 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Wed, 12 Aug 2009 14:33:30 +0100
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To:
References: <4A8197DE.70800@simplistix.co.uk>
Message-ID: <4A82C4AA.4090601@simplistix.co.uk>
Mark Bestley wrote:
> I got libxml and libxslt from macports .
> Fink would probably provide another war or look in the last month or two
> on this list I think someone produced a binary build of lxml for OSX
If whoever did that could do a bdist_egg for python 2.6 and give it to
Stefan to put on PyPI, that would be perfect...
Any chance of that happening?
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
From stefan_ml at behnel.de Wed Aug 12 16:42:42 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 12 Aug 2009 16:42:42 +0200
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A82C4AA.4090601@simplistix.co.uk>
References: <4A8197DE.70800@simplistix.co.uk>
<4A82C4AA.4090601@simplistix.co.uk>
Message-ID: <4A82D4E2.9030606@behnel.de>
Chris Withers wrote:
> Mark Bestley wrote:
>> I got libxml and libxslt from macports .
>> Fink would probably provide another war or look in the last month or two
>> on this list I think someone produced a binary build of lxml for OSX
>
> If whoever did that could do a bdist_egg for python 2.6 and give it to
> Stefan to put on PyPI, that would be perfect...
So, you're not using MacOS-X 10.5, I assume?
Stefan
From chris at simplistix.co.uk Wed Aug 12 17:32:17 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Wed, 12 Aug 2009 16:32:17 +0100
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A82D4E2.9030606@behnel.de>
References: <4A8197DE.70800@simplistix.co.uk>
<4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de>
Message-ID: <4A82E081.6090707@simplistix.co.uk>
Stefan Behnel wrote:
> Chris Withers wrote:
>> Mark Bestley wrote:
>>> I got libxml and libxslt from macports .
>>> Fink would probably provide another war or look in the last month or two
>>> on this list I think someone produced a binary build of lxml for OSX
>> If whoever did that could do a bdist_egg for python 2.6 and give it to
>> Stefan to put on PyPI, that would be perfect...
>
> So, you're not using MacOS-X 10.5, I assume?
10.4.11, but what does that have to do with not wanting to fight this
fight myself?
If someone else can get the compile working, if they can provide a
binary egg, surely that will alleviate the need for other Mac users to
go through the pain each time?
cheers,
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
From stefan_ml at behnel.de Wed Aug 12 18:01:11 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 12 Aug 2009 18:01:11 +0200
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A82E081.6090707@simplistix.co.uk>
References: <4A8197DE.70800@simplistix.co.uk>
<4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de>
<4A82E081.6090707@simplistix.co.uk>
Message-ID: <4A82E747.5030708@behnel.de>
Chris Withers wrote:
> Stefan Behnel wrote:
>> Chris Withers wrote:
>>> If whoever did that could do a bdist_egg for python 2.6 and give it
>>> to Stefan to put on PyPI, that would be perfect...
>>
>> So, you're not using MacOS-X 10.5, I assume?
>
> 10.4.11, but what does that have to do with not wanting to fight this
> fight myself?
The difference is that we have binaries for 10.5:
http://pypi.python.org/pypi/lxml/2.2.2
Stefan
From nick.lang at propylon.com Wed Aug 12 18:16:03 2009
From: nick.lang at propylon.com (Nick Lang)
Date: Wed, 12 Aug 2009 11:16:03 -0500
Subject: [lxml-dev] Inserting Element before Tails
Message-ID: <4A82EAC3.7050006@propylon.com>
Hello,
I am new to lxml. I see from the website that it says it supports
Mixed-Content. Though I have question with this mixed content support.
Say for example I have the following xml:
this is BOLD
If I were to get a list of the children of it would _only_ contain
the element correct?
The .tail of would be: "this is", right?
So the question I have is this:
If I wanted to add an element before and after I would do so
like this (or similarly):
.insert(0, etree.Element("new")
The result of this insert would leave me with: This is BOLD
Is it possible to insert before the tail of ? IE, I would want
something like this.
before:
this is BOLD
after:
this is BOLD
I hope this is clear.
Thanks
Nick
From herve.cauwelier at free.fr Wed Aug 12 18:51:35 2009
From: herve.cauwelier at free.fr (=?UTF-8?B?SGVydsOpIENhdXdlbGllcg==?=)
Date: Wed, 12 Aug 2009 18:51:35 +0200
Subject: [lxml-dev] Inserting Element before Tails
In-Reply-To: <4A82EAC3.7050006@propylon.com>
References: <4A82EAC3.7050006@propylon.com>
Message-ID: <4A82F317.9030804@free.fr>
Nick Lang a ?crit :
> Hello,
>
> I am new to lxml. I see from the website that it says it supports
> Mixed-Content. Though I have question with this mixed content support.
>
> Say for example I have the following xml:
>
> this is BOLD
>
>
> If I were to get a list of the children of it would _only_ contain
> the element correct?
You'll get elements only, so yes only here.
> The .tail of would be: "this is", right?
No, this is the text (attribute ".text"). The tail is None because there
is content between the closing and the closing .
I think you need to read the tutorial closely and try the examples to
get familiar with the terminology.
http://codespeak.net/lxml/tutorial.html
> So the question I have is this:
>
> If I wanted to add an element before and after I would do so
> like this (or similarly):
> .insert(0, etree.Element("new")
>
> The result of this insert would leave me with: This is />BOLD
>
> Is it possible to insert before the tail of ? IE, I would want
> something like this.
> before:
> this is BOLD
>
> after:
> this is BOLD
Keep your current code and just move the text of the element to the
tail of the element.
Herv?
From l at lrowe.co.uk Wed Aug 12 19:07:39 2009
From: l at lrowe.co.uk (Laurence Rowe)
Date: Wed, 12 Aug 2009 18:07:39 +0100
Subject: [lxml-dev] Behaviour of the push parser in recover mode on
encountering errors
Message-ID:
Over at http://bugzilla.gnome.org/show_bug.cgi?id=569131 I reported
what I thought was a bug in HTMLParser but on closer inspection
appears to be an incorrect assumption on my part (and that of lxml)
when dealing with errors returned by the push parser interface.
With the libxml2 bindings, I am able to parse invalid html using the
push parser:
>>> import libxml2
>>> options = libxml2.HTML_PARSE_RECOVER | libxml2.HTML_PARSE_NONET
>>> p = libxml2.htmlCreatePushParser(None, "", 0, "test")
>>> p.ctxtUseOptions(options)
0
>>> bad1 = '''
\n'''
>>> p.htmlParseChunk(bad1, len(bad1), 0)
test:1: HTML parser error : Unexpected end tag : p
^
76
>>> good = '''
But with lxml, the parser is reset on encountering an error:
>>> from lxml.etree import HTMLParser, dump
>>> p = HTMLParser(recover=True)
>>> bad1 = '''\n'''
>>> p.feed(bad1)
Traceback (most recent call last):
File "", line 1, in ?
File "parser.pxi", line 1093, in lxml.etree._FeedParser.feed
(src/lxml/lxml.etree.c:61114)
File "parser.pxi", line 534, in
lxml.etree._ParserContext._handleParseResult
(src/lxml/lxml.etree.c:56605)
File "parser.pxi", line 628, in lxml.etree._handleParseResult
(src/lxml/lxml.etree.c:57504)
File "parser.pxi", line 568, in lxml.etree._raiseParseError
(src/lxml/lxml.etree.c:56902)
XMLSyntaxError: Unexpected end tag : p, line 1, column 19
>>> good = '''
foo
\n'''
>>> p.feed(good)
>>> elem = p.close()
And previous state is lost:
>>> dump(elem)
foo
In fact, I'm unable to retrieve any state from the parser unless it is reset:
>>> p.feed(bad1)
Traceback (most recent call last):
File "", line 1, in ?
File "parser.pxi", line 1093, in lxml.etree._FeedParser.feed
(src/lxml/lxml.etree.c:61114)
File "parser.pxi", line 534, in
lxml.etree._ParserContext._handleParseResult
(src/lxml/lxml.etree.c:56605)
File "parser.pxi", line 628, in lxml.etree._handleParseResult
(src/lxml/lxml.etree.c:57504)
File "parser.pxi", line 568, in lxml.etree._raiseParseError
(src/lxml/lxml.etree.c:56902)
XMLSyntaxError: Unexpected end tag : p, line 1, column 19
>>> p.close()
Traceback (most recent call last):
File "", line 1, in ?
File "parser.pxi", line 1113, in lxml.etree._FeedParser.close
(src/lxml/lxml.etree.c:61239)
XMLSyntaxError: no element found
So in my view, the behaviour here is not helpful. When a parser is
created with recover=True it should not raise errors, so allowing
incremental parsing of invalid html.
Laurence
From chris at simplistix.co.uk Thu Aug 13 09:19:13 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 13 Aug 2009 08:19:13 +0100
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A82E747.5030708@behnel.de>
References: <4A8197DE.70800@simplistix.co.uk>
<4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de>
<4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de>
Message-ID: <4A83BE71.7050907@simplistix.co.uk>
Stefan Behnel wrote:
>
> Chris Withers wrote:
>> Stefan Behnel wrote:
>>> Chris Withers wrote:
>>>> If whoever did that could do a bdist_egg for python 2.6 and give it
>>>> to Stefan to put on PyPI, that would be perfect...
>>> So, you're not using MacOS-X 10.5, I assume?
>> 10.4.11, but what does that have to do with not wanting to fight this
>> fight myself?
>
> The difference is that we have binaries for 10.5:
>
> http://pypi.python.org/pypi/lxml/2.2.2
Oh, I see...
So no, still on 10.4...
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
From stefan_ml at behnel.de Thu Aug 13 09:58:50 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Aug 2009 09:58:50 +0200
Subject: [lxml-dev] Behaviour of the push parser in recover mode on
encountering errors
In-Reply-To:
References:
Message-ID: <4A83C7BA.4020004@behnel.de>
Laurence Rowe wrote:
> When a parser is
> created with recover=True it should not raise errors, so allowing
> incremental parsing of invalid html.
I agree, this is a bug. There is a bit of code in parser.pxi that handles
the recovery flag in the error case, but before doing that, it already
stops short when encountering an error.
Fixed on the trunk.
Stefan
From commissarster at gmail.com Thu Aug 13 09:51:58 2009
From: commissarster at gmail.com (commissar wu)
Date: Thu, 13 Aug 2009 15:51:58 +0800
Subject: [lxml-dev] lxml bug
Message-ID:
Hi:everyone,lxml is very good, I like it .But I recently encountered a
little trouble.I use lxml to parse the contents of the url(
http://www.dtzww.cn/files/article/fulltext/23/23208.html),the lxml is been
blocking,and don't rasie exception. The CPU utilization rate is 100%.
My environment is lxml-2.2.2. ubutnu-8.04-amd64-server python-2.5.2
My code is fellow:
import lxml.html as htmltool
import urlib
url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html"
f = urllib.urlopen(url)
data = f.read()
doc = htmltool.document_fromstring(data) ## <--- Block this
Looking forward to your reply.
commissar
your friend
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090813/7719c32f/attachment.htm
From stefan_ml at behnel.de Thu Aug 13 10:02:31 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Aug 2009 10:02:31 +0200
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A83BE71.7050907@simplistix.co.uk>
References: <4A8197DE.70800@simplistix.co.uk>
<4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de>
<4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de>
<4A83BE71.7050907@simplistix.co.uk>
Message-ID: <4A83C897.6020004@behnel.de>
Chris Withers wrote:
> Stefan Behnel wrote:
>> Chris Withers wrote:
>>> Stefan Behnel wrote:
>>>> Chris Withers wrote:
>>>>> If whoever did that could do a bdist_egg for python 2.6 and give it
>>>>> to Stefan to put on PyPI, that would be perfect...
>>>> So, you're not using MacOS-X 10.5, I assume?
>>> 10.4.11, but what does that have to do with not wanting to fight this
>>> fight myself?
>>
>> The difference is that we have binaries for 10.5:
>>
>> http://pypi.python.org/pypi/lxml/2.2.2
>
> Oh, I see...
>
> So no, still on 10.4...
Did you try the 10.5 egg? It doesn't have that many dependencies, so it
might even work...
If they don't: Stefan (E.), any chance to get binaries that run on 10.4?
Stefan
From stefan_ml at behnel.de Thu Aug 13 10:08:22 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Aug 2009 10:08:22 +0200
Subject: [lxml-dev] lxml bug
In-Reply-To:
References:
Message-ID: <4A83C9F6.3010604@behnel.de>
commissar wu wrote:
> Hi:everyone,lxml is very good, I like it .But I recently encountered a
> little trouble.I use lxml to parse the contents of the url(
> http://www.dtzww.cn/files/article/fulltext/23/23208.html),the lxml is been
> blocking,and don't rasie exception. The CPU utilization rate is 100%.
>
> My environment is lxml-2.2.2. ubutnu-8.04-amd64-server python-2.5.2
>
> My code is fellow:
>
> import lxml.html as htmltool
> import urlib
>
> url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html"
> f = urllib.urlopen(url)
> data = f.read()
>
> doc = htmltool.document_fromstring(data) ## <--- Block this
I can reproduce this, although I didn't look into it any deeper yet.
This works for me, though:
import lxml.html as htmltool
url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html"
doc = htmltool.parse(url)
Stefan
From commissarster at gmail.com Thu Aug 13 10:22:19 2009
From: commissarster at gmail.com (commissar wu)
Date: Thu, 13 Aug 2009 16:22:19 +0800
Subject: [lxml-dev] lxml bug
In-Reply-To: <4A83C9F6.3010604@behnel.de>
References:
<4A83C9F6.3010604@behnel.de>
Message-ID:
2009/8/13 Stefan Behnel
>
> commissar wu wrote:
> > Hi:everyone,lxml is very good, I like it .But I recently encountered a
> > little trouble.I use lxml to parse the contents of the url(
> > http://www.dtzww.cn/files/article/fulltext/23/23208.html),the lxml is
> been
> > blocking,and don't rasie exception. The CPU utilization rate is 100%.
> >
> > My environment is lxml-2.2.2. ubutnu-8.04-amd64-server python-2.5.2
> >
> > My code is fellow:
> >
> > import lxml.html as htmltool
> > import urlib
> >
> > url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html"
> > f = urllib.urlopen(url)
> > data = f.read()
> >
> > doc = htmltool.document_fromstring(data) ## <--- Block this
>
> I can reproduce this, although I didn't look into it any deeper yet.
>
> This works for me, though:
>
> import lxml.html as htmltool
> url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html"
> doc = htmltool.parse(url)
>
> Stefan
>
Oh,Stefan,thank you.
you are right,htmltool.parse is ok.
But,why the document_fromstring can not work ?and the lxml.html.parse and
lxml.html.document_fromstring are not the same used in a way?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090813/f43ff905/attachment.htm
From stefan_ml at behnel.de Thu Aug 13 10:47:14 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Aug 2009 10:47:14 +0200
Subject: [lxml-dev] lxml bug
In-Reply-To:
References:
<4A83C9F6.3010604@behnel.de>
Message-ID: <4A83D312.2040404@behnel.de>
commissar wu wrote:
> But,why the document_fromstring can not work ?
Sorry, I forgot to mention it: please file a bug report.
https://bugs.launchpad.net/lxml
Thanks!
Stefan
From gael at gawel.org Thu Aug 13 14:16:54 2009
From: gael at gawel.org (Gael Pasgrimaud)
Date: Thu, 13 Aug 2009 14:16:54 +0200
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A82E747.5030708@behnel.de>
References: <4A8197DE.70800@simplistix.co.uk>
<4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de>
<4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de>
Message-ID: <7911b3bb0908130516x21c02613xeaecf2a79b064f36@mail.gmail.com>
On Wed, Aug 12, 2009 at 6:01 PM, Stefan Behnel wrote:
>
>
> Chris Withers wrote:
>> Stefan Behnel wrote:
>>> Chris Withers wrote:
>>>> If whoever did that could do a bdist_egg for python 2.6 and give it
>>>> to Stefan to put on PyPI, that would be perfect...
>>>
>>> So, you're not using MacOS-X 10.5, I assume?
>>
>> 10.4.11, but what does that have to do with not wanting to fight this
>> fight myself?
>
> The difference is that we have binaries for 10.5:
>
> http://pypi.python.org/pypi/lxml/2.2.2
>
I still dont understand why my OSX 10.5 always want to compile lxml.
gawel:~/tmp% virtualenv test
New python executable in test/bin/python
Installing setuptools............done.
gawel:~/tmp% cd test && source bin/activate
(test)gawel:~/tmp/test% easy_install lxml
Searching for lxml
Reading http://pypi.python.org/simple/lxml/
Reading http://codespeak.net/lxml
Best match: lxml 2.2.2
Downloading http://codespeak.net/lxml/lxml-2.2.2.tgz
^Cinterrupted
(test)gawel:~/tmp/test% easy_install
http://pypi.python.org/packages/2.6/l/lxml/lxml-2.2.2-py2.6-macosx-10.5-i386.egg
Downloading http://pypi.python.org/packages/2.6/l/lxml/lxml-2.2.2-py2.6-macosx-10.5-i386.egg
Processing lxml-2.2.2-py2.6-macosx-10.5-i386.egg
creating /Users/gawel/tmp/test/lib/python2.6/site-packages/lxml-2.2.2-py2.6-macosx-10.5-i386.egg
Extracting lxml-2.2.2-py2.6-macosx-10.5-i386.egg to
/Users/gawel/tmp/test/lib/python2.6/site-packages
Adding lxml 2.2.2 to easy-install.pth file
Installed /Users/gawel/tmp/test/lib/python2.6/site-packages/lxml-2.2.2-py2.6-macosx-10.5-i386.egg
Processing dependencies for lxml==2.2.2
Searching for lxml==2.2.2
Reading http://pypi.python.org/simple/lxml/
Reading http://codespeak.net/lxml
Best match: lxml 2.2.2
Downloading http://codespeak.net/lxml/lxml-2.2.2.tgz
Processing lxml-2.2.2.tgz
Running lxml-2.2.2/setup.py -q bdist_egg --dist-dir
/var/folders/6W/6W87x5wkH1ObxROpTXJbTk+++TI/-Tmp-/easy_install-mOILMS/lxml-2.2.2/egg-dist-tmp-uH7pE4
Building lxml version 2.2.2.
NOTE: Trying to build without Cython, pre-generated
'src/lxml/lxml.etree.c' needs to be available.
Using build configuration of libxslt 1.1.24
Building against libxml2/libxslt in the following directory: /usr/local/lib
In file included from src/lxml/lxml.etree.c:139:
...
> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
From nospamus at gmail.com Fri Aug 14 16:01:17 2009
From: nospamus at gmail.com (Bryan Hughes)
Date: Fri, 14 Aug 2009 10:01:17 -0400
Subject: [lxml-dev] Validating against an empty element
Message-ID: <4badce440908140701vdad41d0yc7d4113238882d09@mail.gmail.com>
I have the following empty SubElement in my XML file:
This value stores one of the 50 US states. When I attempt to validate it
against my XSD, I receive the following error:
Element 'State': [facet 'pattern'] The value '' is not accepted by the
pattern
Here is the code in my XSD. Can someone shed some light on why this might
be failing? The "State" type is listed as a non-required field
(minOccurs=0), so I'm stumped why this error is popping up.
FWIW, I've ever tried adding "\s" to my regex pattern, but it still raises
the same exception.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090814/c03c8c09/attachment.htm
From jlovell at nwesd.org Fri Aug 14 18:49:41 2009
From: jlovell at nwesd.org (John Lovell)
Date: Fri, 14 Aug 2009 09:49:41 -0700
Subject: [lxml-dev] Validating against an empty element
In-Reply-To: <4badce440908140701vdad41d0yc7d4113238882d09@mail.gmail.com>
References: <4badce440908140701vdad41d0yc7d4113238882d09@mail.gmail.com>
Message-ID:
Bryan:
is an occurrence of element 'State.' Its value is ''. The way you have written your rules if 'State' exists then it must have on of the following values. If you want to allow for an empty string then you must make that an approved value.
Hope this helps,
John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
(360) 299-4086
jlovell at nwesd.org
www.nwesd.org
Together We Can ...
________________________________
From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Bryan Hughes
Sent: Friday, August 14, 2009 7:01 AM
To: lxml-dev at codespeak.net
Subject: [lxml-dev] Validating against an empty element
I have the following empty SubElement in my XML file:
This value stores one of the 50 US states. When I attempt to validate it against my XSD, I receive the following error:
Element 'State': [facet 'pattern'] The value '' is not accepted by the pattern
Here is the code in my XSD. Can someone shed some light on why this might be failing? The "State" type is listed as a non-required field (minOccurs=0), so I'm stumped why this error is popping up.
FWIW, I've ever tried adding "\s" to my regex pattern, but it still raises the same exception.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090814/d7b278de/attachment.htm
From stefan_ml at behnel.de Fri Aug 14 21:46:37 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 14 Aug 2009 21:46:37 +0200
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A85A5C9.5030901@urheberrecht.org>
References: <4A8197DE.70800@simplistix.co.uk> <4A82C4AA.4090601@simplistix.co.uk>
<4A82D4E2.9030606@behnel.de> <4A82E081.6090707@simplistix.co.uk>
<4A82E747.5030708@behnel.de> <4A83BE71.7050907@simplistix.co.uk>
<4A83C897.6020004@behnel.de> <4A85A5C9.5030901@urheberrecht.org>
Message-ID: <4A85BF1D.3090602@behnel.de>
Hi,
Pascal Oberndoerfer wrote:
> Sorry for sending this large file to you directly, but I thought
> it might be of use. And I don't have a clue with regard to pypi...
>
> Built on i386, running 10.4.11, with MacPython.org distro, using:
> "python setup.py bdist_egg --static-deps".
>
> Untested, as I did this after "python setup.py install --static-deps".
Thanks a lot!
I put it here:
http://codespeak.net/lxml/lxml-2.2.2-py2.5-macosx-10.3-i386.egg
so that others can test it before I move it over to PyPI.
Stefan
From stefan_ml at behnel.de Fri Aug 14 22:31:52 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 14 Aug 2009 22:31:52 +0200
Subject: [lxml-dev] current trunk includes static build for libiconv
Message-ID: <4A85C9B8.1070809@behnel.de>
Hi,
since there were problems also with libiconv on MacOS recently, I added
libiconv to the list of libraries that "--static-deps" builds.
I'd be happy to get some feedback on this to see if it actually works for
those who reported problems.
http://codespeak.net/lxml/build.html#subversion
http://codespeak.net/lxml/build.html#building-lxml-on-macos-x
Note that the trunk builds as lxml version "2.3dev" now. Copying over the
buildlibxml.py script to another lxml version should also work.
http://codespeak.net/svn/lxml/trunk/buildlibxml.py
Stefan
From chris at simplistix.co.uk Sat Aug 15 11:34:56 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 15 Aug 2009 10:34:56 +0100
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A83C897.6020004@behnel.de>
References: <4A8197DE.70800@simplistix.co.uk>
<4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de>
<4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de>
<4A83BE71.7050907@simplistix.co.uk> <4A83C897.6020004@behnel.de>
Message-ID: <4A868140.9030302@simplistix.co.uk>
Stefan Behnel wrote:
> Did you try the 10.5 egg? It doesn't have that many dependencies, so it
> might even work...
Because I use automated tools (buildout in this case) I can't manually
substitute in a "wrong" egg like this...
> If they don't: Stefan (E.), any chance to get binaries that run on 10.4?
That would be great :-)
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
From chris at simplistix.co.uk Sat Aug 15 14:09:40 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 15 Aug 2009 13:09:40 +0100
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A85BF1D.3090602@behnel.de>
References: <4A8197DE.70800@simplistix.co.uk> <4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de> <4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de> <4A83BE71.7050907@simplistix.co.uk> <4A83C897.6020004@behnel.de>
<4A85A5C9.5030901@urheberrecht.org> <4A85BF1D.3090602@behnel.de>
Message-ID: <4A86A584.5050100@simplistix.co.uk>
Stefan Behnel wrote:
>> Built on i386, running 10.4.11, with MacPython.org distro, using:
>> "python setup.py bdist_egg --static-deps".
>>
>> Untested, as I did this after "python setup.py install --static-deps".
>
> Thanks a lot!
>
> I put it here:
>
> http://codespeak.net/lxml/lxml-2.2.2-py2.5-macosx-10.3-i386.egg
If this is for MacOS 10.4, why does it say 10.3?
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
From jamie at artefact.org.nz Thu Aug 20 08:15:17 2009
From: jamie at artefact.org.nz (Jamie Norrish)
Date: Thu, 20 Aug 2009 18:15:17 +1200
Subject: [lxml-dev] getname() method on 'smart' attribute string values
Message-ID: <1250748917.7169.14.camel@atman>
I have a situation where I want to find any attributes in a document
that contain a certain value, and then change that value and record the
fact that the new value is the result of such a change. In order to
track these changes, I am populating a dictionary keyed by the parent
element, with each value being another dictionary keyed by attribute
name with a value of the list of parts of the attribute that have been
changed. So:
get_attrs = etree.XPath('//@*[contains(concat(" ", ., " "), concat(" #",
$old_id, " "))]')
for attribute in get_attrs(element, old_id=some_string):
element = attribute.getparent()
And then I discover that in fact I have to use element.attrib.items()
and search through for which attribute matched. It would be much easier
if the attribute 'smart' string result from the XPath evaluation had a
method which would specify *which* attribute of the parent it was the
value of.
Or am I missing a better, existing way of doing this? Note that while
there is in fact a finite set of attribute names I need to check, it's a
potentially expanding set, and I'd rather not have to touch the code
when that expansion happens.
Jamie
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090820/c35706ea/attachment-0001.pgp
From stefan_ml at behnel.de Thu Aug 20 10:37:32 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 20 Aug 2009 10:37:32 +0200
Subject: [lxml-dev] getname() method on 'smart' attribute string values
In-Reply-To: <1250748917.7169.14.camel@atman>
References: <1250748917.7169.14.camel@atman>
Message-ID: <4A8D0B4C.60206@behnel.de>
Hi,
Jamie Norrish wrote:
> I have a situation where I want to find any attributes in a document
> that contain a certain value, and then change that value and record the
> fact that the new value is the result of such a change. In order to
> track these changes, I am populating a dictionary keyed by the parent
> element, with each value being another dictionary keyed by attribute
> name with a value of the list of parts of the attribute that have been
> changed. So:
>
> get_attrs = etree.XPath('//@*[contains(concat(" ", ., " "), concat(" #",
> $old_id, " "))]')
>
> for attribute in get_attrs(element, old_id=some_string):
> element = attribute.getparent()
>
> And then I discover that in fact I have to use element.attrib.items()
> and search through for which attribute matched. It would be much easier
> if the attribute 'smart' string result from the XPath evaluation had a
> method which would specify *which* attribute of the parent it was the
> value of.
I like that. It would be an attribute, though, maybe "attrname", to make it
clear that the name is a) fixed at the time it's found and b) only
available for attributes, as elements have their own way of providing the
tag name (which is not fixed).
Stefan
From lei at ipac.caltech.edu Thu Aug 20 19:01:57 2009
From: lei at ipac.caltech.edu (Mary Lei)
Date: Thu, 20 Aug 2009 10:01:57 -0700
Subject: [lxml-dev] how to do file upload via submit_form
Message-ID: <4A8D8185.7040604@ipac.caltech.edu>
I have to support form post.
Since I am using lxml for form
requests, how do I set up the file upload
part ?
Thanks.
--
Mary Lei
Software Testing
IPAC-NExScl
Rm: KS-233
MS: 220-6
Phone: 395-1998
From stefan_ml at behnel.de Fri Aug 21 11:32:39 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 21 Aug 2009 11:32:39 +0200
Subject: [lxml-dev] getname() method on 'smart' attribute string values
In-Reply-To: <1250846887.27171.99.camel@atman>
References: <1250748917.7169.14.camel@atman> <4A8D0B4C.60206@behnel.de>
<1250846887.27171.99.camel@atman>
Message-ID: <4A8E69B7.4040205@behnel.de>
Jamie Norrish wrote:
> On Thu, 2009-08-20 at 10:37 +0200, Stefan Behnel wrote:
>
>> I like that. It would be an attribute, though, maybe "attrname", to make it
>> clear that the name is a) fixed at the time it's found and b) only
>> available for attributes, as elements have their own way of providing the
>> tag name (which is not fixed).
>
> Sounds good to me!
You can give it a try on the trunk, if you like.
https://codespeak.net/viewvc/?view=rev&revision=67010
http://codespeak.net/lxml/build.html
Stefan
From jamie at artefact.org.nz Fri Aug 21 11:28:07 2009
From: jamie at artefact.org.nz (Jamie Norrish)
Date: Fri, 21 Aug 2009 21:28:07 +1200
Subject: [lxml-dev] getname() method on 'smart' attribute string values
In-Reply-To: <4A8D0B4C.60206@behnel.de>
References: <1250748917.7169.14.camel@atman> <4A8D0B4C.60206@behnel.de>
Message-ID: <1250846887.27171.99.camel@atman>
On Thu, 2009-08-20 at 10:37 +0200, Stefan Behnel wrote:
> I like that. It would be an attribute, though, maybe "attrname", to make it
> clear that the name is a) fixed at the time it's found and b) only
> available for attributes, as elements have their own way of providing the
> tag name (which is not fixed).
Sounds good to me!
Jamie
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090821/4047476e/attachment.pgp
From p.oberndoerfer at urheberrecht.org Fri Aug 21 19:47:59 2009
From: p.oberndoerfer at urheberrecht.org (=?ISO-8859-1?Q?=22Dr=2E_Pascal_Obernd=F6rfer=22?=)
Date: Fri, 21 Aug 2009 19:47:59 +0200
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A86A584.5050100@simplistix.co.uk>
References: <4A8197DE.70800@simplistix.co.uk> <4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de> <4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de> <4A83BE71.7050907@simplistix.co.uk> <4A83C897.6020004@behnel.de>
<4A85A5C9.5030901@urheberrecht.org> <4A85BF1D.3090602@behnel.de>
<4A86A584.5050100@simplistix.co.uk>
Message-ID: <4A8EDDCF.8070509@urheberrecht.org>
Chris Withers schrieb:
> Stefan Behnel wrote:
>>> Built on i386, running 10.4.11, with MacPython.org distro, using:
>>> "python setup.py bdist_egg --static-deps".
>>>
>>> Untested, as I did this after "python setup.py install --static-deps".
>>
>> Thanks a lot!
>>
>> I put it here:
>>
>> http://codespeak.net/lxml/lxml-2.2.2-py2.5-macosx-10.3-i386.egg
>
> If this is for MacOS 10.4, why does it say 10.3?
To be honest: I don't know. All I can say is that I have already seen
this on some occasions (i.e. building on 10.4 and the resulting egg
being named 10.3). Sorry if this is not of any help...
Pascal
> Chris
>
From mike at it-loops.com Sat Aug 22 01:51:49 2009
From: mike at it-loops.com (Michael Guntsche)
Date: Sat, 22 Aug 2009 01:51:49 +0200
Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X
In-Reply-To: <4A8EDDCF.8070509@urheberrecht.org>
References: <4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de>
<4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de>
<4A83BE71.7050907@simplistix.co.uk> <4A83C897.6020004@behnel.de>
<4A85A5C9.5030901@urheberrecht.org> <4A85BF1D.3090602@behnel.de>
<4A86A584.5050100@simplistix.co.uk>
<4A8EDDCF.8070509@urheberrecht.org>
Message-ID: <20090821235148.GA16153@gibson.comsick.at>
On Fri, Aug 21, 2009 at 07:47:59PM +0200, "Dr. Pascal Obernd?rfer" wrote:
> To be honest: I don't know. All I can say is that I have already seen
> this on some occasions (i.e. building on 10.4 and the resulting egg
> being named 10.3). Sorry if this is not of any help...
This more or less just means that it should also work on 10.3.9. Has
something to do with the DEPLOYMENT_TARGET python was build AFAIR.
Kind regards,
Michael Guntsche
From jamie at artefact.org.nz Sun Aug 23 11:00:51 2009
From: jamie at artefact.org.nz (Jamie Norrish)
Date: Sun, 23 Aug 2009 21:00:51 +1200
Subject: [lxml-dev] getname() method on 'smart' attribute string values
In-Reply-To: <4A8E69B7.4040205@behnel.de>
References: <1250748917.7169.14.camel@atman> <4A8D0B4C.60206@behnel.de>
<1250846887.27171.99.camel@atman> <4A8E69B7.4040205@behnel.de>
Message-ID: <1251018051.6506.49.camel@atman>
On Fri, 2009-08-21 at 11:32 +0200, Stefan Behnel wrote:
> You can give it a try on the trunk, if you like.
Perfect, thank you!
Jamie
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090823/49d7e0bf/attachment.pgp
From p.oberndoerfer at urheberrecht.org Sun Aug 23 13:52:35 2009
From: p.oberndoerfer at urheberrecht.org (Pascal Oberndoerfer)
Date: Sun, 23 Aug 2009 13:52:35 +0200
Subject: [lxml-dev] current trunk includes static build for libiconv
In-Reply-To: <4A85C9B8.1070809@behnel.de>
References: <4A85C9B8.1070809@behnel.de>
Message-ID: <4A912D83.9080407@urheberrecht.org>
Stefan Behnel schrieb:
> Hi,
>
> since there were problems also with libiconv on MacOS recently, I added
> libiconv to the list of libraries that "--static-deps" builds.
>
> I'd be happy to get some feedback on this to see if it actually works for
> those who reported problems.
>
> http://codespeak.net/lxml/build.html#subversion
> http://codespeak.net/lxml/build.html#building-lxml-on-macos-x
>
> Note that the trunk builds as lxml version "2.3dev" now. Copying over the
> buildlibxml.py script to another lxml version should also work.
>
> http://codespeak.net/svn/lxml/trunk/buildlibxml.py
>
> Stefan
I copied the new 'buildlibxml.py' into a clean lxml-2.2.2 directory and
started a build with '-- static-deps'. Everything seems to work fine
(libiconv, libxml2, and libxslt build nicely) except form some minor
errors like:
- 'make[3]: [install-data-local] Error 71 (ignored)'
- 'make[2]: [xsltproc.html] Error 4 (ignored)'
Unfortunately -- after installing -- I get this ImportError on doing
'import lxml.etree':
> ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.5/
> lib/python2.5/site-packages/lxml-2.2.2-py2.5-macosx-10.3-ppc.egg/lxml/
> etree.so, 2): Symbol not found: _libiconv_close
>
> Referenced from: /Library/Frameworks/Python.framework/Versions/2.5/
> lib/python2.5/site-packages/lxml-2.2.2-py2.5-macosx-10.3-ppc.egg/lxml/
> etree.so
>
> Expected in: dynamic lookup
Speculating if during one of the last steps (lxml?) libiconv isn't
linked correctly? Where could I look for this?
As the problem is AFAICT only related to the PPC platform
(and if running MacOS X 10.4.x?), would it make sense to
build libiconv statically only
'if platform.processor() == 'powerpc' and major_version == 8:'?
This could possibly help avoid any side effects on Intel or 10.5
systems. Just a thought...
Thanks a lot!
Pascal
From lei at ipac.caltech.edu Mon Aug 24 22:45:27 2009
From: lei at ipac.caltech.edu (Mary Lei)
Date: Mon, 24 Aug 2009 13:45:27 -0700
Subject: [lxml-dev] a solution to file upload
Message-ID: <4A92FBE7.1040001@ipac.caltech.edu>
8/20/09 I posted the following:
I have to support form post.
Since I am using lxml for form
requests, how do I set up the file upload
part ?
I now have a solution for python 2.6:
1. I adopted Fabien SEISEN's
urllib2_file.py. When my code
detects that the form request is a file
upload request, instead of using
submit_form, my code switches to calling
urllib2.urlopen http handler in urllib2_file.py
and thus puts out the same multipart form/data
format for file upload as done via the web browser.
I did have to include an error handler for
urllib2_file.py:
# Special case for StringIO
try: <--- catch
if fd.__module__ in ("StringIO", "cStringIO"): <-- has no
__module__
name = k
fd.seek(0, 2) # EOF
file_size = fd.tell()
fd.seek(0) # START
else:
file_size = os.fstat(fd.fileno())[stat.ST_SIZE]
except AttributeError: <-- catch exception here
file_size = os.fstat(fd.fileno())[stat.ST_SIZE]
--
Mary Lei
Software Testing
IPAC-NExScl
Rm: KS-233
MS: 220-6
Phone: 395-1998
From ted at milo.com Mon Aug 24 23:23:15 2009
From: ted at milo.com (Ted Dziuba)
Date: Mon, 24 Aug 2009 14:23:15 -0700
Subject: [lxml-dev] a solution to file upload
In-Reply-To: <4A92FBE7.1040001@ipac.caltech.edu>
References: <4A92FBE7.1040001@ipac.caltech.edu>
Message-ID: <6451ccbf0908241423u6504c8b5qeeee8a8c3a561eb8@mail.gmail.com>
Have you looked at mechanize? It does almost all form automation you could
ever want. It parses content with BeautifulSoup, though, so parse trees may
look different than in lxml.
Ted
On Mon, Aug 24, 2009 at 1:45 PM, Mary Lei wrote:
> 8/20/09 I posted the following:
> I have to support form post.
> Since I am using lxml for form
> requests, how do I set up the file upload
> part ?
>
> I now have a solution for python 2.6:
> 1. I adopted Fabien SEISEN's
> urllib2_file.py. When my code
> detects that the form request is a file
> upload request, instead of using
> submit_form, my code switches to calling
> urllib2.urlopen http handler in urllib2_file.py
> and thus puts out the same multipart form/data
> format for file upload as done via the web browser.
> I did have to include an error handler for
> urllib2_file.py:
>
> # Special case for StringIO
> try: <--- catch
> if fd.__module__ in ("StringIO", "cStringIO"): <-- has no
> __module__
> name = k
> fd.seek(0, 2) # EOF
> file_size = fd.tell()
> fd.seek(0) # START
> else:
> file_size = os.fstat(fd.fileno())[stat.ST_SIZE]
> except AttributeError: <-- catch exception here
> file_size = os.fstat(fd.fileno())[stat.ST_SIZE]
>
> --
> Mary Lei
>
> Software Testing
> IPAC-NExScl
>
> Rm: KS-233
> MS: 220-6
> Phone: 395-1998
>
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
--
Ted Dziuba
Co-Founder and Engineer
Milo.com, Inc.
165 University Avenue
Palo Alto, CA, 94301
http://milo.com
Cell: (609)-665-2639
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090824/c6470646/attachment-0001.htm
From lei at ipac.caltech.edu Tue Aug 25 00:08:17 2009
From: lei at ipac.caltech.edu (Mary Lei)
Date: Mon, 24 Aug 2009 15:08:17 -0700
Subject: [lxml-dev] a solution to file upload
In-Reply-To: <6451ccbf0908241423u6504c8b5qeeee8a8c3a561eb8@mail.gmail.com>
References: <4A92FBE7.1040001@ipac.caltech.edu>
<6451ccbf0908241423u6504c8b5qeeee8a8c3a561eb8@mail.gmail.com>
Message-ID: <4A930F51.9040903@ipac.caltech.edu>
Thanks for the info.
There is ClientForm also.
But at this point in my project I
am not going to include too many packages
as I won't have a job in October.
The project manager is nervous about
being dependent on too many external packages
and I won't have adequate time
to research them out.
I am finding limitations with lxml
but so far things are working.
Ted Dziuba wrote:
> Have you looked at mechanize? It does almost all form automation you
> could ever want. It parses content with BeautifulSoup, though, so parse
> trees may look different than in lxml.
>
> Ted
>
> On Mon, Aug 24, 2009 at 1:45 PM, Mary Lei > wrote:
>
> 8/20/09 I posted the following:
> I have to support form post.
> Since I am using lxml for form
> requests, how do I set up the file upload
> part ?
>
> I now have a solution for python 2.6:
> 1. I adopted Fabien SEISEN's
> urllib2_file.py. When my code
> detects that the form request is a file
> upload request, instead of using
> submit_form, my code switches to calling
> urllib2.urlopen http handler in urllib2_file.py
> and thus puts out the same multipart form/data
> format for file upload as done via the web browser.
> I did have to include an error handler for
> urllib2_file.py:
>
> # Special case for StringIO
> try: <--- catch
> if fd.__module__ in ("StringIO", "cStringIO"): <-- has no
> __module__
> name = k
> fd.seek(0, 2) # EOF
> file_size = fd.tell()
> fd.seek(0) # START
> else:
> file_size = os.fstat(fd.fileno())[stat.ST_SIZE]
> except AttributeError: <-- catch exception here
> file_size = os.fstat(fd.fileno())[stat.ST_SIZE]
>
> --
> Mary Lei
>
> Software Testing
> IPAC-NExScl
>
> Rm: KS-233
> MS: 220-6
> Phone: 395-1998
>
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
>
>
>
> --
> Ted Dziuba
> Co-Founder and Engineer
>
> Milo.com, Inc.
> 165 University Avenue
> Palo Alto, CA, 94301
> http://milo.com
>
> Cell: (609)-665-2639
>
--
Mary Lei
Software Testing
IPAC-NExScl
Rm: KS-233
MS: 220-6
Phone: 395-1998
From ndudfield at gmail.com Tue Aug 25 01:46:32 2009
From: ndudfield at gmail.com (Nicholas Dudfield)
Date: Tue, 25 Aug 2009 09:46:32 +1000
Subject: [lxml-dev] getname() method on 'smart' attribute string values
In-Reply-To:
References:
Message-ID: <4A932658.9000408@gmail.com>
> You can give it a try on the trunk, if you like.
>
> https://codespeak.net/viewvc/?view=rev&revision=67010
>
> http://codespeak.net/lxml/build.html
>
> Stefan
>
I also have need for this functionality and also a bugfix from a
revision ahead of the stable version 2.2.2 available for windows.
I heard libxml2/lxml is a PITA to build on windows so being
inexperienced I'll not bother attempting it before ruling out alternatives.
Is there a build bot with dist zips of the latest revisions available
anywhere for windows ?
From philipp.reichmuth+gmane at gmail.com Fri Aug 28 16:22:35 2009
From: philipp.reichmuth+gmane at gmail.com (Philipp Reichmuth)
Date: Fri, 28 Aug 2009 16:22:35 +0200
Subject: [lxml-dev] Python 3.1 binary on Windows?
Message-ID: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net>
Hi,
maybe it's just me being stupid for overlooking something, but are there
Windows binaries built for Python 3.1 out there?
Philipp