From stefan_ml at behnel.de Sun Nov 4 19:43:21 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 04 Nov 2007 19:43:21 +0100
Subject: [lxml-dev] lxml trunk now requires Cython 0.9.6.8
Message-ID: <472E12C9.5010503@behnel.de>
Hi all,
the current trunk now requires Cython 0.9.6.8 to build, as will 2.0alpha5.
Many of the changes that lxml required in Pyrex (and that Cython provided)
have now gone back into the mainstream distribution - but different. It was
decided that Cython would follow these incompatible language changes, which
now required some minor changes in lxml.
This also means that external modules that use the C-API of lxml will need a
little adaption as the name of the public import function has changed from
"import_etree" to "import_lxml__etree". Everything else should work just as
before.
Note that lxml 1.3 will not be adapted. Future versions will continue to ship
with a patched Pyrex and will not build with Cython or future Pyrex versions.
Stefan
From ianb at colorstudy.com Mon Nov 5 03:12:07 2007
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 04 Nov 2007 20:12:07 -0600
Subject: [lxml-dev] Failing lxml.html tests
In-Reply-To: <4727A193.6000204@behnel.de>
References: <4720D829.6080507@behnel.de>
<47210B79.10502@colorstudy.com> <47218E76.8030506@behnel.de>
<472201D4.5010409@colorstudy.com> <4722EDAC.1070807@behnel.de>
<472751D4.20403@colorstudy.com> <4727A193.6000204@behnel.de>
Message-ID: <472E7BF7.2000604@colorstudy.com>
Stefan Behnel wrote:
> Ian Bicking wrote:
>> I made a new checkout, did python setup.py develop, and retested, and
>> the errors seem even weirder now. Many are for method, but there's a
>> bunch of others too (though still most pass).
>>
>> I attached the test output.
>
> Hmm, there really must be something wrong with your setup. You have Cython
> 0.9.6.7 installed, I assume? I only get three errors, all in the HTML tests.
> The first one is because one of the entries in _tag_link_attrs is a list, not
> sure about the others.
I only get one, the _tag_link_attrs issue, which I just fixed. It's
possible one of these weird errors is preventing another error from
occurring, though... I guess not, since all the errors I now get are in
lxml.tests.test_elementtree.
> Anyway, you can run the HTML tests by calling "test.py -vv html", that should
> get you over the failing tests for now. I'll see how far I get with a clean
> checkout myself. Have you tried importing etree by hand and checked if the
> failing methods work there?
They do work there (at least the method argument that I tested). So
it's just in the test environment where it's acting weird. Which is odd.
--
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
From stefan_ml at behnel.de Mon Nov 5 09:00:15 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 05 Nov 2007 09:00:15 +0100
Subject: [lxml-dev] Failing lxml.html tests
In-Reply-To: <472E7BF7.2000604@colorstudy.com>
References: <4720D829.6080507@behnel.de>
<47210B79.10502@colorstudy.com> <47218E76.8030506@behnel.de>
<472201D4.5010409@colorstudy.com> <4722EDAC.1070807@behnel.de>
<472751D4.20403@colorstudy.com> <4727A193.6000204@behnel.de>
<472E7BF7.2000604@colorstudy.com>
Message-ID: <472ECD8F.3000505@behnel.de>
Ian Bicking wrote:
> Stefan Behnel wrote:
>> I only get three errors, all in the HTML tests.
>> The first one is because one of the entries in _tag_link_attrs is a
>> list, not sure about the others.
>
> I only get one, the _tag_link_attrs issue, which I just fixed.
The others only occur with libxml2 2.6.29 and later. These versions handle the
"embed" tag as a special tag that does not need closing. However, a
parse-serialise-parse cycle for such HTML alters the document here: it omits
the closing tag and then reparses the following tags as children. So this is a
bug in libxml2. I'll report it there.
For the time being - maybe there's a way to work around that?
Stefan
From jlovell at esd189.org Thu Nov 8 01:21:28 2007
From: jlovell at esd189.org (John Lovell)
Date: Wed, 7 Nov 2007 16:21:28 -0800
Subject: [lxml-dev] XSD Validation: No matching global declaration.
Message-ID: <3A49C88789256B4AB33AC603DB6AF49B08E0DC@ZIRIA.esd189.org>
Hi All:
I don't know if my schema is invalid or if this represents a bug in lxml
or libxml2.
Here is the situation...
I have a XML Schema that looks like this:
...
...
I am trying to validate this data:
&1<tNk44}F2;pl}ee'-mlZm at JDl1uqg!ZE72n/l
The error log shows the following error:
../1.5r1/XSD/Infrastructure/Authentication__Authentication.xml:2:ERROR:S
CHEMASV:SCHEMAV_CVC_ELT_1: Element 'Authentication': No matching global
declaration available for the validation root.
The same code given diffrent files works fine. Does anyone have any
ideas why I am getting this message?
Required version information:
lxml.etree: (1, 3, 4, 0)
libxml used: (2, 6, 30)
libxml compiled: (2, 6, 30)
libxslt used: (1, 1, 21)
libxslt compiled: (1, 1, 22)
Thanks,
John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
www.esd189.org
Together We Can ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20071107/2a250fd3/attachment-0001.htm
From stefan_ml at behnel.de Thu Nov 8 07:55:45 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 08 Nov 2007 07:55:45 +0100
Subject: [lxml-dev] XSD Validation: No matching global declaration.
In-Reply-To: <3A49C88789256B4AB33AC603DB6AF49B08E0DC@ZIRIA.esd189.org>
References: <3A49C88789256B4AB33AC603DB6AF49B08E0DC@ZIRIA.esd189.org>
Message-ID: <4732B2F1.1080802@behnel.de>
John Lovell wrote:
> I don't know if my schema is invalid or if this represents a bug in lxml
> or libxml2.
> The error log shows the following error:
>
> ../1.5r1/XSD/Infrastructure/Authentication__Authentication.xml:2:ERROR:S
> CHEMASV:SCHEMAV_CVC_ELT_1: Element 'Authentication': No matching global
> declaration available for the validation root.
I'm not so firm with libxml2's XML Schema implementation, not sure what that
means exactly. Is it the only error you get in the log?
> The same code given diffrent files works fine. Does anyone have any
> ideas why I am getting this message?
You can check if "xmllint" (which is the command line tool that comes with
libxml2) produces the same error. It is actually unlikely that this is an lxml
problem, so this would tell you if libxml2 really thinks that your file is
invalid. In that case, you can ask on the libxml2 mailing list instead.
Maybe you can also check with a different tool, that might give you more hints
on what is wrong here.
Stefan
From jholg at gmx.de Thu Nov 8 09:19:39 2007
From: jholg at gmx.de (jholg at gmx.de)
Date: Thu, 08 Nov 2007 09:19:39 +0100
Subject: [lxml-dev] XSD Validation: No matching global declaration.
In-Reply-To: <3A49C88789256B4AB33AC603DB6AF49B08E0DC@ZIRIA.esd189.org>
References: <3A49C88789256B4AB33AC603DB6AF49B08E0DC@ZIRIA.esd189.org>
Message-ID: <20071108081939.44610@gmx.net>
Hi,
> I have a XML Schema that looks like this:
>
> xmlns="http://www.w3.org/2001/XMLSchema"
> xmlns:sif="http://www.sifinfo.org/infrastructure/1.x"
> targetNamespace="http://www.sifinfo.org/infrastructure/1.x">
>
> schemaLocation="http://www.w3.org/2001/xml.xsd"/>
>
> ...
>
>
>
> ...
>
>
>
> I am trying to validate this data:
>
>
> ns3:RefId="27D1CAEA85C2BAA647A01B551D21E1EB"
> ns3:SifRefId="211242238C60A55E25B2B86BB337C244"
> ns3:SifRefIdType="EmployeePersonal">
Shouldn't be from the namespace "http://www.sifinfo.org/infrastructure/1.x"
in the instance document?
How do other instance documents that validate fine differ?
Holger
--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
From jlovell at esd189.org Thu Nov 8 18:22:35 2007
From: jlovell at esd189.org (John Lovell)
Date: Thu, 8 Nov 2007 09:22:35 -0800
Subject: [lxml-dev] XSD Validation: No matching global declaration.
In-Reply-To: <4732B2F1.1080802@behnel.de>
References: <3A49C88789256B4AB33AC603DB6AF49B08E0DC@ZIRIA.esd189.org>
<4732B2F1.1080802@behnel.de>
Message-ID: <3A49C88789256B4AB33AC603DB6AF49B08E0DD@ZIRIA.esd189.org>
>John Lovell wrote:
>> I don't know if my schema is invalid or if this represents a bug in
>> lxml or libxml2.
>> The error log shows the following error:
>>
>>
../1.5r1/XSD/Infrastructure/Authentication__Authentication.xml:2:ERROR
>> :S
>> CHEMASV:SCHEMAV_CVC_ELT_1: Element 'Authentication': No matching
>> global declaration available for the validation root.
>I'm not so firm with libxml2's XML Schema implementation, not sure what
>that means exactly. Is it the only error you get in the log?
I get just one similar error in the log for every file I try to validate
with this schema. Only the elements name changes.
>> The same code given diffrent files works fine. Does anyone have any
>> ideas why I am getting this message?
>You can check if "xmllint" (which is the command line tool that comes
with
>libxml2) produces the same error. It is actually unlikely that this is
an
>lxml problem, so this would tell you if libxml2 really thinks that your
>file is invalid. In that case, you can ask on the libxml2 mailing list
>instead.
Okay here we go...
jlovell at esd189-10545:~/SIF Toolkit/Data Generator$ xmllint --schema
combined.txt
../1.5r1/XSD/Infrastructure/Authentication__Authentication.xml
&1<tNk44}F2;pl}ee'-mlZm at JDl1uqg!ZE72n/l
../1.5r1/XSD/Infrastructure/Authentication__Authentication.xml:2:
element Authentication: Schemas validity error : Element
'Authentication': No matching global declaration available for the
validation root.
../1.5r1/XSD/Infrastructure/Authentication__Authentication.xml fails to
validate
You are right it isn't an lxml problem! Sorry, but I had to start
somewhere.
>Maybe you can also check with a different tool, that might give you
more
>hints on what is wrong here.
Good idea here is the output from: http://www.xmlme.com/Validator.aspx
Validation Results:
Schema Error: System.Xml.Schema.XmlSchemaException: The targetNamespace
parameter '' should be the same value as the targetNamespace
'http://www.sifinfo.org/infrastructure/1.x' of the schema. at
System.Xml.Schema.BaseProcessor.SendValidationEvent(XmlSchemaException
e, XmlSeverityType severity) at
System.Xml.Schema.SchemaCollectionPreprocessor.Preprocess(XmlSchema
schema, String targetNamespace, Compositor compositor) at
System.Xml.Schema.SchemaCollectionPreprocessor.Execute(XmlSchema schema,
String targetNamespace, Boolean loadExternals, XmlSchemaCollection xsc)
at System.Xml.Schema.XmlSchema.CompileSchema(XmlSchemaCollection xsc,
XmlResolver resolver, SchemaInfo schemaInfo, String ns,
ValidationEventHandler validationEventHandler, XmlNameTable nameTable,
Boolean CompileContentModel) at
System.Xml.Schema.XmlSchemaCollection.Add(String ns, SchemaInfo
schemaInfo, XmlSchema schema, Boolean compile, XmlResolver resolver) at
System.Xml.Schema.XmlSchemaCollection.Add(String ns, XmlReader reader,
XmlResolver resolver) at
System.Xml.Schema.XmlSchemaCollection.Add(String ns, XmlReader reader)
at Validator.Button1_Click(Object sender, EventArgs e)
Okay, so it is me. At this point, which mailing list should I be
bothering? If you know how I can set the targetNamespace of my document
to match my schema, please email me but consider carefully if it is
appropriate to copy this list.
Thanks for all your help,
John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
www.esd189.org
Together We Can ...
From jlovell at esd189.org Thu Nov 8 18:29:16 2007
From: jlovell at esd189.org (John Lovell)
Date: Thu, 8 Nov 2007 09:29:16 -0800
Subject: [lxml-dev] XSD Validation: No matching global declaration.
In-Reply-To: <20071108081939.44610@gmx.net>
References: <3A49C88789256B4AB33AC603DB6AF49B08E0DC@ZIRIA.esd189.org>
<20071108081939.44610@gmx.net>
Message-ID: <3A49C88789256B4AB33AC603DB6AF49B08E0DE@ZIRIA.esd189.org>
>Hi,
>> I have a XML Schema that looks like this:
>>
>> > xmlns="http://www.w3.org/2001/XMLSchema"
>> xmlns:sif="http://www.sifinfo.org/infrastructure/1.x"
>> targetNamespace="http://www.sifinfo.org/infrastructure/1.x">
>>
>> > schemaLocation="http://www.w3.org/2001/xml.xsd"/>
>>
>> ...
>>
>>
>>
>> ...
>>
>>
>>
>> I am trying to validate this data:
>>
>>
>> > ns3:RefId="27D1CAEA85C2BAA647A01B551D21E1EB"
>> ns3:SifRefId="211242238C60A55E25B2B86BB337C244"
>> ns3:SifRefIdType="EmployeePersonal">
>Shouldn't be from the namespace
"http://www.sifinfo.org/infrastructure/1.x"
>in the instance document?
You are probably right (see my last post). However, I have been unable
to figure out how to do that. I just received the O'Reilly XML Schema
book so that should change. Although, if you would like to clue me in,
that would be great.
>How do other instance documents that validate fine differ?
Drastically, one is the very simple PO example while the schema I'm
working with is about a half a megabyte.
John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
www.esd189.org
Together We Can ...
From jlovell at esd189.org Thu Nov 8 19:23:08 2007
From: jlovell at esd189.org (John Lovell)
Date: Thu, 8 Nov 2007 10:23:08 -0800
Subject: [lxml-dev] XSD Validation: No matching global declaration.
In-Reply-To: <3A49C88789256B4AB33AC603DB6AF49B08E0DE@ZIRIA.esd189.org>
References: <3A49C88789256B4AB33AC603DB6AF49B08E0DC@ZIRIA.esd189.org><20071108081939.44610@gmx.net>
<3A49C88789256B4AB33AC603DB6AF49B08E0DE@ZIRIA.esd189.org>
Message-ID: <3A49C88789256B4AB33AC603DB6AF49B08E0E0@ZIRIA.esd189.org>
Thanks to Holger and Stefan:
Holger was right on, my problem was with my instance document not the
schema.
For anyone this might help, here are my new results:
jlovell at esd189-10545:~/SIF Toolkit/Data Generator$ xmllint --schema
combined.txt
../1.5r1/XSD/Infrastructure/Authentication__Authentication.xml
&1<tNk44}F2;pl}ee'-mlZm at JDl1uqg!ZE72n/l
../1.5r1/XSD/Infrastructure/Authentication__Authentication.xml validates
Thanks,
John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
www.esd189.org
Together We Can ...
From jholg at gmx.de Tue Nov 13 14:28:57 2007
From: jholg at gmx.de (jholg at gmx.de)
Date: Tue, 13 Nov 2007 14:28:57 +0100
Subject: [lxml-dev] 2.0 release plan, any news?
Message-ID: <20071113132857.214520@gmx.net>
Hi,
any plans on releasing another 2.0 alpha-cycle, or even going into Beta phase?
Just asking, as I'm about to package an lxml-based app by the end of week. I've no real issues with using an alpha-snapshot as this runs smoothly, but of course would still prefer to ship an official release.
Cheers,
Holger
--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
From alain.poirier at net-ng.com Wed Nov 14 18:15:50 2007
From: alain.poirier at net-ng.com (Alain Poirier)
Date: Wed, 14 Nov 2007 18:15:50 +0100
Subject: [lxml-dev] Pb with namespaces on attributs
Message-ID: <200711141815.51034.alain.poirier@net-ng.com>
An attribut of a node, inserted then removed from a tree, can lose its
namespace :
>>> from lxml import etree as ET
>>> parent = ET.Element('parent')
>>> parent.set('{http://foo/bar}x', 'a')
>>> child = ET.SubElement(parent, 'child')
>>> child.set('{http://foo/bar}x', 'b')
>>> print ET.tostring(child), child.nsmap
{'ns0': 'http://foo/bar'}
>>> child = parent[0]
>>> parent.clear()
>>> print ET.tostring(child), child.nsmap
{}
It happends when the tree already knows the namespace used.
A bug in lxml or in libxml2 (tested with lxml-2.0alpha4 and libxml2 2.6.28 /
2.6.30) ?
From stefan_ml at behnel.de Wed Nov 14 23:05:18 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 14 Nov 2007 23:05:18 +0100
Subject: [lxml-dev] 2.0 release plan, any news?
In-Reply-To: <20071113132857.214520@gmx.net>
References: <20071113132857.214520@gmx.net>
Message-ID: <473B711E.70103@behnel.de>
Hi Holger,
jholg at gmx.de wrote:
> any plans on releasing another 2.0 alpha-cycle, or even going into Beta
> phase?
there will be at least one additional alpha release, preferably soon. I'm
still struggling with the iterparse() API - not sure the "pass a parser
instead of kw args" bit will work out...
> Just asking, as I'm about to package an lxml-based app by the end of week.
> I've no real issues with using an alpha-snapshot as this runs smoothly, but
> of course would still prefer to ship an official release.
Not sure I can come up with something till this week-end, so I'd suggest
shipping with a trunk version for now. Also, if I manage to reimplement
iterparse(), it might not yet be completely stable in the next release, so
you're probably better off with the current trunk anyway.
Stefan
From stefan_ml at behnel.de Wed Nov 14 23:32:34 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 14 Nov 2007 23:32:34 +0100
Subject: [lxml-dev] Pb with namespaces on attributs
In-Reply-To: <200711141815.51034.alain.poirier@net-ng.com>
References: <200711141815.51034.alain.poirier@net-ng.com>
Message-ID: <473B7782.8000502@behnel.de>
Alain Poirier wrote:
> An attribut of a node, inserted then removed from a tree, can lose its
> namespace :
>
>>>> from lxml import etree as ET
>>>> parent = ET.Element('parent')
>>>> parent.set('{http://foo/bar}x', 'a')
>>>> child = ET.SubElement(parent, 'child')
>>>> child.set('{http://foo/bar}x', 'b')
>
>>>> print ET.tostring(child), child.nsmap
> {'ns0': 'http://foo/bar'}
>
>>>> child = parent[0]
>>>> parent.clear()
>>>> print ET.tostring(child), child.nsmap
> {}
>
> It happends when the tree already knows the namespace used.
Thanks for the report. This is a serialisation bug that was already fixed in
lxml 1.3.5 and on the current SVN trunk. The fix will be in 2.0alpha5.
Stefan
From pf at pfhawkins.com Sat Nov 17 19:48:10 2007
From: pf at pfhawkins.com (P.F. Hawkins)
Date: Sat, 17 Nov 2007 13:48:10 -0500
Subject: [lxml-dev] easy_install issues
Message-ID: <87zlxc674l.fsf@pfhawkins.com>
I'm having trouble installing any version of lxml higher than 1.3.3. The error
says that easy_install can't find a proper version of dateutil, even though I
have a high enough version of "python-dateutil" installed on this system. I
assume that the issue has to do with "dateutil" vs. "python-dateutil", but I'm
not sure.
FWIW, I'm running ubuntu gutsy.
Thanks in advance!
P. F. Hawkins
--
Figuring things out, one word at a time.
From stefan_ml at behnel.de Fri Nov 23 12:31:47 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 23 Nov 2007 12:31:47 +0100
Subject: [lxml-dev] TreeBuilder implementation in lxml.etree
Message-ID: <4746BA23.1080603@behnel.de>
Hi all,
lxml.etree now has an ET compatible TreeBuilder class that is integrated into
the parser framework, i.e. you can create a parser with "target=TreeBuilder()"
and have it build a tree for you just the way ET does. Or, you can create a
TreeBuilder instance and call the event methods yourself to get the same
effect without parsing.
There is one little difference to ET: the start() method has the following
signature:
def start(self, tag, attrs, nsmap=None):
whereas in ET it's
def start(self, tag, attrs):
so lxml.etree accepts an additional "nsmap" argument here. This is required as
lxml's parser would otherwise loose the namespace prefix mappings, so the
generated trees would declare namespaces wherever a tag uses them first in the
hierarchy and not where they were originally declared in the parsed document
(which usually means the root element). Also, this supports prefixed text
values that refer to declared namespaces (see for example the QName class).
If you want to write code that subclasses the TreeBuilder and that should
still work with both ET and lxml.etree, you should use the above signature.
There is a bit of code that tries to figure out if the method can be called
with three arguments, but I'm not sure it works in all cases. It's essentially
this:
import inspect
arguments = inspect.getargspec(target.start)
if len(arguments[0]) > 3: # self + 3 arguments
takes_nsmap = True
elif arguments[1] is not None: # '*args' parameter
takes_nsmap = True
else:
takes_nsmap = False
Hope this is useful,
Stefan
From stefan_ml at behnel.de Sat Nov 24 12:51:34 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 24 Nov 2007 12:51:34 +0100
Subject: [lxml-dev] lxml 2.0alpha5 released
Message-ID: <47481046.70907@behnel.de>
Hi all,
lxml 2.0alpha5 made it to PyPI. This is (hopefully) the last alpha in the
pre-2.0 series, so please report any remaining API quirks, weirdnesses and
bugs now to make sure they get fixed before 2.0 gets its API freeze during the
beta cycle. If all works out well, there should not be more than one beta
release before the final version.
This release features a major overhaul of the target parser, including an
internal SAX parser framework and an ET compatible TreeBuilder implementation.
The complete Changelog follows below.
Note that the API now enforces keyword-only arguments in a couple of places.
This can require some syntactic changes in existing code.
Have fun,
Stefan
2.0alpha5 (2007-11-24)
Features added
* Rich comparison of element.attrib proxies.
* ElementTree compatible TreeBuilder class.
* Use default prefixes for some common XML namespaces.
* lxml.html.clean.Cleaner now allows for a host_whitelist, and two
overridable methods: allow_embedded_url(el, url) and the more general
allow_element(el).
* Extended slicing of Elements as in element[1:-1:2], both in etree and in
objectify
* Resolvers can now provide a base_url keyword argument when resolving a
document as string data.
* When using lxml.doctestcompare you can give the doctest option
NOPARSE_MARKUP (like # doctest: +NOPARSE_MARKUP) to suppress the special
checking for one test.
Bugs fixed
* Target parser failed to report comments.
* In the lxml.html iter_links() method, links in