[lxml-dev] Python XML Validator
Stefan Behnel
stefan_ml at behnel.de
Wed Mar 12 18:55:20 CET 2008
Hi,
moving this here from python-dev (where it started for whatever reason...)
Mike Meyer wrote:
> On Tue, 11 Mar 2008 18:01:29 +0100 Stefan Behnel wrote:
>> BTW, we had MacOS builds a while ago, so I wouldn't mind having someone
>> volunteer to contribute builds on a regular basis (static builds preferred).
>
> For which Python build? python.org? Activatestate? Leopard? Macports?
> Fink? pkgsrc? Any idea if a single build will work for all of them?
I have no idea. At the very least, different Python major versions will pose a
problem. And I guess the builds provided by package distributions like fink
and macports will also require newer dependencies on other ends, or be built
with newer compilers...
>>> The second time for OS-X, I used an older version of lxml (1.3.6), and
>>> just did "setup.py install". Worked like a charm. That's not hard.
>> Interesting. 1.3.6 should also require libxml2 2.6.20 (although maybe less
>> strictly than 2.0).
>
> I just grabbed it and tried parsing thing with it; I didn't try the
> advanced features that I depend on lxml for (rng validation and lots
> of xpath), or what the OP was looking for (validation). Running the
> test.py suite turns has one failure:
>
> File "/Users/mwm/lxml-1.3.6/src/lxml/tests/../../../doc/parsing.txt", line 369, in parsing.txt
> Failed example:
> etree.tounicode(root)
> Expected:
> u'<test> \uf8d1 + \uf8d2 </test>'
> Got:
> u'<test>  +  </test>'
If that's the only problem, then 1.3.x works 'acceptably' with 2.6.16 - except
that newer versions are much better in parsing HTML and validating with XML
schema (amongst other things). Note that the test suite tends to avoid testing
features that only depend on libxml2, and especially stuff that has changed
between library versions. It's a test suite for lxml, not for libxml2.
However, 2.0 will not work that easily. Things like parse-time schema
validation and schematron support do not work on versions below 2.6.20 (or
actually 2.6.21, but we disable schematron on 2.6.20). We might be able to
work around some more stuff by spreading some #ifdef's and #defines, but so
far, I find it perfectly acceptable if 2.0 requires newer dependencies for new
features. People who care about reliability will not use libraries as old as
2.6.16 anyway. The list of fixed bug only gets longer with newer versions.
>>>>> Which means you wind up having to
>>>>> build those yourself if you want a recent version of lxml, even if
>>>>> you're using a system that includes lxml in it's package system.
>>>> If you want a clean system, e.g. for production use, buildout has proven to be
>>>> a good idea. And we also provide pretty good instructions on our web page on
>>>> how to install lxml on MacOS-X and what to take care of.
>>> Yes, but the proposal was to include it in the Python standard
>>> library. Software that doesn't work on popular target platforms
>>> without updating a standard system library isn't really suitable for
>>> that.
>> Hmm, coming somewhat back on-topic: how does Python currently handle its
>> dependencies under MacOS-X? SQLite, for example? Does it use system libraries
>> only, or are there libraries it ships with? (The MacOS distro is much bigger,
>> but that might be due to the universal build - although that suggests that
>> MacOS-X users do not care about disk space or download size anyway)
>
> For most of them, it checks for the existence of the libraries and
> header files for those packages, and then builds the wrapper libraries
> if it finds their requirements. Look through the 2.5.2 setup.py for
> how sqlite3 is handled (it's a bit much to include here).
Funny, looking for the sqlite setup was actually a good idea. It does all
sorts of things to figure out a good one to use, specifically on MacOS-X.
There even appears to be some trickery to take the first library it finds,
static or dynamic, instead of continuing to look for a dynlib.
I wouldn't mind adding a similar setup to lxml's setupinfo.py. Maybe someone
can get a hand on this? It would be great to have an automatic static build on
MacOS, so that people could just run setup.py and be sure it uses the expected
libs the next time they use it.
Is there a standard directory prefix where macport & Co. install libraries and
related stuff like xslt-config?
Stefan
More information about the lxml-dev
mailing list