[lxml-dev] build & performance issues with 2.0beta2
Stefan Behnel
stefan_ml at behnel.de
Thu Jan 31 17:31:33 CET 2008
Hi Holger,
thanks for the report.
jholg at gmx.de wrote:
> but I'm having some problems with 2.0beta2:
>
> First of all, it does not build any more using gcc 2.95.2, yes I know,
> might old compiler...then again, Cython produces C code, not funky C++ stuff (2.0alpha-r47832 still built without problems). This is the error I get:
>
> gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I/apps/prod//include -I/apps/prod//include/libxml2 -I/apps/prod/include/libxml2 -I/apps/prod/include -I/apps/pydev/hjoukl/include/python2.4 -c src/lxml/lxml.etree.c -o build/temp.solaris-2.8-sun4u-2.4/src/lxml/lxml.etree.o -w
> src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsLongLong':
> src/lxml/lxml.etree.c:110165: parse error before `long'
> src/lxml/lxml.etree.c:110167: `val' undeclared (first use in this function)
> src/lxml/lxml.etree.c:110167: (Each undeclared identifier is reported only once
> src/lxml/lxml.etree.c:110167: for each function it appears in.)
> src/lxml/lxml.etree.c: In function `__pyx_PyInt_AsUnsignedLongLong':
> src/lxml/lxml.etree.c:110185: parse error before `long'
> src/lxml/lxml.etree.c:110187: `val' undeclared (first use in this function)
> error: command 'gcc' failed with exit status 1
Hmm, guess that's something to fix in Cython. The *LongLong() functions are a
recent addition for safe type conversion.
The line numbers above differ from mine, though. Could you send me the source
code of the lines that failed here?
> Now, when I switch to use gcc 3.4.4 I can build successfully, but:
>
> 0 lb54320 at adevp02 .../lxml-2.0beta2 $ LD_LIBRARY_PATH=/apps/prod/gcc/3.4.4/lib python2.4 test.py -p -v '' '!test_schematron_invalid*'
> /data/pydev/DOWNLOADS/LXML/lxml/versions/SVN_CHECKOUTS/TAGS/lxml-2.0beta2/src/lxml/html/__init__.py:22: UserWarning: This version of libxml2 has a known XPath bug. Use it at your own risk.
> _rel_links_xpath = etree.XPath("descendant-or-self::a[@rel]")
>
> TESTED VERSION: 2.0.beta2-51091
> Python: (2, 4, 4, 'final', 0)
> lxml.etree: (2, 0, -98, 51091)
> libxml used: (2, 6, 27)
> libxml compiled: (2, 6, 27)
> libxslt used: (1, 1, 20)
> libxslt compiled: (1, 1, 20)
>
> 855/855 (100.0%): Doctest: xpathxslt.txt
> ======================================================================
> FAIL: Doctest: validation.txt
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/apps/prod//lib/python2.4/unittest.py", line 260, in run
> testMethod()
> File "/apps/prod//lib/python2.4/doctest.py", line 2157, in runTest
> raise self.failureException(self.format_failure(new.getvalue()))
> AssertionError: Failed doctest test for validation.txt
> File "/data/pydev/DOWNLOADS/LXML/lxml/versions/SVN_CHECKOUTS/TAGS/lxml-2.0beta2/src/lxml/tests/../../../doc/validation.txt", line 0
>
> ----------------------------------------------------------------------
> File "/data/pydev/DOWNLOADS/LXML/lxml/versions/SVN_CHECKOUTS/TAGS/lxml-2.0beta2/src/lxml/tests/../../../doc/validation.txt", line 113, in validation.txt
> Failed example:
> dtd = etree.DTD(external_id = docbook) # requires catalog support
> Exception raised:
> Traceback (most recent call last):
> File "/apps/prod//lib/python2.4/doctest.py", line 1248, in __run
> compileflags, 1) in test.globs
> File "<doctest validation.txt[16]>", line 1, in ?
> dtd = etree.DTD(external_id = docbook) # requires catalog support
> File "dtd.pxi", line 50, in lxml.etree.DTD.__init__
> DTDParseError: failed to load external entity "-//OASIS//DTD DocBook XML V4.2//EN"
[...]
It seems to lack catalog support. I thought about adding that test or not.
Looks like it's better to leave it out.
> ----------------------------------------------------------------------
> Ran 855 tests in 37.860s
>
> Compared to 2.0alpha (I rebuilt that also with gcc 3.4.4):
>
> ----------------------------------------------------------------------
> Ran 824 tests in 2.698s
>
> So basically performance drops by factor >10 for me, on a Sparc Solaris 8 box, python2.4, gcc 3.4.4.
> I haven't yet looked into the failing tests.
I wouldn't dare to compare the numbers here, given a difference of 30 tests
(especially not knowing which ones are missing). If you get errors, it
naturally takes (a bit) longer. Also, it seems to run much less tests, so I
guess you either do not have ElementTree installed for the compat tests
(though I actually think that's the case for both runs), or it just takes
longer to search the (non-existing) catalogs, or ...
If you want real numbers, you should rather run the benchmarks.
> Remarks: I currently disable the schematron tests because some of them dump core with my setup.
Hmm, now that Cython supports compile time conditional compilation, maybe we
should use that in a couple of places...
Stefan
More information about the lxml-dev
mailing list