How to build lxml from source ============================= To build lxml from source, you need libxml2 and libxslt properly installed, *including the header files*. These are likely shipped in separate ``-dev`` or ``-devel`` packages like ``libxml2-dev``, which you need to install. The build process also requires setuptools_. The lxml source distribution comes with a script called ``ez_setup.py`` that can be used to install them. .. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools .. contents:: .. 1 Pyrex 2 Subversion 3 Setuptools 4 Running the tests and reporting errors 5 Contributing an egg 6 Static linking on Windows 7 Building Debian packages from SVN sources Pyrex ----- The lxml.etree and lxml.objectify modules are written in Pyrex_. Since we distribute the Pyrex-generated .c files with lxml releases, however, you do not need Pyrex to build lxml from the normal release sources. .. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ If you are interested in building lxml from a Subversion checkout or want to be an lxml developer, you do need a working Pyrex installation. * lxml 1.1 and later Newer versions of lxml depend on features and bug fixes that are not yet available in an official Pyrex release. This includes support for the external C-API of lxml.etree, for Python 2.5 and for 64 bit architectures. To build lxml 1.1 and later from non-release or modified sources, you must therefore use an updated Pyrex version from here: http://codespeak.net/svn/lxml/pyrex/ A subversion checkout of lxml will automatically retrieve the latest Pyrex as external project source (``svn:externals``). Look for the ``Pyrex`` directory in the source tree. Since version 1.1.2, the lxml source distribution also includes this Pyrex version. It will be used if the ``Pyrex`` directory is available in the lxml root directory. If you install from SVN or delete this directory from the unpacked distribution directory, the normally installed Pyrex version will be used. * lxml 1.0 and earlier The 1.0 series build with a standard installation of Pyrex 0.9.4.1. Note that Pyrex up to and including version 0.9.4 has known problems when compiling lxml with gcc 4.x or Python 2.4. Do not use it. If you want to build lxml from non-release sources, please install Pyrex version 0.9.4.1 or later. Pyrex now supports EasyInstall_, so you can install it by running the following command as super-user:: easy_install Pyrex .. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall Subversion ---------- The lxml package is developed in a Subversion repository. You can retrieve the current developer version by calling:: svn co http://codespeak.net/svn/lxml/trunk lxml This will create a directory ``lxml`` and download the source into it. You can also `browse the repository through the web`_ or use your favourite SVN client to access it. .. _`browse the repository through the web`: http://codespeak.net/svn/lxml Setuptools ---------- Usually, building lxml is done through setuptools. Do a Subversion checkout (or download the source tar-ball and unpack it) and then type:: python setup.py build or:: python setup.py bdist_egg If you want to test lxml from the source directory, it is better to build it in-place like this:: python setup.py build_ext -i or, in Unix-like environments:: make If you get errors about missing header files (e.g. ``libxml/xmlversion.h``) then you need to make sure the development packages of both libxml2 and libxslt are properly installed. If this doesn't help, you may have to add the location of the header files to the include path like:: python setup.py build_ext -i -I /usr/include/libxml2 where the file is in ``/usr/include/libxml2/libxml/xmlversion.h`` To use lxml.etree in-place, you can place lxml's ``src`` directory on your Python module search path (PYTHONPATH) and then import ``lxml.etree`` to play with it:: # cd lxml # PYTHONPATH=src python Python 2.5.1 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>> To recompile after changes, note that you may have to run ``make clean`` or delete the file ``src/lxml/etree.c``. Distutils do not automatically pick up changes that affect files other than the main file ``src/lxml/etree.pyx``. Running the tests and reporting errors -------------------------------------- The source distribution (tgz) and the Subversion repository contain a test suite for lxml. You can run it from the top-level directory:: python test.py Note that the test script only tests the in-place build (see distutils building above), as it searches the ``src`` directory. You can use the following one-step command to trigger an in-place build and test it:: make test This also runs the ElementTree and cElementTree compatibility tests. To call them separately, make sure you have lxml on your PYTHONPATH first, then run:: python selftest.py and:: python selftest2.py If the tests give failures, errors, or worse, segmentation faults, we'd really like to know. Please contact us on the `mailing list`_, and please specify the version of lxml, libxml2, libxslt and Python you were using, as well as your operating system type (Linux, Windows, MacOs, ...). .. _`mailing list`: http://codespeak.net/mailman/listinfo/lxml-dev Contributing an egg ------------------- This is the procedure to make an lxml egg for your platform: * Download the lxml-x.y.tar.gz release. This contains the pregenerated C so that you don't run into any Pyrex issues. Unpack it and cd into it. * python setup.py build * If you're on a unixy platform, cd into ``build/lib.your.platform`` and strip any ``.so`` file you find there. This reduces the size of the egg considerably. * ``python setup.py bdist_egg upload`` The last 'upload' step only works if you have access to the lxml cheeseshop entry. If not, you can just make an egg with ``bdist_egg`` and mail it to the lxml maintainer. Providing newer library versions on Mac-OS X -------------------------------------------- The Unix environment in Mac-OS X makes it relatively easy to install Unix/Linux style package management tools and new software. However, it seems to be hard to get libraries set up for exclusive usage that Mac-OS X ships in an older version. The result can be segfaults on this platform that are hard to track down. To make sure the newer libxml2 and libxslt versions are used (e.g. under fink), you should add the directory where you installed the libraries to the ``DYLD_LIBRARY_PATH`` environment variable. This seems to fix a lot of problems for users. Alternatively, you can build lxml statically. A way to do this on MS Windows is described in the next section, but it should be easy to adapt it for Mac-OS. That way, you can always be sure you use the versions you compiled lxml with, regardless of the runtime environement. Static linking on Windows ------------------------- Most operating systems have proper package management that makes installing current versions of libxml2 and libxslt easy. The most famous exception is Microsoft Windows, which entirely lacks these capabilities. It can therefore be interesting to statically link the external libraries into lxml.etree to avoid having to install them separately. Download lxml and all required libraries to the same directory. The iconv, libxml2, libxslt, and zlib libraries are all available from the ftp site ftp://ftp.zlatkovic.com/pub/libxml/. Your directory should now have the following files in it (although most likely different versions):: iconv-1.9.1.win32.zip libxml2-2.6.23.win32.zip libxslt-1.1.15.win32.zip lxml-1.0.0.tgz zlib-1.2.3.win32.zip Now extract each of those files in the *same* directory. This should give you something like this:: iconv-1.9.1.win32/ iconv-1.9.1.win32.zip libxml2-2.6.23.win32/ libxml2-2.6.23.win32.zip libxslt-1.1.15.win32/ libxslt-1.1.15.win32.zip lxml-1.0.0/ lxml-1.0.0.tgz zlib-1.2.3.win32/ zlib-1.2.3.win32.zip Go to the lxml directory and edit the file ``setup.py``. There should be a section near the top that looks like this:: STATIC_INCLUDE_DIRS = [] STATIC_LIBRARY_DIRS = [] STATIC_CFLAGS = [] Change this section to something like this, but take care to use the correct version numbers:: STATIC_INCLUDE_DIRS = [ "..\\libxml2-2.6.23.win32\\include", "..\\libxslt-1.1.15.win32\\include", "..\\zlib-1.2.3.win32\\include", "..\\iconv-1.9.1.win32\\include" ] STATIC_LIBRARY_DIRS = [ "..\\libxml2-2.6.23.win32\\lib", "..\\libxslt-1.1.15.win32\\lib", "..\\zlib-1.2.3.win32\\lib", "..\\iconv-1.9.1.win32\\lib" ] STATIC_CFLAGS = [] Add any CFLAGS you might consider useful to the third list. As `Ashish Kulkarni`_ notes, you might have to add the standard Windows library ``wsock32.dll`` to the list of libraries to make ``lxml.objectify`` compile. .. _`Ashish Kulkarni`: http://codespeak.net/pipermail/lxml-dev/2006-September/001893.html Now you should be able to pass the ``--static`` option to setup.py and everything should work well. Try calling:: python setup.py bdist_wininst --static This will create a windows installer in the ``pkg`` directory. Building Debian packages from SVN sources ----------------------------------------- `Andreas Pakulat`_ proposed the following approach. .. _`Andreas Pakulat`: http://codespeak.net/pipermail/lxml-dev/2006-May/001254.html * ``apt-get source lxml`` * remove the unpacked directory * tar.gz the lxml SVN version and replace the orig.tar.gz that lies in the directory * check md5sum of created tar.gz file and place new sum and size in dsc file * do ``dpkg-source -x lxml-[VERSION].dsc`` and cd into the newly created directory * run ``dch -i`` and add a comment like "use trunk version", this will increase the debian version number so apt/dpkg won't get confused * run ``dpkg-buildpackage -rfakeroot -us -uc`` to build the package In case ``dpkg-buildpackage`` tells you that some dependecies are missing, you can either install them manually or run ``apt-get build-dep lxml``. That will give you .deb packages in the parent directory which can be installed using ``dpkg -i``.