[Lxml-checkins] r51576 - lxml/branch/lxml-2.0/doc

scoder at codespeak.net scoder at codespeak.net
Mon Feb 18 11:19:44 CET 2008


Author: scoder
Date: Mon Feb 18 11:19:44 2008
New Revision: 51576

Modified:
   lxml/branch/lxml-2.0/doc/lxml-source-howto.txt
Log:
trunk merge

Modified: lxml/branch/lxml-2.0/doc/lxml-source-howto.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/lxml-source-howto.txt	(original)
+++ lxml/branch/lxml-2.0/doc/lxml-source-howto.txt	Mon Feb 18 11:19:44 2008
@@ -16,9 +16,19 @@
 .. _lxml: http://codespeak.net/lxml
 .. _`how to build lxml from sources`: build.html
 .. _`ReStructured Text`: http://docutils.sourceforge.net/rst.html
+.. _epydoc: http://epydoc.sourceforge.net/
+.. _docutils: http://docutils.sourceforge.net/
+.. _`C-level API`: capi.html
 
 .. contents::
 ..
+   1  What is Cython?
+   2  Where to start?
+     2.1  Concepts
+     2.2  The documentation
+   3  lxml.etree
+   4  lxml.objectify
+   5  lxml.html
 
 
 What is Cython?
@@ -29,21 +39,23 @@
 
 Cython_ is the language that lxml is written in.  It is a very
 Python-like language that was specifically designed for writing Python
-extension modules.  The language is so close to Python that the Cython
-compiler can actually compile many, many Python programs to C without
-major modifications.  But the real speed gains of a C compilation come
-from type annotations that were added to the language and that allow
-Cython to generate very efficient C code.
+extension modules.
 
 The reason why Cython (or actually its predecessor Pyrex_ at the time)
 was chosen as an implementation language for lxml, is that it makes it
 very easy to interface with both the Python world and external C code.
 Cython generates all the necessary glue code for the Python API,
-including Python types and reference counting for Python objects.
-Calling into C code is not more than declaring the signature of the
-function and maybe some variables as being C types, pointers or
-structs, and then calling it.  The rest of the code is just plain
-Python code.
+including Python types, calling conventions and reference counting.
+On the other side of the table, calling into C code is not more than
+declaring the signature of the function and maybe some variables as
+being C types, pointers or structs, and then calling it.  The rest of
+the code is just plain Python code.
+
+The Cython language is so close to Python that the Cython compiler can
+actually compile many, many Python programs to C without major
+modifications.  But the real speed gains of a C compilation come from
+type annotations that were added to the language and that allow Cython
+to generate very efficient C code.
 
 Even if you are not familiar with Cython, you should keep in mind that
 a slow implementation of a feature is better than none.  So, if you
@@ -56,7 +68,7 @@
 Where to start?
 ===============
 
-First of all, read `how to build lxml from sources` to learn how to
+First of all, read `how to build lxml from sources`_ to learn how to
 retrieve the source code from the Subversion repository and how to
 build it.  The source code lives in the subdirectory ``src`` of the
 checkout.
@@ -65,21 +77,12 @@
 ``lxml.objectify``.  All main modules have the file extension
 ``.pyx``, which shows the descendence from Pyrex.  As usual in Python,
 the main files start with a short description and a couple of imports.
-Cython destinguishes between the run-time ``import`` statement (as
+Cython distinguishes between the run-time ``import`` statement (as
 known from Python) and the compile-time ``cimport`` statement, which
 imports C declarations, either from external libraries or from other
 Cython modules.
 
 
-The documentation
------------------
-
-* docs in ``doc`` directory
-* `ReStructured Text`_ format
-* generated through ``mkhtml.py`` script
-* ...
-
-
 Concepts
 --------
 
@@ -88,6 +91,38 @@
 * ...
 
 
+The documentation
+-----------------
+
+An important part of lxml is the documentation that lives in the
+``doc`` directory.  It describes a large part of the API and comprises
+a lot of example code in the form of doctests.
+
+The documentation is written in the `ReStructured Text`_ format, a
+very powerful text markup language that looks almost like plain text.
+It is part of the docutils_ package.
+
+The project web site of lxml_ is completely generated from these text
+documents.  Even the side menu is just collected from the table of
+contents that the ReST processor writes into each HTML page.
+Obviously, we use lxml for this.
+
+The easiest way to generate the HTML pages is by calling::
+
+    make html
+
+This will call the script ``doc/mkhtml.py`` to run the ReST processor
+on the files.  After generating an HTML page the script parses it back
+in to build the side menu, and injects the complete menu into each
+page at the very end.
+
+Running the ``make`` command will also generate the API documentation
+if you have epydoc_ installed.  The epydoc package will import and
+introspect the extension modules and also introspect and parse the
+Python modules of lxml.  The aggregated information will then be
+written out into an HTML documentation site.
+
+
 lxml.etree
 ==========
 
@@ -104,14 +139,7 @@
 
 The main include files are:
 
-proxy.pxi:
-
-    Very low-level functions for memory allocation/deallocation
-    and Element proxy handling.  Ignoring this for the beginning
-    will keep your head from exploding.
-
-apihelpers.pxi:
-
+apihelpers.pxi
     Private C helper functions.  Most of the little functions that are
     used all over the place are defined here.  This includes things
     like reading out the text content of a libxml2 tree node, checking
@@ -120,77 +148,112 @@
     should keep these functions in the back of your head, as they will
     definitely make your life easier.
 
-xmlerror.pxi:
-
-    Error log handling.  All error messages that libxml2 generates
-    internally walk through the code in this file to end up in lxml's
-    Python level error logs.
-
-    At the end of the file, you will find a long list of named error
-    codes.  It is generated from the libxml2 HTML documentation (using
-    lxml, of course).  See the script ``update-error-constants.py``
-    for this.
-
-classlookup.pxi:
-
+classlookup.pxi
     Element class lookup mechanisms.  The main API and engines for
     those who want to define custom Element classes and inject them
     into lxml.
 
-nsclasses.pxi:
+docloader.pxi
+    Support for custom document loaders.  Base class and registry for
+    custom document resolvers.
+
+extensions.pxi
+    Infrastructure for extension functions in XPath/XSLT, including
+    XPath value conversion and function registration.
+
+iterparse.pxi
+    Incremental XML parsing.  An iterator class that builds iterparse
+    events while parsing.
 
+nsclasses.pxi
     Namespace implementation and registry.  The registry and engine
     for Element classes that use the ElementNamespaceClassLookup
     scheme.
 
-docloader.pxi:
-
-    Support for custom document loaders.  Base class and registry for
-    custom document resolvers.
-
-parser.pxi:
-
+parser.pxi
     Parsers for XML and HTML.  This is the main parser engine.  It's
     the reason why you can parse a document from various sources in
     two lines of Python code.  It's definitely not the right place to
     start reading lxml's soure code.
 
-parsertarget.pxi:
+parsertarget.pxi
+    An ElementTree compatible parser target implementation based on
+    the SAX2 interface of libxml2.
 
-    ET Parser target.
+proxy.pxi
+    Very low-level functions for memory allocation/deallocation
+    and Element proxy handling.  Ignoring this for the beginning
+    will safe your head from exploding.
 
-serializer.pxi:
+public-api.pxi
+    The set of C functions that are exported to other extension
+    modules at the C level.  For example, ``lxml.objectify`` makes use
+    of these.  See the `C-level API` documentation.
 
+serializer.pxi
     XML output functions.  Basically everything that creates byte
     sequences from XML trees.
 
-iterparse.pxi:
+xinclude.pxi
+    XInclude implementation.
 
-    Incremental XML parsing.  An iterator class that builds iterparse
-    events while parsing.
+xmlerror.pxi
+    Error log handling.  All error messages that libxml2 generates
+    internally walk through the code in this file to end up in lxml's
+    Python level error logs.
 
-xmlid.pxi:
+    At the end of the file, you will find a long list of named error
+    codes.  It is generated from the libxml2 HTML documentation (using
+    lxml, of course).  See the script ``update-error-constants.py``
+    for this.
 
+xmlid.pxi
     XMLID and IDDict, a dictionary-like way to find Elements by their
     XML-ID attribute.
 
-xinclude.pxi:
+xpath.pxi
+    XPath evaluators.
 
-    XInclude implementation.
+xslt.pxi
+    XSL transformations, including the ``XSLT`` class, document lookup
+    handling and access control.
 
-extensions.pxi:
+The different schema languages (DTD, RelaxNG, XML Schema and
+Schematron) are implemented in the following include files:
 
-    Infrastructure for extension functions in XPath/XSLT, including
-    XPath value conversion and function registration.
+* dtd.pxi
+* relaxng.pxi
+* schematron.pxi
+* xmlschema.pxi
 
-xpath.pxi:
 
-    XPath evaluators.
+Python modules
+==============
 
-xslt.pxi:
+The ``lxml`` package also contains a number of pure Python modules:
 
-    XSL transformations, including the ``XSLT`` class, document lookup
-    handling and access control.
+builder.py
+    The E-factory and the ElementBuilder class.  These provide a
+    simple interface to XML tree generation.
+
+cssselect.py
+    A CSS selector implementation based on XPath.  The main class is
+    called ``CSSSelector``.
+
+doctestcompare.py
+    ...
+
+ElementInclude.py
+    ...
+
+_elementpath.py
+    ...
+
+sax.py
+    ...
+
+usedoctest.py
+    ...
 
 
 lxml.objectify


More information about the lxml-checkins mailing list