From scoder at codespeak.net Thu May 1 11:06:21 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 11:06:21 +0200 (CEST)
Subject: [Lxml-checkins] r54297 - in lxml/branch/lxml-2.0: . doc
Message-ID: <20080501090621.EF4A32A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 11:06:19 2008
New Revision: 54297
Modified:
lxml/branch/lxml-2.0/CHANGES.txt
lxml/branch/lxml-2.0/doc/main.txt
lxml/branch/lxml-2.0/version.txt
Log:
prepare release of 2.0.5
Modified: lxml/branch/lxml-2.0/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-2.0/CHANGES.txt (original)
+++ lxml/branch/lxml-2.0/CHANGES.txt Thu May 1 11:06:19 2008
@@ -2,8 +2,8 @@
lxml changelog
==============
-Under development
-=================
+2.0.4 (2008-05-01)
+==================
Features added
--------------
Modified: lxml/branch/lxml-2.0/doc/main.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/main.txt (original)
+++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 11:06:19 2008
@@ -145,8 +145,8 @@
.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/
.. _`this key`: pubkey.asc
-The latest version is `lxml 2.0.4`_, released 2008-04-13
-(`changes for 2.0.4`_). `Older versions`_ are listed below.
+The latest version is `lxml 2.0.5`_, released 2008-04-13
+(`changes for 2.0.5`_). `Older versions`_ are listed below.
.. _`Older versions`: #old-versions
@@ -206,6 +206,8 @@
Old Versions
------------
+* `lxml 2.0.4`_, released 2008-04-13 (`changes for 2.0.4`_)
+
* `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_)
* `lxml 2.0.2`_, released 2008-02-22 (`changes for 2.0.2`_)
@@ -264,6 +266,7 @@
* `lxml 0.5`_, released 2005-04-08
+.. _`lxml 2.0.5`: lxml-2.0.5.tgz
.. _`lxml 2.0.4`: lxml-2.0.4.tgz
.. _`lxml 2.0.3`: lxml-2.0.3.tgz
.. _`lxml 2.0.2`: lxml-2.0.2.tgz
@@ -294,6 +297,7 @@
.. _`lxml 0.5.1`: lxml-0.5.1.tgz
.. _`lxml 0.5`: lxml-0.5.tgz
+.. _`changes for 2.0.5`: changes-2.0.5.html
.. _`changes for 2.0.4`: changes-2.0.4.html
.. _`changes for 2.0.3`: changes-2.0.3.html
.. _`changes for 2.0.2`: changes-2.0.2.html
Modified: lxml/branch/lxml-2.0/version.txt
==============================================================================
--- lxml/branch/lxml-2.0/version.txt (original)
+++ lxml/branch/lxml-2.0/version.txt Thu May 1 11:06:19 2008
@@ -1 +1 @@
-2.0.4
+2.0.5
From scoder at codespeak.net Thu May 1 11:16:48 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 11:16:48 +0200 (CEST)
Subject: [Lxml-checkins] r54298 - lxml/branch/lxml-2.0/doc
Message-ID: <20080501091648.643A5169E35@codespeak.net>
Author: scoder
Date: Thu May 1 11:16:48 2008
New Revision: 54298
Added:
lxml/branch/lxml-2.0/doc/mklatex.py
- copied unchanged from r54297, lxml/trunk/doc/mklatex.py
Log:
support PDF generation in 2.0.x
From scoder at codespeak.net Thu May 1 11:17:09 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 11:17:09 +0200 (CEST)
Subject: [Lxml-checkins] r54299 - lxml/branch/lxml-2.0/doc
Message-ID: <20080501091709.4B2D3169E37@codespeak.net>
Author: scoder
Date: Thu May 1 11:17:08 2008
New Revision: 54299
Added:
lxml/branch/lxml-2.0/doc/rest2latex.py
- copied unchanged from r54298, lxml/trunk/doc/rest2latex.py
Log:
support PDF generation in 2.0.x
From scoder at codespeak.net Thu May 1 11:32:12 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 11:32:12 +0200 (CEST)
Subject: [Lxml-checkins] r54300 - lxml/branch/lxml-2.0/doc/html
Message-ID: <20080501093212.9F1091684E3@codespeak.net>
Author: scoder
Date: Thu May 1 11:32:12 2008
New Revision: 54300
Added:
lxml/branch/lxml-2.0/doc/html/tagpython-big.png
- copied unchanged from r54299, lxml/trunk/doc/html/tagpython-big.png
Log:
support PDF generation in 2.0.x
From scoder at codespeak.net Thu May 1 12:01:36 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:01:36 +0200 (CEST)
Subject: [Lxml-checkins] r54302 - in lxml/branch/lxml-2.0: . doc
Message-ID: <20080501100136.EF3822A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 12:01:35 2008
New Revision: 54302
Added:
lxml/branch/lxml-2.0/doc/docstructure.py
- copied unchanged from r53866, lxml/trunk/doc/docstructure.py
Modified:
lxml/branch/lxml-2.0/Makefile
lxml/branch/lxml-2.0/doc/main.txt
lxml/branch/lxml-2.0/doc/mkhtml.py
Log:
PDF doc fixes from trunk
Modified: lxml/branch/lxml-2.0/Makefile
==============================================================================
--- lxml/branch/lxml-2.0/Makefile (original)
+++ lxml/branch/lxml-2.0/Makefile Thu May 1 12:01:35 2008
@@ -2,6 +2,7 @@
TESTFLAGS=-p -v
TESTOPTS=
SETUPFLAGS=
+LXMLVERSION=`cat version.txt`
all: inplace
@@ -40,17 +41,40 @@
ftest_inplace: inplace
$(PYTHON) test.py -f $(TESTFLAGS) $(TESTOPTS)
-html: inplace
- mkdir -p doc/html
- PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . `cat version.txt`
+apihtml: inplace
rm -fr doc/html/api
@[ -x "`which epydoc`" ] \
&& (cd src && echo "Generating API docs ..." && \
PYTHONPATH=. epydoc -v --docformat "restructuredtext en" \
-o ../doc/html/api --no-private --exclude='[.]html[.]tests|[.]_' \
- --name lxml --url http://codespeak.net/lxml/ lxml/) \
+ --exclude-introspect='[.]usedoctest' \
+ --name "lxml API" --url http://codespeak.net/lxml/ lxml/) \
|| (echo "not generating epydoc API documentation")
+html: inplace apihtml
+ PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . ${LXMLVERSION}
+
+apipdf: inplace
+ rm -fr doc/pdf
+ mkdir -p doc/pdf
+ @[ -x "`which epydoc`" ] \
+ && (cd src && echo "Generating API docs ..." && \
+ PYTHONPATH=. epydoc -v --latex --docformat "restructuredtext en" \
+ -o ../doc/pdf --no-private --exclude='([.]html)?[.]tests|[.]_' \
+ --exclude-introspect='html[.]clean|[.]usedoctest' \
+ --name "lxml API" --url http://codespeak.net/lxml/ lxml/) \
+ || (echo "not generating epydoc API documentation")
+
+pdf: apipdf
+ $(PYTHON) doc/mklatex.py doc/pdf . ${LXMLVERSION}
+ (cd doc/pdf && pdflatex lxmldoc.tex \
+ && pdflatex lxmldoc.tex \
+ && pdflatex lxmldoc.tex)
+ @pdfopt doc/pdf/lxmldoc.pdf doc/pdf/lxmldoc-${LXMLVERSION}.pdf
+ @echo "PDF available as doc/pdf/lxmldoc-${LXMLVERSION}.pdf"
+
+# Two pdflatex runs are needed to build the correct Table of contents.
+
test: test_inplace
valtest: valgrind_test_inplace
@@ -65,7 +89,12 @@
find . \( -name '*.o' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \;
rm -rf build
-realclean: clean
+docclean:
+ rm -f doc/html/*.html
+ rm -fr doc/html/api
+ rm -fr doc/pdf
+
+realclean: clean docclean
find . -name '*.c' -exec rm -f {} \;
rm -f TAGS
$(PYTHON) setup.py clean -a
Modified: lxml/branch/lxml-2.0/doc/main.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/main.txt (original)
+++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 12:01:35 2008
@@ -47,6 +47,10 @@
Documentation
-------------
+The complete lxml documentation is available for download as `PDF
+documentation`_. The HTML documentation from this web site is part of
+the normal `source download <#download>`_.
+
* ElementTree:
* `ElementTree API`_
@@ -140,32 +144,37 @@
The source distribution is signed with `this key`_. Binary builds for
MS Windows usually become available through PyPI a few days after a
source release. If you can't wait, consider trying a less recent
-version first.
-
-.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/
-.. _`this key`: pubkey.asc
+release version first.
The latest version is `lxml 2.0.5`_, released 2008-04-13
(`changes for 2.0.5`_). `Older versions`_ are listed below.
-.. _`Older versions`: #old-versions
-
Please take a look at the `installation instructions`_!
-.. _`installation instructions`: installation.html
+This complete web site (including the generated API documentation) is
+part of the source distribution, so if you want to download the
+documentation for offline use, take the source archive and copy the
+``doc/html`` directory out of the source tree.
It's also possible to check out the latest development version of lxml
from svn directly, using a command like this::
svn co http://codespeak.net/svn/lxml/trunk lxml
-You can also `browse it through the web`_. Please read `how to build lxml
-from source`_ first. The `latest CHANGES`_ of the developer version are also
-accessible. You can check there if a bug you found has been fixed or a
-feature you want has been implemented in the latest trunk version.
+You can also browse the `Subversion repository`_ through the web, or
+take a look at the `Subversion history`_. Please read `how to build lxml
+from source`_ first. The `latest CHANGES`_ of the developer version
+are also accessible. You can check there if a bug you found has been
+fixed or a feature you want has been implemented in the latest trunk
+version.
-.. _`how to build lxml from source`: build.html
-.. _`browse it through the web`: http://codespeak.net/svn/lxml
+.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/
+.. _`this key`: pubkey.asc
+.. _`Older versions`: #old-versions
+.. _`installation instructions`: installation.html
+ .. _`how to build lxml from source`: build.html
+.. _`Subversion repository`: http://codespeak.net/svn/lxml/
+.. _`Subversion history`: https://codespeak.net/viewvc/lxml/
.. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt
@@ -178,7 +187,7 @@
.. _`mailing list`: http://codespeak.net/mailman/listinfo/lxml-dev
.. _Gmane: http://blog.gmane.org/gmane.comp.python.lxml.devel
-.. _Google: http://www.google.com/webhp?q=site:codespeak.net/mailman/listinfo/lxml-dev%20
+.. _Google: http://www.google.com/webhp?q=site:codespeak.net%2Fmailman%2Flistinfo%2Flxml-dev+
Bug tracker
@@ -189,7 +198,7 @@
unexpected behaviour of lxml is a bug or not, please ask on the `mailing
list`_ first. Do not forget to search the archive (e.g. with Gmane_)!
-.. _`launchpad bug tracker`: https://launchpad.net/lxml
+.. _`launchpad bug tracker`: https://launchpad.net/lxml/
License
Modified: lxml/branch/lxml-2.0/doc/mkhtml.py
==============================================================================
--- lxml/branch/lxml-2.0/doc/mkhtml.py (original)
+++ lxml/branch/lxml-2.0/doc/mkhtml.py Thu May 1 12:01:35 2008
@@ -1,21 +1,8 @@
+from docstructure import SITE_STRUCTURE, HREF_MAP, BASENAME_MAP
from lxml.etree import (parse, fromstring, ElementTree,
Element, SubElement, XPath)
import os, shutil, re, sys, copy, time
-SITE_STRUCTURE = [
- ('lxml', ('main.txt', 'intro.txt', '../INSTALL.txt', 'lxml2.txt',
- 'performance.txt', 'compatibility.txt', 'FAQ.txt')),
- ('Developing with lxml', ('tutorial.txt', '@API reference',
- 'api.txt', 'parsing.txt',
- 'validation.txt', 'xpathxslt.txt',
- 'objectify.txt', 'lxmlhtml.txt',
- 'cssselect.txt', 'elementsoup.txt')),
- ('Extending lxml', ('resolvers.txt', 'extensions.txt',
- 'element_classes.txt', 'sax.txt', 'capi.txt')),
- ('Developing lxml', ('build.txt', 'lxml-source-howto.txt',
- '@Release Changelog')),
- ]
-
RST2HTML_OPTIONS = " ".join([
"--no-toc-backlinks",
"--strip-comments",
@@ -23,15 +10,6 @@
"--date",
])
-HREF_MAP = {
- "API reference" : "api/index.html"
-}
-
-BASENAME_MAP = {
- 'main' : 'index',
- 'INSTALL' : 'installation',
-}
-
htmlnsmap = {"h" : "http://www.w3.org/1999/xhtml"}
find_title = XPath("/h:html/h:head/h:title/text()", namespaces=htmlnsmap)
From scoder at codespeak.net Thu May 1 12:04:49 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:04:49 +0200 (CEST)
Subject: [Lxml-checkins] r54303 - in lxml/branch/lxml-2.0: . doc
Message-ID: <20080501100449.BD3582A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 12:04:49 2008
New Revision: 54303
Modified:
lxml/branch/lxml-2.0/MANIFEST.in
lxml/branch/lxml-2.0/doc/main.txt
Log:
doc fixes
Modified: lxml/branch/lxml-2.0/MANIFEST.in
==============================================================================
--- lxml/branch/lxml-2.0/MANIFEST.in (original)
+++ lxml/branch/lxml-2.0/MANIFEST.in Thu May 1 12:04:49 2008
@@ -10,6 +10,6 @@
recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.xsd *.html *.txt
recursive-include src/lxml/html/tests *.data *.txt
recursive-include benchmark *.py
-recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png
+recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython*.png
recursive-include fake_pyrex *.py
include doc/mkhtml.py doc/rest2html.py
Modified: lxml/branch/lxml-2.0/doc/main.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/main.txt (original)
+++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 12:04:49 2008
@@ -172,7 +172,7 @@
.. _`this key`: pubkey.asc
.. _`Older versions`: #old-versions
.. _`installation instructions`: installation.html
- .. _`how to build lxml from source`: build.html
+.. _`how to build lxml from source`: build.html
.. _`Subversion repository`: http://codespeak.net/svn/lxml/
.. _`Subversion history`: https://codespeak.net/viewvc/lxml/
.. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt
@@ -215,6 +215,8 @@
Old Versions
------------
+.. _`PDF documentation`: lxmldoc-2.0.5.pdf
+
* `lxml 2.0.4`_, released 2008-04-13 (`changes for 2.0.4`_)
* `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_)
From scoder at codespeak.net Thu May 1 12:05:19 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:05:19 +0200 (CEST)
Subject: [Lxml-checkins] r54304 - lxml/trunk
Message-ID: <20080501100519.91FC12A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 12:05:19 2008
New Revision: 54304
Modified:
lxml/trunk/ (props changed)
lxml/trunk/MANIFEST.in
Log:
r4104 at delle: sbehnel | 2008-05-01 12:03:52 +0200
include tagpython-big.png in source distro
Modified: lxml/trunk/MANIFEST.in
==============================================================================
--- lxml/trunk/MANIFEST.in (original)
+++ lxml/trunk/MANIFEST.in Thu May 1 12:05:19 2008
@@ -11,6 +11,6 @@
recursive-include src/lxml/html/tests *.data *.txt
recursive-include samples *.xml
recursive-include benchmark *.py
-recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png
+recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython*.png
recursive-include fake_pyrex *.py
include doc/mkhtml.py doc/rest2html.py
From scoder at codespeak.net Thu May 1 12:10:12 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:10:12 +0200 (CEST)
Subject: [Lxml-checkins] r54305 - lxml/branch/lxml-2.0
Message-ID: <20080501101012.BA74C2A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 12:10:12 2008
New Revision: 54305
Modified:
lxml/branch/lxml-2.0/CHANGES.txt
Log:
pre-release fix
Modified: lxml/branch/lxml-2.0/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-2.0/CHANGES.txt (original)
+++ lxml/branch/lxml-2.0/CHANGES.txt Thu May 1 12:10:12 2008
@@ -2,7 +2,7 @@
lxml changelog
==============
-2.0.4 (2008-05-01)
+2.0.5 (2008-05-01)
==================
Features added
From scoder at codespeak.net Thu May 1 12:28:12 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:28:12 +0200 (CEST)
Subject: [Lxml-checkins] r54306 - in lxml/trunk: . doc
Message-ID: <20080501102812.9821E168563@codespeak.net>
Author: scoder
Date: Thu May 1 12:28:11 2008
New Revision: 54306
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/doc/main.txt
Log:
r4112 at delle: sbehnel | 2008-05-01 12:26:43 +0200
integrated release changes of 2.0.5
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Thu May 1 12:28:11 2008
@@ -26,6 +26,26 @@
namespace (i.e. they would end up in the wrong namespace).
+2.0.5 (2008-05-01)
+==================
+
+Features added
+--------------
+
+Bugs fixed
+----------
+
+* Resolving to a filename in custom resolvers didn't work.
+
+* lxml did not honour libxslt's second error state "STOPPED", which
+ let some XSLT errors pass silently.
+
+* Memory leak in Schematron with libxml2 >= 2.6.31.
+
+Other changes
+-------------
+
+
2.1beta1 (2008-04-15)
=====================
Modified: lxml/trunk/doc/main.txt
==============================================================================
--- lxml/trunk/doc/main.txt (original)
+++ lxml/trunk/doc/main.txt Thu May 1 12:28:11 2008
@@ -219,6 +219,8 @@
* `lxml 2.1alpha1`_, released 2008-03-27 (`changes for 2.1alpha1`_)
+* `lxml 2.0.5`_, released 2008-05-01 (`changes for 2.0.5`_)
+
* `lxml 2.0.4`_, released 2008-04-14 (`changes for 2.0.4`_)
* `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_)
@@ -281,6 +283,7 @@
.. _`lxml 2.1beta1`: lxml-2.1beta1.tgz
.. _`lxml 2.1alpha1`: lxml-2.1alpha1.tgz
+.. _`lxml 2.0.5`: lxml-2.0.5.tgz
.. _`lxml 2.0.4`: lxml-2.0.4.tgz
.. _`lxml 2.0.3`: lxml-2.0.3.tgz
.. _`lxml 2.0.2`: lxml-2.0.2.tgz
@@ -313,6 +316,7 @@
.. _`changes for 2.1beta1`: changes-2.1beta1.html
.. _`changes for 2.1alpha1`: changes-2.1alpha1.html
+.. _`changes for 2.0.5`: changes-2.0.5.html
.. _`changes for 2.0.4`: changes-2.0.4.html
.. _`changes for 2.0.3`: changes-2.0.3.html
.. _`changes for 2.0.2`: changes-2.0.2.html
From scoder at codespeak.net Fri May 2 09:49:51 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 09:49:51 +0200 (CEST)
Subject: [Lxml-checkins] r54311 - in lxml/trunk: . src/lxml/html
src/lxml/html/tests
Message-ID: <20080502074951.5688A498100@codespeak.net>
Author: scoder
Date: Fri May 2 09:49:49 2008
New Revision: 54311
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/html/__init__.py
lxml/trunk/src/lxml/html/tests/test_basic.txt
Log:
r4115 at delle: sbehnel | 2008-05-02 09:48:17 +0200
'parser' keyword in lxml.html parse functions, XHTMLParser class
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Fri May 2 09:49:49 2008
@@ -8,6 +8,12 @@
Features added
--------------
+* All parse functions in lxml.html take a ``parser`` keyword argument.
+
+* lxml.html has a new parser class ``XHTMLParser`` and a module
+ attribute ``xhtml_parser`` that provide XML parsers that are
+ pre-configured for the lxml.html package.
+
Bugs fixed
----------
Modified: lxml/trunk/src/lxml/html/__init__.py
==============================================================================
--- lxml/trunk/src/lxml/html/__init__.py (original)
+++ lxml/trunk/src/lxml/html/__init__.py Fri May 2 09:49:49 2008
@@ -443,14 +443,17 @@
# parsing
################################################################################
-def document_fromstring(html, **kw):
- value = etree.HTML(html, html_parser, **kw)
+def document_fromstring(html, parser=None, **kw):
+ if parser is None:
+ parser = html_parser
+ value = etree.fromstring(html, parser, **kw)
if value is None:
raise etree.ParserError(
"Document is empty")
return value
-def fragments_fromstring(html, no_leading_text=False, base_url=None, **kw):
+def fragments_fromstring(html, no_leading_text=False, base_url=None,
+ parser=None, **kw):
"""
Parses several HTML elements, returning a list of elements.
@@ -461,11 +464,13 @@
base_url will set the document's base_url attribute (and the tree's docinfo.URL)
"""
+ if parser is None:
+ parser = html_parser
# FIXME: check what happens when you give html with a body, head, etc.
start = html[:20].lstrip().lower()
if not start.startswith('%s%s>' % (
- create_parent, html, create_parent), base_url=base_url, **kw)
- elements = fragments_fromstring(html, no_leading_text=True, base_url=base_url, **kw)
+ create_parent, html, create_parent),
+ parser=parser, base_url=base_url, **kw)
+ elements = fragments_fromstring(html, parser=parser, no_leading_text=True,
+ base_url=base_url, **kw)
if not elements:
raise etree.ParserError(
"No elements found")
@@ -512,7 +522,7 @@
el.tail = None
return el
-def fromstring(html, base_url=None, **kw):
+def fromstring(html, base_url=None, parser=None, **kw):
"""
Parse the html, returning a single element/document.
@@ -521,12 +531,14 @@
base_url will set the document's base_url attribute (and the tree's docinfo.URL)
"""
+ if parser is None:
+ parser = html_parser
start = html[:10].lstrip().lower()
if start.startswith('footer
+
+lxml.html has two parsers, one for HTML, one for XHTML:
+
+ >>> from lxml.html import HTMLParser, XHTMLParser
+ >>> html = "
Hi!
"
+
+ >>> root = document_fromstring(html, parser=HTMLParser())
+ >>> print root.tag
+ html
+
+ >>> root = document_fromstring(html, parser=XHTMLParser())
+ >>> print root.tag
+ html
From scoder at codespeak.net Fri May 2 10:13:48 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 10:13:48 +0200 (CEST)
Subject: [Lxml-checkins] r54313 - in lxml/trunk: . doc
Message-ID: <20080502081348.C58B616855E@codespeak.net>
Author: scoder
Date: Fri May 2 10:13:48 2008
New Revision: 54313
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/build.txt
Log:
r4117 at delle: sbehnel | 2008-05-02 10:00:01 +0200
require Cython 0.9.6.14 for lxml 2.1
Modified: lxml/trunk/doc/build.txt
==============================================================================
--- lxml/trunk/doc/build.txt (original)
+++ lxml/trunk/doc/build.txt Fri May 2 10:13:48 2008
@@ -44,10 +44,10 @@
want to be an lxml developer, then you do need a working Cython
installation. You can use EasyInstall_ to install it::
- easy_install Cython==0.9.6.12
+ easy_install Cython==0.9.6.14
-lxml currently requires Cython 0.9.6.12. Any 0.9.6.13 version will not
-work, later versions were not tested.
+lxml currently requires Cython 0.9.6.14, later versions were not
+tested.
Subversion
From scoder at codespeak.net Fri May 2 10:28:05 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 10:28:05 +0200 (CEST)
Subject: [Lxml-checkins] r54315 - lxml/branch/lxml-2.0/doc
Message-ID: <20080502082805.10C124980FC@codespeak.net>
Author: scoder
Date: Fri May 2 10:28:04 2008
New Revision: 54315
Modified:
lxml/branch/lxml-2.0/doc/main.txt
Log:
doc fix
Modified: lxml/branch/lxml-2.0/doc/main.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/main.txt (original)
+++ lxml/branch/lxml-2.0/doc/main.txt Fri May 2 10:28:04 2008
@@ -146,7 +146,7 @@
source release. If you can't wait, consider trying a less recent
release version first.
-The latest version is `lxml 2.0.5`_, released 2008-04-13
+The latest version is `lxml 2.0.5`_, released 2008-05-01
(`changes for 2.0.5`_). `Older versions`_ are listed below.
Please take a look at the `installation instructions`_!
From scoder at codespeak.net Fri May 2 19:14:58 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:14:58 +0200 (CEST)
Subject: [Lxml-checkins] r54340 - in lxml/trunk: . src/lxml
Message-ID: <20080502171458.378E82A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:14:57 2008
New Revision: 54340
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/apihelpers.pxi
lxml/trunk/src/lxml/lxml.etree.pyx
lxml/trunk/src/lxml/proxy.pxi
lxml/trunk/src/lxml/tree.pxd
Log:
r4119 at delle: sbehnel | 2008-05-02 18:10:34 +0200
re-assign all node name pointers from the target dictionary when moving an element to a new tree of a different thread
Modified: lxml/trunk/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/trunk/src/lxml/apihelpers.pxi (original)
+++ lxml/trunk/src/lxml/apihelpers.pxi Fri May 2 19:14:57 2008
@@ -716,7 +716,7 @@
_moveTail(c_next, c_node)
if not attemptDeallocation(c_node):
# make namespaces absolute
- moveNodeToDocument(doc, c_node)
+ moveNodeToDocument(doc, c_node.doc, c_node)
return 0
cdef void _moveTail(xmlNode* c_tail, xmlNode* c_target):
@@ -782,6 +782,7 @@
"""
cdef xmlNode* c_orig_neighbour
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
cdef _Element element
cdef Py_ssize_t seqlength, i, c
cdef _node_to_node_function next_element
@@ -864,12 +865,13 @@
for element in elements:
assert element is not None, "Node must not be None"
# move element and tail over
+ c_source_doc = element._c_node.doc
c_next = element._c_node.next
tree.xmlAddPrevSibling(c_node, element._c_node)
_moveTail(c_next, element._c_node)
# integrate element into new document
- moveNodeToDocument(parent._doc, element._c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, element._c_node)
# stop at the end of the slice
if slicelength > 0:
@@ -899,7 +901,9 @@
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = child._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -908,7 +912,7 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(parent._doc, c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, c_node)
cdef int _prependChild(_Element parent, _Element child) except -1:
"""Prepend a new child to a parent element.
@@ -916,7 +920,9 @@
cdef xmlNode* c_next
cdef xmlNode* c_child
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = child._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -929,14 +935,16 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(parent._doc, c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, c_node)
cdef int _appendSibling(_Element element, _Element sibling) except -1:
"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = sibling._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -944,14 +952,16 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(element._doc, c_node)
+ moveNodeToDocument(element._doc, c_source_doc, c_node)
cdef int _prependSibling(_Element element, _Element sibling) except -1:
"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = sibling._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -959,7 +969,7 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(element._doc, c_node)
+ moveNodeToDocument(element._doc, c_source_doc, c_node)
cdef inline int isutf8(char* s):
cdef char c
Modified: lxml/trunk/src/lxml/lxml.etree.pyx
==============================================================================
--- lxml/trunk/src/lxml/lxml.etree.pyx (original)
+++ lxml/trunk/src/lxml/lxml.etree.pyx Fri May 2 19:14:57 2008
@@ -533,6 +533,7 @@
"""
cdef xmlNode* c_node
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
cdef _Element element
cdef bint left_to_right
cdef Py_ssize_t slicelength, step
@@ -554,13 +555,14 @@
c_node = _findChild(self._c_node, x)
if c_node is NULL:
raise IndexError, "list index out of range"
+ c_source_doc = element._c_node.doc
c_next = element._c_node.next
_removeText(c_node.next)
tree.xmlReplaceNode(c_node, element._c_node)
_moveTail(c_next, element._c_node)
- moveNodeToDocument(self._doc, element._c_node)
+ moveNodeToDocument(self._doc, c_source_doc, element._c_node)
if not attemptDeallocation(c_node):
- moveNodeToDocument(self._doc, c_node)
+ moveNodeToDocument(self._doc, c_node.doc, c_node)
def __delitem__(self, x):
"""__delitem__(self, x)
@@ -707,14 +709,16 @@
"""
cdef xmlNode* c_node
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
c_node = _findChild(self._c_node, index)
if c_node is NULL:
_appendChild(self, element)
return
+ c_source_doc = c_node.doc
c_next = element._c_node.next
tree.xmlAddPrevSibling(c_node, element._c_node)
_moveTail(c_next, element._c_node)
- moveNodeToDocument(self._doc, element._c_node)
+ moveNodeToDocument(self._doc, c_source_doc, element._c_node)
def remove(self, _Element element not None):
"""remove(self, element)
@@ -732,7 +736,7 @@
tree.xmlUnlinkNode(c_node)
_moveTail(c_next, c_node)
# fix namespace declarations
- moveNodeToDocument(self._doc, c_node)
+ moveNodeToDocument(self._doc, c_node.doc, c_node)
def replace(self, _Element old_element not None,
_Element new_element not None):
@@ -744,18 +748,20 @@
cdef xmlNode* c_old_next
cdef xmlNode* c_new_node
cdef xmlNode* c_new_next
+ cdef xmlDoc* c_source_doc
c_old_node = old_element._c_node
if c_old_node.parent is not self._c_node:
raise ValueError, "Element is not a child of this node."
c_old_next = c_old_node.next
c_new_node = new_element._c_node
c_new_next = c_new_node.next
+ c_source_doc = c_new_next.doc
tree.xmlReplaceNode(c_old_node, c_new_node)
_moveTail(c_new_next, c_new_node)
_moveTail(c_old_next, c_old_node)
- moveNodeToDocument(self._doc, c_new_node)
+ moveNodeToDocument(self._doc, c_source_doc, c_new_node)
# fix namespace declarations
- moveNodeToDocument(self._doc, c_old_node)
+ moveNodeToDocument(self._doc, c_old_node.doc, c_old_node)
# PROPERTIES
property tag:
Modified: lxml/trunk/src/lxml/proxy.pxi
==============================================================================
--- lxml/trunk/src/lxml/proxy.pxi (original)
+++ lxml/trunk/src/lxml/proxy.pxi Fri May 2 19:14:57 2008
@@ -276,7 +276,8 @@
c_nsdef[0] = c_ns_next
return 0
-cdef int moveNodeToDocument(_Document doc, xmlNode* c_element) except -1:
+cdef int moveNodeToDocument(_Document doc, xmlDoc* c_source_doc,
+ xmlNode* c_element) except -1:
"""Fix the xmlNs pointers of a node and its subtree that were moved.
Mainly copied from libxml2's xmlReconciliateNs(). Expects libxml2 doc
@@ -293,7 +294,11 @@
prefix). If a namespace is unknown, declare a new one on the
node.
- 3) Set the Document reference to the new Document (if different).
+ 3) Reassign the names of tags and attribute from the dict of the
+ target document *iff* it is different from the dict used in the
+ source subtree.
+
+ 4) Set the Document reference to the new Document (if different).
This is done on backtracking to keep the original Document
alive as long as possible, until all its elements are updated.
@@ -303,16 +308,26 @@
"""
cdef xmlNode* c_start_node
cdef xmlNode* c_node
+ cdef char* c_name
cdef _nscache c_ns_cache
cdef xmlNs* c_ns
cdef xmlNs* c_ns_next
cdef xmlNs* c_nsdef
cdef xmlNs* c_del_ns_list
cdef cstd.size_t i
+ cdef tree.xmlDict* c_dict
if not tree._isElementOrXInclude(c_element):
return 0
+ # we need to copy the names of tags and attributes iff the element
+ # is based on a different libxml2 tag name dictionary
+ if doc._c_doc.dict is not c_source_doc.dict and \
+ doc._c_doc.dict is not NULL and c_source_doc.dict is not NULL:
+ c_dict = doc._c_doc.dict
+ else:
+ c_dict = NULL
+
c_start_node = c_element
c_del_ns_list = NULL
@@ -343,6 +358,13 @@
c_element, c_node.ns.href, c_node.ns.prefix)
_appendToNsCache(&c_ns_cache, c_node.ns, c_ns)
c_node.ns = c_ns
+
+ # 3) re-assign names from the target dict
+ if c_dict is not NULL:
+ c_name = tree.xmlDictLookup(c_dict, c_node.name, -1)
+ if c_name is not NULL:
+ c_element.name = c_name
+
if c_node is c_element:
# after the element, continue with its attributes
c_node = c_element.properties
@@ -358,7 +380,7 @@
if c_node is NULL:
# no children => back off and continue with siblings and parents
- # 3) fix _Document reference (may dealloc the original document!)
+ # 4) fix _Document reference (may dealloc the original document!)
if c_element._private is not NULL:
_updateProxyDocument(c_element, doc)
@@ -376,7 +398,7 @@
if c_element is NULL or not tree._isElementOrXInclude(c_element):
break
- # 3) fix _Document reference (may dealloc the original document!)
+ # 4) fix _Document reference (may dealloc the original document!)
if c_element._private is not NULL:
_updateProxyDocument(c_element, doc)
Modified: lxml/trunk/src/lxml/tree.pxd
==============================================================================
--- lxml/trunk/src/lxml/tree.pxd (original)
+++ lxml/trunk/src/lxml/tree.pxd Fri May 2 19:14:57 2008
@@ -52,12 +52,12 @@
void xmlHashScan(xmlHashTable* table, xmlHashScanner f, void* data) nogil
void* xmlHashLookup(xmlHashTable* table, char* name) nogil
-cdef extern from "libxml/tree.h":
-
- # for some reason need to define this in this section;
+cdef extern from *: # actually "libxml/dict.h"
# libxml/dict.h appears to be broken to include in C
ctypedef struct xmlDict
-
+ cdef char* xmlDictLookup(xmlDict* dict, char* name, int len)
+
+cdef extern from "libxml/tree.h":
ctypedef struct xmlDoc
ctypedef struct xmlAttr
ctypedef struct xmlNotationTable
From scoder at codespeak.net Fri May 2 19:15:03 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:15:03 +0200 (CEST)
Subject: [Lxml-checkins] r54341 - in lxml/trunk: . src/lxml
Message-ID: <20080502171503.51C932A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:15:02 2008
New Revision: 54341
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/lxml.etree.pyx
Log:
r4120 at delle: sbehnel | 2008-05-02 18:42:35 +0200
typo
Modified: lxml/trunk/src/lxml/lxml.etree.pyx
==============================================================================
--- lxml/trunk/src/lxml/lxml.etree.pyx (original)
+++ lxml/trunk/src/lxml/lxml.etree.pyx Fri May 2 19:15:02 2008
@@ -755,7 +755,7 @@
c_old_next = c_old_node.next
c_new_node = new_element._c_node
c_new_next = c_new_node.next
- c_source_doc = c_new_next.doc
+ c_source_doc = c_new_node.doc
tree.xmlReplaceNode(c_old_node, c_new_node)
_moveTail(c_new_next, c_new_node)
_moveTail(c_old_next, c_old_node)
From scoder at codespeak.net Fri May 2 19:15:07 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:15:07 +0200 (CEST)
Subject: [Lxml-checkins] r54342 - in lxml/trunk: . src/lxml
Message-ID: <20080502171507.8028D39B593@codespeak.net>
Author: scoder
Date: Fri May 2 19:15:07 2008
New Revision: 54342
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tree.pxd
Log:
r4121 at delle: sbehnel | 2008-05-02 19:10:07 +0200
cleanup
Modified: lxml/trunk/src/lxml/tree.pxd
==============================================================================
--- lxml/trunk/src/lxml/tree.pxd (original)
+++ lxml/trunk/src/lxml/tree.pxd Fri May 2 19:15:07 2008
@@ -55,7 +55,7 @@
cdef extern from *: # actually "libxml/dict.h"
# libxml/dict.h appears to be broken to include in C
ctypedef struct xmlDict
- cdef char* xmlDictLookup(xmlDict* dict, char* name, int len)
+ cdef char* xmlDictLookup(xmlDict* dict, char* name, int len) nogil
cdef extern from "libxml/tree.h":
ctypedef struct xmlDoc
From scoder at codespeak.net Fri May 2 19:15:11 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:15:11 +0200 (CEST)
Subject: [Lxml-checkins] r54343 - in lxml/trunk: . src/lxml
Message-ID: <20080502171511.A0B4E2A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:15:11 2008
New Revision: 54343
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/proxy.pxi
Log:
r4122 at delle: sbehnel | 2008-05-02 19:11:59 +0200
fix and simplification
Modified: lxml/trunk/src/lxml/proxy.pxi
==============================================================================
--- lxml/trunk/src/lxml/proxy.pxi (original)
+++ lxml/trunk/src/lxml/proxy.pxi Fri May 2 19:15:11 2008
@@ -322,8 +322,7 @@
# we need to copy the names of tags and attributes iff the element
# is based on a different libxml2 tag name dictionary
- if doc._c_doc.dict is not c_source_doc.dict and \
- doc._c_doc.dict is not NULL and c_source_doc.dict is not NULL:
+ if doc._c_doc.dict is not c_source_doc.dict:
c_dict = doc._c_doc.dict
else:
c_dict = NULL
@@ -362,8 +361,10 @@
# 3) re-assign names from the target dict
if c_dict is not NULL:
c_name = tree.xmlDictLookup(c_dict, c_node.name, -1)
+ # c_name can be NULL on memory error, but we don't
+ # handle that here
if c_name is not NULL:
- c_element.name = c_name
+ c_node.name = c_name
if c_node is c_element:
# after the element, continue with its attributes
From scoder at codespeak.net Fri May 2 19:15:15 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:15:15 +0200 (CEST)
Subject: [Lxml-checkins] r54344 - lxml/trunk
Message-ID: <20080502171515.58D732A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:15:15 2008
New Revision: 54344
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
Log:
r4123 at delle: sbehnel | 2008-05-02 19:13:17 +0200
changelog
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Fri May 2 19:15:15 2008
@@ -17,6 +17,10 @@
Bugs fixed
----------
+* Moving a subtree from a document created in one thread into a
+ document of another thread could crash when the rest of the source
+ document is deleted while the subtree is still in use.
+
* Passing an nsmap when creating an Element will no longer strip
redundantly defined namespace URIs. This prevented the definition
of more than one prefix for a namespace on the same Element.
From scoder at codespeak.net Fri May 2 19:59:59 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:59:59 +0200 (CEST)
Subject: [Lxml-checkins] r54345 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080502175959.1362C2A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:59:57 2008
New Revision: 54345
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/test_elementtree.py
Log:
r4130 at delle: sbehnel | 2008-05-02 19:55:34 +0200
cleanup
Modified: lxml/trunk/src/lxml/tests/test_elementtree.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_elementtree.py (original)
+++ lxml/trunk/src/lxml/tests/test_elementtree.py Fri May 2 19:59:57 2008
@@ -8,7 +8,7 @@
for IO related test cases.
"""
-import unittest, doctest
+import unittest
import os, re, tempfile, copy, operator, gc
from common_imports import StringIO, etree, ElementTree, cElementTree
From scoder at codespeak.net Fri May 2 20:00:03 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 20:00:03 +0200 (CEST)
Subject: [Lxml-checkins] r54346 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080502180003.C61222A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 20:00:03 2008
New Revision: 54346
Added:
lxml/trunk/src/lxml/tests/test_threading.py
Modified:
lxml/trunk/ (props changed)
Log:
r4131 at delle: sbehnel | 2008-05-02 19:58:26 +0200
new test suite for threading tests
Added: lxml/trunk/src/lxml/tests/test_threading.py
==============================================================================
--- (empty file)
+++ lxml/trunk/src/lxml/tests/test_threading.py Fri May 2 20:00:03 2008
@@ -0,0 +1,38 @@
+# -*- coding: utf-8 -*-
+
+"""
+Tests for thread usage in lxml.etree.
+"""
+
+import unittest, threading
+
+from common_imports import etree, HelperTestCase
+
+class ThreadingTestCase(HelperTestCase):
+ """Threading tests"""
+ etree = etree
+
+ def test_subtree_copy(self):
+ tostring = self.etree.tostring
+ XML = self.etree.XML
+ xml = " "
+ main_root = XML(" ")
+
+ def run_thread():
+ thread_root = XML(xml)
+ main_root.append(thread_root[0])
+ del thread_root
+
+ thread = threading.Thread(target=run_thread)
+ thread.start()
+ thread.join()
+
+ self.assertEquals(xml, tostring(main_root))
+
+def test_suite():
+ suite = unittest.TestSuite()
+ suite.addTests([unittest.makeSuite(ThreadingTestCase)])
+ return suite
+
+if __name__ == '__main__':
+ print 'to test use test.py %s' % __file__
From scoder at codespeak.net Fri May 2 20:12:28 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 20:12:28 +0200 (CEST)
Subject: [Lxml-checkins] r54349 - in lxml/trunk: . doc
Message-ID: <20080502181228.7C57B169E74@codespeak.net>
Author: scoder
Date: Fri May 2 20:12:28 2008
New Revision: 54349
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/FAQ.txt
Log:
r4134 at delle: sbehnel | 2008-05-02 20:10:19 +0200
relieve FAQ on threading from 'big fat warning'
Modified: lxml/trunk/doc/FAQ.txt
==============================================================================
--- lxml/trunk/doc/FAQ.txt (original)
+++ lxml/trunk/doc/FAQ.txt Fri May 2 20:12:28 2008
@@ -565,29 +565,32 @@
Can I use threads to concurrently access the lxml API?
------------------------------------------------------
-Yes, although not carelessly.
+Short answer: yes, if you use lxml 2.1 and later.
-lxml frees the GIL (Python's global interpreter lock) internally when parsing
-from disk and memory, as long as you use either the default parser (which is
-replicated for each thread) or create a parser for each thread yourself. lxml
-also allows concurrency during validation (RelaxNG and XMLSchema) and XSL
-transformation. You can share RelaxNG, XMLSchema and (with restrictions) XSLT
-objects between threads. While you can also share parsers between threads,
-this will serialize the access to each of them, so it is better to ``copy()``
-parsers or to just use the default parser (which is automatically copied for
-each thread).
+Since version 1.1, lxml frees the GIL (Python's global interpreter
+lock) internally when parsing from disk and memory, as long as you use
+either the default parser (which is replicated for each thread) or
+create a parser for each thread yourself. lxml also allows
+concurrency during validation (RelaxNG and XMLSchema) and XSL
+transformation. You can share RelaxNG, XMLSchema and (with
+restrictions) XSLT objects between threads. While you can also share
+parsers between threads, this will serialize the access to each of
+them, so it is better to ``.copy()`` parsers or to just use the
+default parser if you do not need any special configuration.
Due to the way libxslt handles threading, applying a stylesheets is
most efficient if it was parsed in the same thread that executes it.
One way to achieve this is by caching stylesheets in thread-local
storage.
-Warning: You should generally avoid modifying trees in other threads than the
-one it was generated in. Although this should work in many cases, there are
-certain scenarios where the termination of a thread that parsed a tree can
-crash the application if subtrees of this tree were moved to other documents.
-You should be on the safe side when passing trees between threads if you
-either
+Warning: Before lxml 2.1, there were issues when moving subtrees
+between different threads. If you need code to run with older
+versions, you should generally avoid modifying trees in other threads
+than the one it was generated in. Although this should work in many
+cases, there are certain scenarios where the termination of a thread
+that parsed a tree can crash the application if subtrees of this tree
+were moved to other documents. You should be on the safe side when
+passing trees between threads if you either
a) do not modify these trees and do not move their elements to other trees, or
b) do not terminate threads while the trees they parsed are still in use
From scoder at codespeak.net Fri May 2 20:12:33 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 20:12:33 +0200 (CEST)
Subject: [Lxml-checkins] r54350 - in lxml/trunk: . doc
Message-ID: <20080502181233.B3E33169E77@codespeak.net>
Author: scoder
Date: Fri May 2 20:12:33 2008
New Revision: 54350
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/main.txt
lxml/trunk/version.txt
Log:
r4135 at delle: sbehnel | 2008-05-02 20:10:38 +0200
pre-release changes
Modified: lxml/trunk/doc/main.txt
==============================================================================
--- lxml/trunk/doc/main.txt (original)
+++ lxml/trunk/doc/main.txt Fri May 2 20:12:33 2008
@@ -146,8 +146,8 @@
source release. If you can't wait, consider trying a less recent
release version first.
-The latest version is `lxml 2.1beta1`_, released 2008-04-15
-(`changes for 2.1beta1`_). `Older versions`_ are listed below.
+The latest version is `lxml 2.1beta2`_, released 2008-05-02
+(`changes for 2.1beta2`_). `Older versions`_ are listed below.
Please take a look at the `installation instructions`_!
@@ -215,7 +215,9 @@
Old Versions
------------
-.. _`PDF documentation`: lxmldoc-2.1beta1.pdf
+.. _`PDF documentation`: lxmldoc-2.1beta2.pdf
+
+* `lxml 2.1beta1`_, released 2008-04-15 (`changes for 2.1beta1`_)
* `lxml 2.1alpha1`_, released 2008-03-27 (`changes for 2.1alpha1`_)
@@ -281,6 +283,7 @@
* `lxml 0.5`_, released 2005-04-08
+.. _`lxml 2.1beta2`: lxml-2.1beta2.tgz
.. _`lxml 2.1beta1`: lxml-2.1beta1.tgz
.. _`lxml 2.1alpha1`: lxml-2.1alpha1.tgz
.. _`lxml 2.0.5`: lxml-2.0.5.tgz
@@ -314,6 +317,7 @@
.. _`lxml 0.5.1`: lxml-0.5.1.tgz
.. _`lxml 0.5`: lxml-0.5.tgz
+.. _`changes for 2.1beta2`: changes-2.1beta2.html
.. _`changes for 2.1beta1`: changes-2.1beta1.html
.. _`changes for 2.1alpha1`: changes-2.1alpha1.html
.. _`changes for 2.0.5`: changes-2.0.5.html
Modified: lxml/trunk/version.txt
==============================================================================
--- lxml/trunk/version.txt (original)
+++ lxml/trunk/version.txt Fri May 2 20:12:33 2008
@@ -1 +1 @@
-2.1beta1
+2.1beta2
From scoder at codespeak.net Fri May 2 20:18:01 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 20:18:01 +0200 (CEST)
Subject: [Lxml-checkins] r54351 - lxml/trunk
Message-ID: <20080502181801.CBD87169E78@codespeak.net>
Author: scoder
Date: Fri May 2 20:18:01 2008
New Revision: 54351
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
Log:
r4138 at delle: sbehnel | 2008-05-02 20:16:32 +0200
pre-release changes
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Fri May 2 20:18:01 2008
@@ -2,8 +2,8 @@
lxml changelog
==============
-Under development
-=================
+2.1beta2 (2008-05-02)
+=====================
Features added
--------------
From scoder at codespeak.net Fri May 2 21:56:30 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 21:56:30 +0200 (CEST)
Subject: [Lxml-checkins] r54352 - in lxml/trunk: . src/lxml/html
Message-ID: <20080502195630.2D9A4169E4C@codespeak.net>
Author: scoder
Date: Fri May 2 21:56:28 2008
New Revision: 54352
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/html/defs.py
Log:
r4140 at delle: sbehnel | 2008-05-02 21:46:15 +0200
use sets instead of lists in defs.py as most use cases only test for containment
Modified: lxml/trunk/src/lxml/html/defs.py
==============================================================================
--- lxml/trunk/src/lxml/html/defs.py (original)
+++ lxml/trunk/src/lxml/html/defs.py Fri May 2 21:56:28 2008
@@ -4,34 +4,40 @@
# Data taken from http://www.w3.org/TR/html401/index/elements.html
-empty_tags = [
+try:
+ frozenset
+except NameError:
+ from sets import Set as frozenset
+
+
+empty_tags = frozenset([
'area', 'base', 'basefont', 'br', 'col', 'frame', 'hr',
- 'img', 'input', 'isindex', 'link', 'meta', 'param']
+ 'img', 'input', 'isindex', 'link', 'meta', 'param'])
-deprecated_tags = [
+deprecated_tags = frozenset([
'applet', 'basefont', 'center', 'dir', 'font', 'isindex',
- 'menu', 's', 'strike', 'u']
+ 'menu', 's', 'strike', 'u'])
# archive actually takes a space-separated list of URIs
-link_attrs = [
+link_attrs = frozenset([
'action', 'archive', 'background', 'cite', 'classid',
'codebase', 'data', 'href', 'longdesc', 'profile', 'src',
'usemap',
# Not standard:
'dynsrc', 'lowsrc',
- ]
+ ])
# Not in the HTML 4 spec:
# onerror, onresize
-event_attrs = [
+event_attrs = frozenset([
'onblur', 'onchange', 'onclick', 'ondblclick', 'onerror',
'onfocus', 'onkeydown', 'onkeypress', 'onkeyup', 'onload',
'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover',
'onmouseup', 'onreset', 'onresize', 'onselect', 'onsubmit',
'onunload',
- ]
+ ])
-safe_attrs = [
+safe_attrs = frozenset([
'abbr', 'accept', 'accept-charset', 'accesskey', 'action', 'align',
'alt', 'axis', 'border', 'cellpadding', 'cellspacing', 'char', 'charoff',
'charset', 'checked', 'cite', 'class', 'clear', 'cols', 'colspan',
@@ -41,18 +47,18 @@
'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly',
'rel', 'rev', 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape',
'size', 'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title',
- 'type', 'usemap', 'valign', 'value', 'vspace', 'width']
+ 'type', 'usemap', 'valign', 'value', 'vspace', 'width'])
# From http://htmlhelp.com/reference/html40/olist.html
-top_level_tags = [
+top_level_tags = frozenset([
'html', 'head', 'body', 'frameset',
- ]
+ ])
-head_tags = [
+head_tags = frozenset([
'base', 'isindex', 'link', 'meta', 'script', 'style', 'title',
- ]
+ ])
-general_block_tags = [
+general_block_tags = frozenset([
'address',
'blockquote',
'center',
@@ -70,51 +76,51 @@
'noscript',
'p',
'pre',
- ]
+ ])
-list_tags = [
+list_tags = frozenset([
'dir', 'dl', 'dt', 'dd', 'li', 'menu', 'ol', 'ul',
- ]
+ ])
-table_tags = [
+table_tags = frozenset([
'table', 'caption', 'colgroup', 'col',
'thead', 'tfoot', 'tbody', 'tr', 'td', 'th',
- ]
+ ])
# just this one from
# http://www.georgehernandez.com/h/XComputers/HTML/2BlockLevel.htm
-block_tags = general_block_tags + list_tags + table_tags + [
+block_tags = general_block_tags | list_tags | table_tags | frozenset([
# Partial form tags
'fieldset', 'form', 'legend', 'optgroup', 'option',
- ]
+ ])
-form_tags = [
+form_tags = frozenset([
'form', 'button', 'fieldset', 'legend', 'input', 'label',
'select', 'optgroup', 'option', 'textarea',
- ]
+ ])
-special_inline_tags = [
+special_inline_tags = frozenset([
'a', 'applet', 'basefont', 'bdo', 'br', 'embed', 'font', 'iframe',
'img', 'map', 'area', 'object', 'param', 'q', 'script',
'span', 'sub', 'sup',
- ]
+ ])
-phrase_tags = [
+phrase_tags = frozenset([
'abbr', 'acronym', 'cite', 'code', 'del', 'dfn', 'em',
'ins', 'kbd', 'samp', 'strong', 'var',
- ]
+ ])
-font_style_tags = [
+font_style_tags = frozenset([
'b', 'big', 'i', 's', 'small', 'strike', 'tt', 'u',
- ]
+ ])
-frame_tags = [
+frame_tags = frozenset([
'frameset', 'frame', 'noframes',
- ]
+ ])
# These tags aren't standard
-nonstandard_tags = ['blink', 'marque']
+nonstandard_tags = frozenset(['blink', 'marque'])
-tags = (top_level_tags + head_tags + general_block_tags + list_tags
- + table_tags + form_tags + special_inline_tags + phrase_tags
- + font_style_tags + nonstandard_tags)
+tags = (top_level_tags | head_tags | general_block_tags | list_tags
+ | table_tags | form_tags | special_inline_tags | phrase_tags
+ | font_style_tags | nonstandard_tags)
From scoder at codespeak.net Fri May 2 21:56:35 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 21:56:35 +0200 (CEST)
Subject: [Lxml-checkins] r54353 - in lxml/trunk: . src/lxml/html
Message-ID: <20080502195635.5C6BD2A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 21:56:34 2008
New Revision: 54353
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/html/__init__.py
lxml/trunk/src/lxml/html/clean.py
lxml/trunk/src/lxml/html/formfill.py
Log:
r4141 at delle: sbehnel | 2008-05-02 21:47:32 +0200
support XHTML tags in XPath expressions of lxml.html
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Fri May 2 21:56:34 2008
@@ -2,6 +2,21 @@
lxml changelog
==============
+Under development
+=================
+
+Features added
+--------------
+
+* Most features in lxml.html work for XHTML namespaced tag names.
+
+Bugs fixed
+----------
+
+Other changes
+-------------
+
+
2.1beta2 (2008-05-02)
=====================
Modified: lxml/trunk/src/lxml/html/__init__.py
==============================================================================
--- lxml/trunk/src/lxml/html/__init__.py (original)
+++ lxml/trunk/src/lxml/html/__init__.py Fri May 2 21:56:34 2008
@@ -22,16 +22,30 @@
'find_rel_links', 'find_class', 'make_links_absolute',
'resolve_base_href', 'iterlinks', 'rewrite_links', 'open_in_browser']
-_rel_links_xpath = etree.XPath("descendant-or-self::a[@rel]")
+XHTML_NAMESPACE = "http://www.w3.org/1999/xhtml"
+
+_rel_links_xpath = etree.XPath("descendant-or-self::a[@rel]|descendant-or-self::x:a[@rel]",
+ namespaces={'x':XHTML_NAMESPACE})
+_options_xpath = etree.XPath("descendant-or-self::option|descendant-or-self::x:option",
+ namespaces={'x':XHTML_NAMESPACE})
+_forms_xpath = etree.XPath("descendant-or-self::form|descendant-or-self::x:form",
+ namespaces={'x':XHTML_NAMESPACE})
#_class_xpath = etree.XPath(r"descendant-or-self::*[regexp:match(@class, concat('\b', $class_name, '\b'))]", {'regexp': 'http://exslt.org/regular-expressions'})
_class_xpath = etree.XPath("descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), concat(' ', $class_name, ' '))]")
_id_xpath = etree.XPath("descendant-or-self::*[@id=$id]")
_collect_string_content = etree.XPath("string()")
_css_url_re = re.compile(r'url\((.*?)\)', re.I)
_css_import_re = re.compile(r'@import "(.*?)"')
-_label_xpath = etree.XPath("//label[@for=$id]")
+_label_xpath = etree.XPath("//label[@for=$id]|//x:label[@for=$id]",
+ namespaces={'x':XHTML_NAMESPACE})
_archive_re = re.compile(r'[^ ]+')
+def _nons(tag):
+ if isinstance(tag, basestring):
+ if tag[0] == '{' and tag[1:len(XHTML_NAMESPACE)+1] == XHTML_NAMESPACE:
+ return tag.split('}')[-1]
+ return tag
+
class HtmlMixin(object):
def base_url(self):
@@ -48,7 +62,7 @@
"""
Return a list of all the forms
"""
- return list(self.getiterator('form'))
+ return _forms_xpath(self)
forms = property(forms, doc=forms.__doc__)
def body(self):
@@ -56,7 +70,7 @@
Return the element. Can be called from a child element
to get the document's head.
"""
- return self.xpath('//body')[0]
+ return self.xpath('//body|//x:body', namespaces={'x':XHTML_NAMESPACE})[0]
body = property(body, doc=body.__doc__)
def head(self):
@@ -64,7 +78,7 @@
Returns the element. Can be called from a child
element to get the document's head.
"""
- return self.xpath('//head')[0]
+ return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0]
head = property(head, doc=head.__doc__)
def _label__get(self):
@@ -85,7 +99,7 @@
raise TypeError(
"You cannot set a label for an element (%r) that has no id"
% self)
- if not label.tag == 'label':
+ if _nons(label.tag) != 'label':
raise TypeError(
"You can only assign label to a label element (not %r)"
% label)
@@ -228,7 +242,7 @@
tag once it has been applied.
"""
base_href = None
- basetags = self.xpath('//base[@href]')
+ basetags = self.xpath('//base[@href]|//x:base[@href]', namespaces={'x':XHTML_NAMESPACE})
for b in basetags:
base_href = b.get('href')
b.drop_tree()
@@ -249,11 +263,12 @@
link_attrs = defs.link_attrs
for el in self.getiterator():
attribs = el.attrib
- if el.tag != 'object':
+ tag = _nons(el.tag)
+ if tag != 'object':
for attrib in link_attrs:
if attrib in attribs:
yield (el, attrib, attribs[attrib], 0)
- elif el.tag == 'object':
+ elif tag == 'object':
codebase = None
## tags have attributes that are relative to
## codebase
@@ -272,7 +287,7 @@
if codebase is not None:
value = urlparse.urljoin(codebase, value)
yield (el, 'archive', value, match.start())
- if el.tag == 'param':
+ if tag == 'param':
valuetype = el.get('valuetype') or ''
if valuetype.lower() == 'ref':
## FIXME: while it's fine we *find* this link,
@@ -282,7 +297,7 @@
## doesn't have a valuetype="ref" (which seems to be the norm)
## http://www.w3.org/TR/html401/struct/objects.html#adef-valuetype
yield (el, 'value', el.get('value'), 0)
- if el.tag == 'style' and el.text:
+ if tag == 'style' and el.text:
for match in _css_url_re.finditer(el.text):
yield (el, None, match.group(1), match.start(1))
for match in _css_import_re.finditer(el.text):
@@ -471,8 +486,8 @@
if not start.startswith(' 1:
@@ -558,6 +575,8 @@
else:
body = None
heads = doc.findall('head')
+ if not heads:
+ heads = doc.findall('{%s}head' % XHTML_NAMESPACE)
if heads:
# Well, we have some sort of structure, so lets keep it all
head = heads[0]
@@ -598,7 +617,7 @@
# FIXME: I could do this with XPath, but would that just be
# unnecessarily slow?
for el in el.getiterator():
- if el.tag in defs.block_tags:
+ if _nons(el.tag) in defs.block_tags:
return True
return False
@@ -608,7 +627,7 @@
elif isinstance(el, basestring):
return 'string'
else:
- return el.tag
+ return _nons(el.tag)
################################################################################
# form handling
@@ -655,7 +674,10 @@
return self.get('name')
elif self.get('id'):
return '#' + self.get('id')
- return str(self.body.findall('form').index(self))
+ forms = self.body.findall('form')
+ if not forms:
+ forms = self.body.findall('{%s}form' % XHTML_NAMESPACE)
+ return str(forms.index(self))
def form_values(self):
"""
@@ -667,9 +689,10 @@
name = el.name
if not name:
continue
- if el.tag == 'textarea':
+ tag = _nons(el.tag)
+ if tag == 'textarea':
results.append((name, el.value))
- elif el.tag == 'select':
+ elif tag == 'select':
value = el.value
if el.multiple:
for v in value:
@@ -677,7 +700,7 @@
elif value is not None:
results.append((name, el.value))
else:
- assert el.tag == 'input', (
+ assert tag == 'input', (
"Unexpected tag: %r" % el)
if el.checkable and not el.checked:
continue
@@ -801,8 +824,8 @@
checkboxes and radio elements are returned individually.
"""
- _name_xpath = etree.XPath(".//*[@name = $name and (name(.) = 'select' or name(.) = 'input' or name(.) = 'textarea')]")
- _all_xpath = etree.XPath(".//*[name() = 'select' or name() = 'input' or name() = 'textarea']")
+ _name_xpath = etree.XPath(".//*[@name = $name and (local-name(.) = 'select' or local-name(.) = 'input' or local-name(.) = 'textarea')]")
+ _all_xpath = etree.XPath(".//*[local-name() = 'select' or local-name() = 'input' or local-name() = 'textarea']")
def __init__(self, form):
self.form = form
@@ -919,7 +942,7 @@
"""
if self.multiple:
return MultipleSelectOptions(self)
- for el in self.getiterator('option'):
+ for el in _options_xpath(self):
if 'selected' in el.attrib:
value = el.get('value')
# FIXME: If value is None, what to return?, get_text()?
@@ -935,7 +958,7 @@
self.value.update(value)
return
if value is not None:
- for el in self.getiterator('option'):
+ for el in _options_xpath(self):
# FIXME: also if el.get('value') is None?
if el.get('value') == value:
checked_option = el
@@ -943,7 +966,7 @@
else:
raise ValueError(
"There is no option with the value of %r" % value)
- for el in self.getiterator('option'):
+ for el in _options_xpath(self):
if 'selected' in el.attrib:
del el.attrib['selected']
if value is not None:
@@ -963,7 +986,7 @@
All the possible values this select can have (the ``value``
attribute of all the ```` elements.
"""
- return [el.get('value') for el in self.getiterator('option')]
+ return [el.get('value') for el in _options_xpath(self)]
value_options = property(value_options, doc=value_options.__doc__)
def _multiple__get(self):
@@ -995,7 +1018,7 @@
"""
Iterator of all the `` `` elements.
"""
- return self.select.getiterator('option')
+ return iter(_options_xpath(self.select))
options = property(options)
def __iter__(self):
Modified: lxml/trunk/src/lxml/html/clean.py
==============================================================================
--- lxml/trunk/src/lxml/html/clean.py (original)
+++ lxml/trunk/src/lxml/html/clean.py Fri May 2 21:56:34 2008
@@ -9,7 +9,7 @@
import urlparse
from lxml import etree
from lxml.html import defs
-from lxml.html import fromstring, tostring
+from lxml.html import fromstring, tostring, XHTML_NAMESPACE, _nons
try:
set
@@ -62,7 +62,9 @@
"descendant-or-self::*[@style]")
_find_external_links = etree.XPath(
- "descendant-or-self::a[normalize-space(@href) and substring(normalize-space(@href),1,1) != '#']")
+ ("descendant-or-self::a [normalize-space(@href) and substring(normalize-space(@href),1,1) != '#'] |"
+ "descendant-or-self::x:a[normalize-space(@href) and substring(normalize-space(@href),1,1) != '#']"),
+ namespaces={'x':XHTML_NAMESPACE})
class Cleaner(object):
"""
@@ -201,6 +203,11 @@
if hasattr(doc, 'getroot'):
# ElementTree instance, instead of an element
doc = doc.getroot()
+ # convert XHTML to HTML
+ for el in doc.iter():
+ tag = el.tag
+ if isinstance(tag, basestring):
+ el.tag = _nons(tag)
# Normalize a case that IE treats like , and that
# can confuse either this step or later steps.
for el in doc.iter('image'):
Modified: lxml/trunk/src/lxml/html/formfill.py
==============================================================================
--- lxml/trunk/src/lxml/html/formfill.py (original)
+++ lxml/trunk/src/lxml/html/formfill.py Fri May 2 21:56:34 2008
@@ -1,5 +1,6 @@
from lxml.etree import XPath, ElementBase
-from lxml.html import fromstring, tostring
+from lxml.html import fromstring, tostring, XHTML_NAMESPACE
+from lxml.html import _forms_xpath, _options_xpath, _nons
from lxml.html import defs
__all__ = ['FormNotFound', 'fill_form', 'fill_form_html',
@@ -11,9 +12,11 @@
Raised when no form can be found
"""
-_form_name_xpath = XPath('descendant-or-self::form[name=$name]')
-_input_xpath = XPath('descendant-or-self::input | descendant-or-self::select | descendant-or-self::textarea')
-_label_for_xpath = XPath('//label[@for=$for_id]')
+_form_name_xpath = XPath('descendant-or-self::form[name=$name]|descendant-or-self::x:form[name=$name]', namespaces={'x':XHTML_NAMESPACE})
+_input_xpath = XPath('|'.join(['descendant-or-self::'+_tag for _tag in ('input','select','textarea','x:input','x:select','x:textarea')]),
+ namespaces={'x':XHTML_NAMESPACE})
+_label_for_xpath = XPath('//label[@for=$for_id]|//x:label[@for=$for_id]',
+ namespaces={'x':XHTML_NAMESPACE})
_name_xpath = XPath('descendant-or-self::*[@name=$name]')
def fill_form(
@@ -69,7 +72,7 @@
_fill_single(input, value)
def _takes_multiple(input):
- if input.tag == 'select' and input.get('multiple'):
+ if _nons(input.tag) == 'select' and input.get('multiple'):
# FIXME: multiple="0"?
return True
type = input.get('type', '').lower()
@@ -96,8 +99,8 @@
v = input.get('value')
_check(input, v in value)
else:
- assert input.tag == 'select'
- for option in input.findall('option'):
+ assert _nons(input.tag) == 'select'
+ for option in _options_xpath(input):
v = option.get('value')
if v is None:
# This seems to be the default, at least on IE
@@ -120,7 +123,7 @@
del el.attrib['selected']
def _fill_single(input, value):
- if input.tag == 'textarea':
+ if _nons(input.tag) == 'textarea':
input.clear()
input.text = value
else:
@@ -128,7 +131,7 @@
def _find_form(el, form_id=None, form_index=None):
if form_id is None and form_index is None:
- forms = el.getiterator('form')
+ forms = _forms_xpath(el)
for form in forms:
return form
raise FormNotFound(
@@ -145,7 +148,7 @@
"No form with the name or id of %r (forms: %s)"
% (id, ', '.join(_find_form_ids(el))))
if form_index is not None:
- forms = el.getiterator('form')
+ forms = _forms_xpath(el)
try:
return forms[form_index]
except IndexError:
@@ -154,7 +157,7 @@
% (form_index, len(forms)))
def _find_form_ids(el):
- forms = el.getiterator('form')
+ forms = _forms_xpath(el)
if not forms:
yield '(no forms)'
return
@@ -254,11 +257,11 @@
return doc
def _insert_error(el, error, error_class, error_creator):
- if el.tag in defs.empty_tags or el.tag == 'textarea':
+ if _nons(el.tag) in defs.empty_tags or _nons(el.tag) == 'textarea':
is_block = False
else:
is_block = True
- if el.tag != 'form' and error_class:
+ if _nons(el.tag) != 'form' and error_class:
_add_class(el, error_class)
if el.get('id'):
labels = _label_for_xpath(el, for_id=el.get('id'))
From scoder at codespeak.net Sat May 3 14:55:01 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sat, 3 May 2008 14:55:01 +0200 (CEST)
Subject: [Lxml-checkins] r54361 - lxml/branch/lxml-2.0/src/lxml/tests
Message-ID: <20080503125501.EFBA32A0152@codespeak.net>
Author: scoder
Date: Sat May 3 14:54:59 2008
New Revision: 54361
Added:
lxml/branch/lxml-2.0/src/lxml/tests/test_threading.py
- copied unchanged from r54360, lxml/trunk/src/lxml/tests/test_threading.py
Log:
copied test suite from trunk
From scoder at codespeak.net Sat May 3 15:42:16 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sat, 3 May 2008 15:42:16 +0200 (CEST)
Subject: [Lxml-checkins] r54363 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080503134216.9574916856A@codespeak.net>
Author: scoder
Date: Sat May 3 15:42:14 2008
New Revision: 54363
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/test_threading.py
Log:
r4144 at delle: sbehnel | 2008-05-03 14:50:08 +0200
extended threading test cases
Modified: lxml/trunk/src/lxml/tests/test_threading.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_threading.py (original)
+++ lxml/trunk/src/lxml/tests/test_threading.py Sat May 3 15:42:14 2008
@@ -6,13 +6,18 @@
import unittest, threading
-from common_imports import etree, HelperTestCase
+from common_imports import etree, HelperTestCase, StringIO
class ThreadingTestCase(HelperTestCase):
"""Threading tests"""
etree = etree
- def test_subtree_copy(self):
+ def _run_thread(self, func):
+ thread = threading.Thread(target=func)
+ thread.start()
+ thread.join()
+
+ def test_subtree_copy_thread(self):
tostring = self.etree.tostring
XML = self.etree.XML
xml = " "
@@ -23,12 +28,113 @@
main_root.append(thread_root[0])
del thread_root
- thread = threading.Thread(target=run_thread)
- thread.start()
- thread.join()
-
+ self._run_thread(run_thread)
self.assertEquals(xml, tostring(main_root))
+ def test_main_xslt_in_thread(self):
+ XML = self.etree.XML
+ style = XML('''\
+
+
+
+
+ ''')
+ st = etree.XSLT(style)
+
+ result = []
+
+ def run_thread():
+ root = XML('B C ')
+ result.append( st(root) )
+
+ self._run_thread(run_thread)
+ self.assertEquals('''\
+
+B
+''',
+ str(result[0]))
+
+ def test_thread_xslt(self):
+ XML = self.etree.XML
+ tostring = self.etree.tostring
+ root = XML('B C ')
+
+ def run_thread():
+ style = XML('''\
+
+
+
+
+ ''')
+ st = etree.XSLT(style)
+ root.append( st(root).getroot() )
+
+ self._run_thread(run_thread)
+ self.assertEquals('B C B ',
+ tostring(root))
+
+ def test_thread_mix(self):
+ XML = self.etree.XML
+ Element = self.etree.Element
+ SubElement = self.etree.SubElement
+ tostring = self.etree.tostring
+ xml = 'B C '
+ root = XML(xml)
+
+ result = self.etree.Element("{myns}root", att = "someval")
+
+ def run_XML():
+ thread_root = XML(xml)
+ result.append(thread_root[0])
+ result.append(thread_root[-1])
+
+ def run_parse():
+ thread_root = self.etree.parse(StringIO(xml)).getroot()
+ result.append(thread_root[0])
+ result.append(thread_root[-1])
+
+ def run_foreign_XML():
+ thread_root = XML(" ")
+ result.append(thread_root[0])
+
+ def run_build():
+ result.append(
+ Element("{myns}foo", attrib={'{test}attr':'val'}))
+ SubElement(result, "{otherns}tasty")
+
+ def run_xslt():
+ style = XML('''\
+
+
+
+
+ ''')
+ st = etree.XSLT(style)
+ result.append( st(root).getroot()[0] )
+
+ for test in (run_XML, run_parse, run_foreign_XML, run_xslt):
+ tostring(result)
+ self._run_thread(test)
+
+ self.assertEquals(
+ 'B C B C B ',
+ tostring(result))
+
+ def strip_first():
+ root = Element("newroot")
+ root.append(result[0])
+
+ while len(result):
+ self._run_thread(strip_first)
+
+ self.assertEquals(
+ ' ',
+ tostring(result))
+
+
def test_suite():
suite = unittest.TestSuite()
suite.addTests([unittest.makeSuite(ThreadingTestCase)])
From scoder at codespeak.net Sat May 3 15:42:21 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sat, 3 May 2008 15:42:21 +0200 (CEST)
Subject: [Lxml-checkins] r54364 - in lxml/trunk: . src/lxml
Message-ID: <20080503134221.B5B0B16856B@codespeak.net>
Author: scoder
Date: Sat May 3 15:42:20 2008
New Revision: 54364
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/xslt.pxi
Log:
r4145 at delle: sbehnel | 2008-05-03 14:50:52 +0200
rewrite of XSLT threading work around: fix the result document, not the stylesheet
Modified: lxml/trunk/src/lxml/xslt.pxi
==============================================================================
--- lxml/trunk/src/lxml/xslt.pxi (original)
+++ lxml/trunk/src/lxml/xslt.pxi Sat May 3 15:42:20 2008
@@ -451,11 +451,7 @@
cdef xslt.xsltTransformContext* transform_ctxt
cdef xmlDoc* c_result
cdef xmlDoc* c_doc
-
- if not _checkThreadDict(self._c_style.doc.dict):
- if profile_run is not False:
- _kw['profile_run'] = profile_run
- return _copyXSLT(self)(_input, **_kw)
+ cdef xmlNode* c_node
input_doc = _documentOrRaise(_input)
root_node = _rootNodeOrRaise(_input)
@@ -530,6 +526,14 @@
resolver_context.clear()
result_doc = _documentFactory(c_result, input_doc._parser)
+
+ if not _checkThreadDict(c_result.dict):
+ # fix document dictionary
+ c_node = _findChildForwards(c_result, 0)
+ if c_node is not NULL:
+ __GLOBAL_PARSER_CONTEXT.initThreadDictRef(&c_result.dict)
+ moveNodeToDocument(result_doc, self._c_style.doc, c_node)
+
return _xsltResultTreeFactory(result_doc, self, profile_doc)
cdef xmlDoc* _run_transform(self, xmlDoc* c_input_doc,
From scoder at codespeak.net Sat May 3 15:42:27 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sat, 3 May 2008 15:42:27 +0200 (CEST)
Subject: [Lxml-checkins] r54365 - in lxml/trunk: . src/lxml/html
src/lxml/html/tests
Message-ID: <20080503134227.9614D168569@codespeak.net>
Author: scoder
Date: Sat May 3 15:42:27 2008
New Revision: 54365
Added:
lxml/trunk/src/lxml/html/tests/test_xhtml.py
lxml/trunk/src/lxml/html/tests/test_xhtml.txt
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/html/__init__.py
lxml/trunk/src/lxml/html/tests/test_basic.txt
Log:
r4146 at delle: sbehnel | 2008-05-03 15:40:45 +0200
conversion functions HTML<->XHTML in lxml.html
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Sat May 3 15:42:27 2008
@@ -8,6 +8,9 @@
Features added
--------------
+* Conversion functions ``html_to_xhtml()`` and ``xhtml_to_html()`` in
+ lxml.html.
+
* Most features in lxml.html work for XHTML namespaced tag names.
Bugs fixed
Modified: lxml/trunk/src/lxml/html/__init__.py
==============================================================================
--- lxml/trunk/src/lxml/html/__init__.py (original)
+++ lxml/trunk/src/lxml/html/__init__.py Sat May 3 15:42:27 2008
@@ -1301,6 +1301,34 @@
## Serialization
############################################################
+def html_to_xhtml(html):
+ """Convert all tags in an HTML tree to XHTML by moving them to the
+ XHTML namespace.
+ """
+ try:
+ html = html.getroot()
+ except AttributeError:
+ pass
+ prefix = "{%s}" % XHTML_NAMESPACE
+ for el in html.iter():
+ tag = el.tag
+ if isinstance(tag, basestring):
+ if tag[0] != '{':
+ el.tag = prefix + tag
+
+def xhtml_to_html(xhtml):
+ """Convert all tags in an XHTML tree to HTML by removing their
+ XHTML namespace.
+ """
+ try:
+ xhtml = xhtml.getroot()
+ except AttributeError:
+ pass
+ prefix = "{%s}" % XHTML_NAMESPACE
+ prefix_len = len(prefix)
+ for el in xhtml.iter(prefix + "*"):
+ el.tag = el.tag[prefix_len:]
+
# This isn't a general match, but it's a match for what libxml2
# specifically serialises:
__replace_meta_content_type = re.compile(
Modified: lxml/trunk/src/lxml/html/tests/test_basic.txt
==============================================================================
--- lxml/trunk/src/lxml/html/tests/test_basic.txt (original)
+++ lxml/trunk/src/lxml/html/tests/test_basic.txt Sat May 3 15:42:27 2008
@@ -96,16 +96,3 @@
footer
-
-lxml.html has two parsers, one for HTML, one for XHTML:
-
- >>> from lxml.html import HTMLParser, XHTMLParser
- >>> html = "Hi!
"
-
- >>> root = document_fromstring(html, parser=HTMLParser())
- >>> print root.tag
- html
-
- >>> root = document_fromstring(html, parser=XHTMLParser())
- >>> print root.tag
- html
Added: lxml/trunk/src/lxml/html/tests/test_xhtml.py
==============================================================================
--- (empty file)
+++ lxml/trunk/src/lxml/html/tests/test_xhtml.py Sat May 3 15:42:27 2008
@@ -0,0 +1,11 @@
+import unittest, sys
+from lxml.tests.common_imports import doctest
+import lxml.html
+
+def test_suite():
+ suite = unittest.TestSuite()
+ suite.addTests([doctest.DocFileSuite('test_xhtml.txt')])
+ return suite
+
+if __name__ == '__main__':
+ unittest.main()
Added: lxml/trunk/src/lxml/html/tests/test_xhtml.txt
==============================================================================
--- (empty file)
+++ lxml/trunk/src/lxml/html/tests/test_xhtml.txt Sat May 3 15:42:27 2008
@@ -0,0 +1,30 @@
+ >>> from lxml.html import document_fromstring, fragment_fromstring, tostring
+
+lxml.html has two parsers, one for HTML, one for XHTML:
+
+ >>> from lxml.html import HTMLParser, XHTMLParser
+ >>> html = "Hi!
"
+
+ >>> root = document_fromstring(html, parser=HTMLParser())
+ >>> print root.tag
+ html
+
+ >>> root = document_fromstring(html, parser=XHTMLParser())
+ >>> print root.tag
+ html
+
+There are two functions for converting between HTML and XHTML:
+
+ >>> from lxml.html import xhtml_to_html, html_to_xhtml
+
+ >>> doc = document_fromstring(html, parser=HTMLParser())
+ >>> print tostring(doc)
+ Hi!
+
+ >>> html_to_xhtml(doc)
+ >>> print tostring(doc)
+ Hi!
+
+ >>> xhtml_to_html(doc)
+ >>> print tostring(doc)
+ Hi!
From lxml-checkins at codespeak.net Sat May 3 19:49:37 2008
From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net)
Date: Sat, 3 May 2008 19:49:37 +0200 (CEST)
Subject: [Lxml-checkins] Because you deserve it! Feel the power DEALS on a
fresh pair of Gucci Shoe now!
Message-ID: <20080503104811.53730.qmail@dsl85-104-29136.ttnet.net.tr>
Ladies and Gentlemen, Get Ready for..
Thought I would let you know about the Fashion Footwear SPRING Sale!
Men and Women Designer Shoes, Heels, Sandals and Boots, All Half-OFF,
Buy Direct, Forget Department Store Prices, Get Exclusive 2008 D&G, Gucci, Versace, Prada, Chanel, Christian Dior, Dsquared, Uggs and More!
FREE International Shipping on all Orders!
http://jkkgroup.com/offer/
From scoder at codespeak.net Sun May 4 12:03:24 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sun, 4 May 2008 12:03:24 +0200 (CEST)
Subject: [Lxml-checkins] r54387 - in lxml/trunk: . doc
Message-ID: <20080504100324.D5BE1169EDE@codespeak.net>
Author: scoder
Date: Sun May 4 12:03:23 2008
New Revision: 54387
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/tutorial.txt
Log:
r4151 at delle: sbehnel | 2008-05-04 12:01:52 +0200
tutorial: make clear you have to clean up the iterparse() tree yourself
Modified: lxml/trunk/doc/tutorial.txt
==============================================================================
--- lxml/trunk/doc/tutorial.txt (original)
+++ lxml/trunk/doc/tutorial.txt Sun May 4 12:03:23 2008
@@ -861,8 +861,28 @@
Note that the text, tail and children of an Element are not necessarily there
yet when receiving the ``start`` event. Only the ``end`` event guarantees
-that the Element has been parsed completely. It also allows to ``clear()`` or
-modify the content of an Element to save memory.
+that the Element has been parsed completely.
+
+It also allows to ``.clear()`` or modify the content of an Element to
+save memory. So if you parse a large tree and you want to keep memory
+usage small, you should clean up parts of the tree that you no longer
+need:
+
+.. sourcecode:: pycon
+
+ >>> some_file_like = StringIO(
+ ... "data ")
+
+ >>> for event, element in etree.iterparse(some_file_like):
+ ... if element.tag == 'b':
+ ... print element.text
+ ... elif element.tag == 'a':
+ ... print "** cleaning up the subtree"
+ ... element.clear()
+ data
+ ** cleaning up the subtree
+ None
+ ** cleaning up the subtree
If memory is a real bottleneck, or if building the tree is not desired at all,
the target parser interface of ``lxml.etree`` can be used. It creates
From scoder at codespeak.net Sun May 4 12:04:16 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sun, 4 May 2008 12:04:16 +0200 (CEST)
Subject: [Lxml-checkins] r54388 - lxml/branch/lxml-2.0/doc
Message-ID: <20080504100416.B9410169EE1@codespeak.net>
Author: scoder
Date: Sun May 4 12:04:15 2008
New Revision: 54388
Modified:
lxml/branch/lxml-2.0/doc/tutorial.txt
Log:
tutorial: make clear you have to clean up the iterparse() tree yourself
Modified: lxml/branch/lxml-2.0/doc/tutorial.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/tutorial.txt (original)
+++ lxml/branch/lxml-2.0/doc/tutorial.txt Sun May 4 12:04:15 2008
@@ -750,8 +750,28 @@
Note that the text, tail and children of an Element are not necessarily there
yet when receiving the ``start`` event. Only the ``end`` event guarantees
-that the Element has been parsed completely. It also allows to ``clear()`` or
-modify the content of an Element to save memory.
+that the Element has been parsed completely.
+
+It also allows to ``.clear()`` or modify the content of an Element to
+save memory. So if you parse a large tree and you want to keep memory
+usage small, you should clean up parts of the tree that you no longer
+need:
+
+.. sourcecode:: pycon
+
+ >>> some_file_like = StringIO(
+ ... "data ")
+
+ >>> for event, element in etree.iterparse(some_file_like):
+ ... if element.tag == 'b':
+ ... print element.text
+ ... elif element.tag == 'a':
+ ... print "** cleaning up the subtree"
+ ... element.clear()
+ data
+ ** cleaning up the subtree
+ None
+ ** cleaning up the subtree
If memory is a real bottleneck, or if building the tree is not desired at all,
the target parser interface of ``lxml.etree`` can be used. It creates
From scoder at codespeak.net Sun May 4 12:11:39 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sun, 4 May 2008 12:11:39 +0200 (CEST)
Subject: [Lxml-checkins] r54389 - lxml/branch/lxml-2.0/doc
Message-ID: <20080504101139.CE5DB169EEF@codespeak.net>
Author: scoder
Date: Sun May 4 12:11:39 2008
New Revision: 54389
Modified:
lxml/branch/lxml-2.0/doc/tutorial.txt
Log:
rst fix
Modified: lxml/branch/lxml-2.0/doc/tutorial.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/tutorial.txt (original)
+++ lxml/branch/lxml-2.0/doc/tutorial.txt Sun May 4 12:11:39 2008
@@ -755,9 +755,7 @@
It also allows to ``.clear()`` or modify the content of an Element to
save memory. So if you parse a large tree and you want to keep memory
usage small, you should clean up parts of the tree that you no longer
-need:
-
-.. sourcecode:: pycon
+need::
>>> some_file_like = StringIO(
... "data ")
From scoder at codespeak.net Sun May 4 20:24:43 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sun, 4 May 2008 20:24:43 +0200 (CEST)
Subject: [Lxml-checkins] r54418 - lxml/trunk
Message-ID: <20080504182443.1A62B2A808E@codespeak.net>
Author: scoder
Date: Sun May 4 20:24:42 2008
New Revision: 54418
Modified:
lxml/trunk/ (props changed)
lxml/trunk/Makefile
Log:
r4153 at delle: sbehnel | 2008-05-04 20:21:22 +0200
keep private classes in the API docs as they include _Element etc.
Modified: lxml/trunk/Makefile
==============================================================================
--- lxml/trunk/Makefile (original)
+++ lxml/trunk/Makefile Sun May 4 20:24:42 2008
@@ -46,7 +46,7 @@
@[ -x "`which epydoc`" ] \
&& (cd src && echo "Generating API docs ..." && \
PYTHONPATH=. epydoc -v --docformat "restructuredtext en" \
- -o ../doc/html/api --no-private --exclude='[.]html[.]tests|[.]_' \
+ -o ../doc/html/api --exclude='[.]html[.]tests|[.]_' \
--exclude-introspect='[.]usedoctest' \
--name "lxml API" --url http://codespeak.net/lxml/ lxml/) \
|| (echo "not generating epydoc API documentation")
@@ -60,7 +60,7 @@
@[ -x "`which epydoc`" ] \
&& (cd src && echo "Generating API docs ..." && \
PYTHONPATH=. epydoc -v --latex --docformat "restructuredtext en" \
- -o ../doc/pdf --no-private --exclude='([.]html)?[.]tests|[.]_' \
+ -o ../doc/pdf --exclude='([.]html)?[.]tests|[.]_' \
--exclude-introspect='html[.]clean|[.]usedoctest' \
--name "lxml API" --url http://codespeak.net/lxml/ lxml/) \
|| (echo "not generating epydoc API documentation")
From scoder at codespeak.net Sun May 4 20:24:48 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Sun, 4 May 2008 20:24:48 +0200 (CEST)
Subject: [Lxml-checkins] r54419 - in lxml/trunk: . doc
Message-ID: <20080504182448.35BA02A808E@codespeak.net>
Author: scoder
Date: Sun May 4 20:24:47 2008
New Revision: 54419
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/tutorial.txt
Log:
r4154 at delle: sbehnel | 2008-05-04 20:23:11 +0200
simplified doctest
Modified: lxml/trunk/doc/tutorial.txt
==============================================================================
--- lxml/trunk/doc/tutorial.txt (original)
+++ lxml/trunk/doc/tutorial.txt Sun May 4 20:24:47 2008
@@ -774,12 +774,12 @@
.. sourcecode:: pycon
>>> class DataSource:
- ... data = iter(["<", "a/", "><", "/root>"])
+ ... data = ["<", "a/", "><", "/root>"]
... def read(self, requested_size):
... try:
- ... return self.data.next()
- ... except StopIteration:
- ... return ""
+ ... return self.data.pop(0)
+ ... except IndexError:
+ ... return ''
>>> tree = etree.parse(DataSource())
From scoder at codespeak.net Mon May 5 21:31:00 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Mon, 5 May 2008 21:31:00 +0200 (CEST)
Subject: [Lxml-checkins] r54445 - lxml/branch/lxml-2.0/src/lxml
Message-ID: <20080505193100.4679016840D@codespeak.net>
Author: scoder
Date: Mon May 5 21:30:57 2008
New Revision: 54445
Modified:
lxml/branch/lxml-2.0/src/lxml/proxy.pxi
Log:
merged ns copy optimisation from trunk
Modified: lxml/branch/lxml-2.0/src/lxml/proxy.pxi
==============================================================================
--- lxml/branch/lxml-2.0/src/lxml/proxy.pxi (original)
+++ lxml/branch/lxml-2.0/src/lxml/proxy.pxi Mon May 5 21:30:57 2008
@@ -191,10 +191,8 @@
c_parent.type == tree.XML_DOCUMENT_NODE):
c_new_ns = c_parent.nsDef
while c_new_ns is not NULL:
- # check if prefix is already defined
- c_ns = tree.xmlSearchNs(c_to_node.doc, c_to_node, c_new_ns.prefix)
- if c_ns is NULL:
- tree.xmlNewNs(c_to_node, c_new_ns.href, c_new_ns.prefix)
+ # libxml2 will check if the prefix is already defined
+ tree.xmlNewNs(c_to_node, c_new_ns.href, c_new_ns.prefix)
c_new_ns = c_new_ns.next
c_parent = c_parent.parent
From scoder at codespeak.net Mon May 5 21:42:54 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Mon, 5 May 2008 21:42:54 +0200 (CEST)
Subject: [Lxml-checkins] r54446 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080505194254.F1FD4168448@codespeak.net>
Author: scoder
Date: Mon May 5 21:42:52 2008
New Revision: 54446
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/test_threading.py
Log:
r4159 at delle: sbehnel | 2008-05-05 08:10:24 +0200
test cleanup
Modified: lxml/trunk/src/lxml/tests/test_threading.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_threading.py (original)
+++ lxml/trunk/src/lxml/tests/test_threading.py Mon May 5 21:42:52 2008
@@ -82,6 +82,7 @@
tostring = self.etree.tostring
xml = 'B C '
root = XML(xml)
+ fragment = XML(" ")
result = self.etree.Element("{myns}root", att = "someval")
@@ -95,9 +96,8 @@
result.append(thread_root[0])
result.append(thread_root[-1])
- def run_foreign_XML():
- thread_root = XML(" ")
- result.append(thread_root[0])
+ def run_move_main():
+ result.append(fragment[0])
def run_build():
result.append(
@@ -115,7 +115,7 @@
st = etree.XSLT(style)
result.append( st(root).getroot()[0] )
- for test in (run_XML, run_parse, run_foreign_XML, run_xslt):
+ for test in (run_XML, run_parse, run_move_main, run_xslt):
tostring(result)
self._run_thread(test)
From scoder at codespeak.net Mon May 5 21:43:01 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Mon, 5 May 2008 21:43:01 +0200 (CEST)
Subject: [Lxml-checkins] r54447 - in lxml/trunk: . src/lxml
Message-ID: <20080505194301.8985616844B@codespeak.net>
Author: scoder
Date: Mon May 5 21:43:00 2008
New Revision: 54447
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tree.pxd
Log:
r4160 at delle: sbehnel | 2008-05-05 09:12:11 +0200
declare xmlDictExists()
Modified: lxml/trunk/src/lxml/tree.pxd
==============================================================================
--- lxml/trunk/src/lxml/tree.pxd (original)
+++ lxml/trunk/src/lxml/tree.pxd Mon May 5 21:43:00 2008
@@ -56,6 +56,7 @@
# libxml/dict.h appears to be broken to include in C
ctypedef struct xmlDict
cdef char* xmlDictLookup(xmlDict* dict, char* name, int len) nogil
+ cdef char* xmlDictExists(xmlDict* dict, char* name, int len) nogil
cdef extern from "libxml/tree.h":
ctypedef struct xmlDoc
From scoder at codespeak.net Mon May 5 21:43:10 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Mon, 5 May 2008 21:43:10 +0200 (CEST)
Subject: [Lxml-checkins] r54448 - in lxml/trunk: . src/lxml
Message-ID: <20080505194310.4C4FF168448@codespeak.net>
Author: scoder
Date: Mon May 5 21:43:09 2008
New Revision: 54448
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/apihelpers.pxi
lxml/trunk/src/lxml/lxml.objectify.pyx
lxml/trunk/src/lxml/objectpath.pxi
Log:
r4161 at delle: sbehnel | 2008-05-05 09:54:55 +0200
special node matcher for objectify, exploits the fact that all node names come from the document dictionary
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Mon May 5 21:43:09 2008
@@ -19,6 +19,9 @@
Other changes
-------------
+* Up to several times faster attribute access (i.e. tree traversal) in
+ lxml.objectify.
+
2.1beta2 (2008-05-02)
=====================
Modified: lxml/trunk/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/trunk/src/lxml/apihelpers.pxi (original)
+++ lxml/trunk/src/lxml/apihelpers.pxi Mon May 5 21:43:09 2008
@@ -696,8 +696,8 @@
elif c_href is NULL:
if _getNs(c_node) is not NULL:
return 0
- return cstd.strcmp(c_node.name, c_name) == 0
- elif cstd.strcmp(c_node.name, c_name) == 0:
+ return c_node.name == c_name or cstd.strcmp(c_node.name, c_name) == 0
+ elif c_node.name == c_name or cstd.strcmp(c_node.name, c_name) == 0:
c_node_href = _getNs(c_node)
if c_node_href is NULL:
return c_href[0] == c'\0'
Modified: lxml/trunk/src/lxml/lxml.objectify.pyx
==============================================================================
--- lxml/trunk/src/lxml/lxml.objectify.pyx (original)
+++ lxml/trunk/src/lxml/lxml.objectify.pyx Mon May 5 21:43:09 2008
@@ -385,6 +385,17 @@
prefix = '.'.join(prefix)
return _buildDescendantPaths(self._c_node, prefix)
+cdef inline bint _tagMatches(tree.xmlNode* c_node, char* c_href, char* c_name):
+ cdef char* c_node_href
+ if c_node.name != c_name:
+ return 0
+ if c_href == NULL:
+ return 1
+ c_node_href = tree._getNs(c_node)
+ if c_node_href == NULL:
+ return c_href[0] == c'\0'
+ return cstd.strcmp(c_node_href, c_href) == 0
+
cdef Py_ssize_t _countSiblings(tree.xmlNode* c_start_node):
cdef tree.xmlNode* c_node
cdef char* c_href
@@ -396,13 +407,13 @@
c_node = c_start_node.next
while c_node is not NULL:
if c_node.type == tree.XML_ELEMENT_NODE and \
- cetree.tagMatches(c_node, c_href, c_tag):
+ _tagMatches(c_node, c_href, c_tag):
count = count + 1
c_node = c_node.next
c_node = c_start_node.prev
while c_node is not NULL:
if c_node.type == tree.XML_ELEMENT_NODE and \
- cetree.tagMatches(c_node, c_href, c_tag):
+ _tagMatches(c_node, c_href, c_tag):
count = count + 1
c_node = c_node.prev
return count
@@ -418,7 +429,7 @@
next = cetree.previousElement
while c_node is not NULL:
if c_node.type == tree.XML_ELEMENT_NODE and \
- cetree.tagMatches(c_node, href, name):
+ _tagMatches(c_node, href, name):
index = index - 1
if index < 0:
return c_node
@@ -430,9 +441,12 @@
cdef tree.xmlNode* c_node
cdef char* c_href
cdef char* c_tag
- ns, tag = cetree.getNsTag(tag)
- c_tag = _cstr(tag)
c_node = parent._c_node
+ ns, tag = cetree.getNsTag(tag)
+ c_tag = tree.xmlDictExists(
+ c_node.doc.dict, _cstr(tag), python.PyString_GET_SIZE(tag))
+ if c_tag is NULL:
+ return None
if ns is None:
c_href = tree._getNs(c_node)
else:
Modified: lxml/trunk/src/lxml/objectpath.pxi
==============================================================================
--- lxml/trunk/src/lxml/objectpath.pxi (original)
+++ lxml/trunk/src/lxml/objectpath.pxi Mon May 5 21:43:09 2008
@@ -206,7 +206,11 @@
c_path = c_path + 1
if c_path[0].href is not NULL:
c_href = c_path[0].href # otherwise: keep parent namespace
- c_name = c_path[0].name
+ c_name = tree.xmlDictExists(c_node.doc.dict, c_path[0].name, -1)
+ if c_name is NULL:
+ c_name = c_path[0].name
+ c_node = NULL
+ break
c_index = c_path[0].index
if c_index < 0:
@@ -253,14 +257,17 @@
c_path = c_path + 1
if c_path[0].href is not NULL:
c_href = c_path[0].href # otherwise: keep parent namespace
- c_name = c_path[0].name
c_index = c_path[0].index
-
- if c_index < 0:
- c_child = c_node.last
+ c_name = tree.xmlDictExists(c_node.doc.dict, c_path[0].name, -1)
+ if c_name is NULL:
+ c_name = c_path[0].name
+ c_child = NULL
else:
- c_child = c_node.children
- c_child = _findFollowingSibling(c_child, c_href, c_name, c_index)
+ if c_index < 0:
+ c_child = c_node.last
+ else:
+ c_child = c_node.children
+ c_child = _findFollowingSibling(c_child, c_href, c_name, c_index)
if c_child is not NULL:
c_node = c_child
From scoder at codespeak.net Mon May 5 21:43:17 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Mon, 5 May 2008 21:43:17 +0200 (CEST)
Subject: [Lxml-checkins] r54449 - in lxml/trunk: . src/lxml
Message-ID: <20080505194317.0CE6316844B@codespeak.net>
Author: scoder
Date: Mon May 5 21:43:16 2008
New Revision: 54449
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/proxy.pxi
Log:
r4162 at delle: sbehnel | 2008-05-05 17:43:28 +0200
cleanup in _copyParentNamespaces()
Modified: lxml/trunk/src/lxml/proxy.pxi
==============================================================================
--- lxml/trunk/src/lxml/proxy.pxi (original)
+++ lxml/trunk/src/lxml/proxy.pxi Mon May 5 21:43:16 2008
@@ -203,10 +203,8 @@
c_parent.type == tree.XML_DOCUMENT_NODE):
c_new_ns = c_parent.nsDef
while c_new_ns is not NULL:
- # check if prefix is already defined
- c_ns = tree.xmlSearchNs(c_to_node.doc, c_to_node, c_new_ns.prefix)
- if c_ns is NULL:
- tree.xmlNewNs(c_to_node, c_new_ns.href, c_new_ns.prefix)
+ # libxml2 will check if the prefix is already defined
+ tree.xmlNewNs(c_to_node, c_new_ns.href, c_new_ns.prefix)
c_new_ns = c_new_ns.next
c_parent = c_parent.parent
From scoder at codespeak.net Tue May 6 20:39:49 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:39:49 +0200 (CEST)
Subject: [Lxml-checkins] r54488 - in lxml/trunk: . src/lxml
Message-ID: <20080506183949.4C552168410@codespeak.net>
Author: scoder
Date: Tue May 6 20:39:46 2008
New Revision: 54488
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/parser.pxi
Log:
r4168 at delle: sbehnel | 2008-05-05 23:41:19 +0200
fix HTML names to always come from the dictionary
Modified: lxml/trunk/src/lxml/parser.pxi
==============================================================================
--- lxml/trunk/src/lxml/parser.pxi (original)
+++ lxml/trunk/src/lxml/parser.pxi Tue May 6 20:39:46 2008
@@ -520,6 +520,8 @@
if well_formed:
__GLOBAL_PARSER_CONTEXT.initDocDict(result)
+ if c_ctxt.html:
+ _fixHtmlDictNames(result)
else:
# free broken document
tree.xmlFreeDoc(result)
@@ -540,6 +542,31 @@
result.URL = tree.xmlStrdup(_cstr(filename))
return result
+cdef int _fixHtmlDictNames(xmlDoc* c_doc) except -1:
+ cdef char* c_name
+ cdef xmlNode* c_attr
+ cdef xmlNode* c_node = c_doc.children
+ tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_doc, c_node, 0)
+ if c_node.type == tree.XML_ELEMENT_NODE:
+ if not tree.xmlDictOwns(c_doc.dict, c_node.name):
+ c_name = tree.xmlDictLookup(c_doc.dict, c_node.name, -1)
+ if c_name is NULL:
+ python.PyErr_NoMemory()
+ return -1
+ tree.xmlFree(c_node.name)
+ c_node.name = c_name
+ c_attr = c_node.properties
+ while c_attr is not NULL:
+ if not tree.xmlDictOwns(c_doc.dict, c_attr.name):
+ c_name = tree.xmlDictLookup(c_doc.dict, c_attr.name, -1)
+ if c_name is NULL:
+ python.PyErr_NoMemory()
+ return -1
+ tree.xmlFree(c_attr.name)
+ c_attr.name = c_name
+ c_attr = c_attr.next
+ tree.END_FOR_EACH_ELEMENT_FROM(c_node)
+
cdef class _BaseParser:
cdef ElementClassLookup _class_lookup
From scoder at codespeak.net Tue May 6 20:39:54 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:39:54 +0200 (CEST)
Subject: [Lxml-checkins] r54489 - in lxml/trunk: . src/lxml
Message-ID: <20080506183954.F2BAD168413@codespeak.net>
Author: scoder
Date: Tue May 6 20:39:54 2008
New Revision: 54489
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/lxml.etree.pyx
lxml/trunk/src/lxml/tree.pxd
Log:
r4169 at delle: sbehnel | 2008-05-05 23:43:08 +0200
some cleanup, optimise tree iterator to compare name pointers instead of strings
Modified: lxml/trunk/src/lxml/lxml.etree.pyx
==============================================================================
--- lxml/trunk/src/lxml/lxml.etree.pyx (original)
+++ lxml/trunk/src/lxml/lxml.etree.pyx Tue May 6 20:39:54 2008
@@ -2036,10 +2036,10 @@
def __next__(self):
cdef xmlNode* c_node
cdef _Element current_node
+ if self._node is None:
+ raise StopIteration
# Python ref:
current_node = self._node
- if current_node is None:
- raise StopIteration
self._storeNext(current_node)
return current_node
@@ -2131,9 +2131,9 @@
def __next__(self):
cdef xmlNode* c_node
cdef _Element current_node
- current_node = self._next_node
- if current_node is None:
+ if self._next_node is None:
raise StopIteration
+ current_node = self._next_node
c_node = self._next_node._c_node
if self._name is NULL and self._href is NULL:
c_node = self._nextNodeAnyTag(c_node)
@@ -2153,9 +2153,16 @@
return NULL
cdef xmlNode* _nextNodeMatchTag(self, xmlNode* c_node):
+ cdef char* c_name = NULL
+ if self._name is not NULL:
+ c_name = tree.xmlDictExists(c_node.doc.dict, self._name, -1)
+ if c_name is NULL:
+ # not found in dict => not in document at all
+ return NULL
tree.BEGIN_FOR_EACH_ELEMENT_FROM(self._top_node._c_node, c_node, 0)
if c_node.type == tree.XML_ELEMENT_NODE:
- if _tagMatches(c_node, self._href, self._name):
+ if (c_name is NULL or c_name is c_node.name) and \
+ _tagMatches(c_node, self._href, self._name):
return c_node
tree.END_FOR_EACH_ELEMENT_FROM(c_node)
return NULL
Modified: lxml/trunk/src/lxml/tree.pxd
==============================================================================
--- lxml/trunk/src/lxml/tree.pxd (original)
+++ lxml/trunk/src/lxml/tree.pxd Tue May 6 20:39:54 2008
@@ -57,6 +57,7 @@
ctypedef struct xmlDict
cdef char* xmlDictLookup(xmlDict* dict, char* name, int len) nogil
cdef char* xmlDictExists(xmlDict* dict, char* name, int len) nogil
+ cdef int xmlDictOwns(xmlDict* dict, char* name) nogil
cdef extern from "libxml/tree.h":
ctypedef struct xmlDoc
From scoder at codespeak.net Tue May 6 20:39:56 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:39:56 +0200 (CEST)
Subject: [Lxml-checkins] r54490 - lxml/branch/lxml-2.0/src/lxml
Message-ID: <20080506183956.E8E91168414@codespeak.net>
Author: scoder
Date: Tue May 6 20:39:56 2008
New Revision: 54490
Modified:
lxml/branch/lxml-2.0/src/lxml/serializer.pxi
Log:
removed superfluous function call (bug 227259)
Modified: lxml/branch/lxml-2.0/src/lxml/serializer.pxi
==============================================================================
--- lxml/branch/lxml-2.0/src/lxml/serializer.pxi (original)
+++ lxml/branch/lxml-2.0/src/lxml/serializer.pxi Tue May 6 20:39:56 2008
@@ -360,7 +360,6 @@
write_xml_declaration, write_doctype,
pretty_print, with_tail)
tree.xmlOutputBufferClose(c_buffer)
- tree.xmlCharEncCloseFunc(enchandler)
if writer is None:
python.PyEval_RestoreThread(state)
else:
From scoder at codespeak.net Tue May 6 20:40:02 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:40:02 +0200 (CEST)
Subject: [Lxml-checkins] r54491 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080506184002.72BA9168415@codespeak.net>
Author: scoder
Date: Tue May 6 20:40:01 2008
New Revision: 54491
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/test_io.py
Log:
r4170 at delle: sbehnel | 2008-05-06 18:38:42 +0200
new test case for encoded I/O
Modified: lxml/trunk/src/lxml/tests/test_io.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_io.py (original)
+++ lxml/trunk/src/lxml/tests/test_io.py Tue May 6 20:40:01 2008
@@ -45,6 +45,7 @@
ElementTree = self.etree.ElementTree
element = Element('top')
+ element.text = u"qwrtio???\uAABB"
tree = ElementTree(element)
self.buildNodes(element, 10, 3)
f = open(self.getTestFilePath('testdump.xml'), 'w')
@@ -63,6 +64,31 @@
data2 = f.read()
f.close()
self.assertEquals(data1, data2)
+
+ def test_tree_io_latin1(self):
+ Element = self.etree.Element
+ ElementTree = self.etree.ElementTree
+
+ element = Element('top')
+ element.text = u"qwrtio??????"
+ tree = ElementTree(element)
+ self.buildNodes(element, 10, 3)
+ f = open(self.getTestFilePath('testdump.xml'), 'w')
+ tree.write(f, encoding='iso-8859-1')
+ f.close()
+ f = open(self.getTestFilePath('testdump.xml'), 'r')
+ tree = ElementTree(file=f)
+ f.close()
+ f = open(self.getTestFilePath('testdump2.xml'), 'w')
+ tree.write(f, encoding='iso-8859-1')
+ f.close()
+ f = open(self.getTestFilePath('testdump.xml'), 'r')
+ data1 = f.read()
+ f.close()
+ f = open(self.getTestFilePath('testdump2.xml'), 'r')
+ data2 = f.read()
+ f.close()
+ self.assertEquals(data1, data2)
def test_write_filename(self):
# (c)ElementTree supports filename strings as write argument
From scoder at codespeak.net Tue May 6 20:40:07 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:40:07 +0200 (CEST)
Subject: [Lxml-checkins] r54492 - lxml/branch/lxml-1.3/src/lxml
Message-ID: <20080506184007.288B3168415@codespeak.net>
Author: scoder
Date: Tue May 6 20:40:06 2008
New Revision: 54492
Modified:
lxml/branch/lxml-1.3/src/lxml/serializer.pxi
Log:
removed superfluous function call (bug 227259)
Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi
==============================================================================
--- lxml/branch/lxml-1.3/src/lxml/serializer.pxi (original)
+++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi Tue May 6 20:40:06 2008
@@ -278,7 +278,6 @@
_writeNodeToBuffer(c_buffer, element._c_node, c_enc,
write_xml_declaration, write_doctype, pretty_print)
tree.xmlOutputBufferClose(c_buffer)
- tree.xmlCharEncCloseFunc(enchandler)
if writer is None:
python.PyEval_RestoreThread(state)
else:
From scoder at codespeak.net Tue May 6 20:40:10 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:40:10 +0200 (CEST)
Subject: [Lxml-checkins] r54493 - in lxml/trunk: . src/lxml
Message-ID: <20080506184010.589A7168415@codespeak.net>
Author: scoder
Date: Tue May 6 20:40:09 2008
New Revision: 54493
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/serializer.pxi
Log:
r4171 at delle: sbehnel | 2008-05-06 18:40:11 +0200
removed superfluous function call (bug 227259)
Modified: lxml/trunk/src/lxml/serializer.pxi
==============================================================================
--- lxml/trunk/src/lxml/serializer.pxi (original)
+++ lxml/trunk/src/lxml/serializer.pxi Tue May 6 20:40:09 2008
@@ -350,7 +350,6 @@
write_xml_declaration, write_doctype,
pretty_print, with_tail)
tree.xmlOutputBufferClose(c_buffer)
- tree.xmlCharEncCloseFunc(enchandler)
if writer is None:
python.PyEval_RestoreThread(state)
else:
From scoder at codespeak.net Tue May 6 20:40:23 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:40:23 +0200 (CEST)
Subject: [Lxml-checkins] r54494 - in lxml/trunk: . tools
Message-ID: <20080506184023.4E98F168410@codespeak.net>
Author: scoder
Date: Tue May 6 20:40:22 2008
New Revision: 54494
Modified:
lxml/trunk/ (props changed)
lxml/trunk/tools/xpathgrep.py
Log:
r4172 at delle: sbehnel | 2008-05-06 19:28:15 +0200
API usage fix
Modified: lxml/trunk/tools/xpathgrep.py
==============================================================================
--- lxml/trunk/tools/xpathgrep.py (original)
+++ lxml/trunk/tools/xpathgrep.py Tue May 6 20:40:22 2008
@@ -200,7 +200,7 @@
register_builtins()
namespaces["py"] = PYTHON_BUILTINS_NS
- xpath = et.XPath(args[0], namespaces)
+ xpath = et.XPath(args[0], namespaces=namespaces)
found = False
if len(args) == 1:
From scoder at codespeak.net Tue May 6 20:40:28 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:40:28 +0200 (CEST)
Subject: [Lxml-checkins] r54495 - lxml/trunk
Message-ID: <20080506184028.060AB168414@codespeak.net>
Author: scoder
Date: Tue May 6 20:40:28 2008
New Revision: 54495
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
Log:
r4173 at delle: sbehnel | 2008-05-06 19:28:55 +0200
changelog
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Tue May 6 20:40:28 2008
@@ -16,6 +16,8 @@
Bugs fixed
----------
+* Rare crash when serialising to a file object with certain encodings.
+
Other changes
-------------
From scoder at codespeak.net Tue May 6 20:40:34 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 20:40:34 +0200 (CEST)
Subject: [Lxml-checkins] r54496 - lxml/trunk
Message-ID: <20080506184034.BC0C3168415@codespeak.net>
Author: scoder
Date: Tue May 6 20:40:34 2008
New Revision: 54496
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
Log:
r4174 at delle: sbehnel | 2008-05-06 19:29:52 +0200
changelog
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Tue May 6 20:40:34 2008
@@ -9,9 +9,10 @@
--------------
* Conversion functions ``html_to_xhtml()`` and ``xhtml_to_html()`` in
- lxml.html.
+ lxml.html (experimental).
-* Most features in lxml.html work for XHTML namespaced tag names.
+* Most features in lxml.html work for XHTML namespaced tag names
+ (experimental).
Bugs fixed
----------
From scoder at codespeak.net Tue May 6 22:37:39 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 22:37:39 +0200 (CEST)
Subject: [Lxml-checkins] r54502 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080506203739.1666C1683E7@codespeak.net>
Author: scoder
Date: Tue May 6 22:37:38 2008
New Revision: 54502
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/test_etree.py
Log:
r4184 at delle: sbehnel | 2008-05-06 21:19:32 +0200
test split
Modified: lxml/trunk/src/lxml/tests/test_etree.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_etree.py (original)
+++ lxml/trunk/src/lxml/tests/test_etree.py Tue May 6 22:37:38 2008
@@ -393,7 +393,7 @@
8,
len(events))
- def test_iterparse_encoding_8bit_override(self):
+ def test_iterparse_encoding_error(self):
text = u'S?k p? nettet'
wrong_declaration = ""
xml_latin1 = (u'%s%s ' % (wrong_declaration, text)
@@ -402,6 +402,12 @@
self.assertRaises(self.etree.ParseError,
list, self.etree.iterparse(StringIO(xml_latin1)))
+ def test_iterparse_encoding_8bit_override(self):
+ text = u'S?k p? nettet'
+ wrong_declaration = ""
+ xml_latin1 = (u'%s%s ' % (wrong_declaration, text)
+ ).encode('iso-8859-1')
+
iterator = self.etree.iterparse(StringIO(xml_latin1),
encoding="iso-8859-1")
self.assertEquals(1, len(list(iterator)))
From scoder at codespeak.net Tue May 6 22:37:45 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 22:37:45 +0200 (CEST)
Subject: [Lxml-checkins] r54503 - in lxml/trunk: . src/lxml
Message-ID: <20080506203745.914BB1683F3@codespeak.net>
Author: scoder
Date: Tue May 6 22:37:45 2008
New Revision: 54503
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/iterparse.pxi
lxml/trunk/src/lxml/parser.pxi
Log:
r4185 at delle: sbehnel | 2008-05-06 21:21:23 +0200
fixes for HTML name dictification and a corner case in iterparse()
Modified: lxml/trunk/src/lxml/iterparse.pxi
==============================================================================
--- lxml/trunk/src/lxml/iterparse.pxi (original)
+++ lxml/trunk/src/lxml/iterparse.pxi Tue May 6 22:37:45 2008
@@ -201,6 +201,8 @@
cdef _IterparseContext context
context = <_IterparseContext>c_ctxt._private
try:
+ if c_ctxt.html:
+ _fixHtmlDictNodeNames(c_ctxt.dict, c_node)
context.startNode(c_node)
except:
if c_ctxt.errNo == xmlerror.XML_ERR_OK:
@@ -452,7 +454,8 @@
not context._validator.isvalid()):
self._source = None
del context._events[:]
- _raiseParseError(pctxt, self._filename, context._error_log)
+ _handleParseResult(context, pctxt, NULL,
+ self._filename, self._for_html)
if python.PyList_GET_SIZE(context._events) == 0:
self.root = context._root
self._source = None
Modified: lxml/trunk/src/lxml/parser.pxi
==============================================================================
--- lxml/trunk/src/lxml/parser.pxi (original)
+++ lxml/trunk/src/lxml/parser.pxi Tue May 6 22:37:45 2008
@@ -244,6 +244,10 @@
result = htmlparser.htmlCtxtReadIO(
ctxt, _readFilelikeParser, NULL, self,
self._c_url, c_encoding, options)
+ if result is not NULL:
+ if _fixHtmlDictNames(ctxt.dict, result) < 0:
+ tree.xmlFreeDoc(result)
+ result = NULL
else:
result = xmlparser.xmlCtxtReadIO(
ctxt, _readFilelikeParser, NULL, self,
@@ -493,8 +497,12 @@
xmlDoc* result, filename,
bint recover) except NULL:
cdef bint well_formed
+ if result is not NULL:
+ __GLOBAL_PARSER_CONTEXT.initDocDict(result)
+
if c_ctxt.myDoc is not NULL:
- if c_ctxt.myDoc != result:
+ if c_ctxt.myDoc is not result:
+ __GLOBAL_PARSER_CONTEXT.initDocDict(c_ctxt.myDoc)
tree.xmlFreeDoc(c_ctxt.myDoc)
c_ctxt.myDoc = NULL
@@ -518,11 +526,7 @@
else:
well_formed = 0
- if well_formed:
- __GLOBAL_PARSER_CONTEXT.initDocDict(result)
- if c_ctxt.html:
- _fixHtmlDictNames(result)
- else:
+ if not well_formed:
# free broken document
tree.xmlFreeDoc(result)
result = NULL
@@ -542,31 +546,38 @@
result.URL = tree.xmlStrdup(_cstr(filename))
return result
-cdef int _fixHtmlDictNames(xmlDoc* c_doc) except -1:
- cdef char* c_name
- cdef xmlNode* c_attr
- cdef xmlNode* c_node = c_doc.children
+cdef int _fixHtmlDictNames(tree.xmlDict* c_dict, xmlDoc* c_doc) nogil:
+ cdef xmlNode* c_node
+ if c_doc is NULL:
+ return 0
+ c_node = c_doc.children
tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_doc, c_node, 0)
if c_node.type == tree.XML_ELEMENT_NODE:
- if not tree.xmlDictOwns(c_doc.dict, c_node.name):
- c_name = tree.xmlDictLookup(c_doc.dict, c_node.name, -1)
- if c_name is NULL:
- python.PyErr_NoMemory()
- return -1
- tree.xmlFree(c_node.name)
- c_node.name = c_name
- c_attr = c_node.properties
- while c_attr is not NULL:
- if not tree.xmlDictOwns(c_doc.dict, c_attr.name):
- c_name = tree.xmlDictLookup(c_doc.dict, c_attr.name, -1)
- if c_name is NULL:
- python.PyErr_NoMemory()
- return -1
- tree.xmlFree(c_attr.name)
- c_attr.name = c_name
- c_attr = c_attr.next
+ if _fixHtmlDictNodeNames(c_dict, c_node) < 0:
+ return -1
tree.END_FOR_EACH_ELEMENT_FROM(c_node)
+ return 0
+cdef inline int _fixHtmlDictNodeNames(tree.xmlDict* c_dict,
+ xmlNode* c_node) nogil:
+ cdef xmlNode* c_attr
+ cdef char* c_name
+ c_name = tree.xmlDictLookup(c_dict, c_node.name, -1)
+ if c_name is NULL:
+ return -1
+ if c_name is not c_node.name:
+ tree.xmlFree(c_node.name)
+ c_node.name = c_name
+ c_attr = c_node.properties
+ while c_attr is not NULL:
+ c_name = tree.xmlDictLookup(c_dict, c_attr.name, -1)
+ if c_name is NULL:
+ return -1
+ if c_name is not c_attr.name:
+ tree.xmlFree(c_attr.name)
+ c_attr.name = c_name
+ c_attr = c_attr.next
+ return 0
cdef class _BaseParser:
cdef ElementClassLookup _class_lookup
@@ -784,6 +795,10 @@
result = htmlparser.htmlCtxtReadMemory(
pctxt, c_text, buffer_len, c_filename, _UNICODE_ENCODING,
self._parse_options)
+ if result is not NULL:
+ if _fixHtmlDictNames(pctxt.dict, result) < 0:
+ tree.xmlFreeDoc(result)
+ result = NULL
else:
result = xmlparser.xmlCtxtReadMemory(
pctxt, c_text, buffer_len, c_filename, _UNICODE_ENCODING,
@@ -820,6 +835,10 @@
result = htmlparser.htmlCtxtReadMemory(
pctxt, c_text, c_len, c_filename,
c_encoding, self._parse_options)
+ if result is not NULL:
+ if _fixHtmlDictNames(pctxt.dict, result) < 0:
+ tree.xmlFreeDoc(result)
+ result = NULL
else:
result = xmlparser.xmlCtxtReadMemory(
pctxt, c_text, c_len, c_filename,
@@ -853,6 +872,10 @@
if self._for_html:
result = htmlparser.htmlCtxtReadFile(
pctxt, c_filename, c_encoding, self._parse_options)
+ if result is not NULL:
+ if _fixHtmlDictNames(pctxt.dict, result) < 0:
+ tree.xmlFreeDoc(result)
+ result = NULL
else:
result = xmlparser.xmlCtxtReadFile(
pctxt, c_filename, c_encoding, self._parse_options)
From scoder at codespeak.net Tue May 6 22:37:51 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 22:37:51 +0200 (CEST)
Subject: [Lxml-checkins] r54504 - in lxml/trunk: . src/lxml
Message-ID: <20080506203751.85C421683F5@codespeak.net>
Author: scoder
Date: Tue May 6 22:37:51 2008
New Revision: 54504
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/iterparse.pxi
Log:
r4186 at delle: sbehnel | 2008-05-06 21:56:12 +0200
fix iterparse(): proactively free the document on parse errors if Python will not free it for us
Modified: lxml/trunk/src/lxml/iterparse.pxi
==============================================================================
--- lxml/trunk/src/lxml/iterparse.pxi (original)
+++ lxml/trunk/src/lxml/iterparse.pxi Tue May 6 22:37:51 2008
@@ -195,6 +195,11 @@
python.PyList_Append(self._events, (event, node))
return 0
+ cdef void _assureDocGetsFreed(self):
+ if self._c_ctxt.myDoc is not NULL and self._doc is None:
+ tree.xmlFreeDoc(self._c_ctxt.myDoc)
+ self._c_ctxt.myDoc = NULL
+
cdef inline void _pushSaxStartEvent(xmlparser.xmlParserCtxt* c_ctxt,
xmlNode* c_node):
@@ -454,8 +459,8 @@
not context._validator.isvalid()):
self._source = None
del context._events[:]
- _handleParseResult(context, pctxt, NULL,
- self._filename, self._for_html)
+ context._assureDocGetsFreed()
+ _raiseParseError(pctxt, self._filename, context._error_log)
if python.PyList_GET_SIZE(context._events) == 0:
self.root = context._root
self._source = None
From scoder at codespeak.net Tue May 6 22:37:56 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 22:37:56 +0200 (CEST)
Subject: [Lxml-checkins] r54505 - lxml/branch/lxml-1.3
Message-ID: <20080506203756.6FC3F1683E7@codespeak.net>
Author: scoder
Date: Tue May 6 22:37:55 2008
New Revision: 54505
Modified:
lxml/branch/lxml-1.3/CHANGES.txt
Log:
changelog
Modified: lxml/branch/lxml-1.3/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-1.3/CHANGES.txt (original)
+++ lxml/branch/lxml-1.3/CHANGES.txt Tue May 6 22:37:55 2008
@@ -2,6 +2,21 @@
lxml changelog
==============
+Under development
+=================
+
+Features added
+--------------
+
+Bugs fixed
+----------
+
+* Rare crash when serialising to a file object with certain encodings.
+
+Other changes
+-------------
+
+
1.3.6 (2007-10-29)
==================
From scoder at codespeak.net Tue May 6 22:38:39 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 22:38:39 +0200 (CEST)
Subject: [Lxml-checkins] r54506 - lxml/branch/lxml-2.0
Message-ID: <20080506203839.85E461683E7@codespeak.net>
Author: scoder
Date: Tue May 6 22:38:38 2008
New Revision: 54506
Modified:
lxml/branch/lxml-2.0/CHANGES.txt
Log:
changelog
Modified: lxml/branch/lxml-2.0/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-2.0/CHANGES.txt (original)
+++ lxml/branch/lxml-2.0/CHANGES.txt Tue May 6 22:38:38 2008
@@ -2,6 +2,21 @@
lxml changelog
==============
+Under development
+=================
+
+Features added
+--------------
+
+Bugs fixed
+----------
+
+* Rare crash when serialising to a file object with certain encodings.
+
+Other changes
+-------------
+
+
2.0.5 (2008-05-01)
==================
From scoder at codespeak.net Tue May 6 23:25:42 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 6 May 2008 23:25:42 +0200 (CEST)
Subject: [Lxml-checkins] r54508 - in lxml/branch/lxml-2.0/src/lxml: . tests
Message-ID: <20080506212542.0571E168409@codespeak.net>
Author: scoder
Date: Tue May 6 23:25:42 2008
New Revision: 54508
Modified:
lxml/branch/lxml-2.0/src/lxml/apihelpers.pxi
lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx
lxml/branch/lxml-2.0/src/lxml/proxy.pxi
lxml/branch/lxml-2.0/src/lxml/tests/test_threading.py
lxml/branch/lxml-2.0/src/lxml/tree.pxd
Log:
large trunk merge of threading fix
Modified: lxml/branch/lxml-2.0/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/branch/lxml-2.0/src/lxml/apihelpers.pxi (original)
+++ lxml/branch/lxml-2.0/src/lxml/apihelpers.pxi Tue May 6 23:25:42 2008
@@ -643,7 +643,7 @@
_moveTail(c_next, c_node)
if not attemptDeallocation(c_node):
# make namespaces absolute
- moveNodeToDocument(doc, c_node)
+ moveNodeToDocument(doc, c_node.doc, c_node)
return 0
cdef void _moveTail(xmlNode* c_tail, xmlNode* c_target):
@@ -709,6 +709,7 @@
"""
cdef xmlNode* c_orig_neighbour
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
cdef _Element element
cdef Py_ssize_t seqlength, i, c
cdef _node_to_node_function next_element
@@ -792,12 +793,13 @@
for element in elements:
assert element is not None, "Node must not be None"
# move element and tail over
+ c_source_doc = element._c_node.doc
c_next = element._c_node.next
tree.xmlAddPrevSibling(c_node, element._c_node)
_moveTail(c_next, element._c_node)
# integrate element into new document
- moveNodeToDocument(parent._doc, element._c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, element._c_node)
# stop at the end of the slice
if slicelength > 0:
@@ -827,7 +829,9 @@
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = child._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -836,7 +840,7 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(parent._doc, c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, c_node)
cdef int _prependChild(_Element parent, _Element child) except -1:
"""Prepend a new child to a parent element.
@@ -844,7 +848,9 @@
cdef xmlNode* c_next
cdef xmlNode* c_child
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = child._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -857,14 +863,16 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(parent._doc, c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, c_node)
cdef int _appendSibling(_Element element, _Element sibling) except -1:
"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = sibling._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -872,14 +880,16 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(element._doc, c_node)
+ moveNodeToDocument(element._doc, c_source_doc, c_node)
cdef int _prependSibling(_Element element, _Element sibling) except -1:
"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = sibling._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -887,7 +897,7 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(element._doc, c_node)
+ moveNodeToDocument(element._doc, c_source_doc, c_node)
cdef int isutf8(char* s):
cdef char c
Modified: lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx
==============================================================================
--- lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx (original)
+++ lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx Tue May 6 23:25:42 2008
@@ -557,6 +557,7 @@
"""
cdef xmlNode* c_node
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
cdef _Element element
cdef bint left_to_right
cdef Py_ssize_t slicelength, step
@@ -578,13 +579,14 @@
c_node = _findChild(self._c_node, x)
if c_node is NULL:
raise IndexError("list index out of range")
+ c_source_doc = element._c_node.doc
c_next = element._c_node.next
_removeText(c_node.next)
tree.xmlReplaceNode(c_node, element._c_node)
_moveTail(c_next, element._c_node)
- moveNodeToDocument(self._doc, element._c_node)
+ moveNodeToDocument(self._doc, c_source_doc, element._c_node)
if not attemptDeallocation(c_node):
- moveNodeToDocument(self._doc, c_node)
+ moveNodeToDocument(self._doc, c_node.doc, c_node)
def __delitem__(self, x):
"""__delitem__(self, x)
@@ -731,14 +733,16 @@
"""
cdef xmlNode* c_node
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
c_node = _findChild(self._c_node, index)
if c_node is NULL:
_appendChild(self, element)
return
+ c_source_doc = c_node.doc
c_next = element._c_node.next
tree.xmlAddPrevSibling(c_node, element._c_node)
_moveTail(c_next, element._c_node)
- moveNodeToDocument(self._doc, element._c_node)
+ moveNodeToDocument(self._doc, c_source_doc, element._c_node)
def remove(self, _Element element not None):
"""remove(self, element)
@@ -756,7 +760,7 @@
tree.xmlUnlinkNode(c_node)
_moveTail(c_next, c_node)
# fix namespace declarations
- moveNodeToDocument(self._doc, c_node)
+ moveNodeToDocument(self._doc, c_node.doc, c_node)
def replace(self, _Element old_element not None,
_Element new_element not None):
@@ -768,18 +772,20 @@
cdef xmlNode* c_old_next
cdef xmlNode* c_new_node
cdef xmlNode* c_new_next
+ cdef xmlDoc* c_source_doc
c_old_node = old_element._c_node
if c_old_node.parent is not self._c_node:
raise ValueError("Element is not a child of this node.")
c_old_next = c_old_node.next
c_new_node = new_element._c_node
c_new_next = c_new_node.next
+ c_source_doc = c_new_node.doc
tree.xmlReplaceNode(c_old_node, c_new_node)
_moveTail(c_new_next, c_new_node)
_moveTail(c_old_next, c_old_node)
- moveNodeToDocument(self._doc, c_new_node)
+ moveNodeToDocument(self._doc, c_source_doc, c_new_node)
# fix namespace declarations
- moveNodeToDocument(self._doc, c_old_node)
+ moveNodeToDocument(self._doc, c_old_node.doc, c_old_node)
# PROPERTIES
property tag:
Modified: lxml/branch/lxml-2.0/src/lxml/proxy.pxi
==============================================================================
--- lxml/branch/lxml-2.0/src/lxml/proxy.pxi (original)
+++ lxml/branch/lxml-2.0/src/lxml/proxy.pxi Tue May 6 23:25:42 2008
@@ -4,7 +4,7 @@
# structure of the respective node to avoid multiple instantiation of
# the Python class
-cdef _Element getProxy(xmlNode* c_node):
+cdef inline _Element getProxy(xmlNode* c_node):
"""Get a proxy for a given node.
"""
#print "getProxy for:", c_node
@@ -13,10 +13,10 @@
else:
return None
-cdef int hasProxy(xmlNode* c_node):
+cdef inline int hasProxy(xmlNode* c_node):
return c_node._private is not NULL
-cdef int _registerProxy(_Element proxy) except -1:
+cdef inline int _registerProxy(_Element proxy) except -1:
"""Register a proxy and type for the node it's proxying for.
"""
cdef xmlNode* c_node
@@ -31,7 +31,7 @@
proxy._gc_doc = proxy._doc
python.Py_INCREF(proxy._doc)
-cdef int _unregisterProxy(_Element proxy) except -1:
+cdef inline int _unregisterProxy(_Element proxy) except -1:
"""Unregister a proxy for the node it's proxying for.
"""
cdef xmlNode* c_node
@@ -40,12 +40,24 @@
c_node._private = NULL
return 0
-cdef void _releaseProxy(_Element proxy):
+cdef inline void _releaseProxy(_Element proxy):
"""An additional DECREF for the document.
"""
python.Py_XDECREF(proxy._gc_doc)
proxy._gc_doc = NULL
+cdef inline void _updateProxyDocument(xmlNode* c_node, _Document doc):
+ """Replace the document reference of a proxy.
+
+ This may deallocate the original document of the proxy!
+ """
+ cdef _Element element = <_Element>c_node._private
+ if element._doc is not doc:
+ python.Py_INCREF(doc)
+ python.Py_DECREF(element._doc)
+ element._doc = doc
+ element._gc_doc = doc
+
################################################################################
# temporarily make a node the root node of its document
@@ -196,7 +208,74 @@
c_new_ns = c_new_ns.next
c_parent = c_parent.parent
-cdef int moveNodeToDocument(_Document doc, xmlNode* c_element) except -1:
+ctypedef struct _nscache:
+ xmlNs** new
+ xmlNs** old
+ cstd.size_t size
+ cstd.size_t last
+
+cdef int _growNsCache(_nscache* c_ns_cache) except -1:
+ cdef xmlNs** c_ns_ptr
+ if c_ns_cache.size == 0:
+ c_ns_cache.size = 20
+ else:
+ c_ns_cache.size *= 2
+ c_ns_ptr = cstd.realloc(
+ c_ns_cache.new, c_ns_cache.size * sizeof(xmlNs*))
+ if c_ns_ptr is not NULL:
+ c_ns_cache.new = c_ns_ptr
+ c_ns_ptr = cstd.realloc(
+ c_ns_cache.old, c_ns_cache.size * sizeof(xmlNs*))
+ if c_ns_ptr is not NULL:
+ c_ns_cache.old = c_ns_ptr
+ else:
+ cstd.free(c_ns_cache.new)
+ cstd.free(c_ns_cache.old)
+ python.PyErr_NoMemory()
+ return -1
+ return 0
+
+cdef inline int _appendToNsCache(_nscache* c_ns_cache,
+ xmlNs* c_old_ns, xmlNs* c_new_ns) except -1:
+ if c_ns_cache.last >= c_ns_cache.size:
+ _growNsCache(c_ns_cache)
+ c_ns_cache.old[c_ns_cache.last] = c_old_ns
+ c_ns_cache.new[c_ns_cache.last] = c_new_ns
+ c_ns_cache.last += 1
+
+cdef int _stripRedundantNamespaceDeclarations(
+ xmlNode* c_element, _nscache* c_ns_cache, xmlNs** c_del_ns_list) except -1:
+ """Removes namespace declarations from an element that are already
+ defined in its parents. Does not free the xmlNs's, just prepends
+ them to the c_del_ns_list.
+ """
+ cdef xmlNs* c_ns
+ cdef xmlNs* c_ns_next
+ cdef xmlNs** c_nsdef
+ # use a xmlNs** to handle assignments to "c_element.nsDef" correctly
+ c_nsdef = &c_element.nsDef
+ while c_nsdef[0] is not NULL:
+ c_ns = tree.xmlSearchNsByHref(
+ c_element.doc, c_element.parent, c_nsdef[0].href)
+ if c_ns is NULL:
+ # new namespace href => keep and cache the ns declaration
+ _appendToNsCache(c_ns_cache, c_nsdef[0], c_nsdef[0])
+ c_nsdef = &c_nsdef[0].next
+ else:
+ # known namespace href => strip the ns
+ if c_ns is tree.xmlSearchNs(c_element.doc, c_element.parent,
+ c_ns.prefix):
+ # prefix is not shadowed by parents => ns is reusable
+ _appendToNsCache(c_ns_cache, c_nsdef[0], c_ns)
+ # cut out c_nsdef.next and prepend it to garbage chain
+ c_ns_next = c_nsdef[0].next
+ c_nsdef[0].next = c_del_ns_list[0]
+ c_del_ns_list[0] = c_nsdef[0]
+ c_nsdef[0] = c_ns_next
+ return 0
+
+cdef int moveNodeToDocument(_Document doc, xmlDoc* c_source_doc,
+ xmlNode* c_element) except -1:
"""Fix the xmlNs pointers of a node and its subtree that were moved.
Mainly copied from libxml2's xmlReconciliateNs(). Expects libxml2 doc
@@ -213,7 +292,11 @@
prefix). If a namespace is unknown, declare a new one on the
node.
- 3) Set the Document reference to the new Document (if different).
+ 3) Reassign the names of tags and attribute from the dict of the
+ target document *iff* it is different from the dict used in the
+ source subtree.
+
+ 4) Set the Document reference to the new Document (if different).
This is done on backtracking to keep the original Document
alive as long as possible, until all its elements are updated.
@@ -221,96 +304,66 @@
step 1), but freed only after the complete subtree was traversed
and all occurrences were replaced by tree-internal pointers.
"""
- cdef _Element element
- cdef xmlDoc* c_doc
cdef xmlNode* c_start_node
cdef xmlNode* c_node
- cdef xmlNs** c_ns_ptr
- cdef xmlNs** c_ns_new_cache
- cdef xmlNs** c_ns_old_cache
+ cdef char* c_name
+ cdef _nscache c_ns_cache
cdef xmlNs* c_ns
cdef xmlNs* c_ns_next
cdef xmlNs* c_nsdef
- cdef xmlNs* c_new_ns
- cdef xmlNs* c_del_ns
- cdef cstd.size_t i, c_cache_size, c_cache_last
+ cdef xmlNs* c_del_ns_list
+ cdef cstd.size_t i
+ cdef tree.xmlDict* c_dict
if not tree._isElementOrXInclude(c_element):
return 0
- c_doc = c_element.doc
+ # we need to copy the names of tags and attributes iff the element
+ # is based on a different libxml2 tag name dictionary
+ if doc._c_doc.dict is not c_source_doc.dict:
+ c_dict = doc._c_doc.dict
+ else:
+ c_dict = NULL
+
c_start_node = c_element
- c_ns_new_cache = NULL
- c_ns_old_cache = NULL
- c_cache_size = 0
- c_cache_last = 0
- c_del_ns = NULL
+ c_del_ns_list = NULL
+
+ c_ns_cache.new = NULL
+ c_ns_cache.old = NULL
+ c_ns_cache.size = 0
+ c_ns_cache.last = 0
while c_element is not NULL:
# 1) cut out namespaces defined here that are already known by
# the ancestors
- c_nsdef = c_element.nsDef
- if c_nsdef is not NULL:
- # start with second nsdef to keep c_element.nsDef for now
- while c_nsdef.next is not NULL:
- if c_nsdef.next is c_element.ns:
- c_nsdef = c_nsdef.next
- continue
- c_ns = tree.xmlSearchNsByHref(
- c_element.doc, c_element.parent, c_nsdef.next.href)
- if c_ns is NULL:
- c_nsdef = c_nsdef.next
- continue
- # cut out c_nsdef.next and prepend it to garbage chain
- c_ns_next = c_nsdef.next.next
- c_nsdef.next.next = c_del_ns
- c_del_ns = c_nsdef.next
- c_nsdef.next = c_ns_next
- # now handle c_element.nsDef
- c_ns = tree.xmlSearchNsByHref(
- c_element.doc, c_element.parent, c_element.nsDef.href)
- if c_ns is not NULL:
- c_ns_next = c_element.nsDef.next
- c_element.nsDef.next = c_del_ns
- c_del_ns = c_element.nsDef
- c_element.nsDef = c_ns_next
+ if c_element.nsDef is not NULL:
+ _stripRedundantNamespaceDeclarations(
+ c_element, &c_ns_cache, &c_del_ns_list)
- # 2) make sure the namespace of an element and its attributes
- # is declared in this document (i.e. the node or its parents)
+ # 2) make sure the namespaces of an element and its attributes
+ # are declared in this document (i.e. on the node or its parents)
c_node = c_element
while c_node is not NULL:
if c_node.ns is not NULL:
- for i from 0 <= i < c_cache_last:
- if c_node.ns is c_ns_old_cache[i]:
- c_node.ns = c_ns_new_cache[i]
+ for i from 0 <= i < c_ns_cache.last:
+ if c_node.ns is c_ns_cache.old[i]:
+ c_node.ns = c_ns_cache.new[i]
break
else:
# not in cache => find a replacement from this document
- c_new_ns = doc._findOrBuildNodeNs(
+ c_ns = doc._findOrBuildNodeNs(
c_element, c_node.ns.href, c_node.ns.prefix)
- if c_cache_last >= c_cache_size:
- # must resize cache
- if c_cache_size == 0:
- c_cache_size = 20
- else:
- c_cache_size *= 2
- c_ns_ptr = cstd.realloc(
- c_ns_new_cache, c_cache_size * sizeof(xmlNs*))
- if c_ns_ptr is not NULL:
- c_ns_new_cache = c_ns_ptr
- c_ns_ptr = cstd.realloc(
- c_ns_old_cache, c_cache_size * sizeof(xmlNs*))
- if c_ns_ptr is not NULL:
- c_ns_old_cache = c_ns_ptr
- else:
- cstd.free(c_ns_new_cache)
- cstd.free(c_ns_old_cache)
- python.PyErr_NoMemory()
- return -1
- c_ns_new_cache[c_cache_last] = c_new_ns
- c_ns_old_cache[c_cache_last] = c_node.ns
- c_cache_last += 1
- c_node.ns = c_new_ns
+ _appendToNsCache(&c_ns_cache, c_node.ns, c_ns)
+ c_node.ns = c_ns
+
+ # 3) re-assign names from the target dict
+ if c_dict is not NULL:
+ c_name = tree.xmlDictLookup(c_dict, c_node.name, -1)
+ # c_name can be NULL on memory error, but we don't
+ # handle that here
+ if c_name is not NULL:
+ c_node.name = c_name
+
if c_node is c_element:
# after the element, continue with its attributes
c_node = c_element.properties
@@ -326,14 +379,9 @@
if c_node is NULL:
# no children => back off and continue with siblings and parents
- # 3) fix _Document reference (may dealloc the original document!)
+ # 4) fix _Document reference (may dealloc the original document!)
if c_element._private is not NULL:
- element = <_Element>c_element._private
- if element._doc is not doc:
- python.Py_INCREF(doc)
- python.Py_DECREF(element._doc)
- element._doc = doc
- element._gc_doc = doc
+ _updateProxyDocument(c_element, doc)
if c_element is c_start_node:
break # all done
@@ -349,14 +397,9 @@
if c_element is NULL or not tree._isElementOrXInclude(c_element):
break
- # 3) fix _Document reference (may dealloc the original document!)
+ # 4) fix _Document reference (may dealloc the original document!)
if c_element._private is not NULL:
- element = <_Element>c_element._private
- if element._doc is not doc:
- python.Py_INCREF(doc)
- python.Py_DECREF(element._doc)
- element._doc = doc
- element._gc_doc = doc
+ _updateProxyDocument(c_element, doc)
if c_element is c_start_node:
break
@@ -370,13 +413,13 @@
c_element = c_node
# free now unused namespace declarations
- if c_del_ns is not NULL:
- tree.xmlFreeNsList(c_del_ns)
+ if c_del_ns_list is not NULL:
+ tree.xmlFreeNsList(c_del_ns_list)
# cleanup
- if c_ns_new_cache is not NULL:
- cstd.free(c_ns_new_cache)
- if c_ns_old_cache is not NULL:
- cstd.free(c_ns_old_cache)
+ if c_ns_cache.new is not NULL:
+ cstd.free(c_ns_cache.new)
+ if c_ns_cache.old is not NULL:
+ cstd.free(c_ns_cache.old)
return 0
Modified: lxml/branch/lxml-2.0/src/lxml/tests/test_threading.py
==============================================================================
--- lxml/branch/lxml-2.0/src/lxml/tests/test_threading.py (original)
+++ lxml/branch/lxml-2.0/src/lxml/tests/test_threading.py Tue May 6 23:25:42 2008
@@ -6,13 +6,18 @@
import unittest, threading
-from common_imports import etree, HelperTestCase
+from common_imports import etree, HelperTestCase, StringIO
class ThreadingTestCase(HelperTestCase):
"""Threading tests"""
etree = etree
- def test_subtree_copy(self):
+ def _run_thread(self, func):
+ thread = threading.Thread(target=func)
+ thread.start()
+ thread.join()
+
+ def test_subtree_copy_thread(self):
tostring = self.etree.tostring
XML = self.etree.XML
xml = " "
@@ -23,12 +28,113 @@
main_root.append(thread_root[0])
del thread_root
- thread = threading.Thread(target=run_thread)
- thread.start()
- thread.join()
-
+ self._run_thread(run_thread)
self.assertEquals(xml, tostring(main_root))
+ def test_main_xslt_in_thread(self):
+ XML = self.etree.XML
+ style = XML('''\
+
+
+
+
+ ''')
+ st = etree.XSLT(style)
+
+ result = []
+
+ def run_thread():
+ root = XML('B C ')
+ result.append( st(root) )
+
+ self._run_thread(run_thread)
+ self.assertEquals('''\
+
+B
+''',
+ str(result[0]))
+
+ def test_thread_xslt(self):
+ XML = self.etree.XML
+ tostring = self.etree.tostring
+ root = XML('B C ')
+
+ def run_thread():
+ style = XML('''\
+
+
+
+
+ ''')
+ st = etree.XSLT(style)
+ root.append( st(root).getroot() )
+
+ self._run_thread(run_thread)
+ self.assertEquals('B C B ',
+ tostring(root))
+
+ def test_thread_mix(self):
+ XML = self.etree.XML
+ Element = self.etree.Element
+ SubElement = self.etree.SubElement
+ tostring = self.etree.tostring
+ xml = 'B C '
+ root = XML(xml)
+ fragment = XML(" ")
+
+ result = self.etree.Element("{myns}root", att = "someval")
+
+ def run_XML():
+ thread_root = XML(xml)
+ result.append(thread_root[0])
+ result.append(thread_root[-1])
+
+ def run_parse():
+ thread_root = self.etree.parse(StringIO(xml)).getroot()
+ result.append(thread_root[0])
+ result.append(thread_root[-1])
+
+ def run_move_main():
+ result.append(fragment[0])
+
+ def run_build():
+ result.append(
+ Element("{myns}foo", attrib={'{test}attr':'val'}))
+ SubElement(result, "{otherns}tasty")
+
+ def run_xslt():
+ style = XML('''\
+
+
+
+
+ ''')
+ st = etree.XSLT(style)
+ result.append( st(root).getroot()[0] )
+
+ for test in (run_XML, run_parse, run_move_main, run_xslt):
+ tostring(result)
+ self._run_thread(test)
+
+ self.assertEquals(
+ 'B C B C B ',
+ tostring(result))
+
+ def strip_first():
+ root = Element("newroot")
+ root.append(result[0])
+
+ while len(result):
+ self._run_thread(strip_first)
+
+ self.assertEquals(
+ ' ',
+ tostring(result))
+
+
def test_suite():
suite = unittest.TestSuite()
suite.addTests([unittest.makeSuite(ThreadingTestCase)])
Modified: lxml/branch/lxml-2.0/src/lxml/tree.pxd
==============================================================================
--- lxml/branch/lxml-2.0/src/lxml/tree.pxd (original)
+++ lxml/branch/lxml-2.0/src/lxml/tree.pxd Tue May 6 23:25:42 2008
@@ -52,12 +52,12 @@
void xmlHashScan(xmlHashTable* table, xmlHashScanner f, void* data) nogil
void* xmlHashLookup(xmlHashTable* table, char* name) nogil
-cdef extern from "libxml/tree.h":
-
- # for some reason need to define this in this section;
+cdef extern from *: # actually "libxml/dict.h"
# libxml/dict.h appears to be broken to include in C
ctypedef struct xmlDict
-
+ cdef char* xmlDictLookup(xmlDict* dict, char* name, int len) nogil
+
+cdef extern from "libxml/tree.h":
ctypedef struct xmlDoc
ctypedef struct xmlAttr
ctypedef struct xmlNotationTable
From scoder at codespeak.net Thu May 8 17:44:50 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 8 May 2008 17:44:50 +0200 (CEST)
Subject: [Lxml-checkins] r54564 - in lxml/trunk: . src/lxml
Message-ID: <20080508154450.A7732168537@codespeak.net>
Author: scoder
Date: Thu May 8 17:44:48 2008
New Revision: 54564
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/lxml.etree.pyx
Log:
r4190 at delle: sbehnel | 2008-05-07 18:59:54 +0200
use DEF for module constants
Modified: lxml/trunk/src/lxml/lxml.etree.pyx
==============================================================================
--- lxml/trunk/src/lxml/lxml.etree.pyx (original)
+++ lxml/trunk/src/lxml/lxml.etree.pyx Thu May 8 17:44:48 2008
@@ -50,12 +50,10 @@
# what to do with libxml2/libxslt error messages?
# 0 : drop
# 1 : use log
-cdef int __DEBUG
-__DEBUG = 1
+DEF __DEBUG = 1
# maximum number of lines in the libxml2/xslt log if __DEBUG == 1
-cdef int __MAX_LOG_SIZE
-__MAX_LOG_SIZE = 100
+DEF __MAX_LOG_SIZE = 100
# make the compiled-in debug state publicly available
DEBUG = __DEBUG
From scoder at codespeak.net Thu May 8 17:44:56 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 8 May 2008 17:44:56 +0200 (CEST)
Subject: [Lxml-checkins] r54565 - in lxml/trunk: . src/lxml
Message-ID: <20080508154456.9E1DB168538@codespeak.net>
Author: scoder
Date: Thu May 8 17:44:56 2008
New Revision: 54565
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/iterparse.pxi
Log:
r4191 at delle: sbehnel | 2008-05-07 19:28:06 +0200
large cleanup in iterparse code
Modified: lxml/trunk/src/lxml/iterparse.pxi
==============================================================================
--- lxml/trunk/src/lxml/iterparse.pxi (original)
+++ lxml/trunk/src/lxml/iterparse.pxi Thu May 8 17:44:56 2008
@@ -1,7 +1,6 @@
-# iterparse -- incremental parsing
+# iterparse -- event-driven parsing
-cdef object __ITERPARSE_CHUNK_SIZE
-__ITERPARSE_CHUNK_SIZE = 32768
+DEF __ITERPARSE_CHUNK_SIZE = 32768
ctypedef enum _IterparseEventFilter:
ITERPARSE_FILTER_START = 1
@@ -201,91 +200,97 @@
self._c_ctxt.myDoc = NULL
-cdef inline void _pushSaxStartEvent(xmlparser.xmlParserCtxt* c_ctxt,
+cdef inline void _pushSaxStartEvent(_IterparseContext context,
xmlNode* c_node):
- cdef _IterparseContext context
- context = <_IterparseContext>c_ctxt._private
try:
- if c_ctxt.html:
- _fixHtmlDictNodeNames(c_ctxt.dict, c_node)
+ if context._c_ctxt.html:
+ _fixHtmlDictNodeNames(context._c_ctxt.dict, c_node)
context.startNode(c_node)
except:
- if c_ctxt.errNo == xmlerror.XML_ERR_OK:
- c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR
- c_ctxt.disableSAX = 1
+ if context._c_ctxt.errNo == xmlerror.XML_ERR_OK:
+ context._c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR
+ context._c_ctxt.disableSAX = 1
context._store_raised()
-cdef inline void _pushSaxEndEvent(xmlparser.xmlParserCtxt* c_ctxt,
+cdef inline void _pushSaxEndEvent(_IterparseContext context,
xmlNode* c_node):
- cdef _IterparseContext context
- context = <_IterparseContext>c_ctxt._private
try:
context.endNode(c_node)
except:
- if c_ctxt.errNo == xmlerror.XML_ERR_OK:
- c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR
- c_ctxt.disableSAX = 1
+ if context._c_ctxt.errNo == xmlerror.XML_ERR_OK:
+ context._c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR
+ context._c_ctxt.disableSAX = 1
context._store_raised()
-cdef inline void _pushSaxEvent(xmlparser.xmlParserCtxt* c_ctxt,
+cdef inline void _pushSaxEvent(_IterparseContext context,
event, xmlNode* c_node):
- cdef _IterparseContext context
- context = <_IterparseContext>c_ctxt._private
try:
context.pushEvent(event, c_node)
except:
- if c_ctxt.errNo == xmlerror.XML_ERR_OK:
- c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR
- c_ctxt.disableSAX = 1
+ if context._c_ctxt.errNo == xmlerror.XML_ERR_OK:
+ context._c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR
+ context._c_ctxt.disableSAX = 1
context._store_raised()
cdef void _iterparseSaxStart(void* ctxt, char* localname, char* prefix,
char* URI, int nb_namespaces, char** namespaces,
int nb_attributes, int nb_defaulted,
- char** attributes):
+ char** attributes) with gil:
cdef xmlparser.xmlParserCtxt* c_ctxt
+ cdef _IterparseContext context
c_ctxt = ctxt
- (<_IterparseContext>c_ctxt._private)._origSaxStart(
+ context = <_IterparseContext>c_ctxt._private
+ context._origSaxStart(
ctxt, localname, prefix, URI,
nb_namespaces, namespaces,
nb_attributes, nb_defaulted, attributes)
- _pushSaxStartEvent(c_ctxt, c_ctxt.node)
+ _pushSaxStartEvent(context, c_ctxt.node)
-cdef void _iterparseSaxEnd(void* ctxt, char* localname, char* prefix, char* URI):
+cdef void _iterparseSaxEnd(void* ctxt, char* localname, char* prefix, char* URI) with gil:
cdef xmlparser.xmlParserCtxt* c_ctxt
+ cdef _IterparseContext context
c_ctxt = ctxt
- _pushSaxEndEvent(c_ctxt, c_ctxt.node)
- (<_IterparseContext>c_ctxt._private)._origSaxEnd(ctxt, localname, prefix, URI)
+ context = <_IterparseContext>c_ctxt._private
+ _pushSaxEndEvent(context, c_ctxt.node)
+ context._origSaxEnd(ctxt, localname, prefix, URI)
-cdef void _iterparseSaxStartNoNs(void* ctxt, char* name, char** attributes):
+cdef void _iterparseSaxStartNoNs(void* ctxt, char* name, char** attributes) with gil:
cdef xmlparser.xmlParserCtxt* c_ctxt
+ cdef _IterparseContext context
c_ctxt = ctxt
- (<_IterparseContext>c_ctxt._private)._origSaxStartNoNs(ctxt, name, attributes)
- _pushSaxStartEvent(c_ctxt, c_ctxt.node)
+ context = <_IterparseContext>c_ctxt._private
+ context._origSaxStartNoNs(ctxt, name, attributes)
+ _pushSaxStartEvent(context, c_ctxt.node)
-cdef void _iterparseSaxEndNoNs(void* ctxt, char* name):
+cdef void _iterparseSaxEndNoNs(void* ctxt, char* name) with gil:
cdef xmlparser.xmlParserCtxt* c_ctxt
+ cdef _IterparseContext context
c_ctxt = ctxt
- _pushSaxEndEvent(c_ctxt, c_ctxt.node)
- (<_IterparseContext>c_ctxt._private)._origSaxEndNoNs(ctxt, name)
+ context = <_IterparseContext>c_ctxt._private
+ _pushSaxEndEvent(context, c_ctxt.node)
+ context._origSaxEndNoNs(ctxt, name)
-cdef void _iterparseSaxComment(void* ctxt, char* text):
+cdef void _iterparseSaxComment(void* ctxt, char* text) with gil:
cdef xmlNode* c_node
cdef xmlparser.xmlParserCtxt* c_ctxt
+ cdef _IterparseContext context
c_ctxt = ctxt
- (<_IterparseContext>c_ctxt._private)._origSaxComment(ctxt, text)
+ context = <_IterparseContext>c_ctxt._private
+ context._origSaxComment(ctxt, text)
c_node = _iterparseFindLastNode(c_ctxt)
if c_node is not NULL:
- _pushSaxEvent(c_ctxt, "comment", c_node)
+ _pushSaxEvent(context, "comment", c_node)
-cdef void _iterparseSaxPI(void* ctxt, char* target, char* data):
+cdef void _iterparseSaxPI(void* ctxt, char* target, char* data) with gil:
cdef xmlNode* c_node
cdef xmlparser.xmlParserCtxt* c_ctxt
+ cdef _IterparseContext context
c_ctxt = ctxt
- (<_IterparseContext>c_ctxt._private)._origSaxPI(ctxt, target, data)
+ context = <_IterparseContext>c_ctxt._private
+ context._origSaxPI(ctxt, target, data)
c_node = _iterparseFindLastNode(c_ctxt)
if c_node is not NULL:
- _pushSaxEvent(c_ctxt, "pi", c_node)
+ _pushSaxEvent(context, "pi", c_node)
cdef inline xmlNode* _iterparseFindLastNode(xmlparser.xmlParserCtxt* c_ctxt):
# this mimics what libxml2 creates for comments/PIs
@@ -342,10 +347,12 @@
- encoding - override the document encoding
- schema - an XMLSchema to validate against
"""
- cdef object _source
- cdef object _events
cdef object _tag
+ cdef object _events
cdef readonly object root
+ cdef object _source
+ cdef int (*_parse_chunk)(xmlparser.xmlParserCtxt* ctxt,
+ char* chunk, int size, int terminate)
def __init__(self, source, events=("end",), *, tag=None,
attribute_defaults=False, dtd_validation=False,
load_dtd=False, no_network=True, remove_blank_text=False,
@@ -394,6 +401,11 @@
remove_comments, remove_pis, strip_cdata,
None, filename, encoding)
+ if self._for_html:
+ self._parse_chunk = htmlparser.htmlParseChunk
+ else:
+ self._parse_chunk = xmlparser.xmlParseChunk
+
context = <_IterparseContext>self._getPushParserContext()
__GLOBAL_PARSER_CONTEXT.initParserDict(context._c_ctxt)
context.prepare()
@@ -422,41 +434,39 @@
def __next__(self):
cdef _IterparseContext context
cdef xmlparser.xmlParserCtxt* pctxt
+ cdef char* c_data
+ cdef Py_ssize_t c_data_len
cdef int error
if self._source is None:
raise StopIteration
- context = <_IterparseContext>self._getPushParserContext()
+ context = <_IterparseContext>self._push_parser_context
if python.PyList_GET_SIZE(context._events) > context._event_index:
item = python.PyList_GET_ITEM(context._events, context._event_index)
python.Py_INCREF(item) # 'borrowed reference' from PyList_GET_ITEM
- context._event_index = context._event_index + 1
+ context._event_index += 1
return item
del context._events[:]
pctxt = context._c_ctxt
error = 0
- while python.PyList_GET_SIZE(context._events) == 0 and error == 0:
+ while python.PyList_GET_SIZE(context._events) == 0:
data = self._source.read(__ITERPARSE_CHUNK_SIZE)
if not python.PyString_Check(data):
self._source = None
raise TypeError, "reading file objects must return plain strings"
- elif data:
- if self._for_html:
- error = htmlparser.htmlParseChunk(
- pctxt, _cstr(data), python.PyString_GET_SIZE(data), 0)
- else:
- error = xmlparser.xmlParseChunk(
- pctxt, _cstr(data), python.PyString_GET_SIZE(data), 0)
+ c_data_len = python.PyString_GET_SIZE(data)
+ if c_data_len == 0:
+ c_data = NULL
else:
- if self._for_html:
- error = htmlparser.htmlParseChunk(pctxt, NULL, 0, 1)
- else:
- error = xmlparser.xmlParseChunk(pctxt, NULL, 0, 1)
- self._source = None
+ c_data = _cstr(data)
+ with nogil:
+ error = self._parse_chunk(
+ pctxt, c_data, c_data_len, (c_data_len == 0))
+ if error or c_data_len == 0:
break
- if error != 0 or (context._validator is not None and
- not context._validator.isvalid()):
+ if error or (context._validator is not None and
+ not context._validator.isvalid()):
self._source = None
del context._events[:]
context._assureDocGetsFreed()
From scoder at codespeak.net Thu May 8 17:45:02 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 8 May 2008 17:45:02 +0200 (CEST)
Subject: [Lxml-checkins] r54566 - in lxml/trunk: . src/lxml
Message-ID: <20080508154502.1AD2816853A@codespeak.net>
Author: scoder
Date: Thu May 8 17:45:01 2008
New Revision: 54566
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/cstd.pxd
lxml/trunk/src/lxml/iterparse.pxi
Log:
r4192 at delle: sbehnel | 2008-05-07 20:32:28 +0200
free GIL in iterparse() only when reading from a plain file
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Thu May 8 17:45:01 2008
@@ -8,6 +8,9 @@
Features added
--------------
+* Running ``iterparse()`` on a plain file (or filename) frees the GIL
+ on reading.
+
* Conversion functions ``html_to_xhtml()`` and ``xhtml_to_html()`` in
lxml.html (experimental).
Modified: lxml/trunk/src/lxml/cstd.pxd
==============================================================================
--- lxml/trunk/src/lxml/cstd.pxd (original)
+++ lxml/trunk/src/lxml/cstd.pxd Thu May 8 17:45:01 2008
@@ -1,9 +1,4 @@
-cdef extern from "stdio.h":
- ctypedef struct FILE
- cdef int sprintf(char* str, char* format, ...) nogil
- cdef int printf(char* str) nogil
-
cdef extern from "string.h":
ctypedef int size_t
cdef int strlen(char* s) nogil
@@ -15,6 +10,15 @@
cdef void* memcpy(void* dest, void* src, size_t len) nogil
cdef void* memset(void* s, int c, size_t len) nogil
+cdef extern from "stdio.h":
+ ctypedef struct FILE
+ cdef size_t fread(void *ptr, size_t size, size_t nmemb,
+ FILE *stream) nogil
+ cdef int feof(FILE *stream) nogil
+ cdef int ferror(FILE *stream) nogil
+ cdef int sprintf(char* str, char* format, ...) nogil
+ cdef int printf(char* str) nogil
+
cdef extern from "stdlib.h":
cdef void* malloc(size_t size) nogil
cdef void* realloc(void* ptr, size_t size) nogil
Modified: lxml/trunk/src/lxml/iterparse.pxi
==============================================================================
--- lxml/trunk/src/lxml/iterparse.pxi (original)
+++ lxml/trunk/src/lxml/iterparse.pxi Thu May 8 17:45:01 2008
@@ -235,7 +235,7 @@
cdef void _iterparseSaxStart(void* ctxt, char* localname, char* prefix,
char* URI, int nb_namespaces, char** namespaces,
int nb_attributes, int nb_defaulted,
- char** attributes) with gil:
+ char** attributes):
cdef xmlparser.xmlParserCtxt* c_ctxt
cdef _IterparseContext context
c_ctxt = ctxt
@@ -246,7 +246,7 @@
nb_attributes, nb_defaulted, attributes)
_pushSaxStartEvent(context, c_ctxt.node)
-cdef void _iterparseSaxEnd(void* ctxt, char* localname, char* prefix, char* URI) with gil:
+cdef void _iterparseSaxEnd(void* ctxt, char* localname, char* prefix, char* URI):
cdef xmlparser.xmlParserCtxt* c_ctxt
cdef _IterparseContext context
c_ctxt = ctxt
@@ -254,7 +254,7 @@
_pushSaxEndEvent(context, c_ctxt.node)
context._origSaxEnd(ctxt, localname, prefix, URI)
-cdef void _iterparseSaxStartNoNs(void* ctxt, char* name, char** attributes) with gil:
+cdef void _iterparseSaxStartNoNs(void* ctxt, char* name, char** attributes):
cdef xmlparser.xmlParserCtxt* c_ctxt
cdef _IterparseContext context
c_ctxt = ctxt
@@ -262,7 +262,7 @@
context._origSaxStartNoNs(ctxt, name, attributes)
_pushSaxStartEvent(context, c_ctxt.node)
-cdef void _iterparseSaxEndNoNs(void* ctxt, char* name) with gil:
+cdef void _iterparseSaxEndNoNs(void* ctxt, char* name):
cdef xmlparser.xmlParserCtxt* c_ctxt
cdef _IterparseContext context
c_ctxt = ctxt
@@ -270,7 +270,7 @@
_pushSaxEndEvent(context, c_ctxt.node)
context._origSaxEndNoNs(ctxt, name)
-cdef void _iterparseSaxComment(void* ctxt, char* text) with gil:
+cdef void _iterparseSaxComment(void* ctxt, char* text):
cdef xmlNode* c_node
cdef xmlparser.xmlParserCtxt* c_ctxt
cdef _IterparseContext context
@@ -281,7 +281,7 @@
if c_node is not NULL:
_pushSaxEvent(context, "comment", c_node)
-cdef void _iterparseSaxPI(void* ctxt, char* target, char* data) with gil:
+cdef void _iterparseSaxPI(void* ctxt, char* target, char* data):
cdef xmlNode* c_node
cdef xmlparser.xmlParserCtxt* c_ctxt
cdef _IterparseContext context
@@ -351,6 +351,7 @@
cdef object _events
cdef readonly object root
cdef object _source
+ cdef object _buffer
cdef int (*_parse_chunk)(xmlparser.xmlParserCtxt* ctxt,
char* chunk, int size, int terminate)
def __init__(self, source, events=("end",), *, tag=None,
@@ -434,9 +435,10 @@
def __next__(self):
cdef _IterparseContext context
cdef xmlparser.xmlParserCtxt* pctxt
+ cdef cstd.FILE* c_stream
cdef char* c_data
cdef Py_ssize_t c_data_len
- cdef int error
+ cdef int error, done
if self._source is None:
raise StopIteration
@@ -449,24 +451,41 @@
del context._events[:]
pctxt = context._c_ctxt
- error = 0
+ error = done = 0
+ c_stream = python.PyFile_AsFile(self._source)
while python.PyList_GET_SIZE(context._events) == 0:
- data = self._source.read(__ITERPARSE_CHUNK_SIZE)
- if not python.PyString_Check(data):
- self._source = None
- raise TypeError, "reading file objects must return plain strings"
- c_data_len = python.PyString_GET_SIZE(data)
- if c_data_len == 0:
- c_data = NULL
- else:
+ if c_stream is NULL:
+ data = self._source.read(__ITERPARSE_CHUNK_SIZE)
+ if not python.PyString_Check(data):
+ self._source = None
+ raise TypeError, "reading file objects must return plain strings"
+ c_data_len = python.PyString_GET_SIZE(data)
c_data = _cstr(data)
- with nogil:
- error = self._parse_chunk(
- pctxt, c_data, c_data_len, (c_data_len == 0))
- if error or c_data_len == 0:
+ done = (c_data_len == 0)
+ error = self._parse_chunk(pctxt, c_data, c_data_len, done)
+ else:
+ if self._buffer is None:
+ self._buffer = python.PyString_FromStringAndSize(
+ NULL, __ITERPARSE_CHUNK_SIZE)
+ c_data = _cstr(self._buffer)
+ with nogil:
+ c_data_len = cstd.fread(
+ c_data, 1, __ITERPARSE_CHUNK_SIZE, c_stream)
+ if c_data_len < __ITERPARSE_CHUNK_SIZE:
+ if cstd.ferror(c_stream):
+ error = 1
+ elif cstd.feof(c_stream):
+ done = 1
+ if not error:
+ error = self._parse_chunk(
+ pctxt, c_data, c_data_len, done)
+ if error or done:
+ self._buffer = None
break
- if error or (context._validator is not None and
- not context._validator.isvalid()):
+
+ if not error and context._validator is not None:
+ error = not context._validator.isvalid()
+ if error:
self._source = None
del context._events[:]
context._assureDocGetsFreed()
From scoder at codespeak.net Thu May 8 17:45:06 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 8 May 2008 17:45:06 +0200 (CEST)
Subject: [Lxml-checkins] r54567 - lxml/trunk
Message-ID: <20080508154506.C007E16853B@codespeak.net>
Author: scoder
Date: Thu May 8 17:45:05 2008
New Revision: 54567
Modified:
lxml/trunk/ (props changed)
lxml/trunk/INSTALL.txt
Log:
r4193 at delle: sbehnel | 2008-05-08 15:15:33 +0200
require libxml2 2.6.21+ for lxml 2.1 (because of schematron.h)
Modified: lxml/trunk/INSTALL.txt
==============================================================================
--- lxml/trunk/INSTALL.txt (original)
+++ lxml/trunk/INSTALL.txt Thu May 8 17:45:05 2008
@@ -8,7 +8,7 @@
You need libxml2 and libxslt, in particular:
-* libxml 2.6.20 or later. It can be found here:
+* libxml 2.6.21 or later. It can be found here:
http://xmlsoft.org/downloads.html
If you want to use XPath, do not use libxml2 2.6.27. We recommend
@@ -17,10 +17,9 @@
* libxslt 1.1.15 or later. It can be found here:
http://xmlsoft.org/XSLT/downloads.html
-Newer versions generally contain less bugs and are therefore recommended. The
-HTML parser benefits from libxml2 version 2.6.21 or later, which support
-parsing horribly broken HTML. XML Schema support is also still worked on in
-libxml2, so newer versions will give you better complience with the W3C spec.
+Newer versions generally contain less bugs and are therefore
+recommended. XML Schema support is also still worked on in libxml2,
+so newer versions will give you better complience with the W3C spec.
Installation
From scoder at codespeak.net Thu May 8 17:45:10 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 8 May 2008 17:45:10 +0200 (CEST)
Subject: [Lxml-checkins] r54568 - in lxml/trunk: . src/lxml
Message-ID: <20080508154510.98A99168537@codespeak.net>
Author: scoder
Date: Thu May 8 17:45:10 2008
New Revision: 54568
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/etree_defs.h
Log:
r4194 at delle: sbehnel | 2008-05-08 15:18:11 +0200
removed unused #define section: already provided by Cython
Modified: lxml/trunk/src/lxml/etree_defs.h
==============================================================================
--- lxml/trunk/src/lxml/etree_defs.h (original)
+++ lxml/trunk/src/lxml/etree_defs.h Thu May 8 17:45:10 2008
@@ -5,17 +5,6 @@
#define va_int(ap) va_arg(ap, int)
#define va_charptr(ap) va_arg(ap, char *)
-/* Py_ssize_t support was added in Python 2.5 */
-#if PY_VERSION_HEX < 0x02050000
-# ifndef PY_SSIZE_T_MAX /* patched Pyrex? */
- typedef int Py_ssize_t;
-# define PY_SSIZE_T_MAX INT_MAX
-# define PY_SSIZE_T_MIN INT_MIN
-# define PyInt_FromSsize_t(z) PyInt_FromLong(z)
-# define PyInt_AsSsize_t(o) PyInt_AsLong(o)
-# endif
-#endif
-
/* Threading can crash under Python <= 2.4.1 */
#if PY_VERSION_HEX < 0x02040200
# ifndef WITHOUT_THREADING
From scoder at codespeak.net Thu May 8 17:45:15 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 8 May 2008 17:45:15 +0200 (CEST)
Subject: [Lxml-checkins] r54569 - in lxml/trunk: . src/lxml
Message-ID: <20080508154515.412B716853C@codespeak.net>
Author: scoder
Date: Thu May 8 17:45:14 2008
New Revision: 54569
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/parser.pxi
Log:
r4195 at delle: sbehnel | 2008-05-08 17:43:13 +0200
when parsing from a plain file, free the GIL and do not pass through Python
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Thu May 8 17:45:14 2008
@@ -8,6 +8,8 @@
Features added
--------------
+* Parsing from a plain file object frees the GIL.
+
* Running ``iterparse()`` on a plain file (or filename) frees the GIL
on reading.
Modified: lxml/trunk/src/lxml/parser.pxi
==============================================================================
--- lxml/trunk/src/lxml/parser.pxi (original)
+++ lxml/trunk/src/lxml/parser.pxi Thu May 8 17:45:14 2008
@@ -222,45 +222,67 @@
self._bytes = ''
self._bytes_read = 0
+ cdef xmlparser.xmlParserInputBuffer* _createParserInputBuffer(self):
+ cdef cstd.FILE* c_stream
+ cdef xmlparser.xmlParserInputBuffer* c_buffer
+ c_buffer = xmlparser.xmlAllocParserInputBuffer(0)
+ c_stream = python.PyFile_AsFile(self._filelike)
+ if c_stream is NULL:
+ c_buffer.readcallback = _readFilelikeParser
+ c_buffer.context = self
+ else:
+ c_buffer.readcallback = _readFileParser
+ c_buffer.context = c_stream
+ return c_buffer
+
cdef xmlparser.xmlParserInput* _createParserInput(
self, xmlparser.xmlParserCtxt* ctxt):
cdef xmlparser.xmlParserInputBuffer* c_buffer
- c_buffer = xmlparser.xmlAllocParserInputBuffer(0)
- c_buffer.context = self
- c_buffer.readcallback = _readFilelikeParser
+ c_buffer = self._createParserInputBuffer()
return xmlparser.xmlNewIOInputStream(ctxt, c_buffer, 0)
+ cdef tree.xmlDtd* _readDtd(self):
+ cdef xmlparser.xmlParserInputBuffer* c_buffer
+ c_buffer = self._createParserInputBuffer()
+ with nogil:
+ return xmlparser.xmlIOParseDTD(NULL, c_buffer, 0)
+
cdef xmlDoc* _readDoc(self, xmlparser.xmlParserCtxt* ctxt, int options):
cdef xmlDoc* result
cdef char* c_encoding
+ cdef cstd.FILE* c_stream
+ cdef xmlparser.xmlInputReadCallback c_read_callback
+ cdef xmlparser.xmlInputCloseCallback c_close_callback
+ cdef void* c_callback_context
if self._encoding is None:
c_encoding = NULL
else:
c_encoding = _cstr(self._encoding)
+ c_stream = python.PyFile_AsFile(self._filelike)
+ if c_stream is NULL:
+ c_read_callback = _readFilelikeParser
+ c_callback_context = self
+ else:
+ c_read_callback = _readFileParser
+ c_callback_context = c_stream
+
with nogil:
if ctxt.html:
result = htmlparser.htmlCtxtReadIO(
- ctxt, _readFilelikeParser, NULL, self,
- self._c_url, c_encoding, options)
+ ctxt, c_read_callback, NULL, c_callback_context,
+ self._c_url, c_encoding, options)
if result is not NULL:
if _fixHtmlDictNames(ctxt.dict, result) < 0:
tree.xmlFreeDoc(result)
result = NULL
else:
result = xmlparser.xmlCtxtReadIO(
- ctxt, _readFilelikeParser, NULL, self,
+ ctxt, c_read_callback, NULL, c_callback_context,
self._c_url, c_encoding, options)
- return result
- cdef tree.xmlDtd* _readDtd(self):
- cdef xmlparser.xmlParserInputBuffer* c_buffer
- c_buffer = xmlparser.xmlAllocParserInputBuffer(0)
- c_buffer.context = self
- c_buffer.readcallback = _readFilelikeParser
- with nogil:
- return xmlparser.xmlIOParseDTD(NULL, c_buffer, 0)
+ return result
cdef int copyToBuffer(self, char* c_buffer, int c_size):
cdef char* c_start
@@ -293,6 +315,9 @@
cdef int _readFilelikeParser(void* ctxt, char* c_buffer, int c_size) with gil:
return (<_FileReaderContext>ctxt).copyToBuffer(c_buffer, c_size)
+cdef int _readFileParser(void* ctxt, char* c_buffer, int c_size) nogil:
+ return cstd.fread(c_buffer, 1, c_size, ctxt)
+
############################################################
## support for custom document loaders
############################################################
From lxml-checkins at codespeak.net Thu May 8 18:10:26 2008
From: lxml-checkins at codespeak.net (VIAGRA INC)
Date: Thu, 8 May 2008 18:10:26 +0200 (CEST)
Subject: [Lxml-checkins] SALE 89% OFF
Message-ID: <20080508050855.5807.qmail@host86-155-162-111.range86-155.btcentralplus.com>
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080508/906416c7/attachment.htm
From scoder at codespeak.net Fri May 9 16:07:25 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 9 May 2008 16:07:25 +0200 (CEST)
Subject: [Lxml-checkins] r54595 - lxml/branch/lxml-2.0
Message-ID: <20080509140725.93979169F0D@codespeak.net>
Author: scoder
Date: Fri May 9 16:07:24 2008
New Revision: 54595
Modified:
lxml/branch/lxml-2.0/CHANGES.txt
Log:
changelog
Modified: lxml/branch/lxml-2.0/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-2.0/CHANGES.txt (original)
+++ lxml/branch/lxml-2.0/CHANGES.txt Fri May 9 16:07:24 2008
@@ -11,6 +11,12 @@
Bugs fixed
----------
+* Windows build was broken.
+
+* Moving a subtree from a document created in one thread into a
+ document of another thread could crash when the rest of the source
+ document is deleted while the subtree is still in use.
+
* Rare crash when serialising to a file object with certain encodings.
Other changes
From scoder at codespeak.net Tue May 13 23:52:39 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 13 May 2008 23:52:39 +0200 (CEST)
Subject: [Lxml-checkins] r54711 - in lxml/trunk: . src/lxml
Message-ID: <20080513215239.BFFA6169EA5@codespeak.net>
Author: scoder
Date: Tue May 13 23:52:36 2008
New Revision: 54711
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/etree_defs.h
Log:
r4205 at delle: sbehnel | 2008-05-11 08:35:44 +0200
Py3 fix
Modified: lxml/trunk/src/lxml/etree_defs.h
==============================================================================
--- lxml/trunk/src/lxml/etree_defs.h (original)
+++ lxml/trunk/src/lxml/etree_defs.h Tue May 13 23:52:36 2008
@@ -101,11 +101,15 @@
#define unlikely_condition(x) (x)
#endif /* __GNUC__ */
+#ifndef Py_TYPE
+ #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
+#endif
+
#define PY_NEW(T) \
(((PyTypeObject*)(T))->tp_new( \
(PyTypeObject*)(T), __pyx_empty_tuple, NULL))
-#define _fqtypename(o) (((PyTypeObject*)(o))->ob_type->tp_name)
+#define _fqtypename(o) ((Py_TYPE(o))->tp_name)
#define _isString(obj) (PyString_CheckExact(obj) || \
PyUnicode_CheckExact(obj) || \
From scoder at codespeak.net Tue May 13 23:52:46 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 13 May 2008 23:52:46 +0200 (CEST)
Subject: [Lxml-checkins] r54712 - lxml/trunk
Message-ID: <20080513215246.151F7169EA6@codespeak.net>
Author: scoder
Date: Tue May 13 23:52:44 2008
New Revision: 54712
Modified:
lxml/trunk/ (props changed)
lxml/trunk/setupinfo.py
Log:
r4206 at delle: sbehnel | 2008-05-13 06:43:04 +0200
stupid bug in env var evaluation
Modified: lxml/trunk/setupinfo.py
==============================================================================
--- lxml/trunk/setupinfo.py (original)
+++ lxml/trunk/setupinfo.py Tue May 13 23:52:44 2008
@@ -258,7 +258,7 @@
except ValueError:
pass
# allow passing all cmd line options also as environment variables
- env_val = os.getenv(name.upper().replace('-', '_'), 'false').upper()
+ env_val = os.getenv(name.upper().replace('-', '_'), 'false').lower()
if env_val == "true":
return True
return False
From scoder at codespeak.net Tue May 13 23:52:52 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 13 May 2008 23:52:52 +0200 (CEST)
Subject: [Lxml-checkins] r54713 - lxml/trunk
Message-ID: <20080513215252.56810169EA7@codespeak.net>
Author: scoder
Date: Tue May 13 23:52:51 2008
New Revision: 54713
Modified:
lxml/trunk/ (props changed)
lxml/trunk/setupinfo.py
Log:
r4207 at delle: sbehnel | 2008-05-13 07:25:11 +0200
fix for broken env var evaluation
Modified: lxml/trunk/setupinfo.py
==============================================================================
--- lxml/trunk/setupinfo.py (original)
+++ lxml/trunk/setupinfo.py Tue May 13 23:52:51 2008
@@ -15,8 +15,11 @@
PACKAGE_PATH = "src/lxml/"
def env_var(name):
- value = os.getenv(name, '')
- return value.split(os.pathsep)
+ value = os.getenv(name)
+ if value:
+ return value.split(os.pathsep)
+ else:
+ return []
def ext_modules(static_include_dirs, static_library_dirs, static_cflags):
if CYTHON_INSTALLED:
From scoder at codespeak.net Tue May 13 23:52:58 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 13 May 2008 23:52:58 +0200 (CEST)
Subject: [Lxml-checkins] r54714 - lxml/trunk
Message-ID: <20080513215258.DFA78169EA5@codespeak.net>
Author: scoder
Date: Tue May 13 23:52:57 2008
New Revision: 54714
Modified:
lxml/trunk/ (props changed)
lxml/trunk/setupinfo.py
Log:
r4208 at delle: sbehnel | 2008-05-13 21:59:37 +0200
cleaned up old left over
Modified: lxml/trunk/setupinfo.py
==============================================================================
--- lxml/trunk/setupinfo.py (original)
+++ lxml/trunk/setupinfo.py Tue May 13 23:52:57 2008
@@ -27,7 +27,7 @@
print("Building with Cython %s." % Cython.Compiler.Version.version)
else:
print ("NOTE: Trying to build without Cython, pre-generated "
- "'%setree.c' needs to be available." % PACKAGE_PATH)
+ "'%slxml.etree.c' needs to be available." % PACKAGE_PATH)
source_extension = ".c"
if OPTION_WITHOUT_OBJECTIFY:
From scoder at codespeak.net Tue May 13 23:53:07 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 13 May 2008 23:53:07 +0200 (CEST)
Subject: [Lxml-checkins] r54715 - in lxml/trunk: . src/lxml
Message-ID: <20080513215307.A540B169EA6@codespeak.net>
Author: scoder
Date: Tue May 13 23:53:05 2008
New Revision: 54715
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/python.pxd
lxml/trunk/src/lxml/serializer.pxi
Log:
r4209 at delle: sbehnel | 2008-05-13 22:28:56 +0200
simplification
Modified: lxml/trunk/src/lxml/python.pxd
==============================================================================
--- lxml/trunk/src/lxml/python.pxd (original)
+++ lxml/trunk/src/lxml/python.pxd Tue May 13 23:53:05 2008
@@ -21,8 +21,6 @@
pass
cdef FILE* PyFile_AsFile(object p)
- cdef int PyFile_Check(object p)
- cdef object PyFile_Name(object p)
cdef int PyUnicode_Check(object obj)
cdef int PyString_Check(object obj)
Modified: lxml/trunk/src/lxml/serializer.pxi
==============================================================================
--- lxml/trunk/src/lxml/serializer.pxi (original)
+++ lxml/trunk/src/lxml/serializer.pxi Tue May 13 23:53:05 2008
@@ -400,9 +400,11 @@
cdef _dumpToFile(f, xmlNode* c_node, bint pretty_print, bint with_tail):
cdef tree.xmlOutputBuffer* c_buffer
- if not python.PyFile_Check(f):
+ cdef cstd.FILE* c_file
+ c_file = python.PyFile_AsFile(f)
+ if c_file is NULL:
raise ValueError, "not a file"
- c_buffer = tree.xmlOutputBufferCreateFile(python.PyFile_AsFile(f), NULL)
+ c_buffer = tree.xmlOutputBufferCreateFile(c_file, NULL)
tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0, pretty_print, NULL)
if with_tail:
_writeTail(c_buffer, c_node, NULL, 0)
From scoder at codespeak.net Thu May 15 20:08:49 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 15 May 2008 20:08:49 +0200 (CEST)
Subject: [Lxml-checkins] r54760 - in lxml/trunk: . src/lxml
Message-ID: <20080515180849.EDFDE2A8003@codespeak.net>
Author: scoder
Date: Thu May 15 20:08:46 2008
New Revision: 54760
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/apihelpers.pxi
Log:
r4216 at delle: sbehnel | 2008-05-14 18:14:49 +0200
better error message
Modified: lxml/trunk/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/trunk/src/lxml/apihelpers.pxi (original)
+++ lxml/trunk/src/lxml/apihelpers.pxi Thu May 15 20:08:46 2008
@@ -1020,13 +1020,13 @@
cdef object _utf8(object s):
if python.PyString_Check(s):
- assert not isutf8py(s), \
- "All strings must be XML compatible, either Unicode or ASCII"
+ assert isutf8py(s) == 0, \
+ "All strings must be XML compatible: Unicode or ASCII, no NULL bytes"
elif python.PyUnicode_Check(s):
# FIXME: we should test these strings, too ...
s = python.PyUnicode_AsUTF8String(s)
assert isutf8py(s) != -1, \
- "All strings must be XML compatible, either Unicode or ASCII"
+ "All strings must be XML compatible: Unicode or ASCII, no NULL bytes"
else:
raise TypeError, "Argument must be string or unicode."
return s
From scoder at codespeak.net Thu May 15 20:08:55 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 15 May 2008 20:08:55 +0200 (CEST)
Subject: [Lxml-checkins] r54761 - lxml/trunk
Message-ID: <20080515180855.A1C911684D5@codespeak.net>
Author: scoder
Date: Thu May 15 20:08:54 2008
New Revision: 54761
Modified:
lxml/trunk/ (props changed)
lxml/trunk/setupinfo.py
Log:
r4217 at delle: sbehnel | 2008-05-15 12:55:35 +0200
Py3 fixes
Modified: lxml/trunk/setupinfo.py
==============================================================================
--- lxml/trunk/setupinfo.py (original)
+++ lxml/trunk/setupinfo.py Thu May 15 20:08:54 2008
@@ -14,10 +14,20 @@
PACKAGE_PATH = "src/lxml/"
+if sys.version_info[0] >= 3:
+ _system_encoding = sys.getdefaultencoding()
+ if _system_encoding is None:
+ _system_encoding = "iso-8859-1" # :-)
+ def decode_input(data):
+ return data.decode(_system_encoding)
+else:
+ def decode_input(data):
+ return data
+
def env_var(name):
value = os.getenv(name)
if value:
- return value.split(os.pathsep)
+ return decode_input(value).split(os.pathsep)
else:
return []
@@ -204,8 +214,7 @@
_ERROR_PRINTED = True
print("ERROR: %s" % errors)
print("** make sure the development packages of libxml2 and libxslt are installed **\n")
- output = rf.read()
- return (output or '').strip()
+ return decode_input(rf.read()).strip()
def get_library_versions():
xml2_version = run_command(find_xml2_config(), "--version")
From scoder at codespeak.net Thu May 15 20:09:01 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 15 May 2008 20:09:01 +0200 (CEST)
Subject: [Lxml-checkins] r54762 - lxml/trunk
Message-ID: <20080515180901.9FAF61684D6@codespeak.net>
Author: scoder
Date: Thu May 15 20:09:00 2008
New Revision: 54762
Modified:
lxml/trunk/ (props changed)
lxml/trunk/setupinfo.py
Log:
r4218 at delle: sbehnel | 2008-05-15 12:59:21 +0200
fix build on Mac-OS
Modified: lxml/trunk/setupinfo.py
==============================================================================
--- lxml/trunk/setupinfo.py (original)
+++ lxml/trunk/setupinfo.py Thu May 15 20:09:00 2008
@@ -174,13 +174,20 @@
static_cflags = env_var('CFLAGS')
assert static_cflags, "Static build not configured, see doc/build.txt"
result.extend(static_cflags)
- return result
+ else:
+ # anything from xslt-config --cflags that doesn't start with -I
+ possible_cflags = flags('cflags')
+ for possible_cflag in possible_cflags:
+ if not possible_cflag.startswith('-I'):
+ result.append(possible_cflag)
+
+ if sys.platform in ('darwin',):
+ for opt in result:
+ if 'flat_namespace' in opt:
+ break
+ else:
+ result.append('-flat_namespace')
- # anything from xslt-config --cflags that doesn't start with -I
- possible_cflags = flags('cflags')
- for possible_cflag in possible_cflags:
- if not possible_cflag.startswith('-I'):
- result.append(possible_cflag)
return result
def define_macros():
From scoder at codespeak.net Thu May 15 20:09:07 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 15 May 2008 20:09:07 +0200 (CEST)
Subject: [Lxml-checkins] r54763 - in lxml/trunk: . doc
Message-ID: <20080515180907.9CDE41684D1@codespeak.net>
Author: scoder
Date: Thu May 15 20:09:06 2008
New Revision: 54763
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/build.txt
Log:
r4219 at delle: sbehnel | 2008-05-15 16:09:08 +0200
clean up build docs
Modified: lxml/trunk/doc/build.txt
==============================================================================
--- lxml/trunk/doc/build.txt (original)
+++ lxml/trunk/doc/build.txt Thu May 15 20:09:06 2008
@@ -174,35 +174,51 @@
Providing newer library versions on Mac-OS X
--------------------------------------------
-The Unix environment in Mac-OS X makes it relatively easy to install
-Unix/Linux style package management tools and new software. However, it seems
-to be hard to get libraries set up for exclusive usage that Mac-OS X ships in
-an older version. The result can be segfaults on this platform that are hard
-to track down.
+Apple regularly ships new system releases with horribly outdated
+system libraries. This is specifically the case for libxml2 and
+libxslt, where the system provided versions are too old to build lxml.
+
+While the Unix environment in Mac-OS X makes it relatively easy to
+install Unix/Linux style package management tools and new software, it
+actually seems to be hard to get libraries set up for exclusive usage
+that Mac-OS X ships in an older version. Alternative distributions
+(like macports) install their libraries in addition to the system
+libraries, but the compiler and the runtime loader on Mac-OS still
+sees the system libraries before the new libraries. This can lead to
+undebuggable crashes where the newer library seems to be loaded but
+the older system library is used.
+
+Apple discourages static building against libraries, which would help
+working around this problem. Apple does not ship static library
+binaries with its system and several package management systems follow
+this decision. Therefore, building static binaries would require
+building the dependencies first. You can do this with the `buildout
+recipe for lxml`_.
To make sure the newer libxml2 and libxslt versions (e.g. those
provided by fink or macports) are used at *build time*, you must take
-care that the script ``xslt-config`` is found from the newly installed
-version when running the build setup. The system libraries also
-provide this script, but the new one must come first in the PATH. The
+care that the script ``xslt-config`` from the newly installed version
+is found when running the build setup. The system libraries also
+provide this script, so the new one must come first in the PATH. The
best way to make sure the right version is used is by passing the path
to the script as an option to setup.py::
python setup.py build --with-xslt-config=/path/to/xslt-config \
--with-xml2-config=/path/to/xml2-config
-To make sure the newer libxml2 and libxslt versions are used at
-*runtime*, you should add *all* directories where the newer libraries
-are installed (i.e. libxml2, libxslt and libexslt) to the
-``DYLD_LIBRARY_PATH`` environment variable when you use lxml (i.e. not
-only at build time). This seems to fix a lot of problems for users.
-
-Please read this thread about `experiences with MacOS-X`_ if you
-encounter problems. It also has a `buildout for lxml`_ that you can
-use.
+Instead of ``build``, you can use any target, like ``bdist_egg`` if
+you want to use setuptools to build an installable egg.
-.. _`experiences with MacOS-X`: http://thread.gmane.org/gmane.comp.python.lxml.devel/3290/focus=3290
-.. _`buildout for lxml`: http://thread.gmane.org/gmane.comp.python.lxml.devel/3290/focus=3297
+Since release 2.0.6, lxml automatically passes the option
+``-flat_namespace`` to the C compiler. This was reported to make sure
+that the libraries that lxml was built against are also used at
+runtime. Without this option, users needed to add all directories
+where the newer libraries are installed (i.e. libxml2, libxslt and
+libexslt) to the ``DYLD_LIBRARY_PATH`` environment variable when using
+lxml (i.e. at runtime). This should no longer be necessary with the
+new build setup.
+
+.. _`buildout recipe for lxml`: http://thread.gmane.org/gmane.comp.python.lxml.devel/3290/focus=3297
Static linking on Windows
@@ -267,13 +283,8 @@
STATIC_CFLAGS = []
-Add any CFLAGS you might consider useful to the third list. As `Ashish
-Kulkarni`_ notes, you might have to add the standard Windows library
-``wsock32.dll`` to the list of libraries to make ``lxml.objectify`` compile.
-
-.. _`Ashish Kulkarni`: http://codespeak.net/pipermail/lxml-dev/2006-September/001893.html
-
-Now you should be able to pass the ``--static`` option to setup.py and
+Add any CFLAGS you might consider useful to the third list. Now you
+should be able to pass the ``--static`` option to setup.py and
everything should work well. Try calling::
python setup.py bdist_wininst --static
From lxml-checkins at codespeak.net Fri May 16 10:01:57 2008
From: lxml-checkins at codespeak.net (VIAGRA ® Official Site)
Date: Fri, 16 May 2008 10:01:57 +0200 (CEST)
Subject: [Lxml-checkins] Dear lxml-checkins@codespeak.net May 83% 0FF
Message-ID: <20080516070030.16363.qmail@pppoe.77.43.145.86.ccl.perm.ru>
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080516/c0204cd5/attachment.htm
From lxml-checkins at codespeak.net Sun May 18 21:21:05 2008
From: lxml-checkins at codespeak.net (VIAGRA ® Official Site)
Date: Sun, 18 May 2008 21:21:05 +0200 (CEST)
Subject: [Lxml-checkins] Dear lxml-checkins@codespeak.net May 87% 0FF
Message-ID: <20020518101655.6628.qmail@ajw100.neoplus.adsl.tpnet.pl>
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080518/6d2b8a72/attachment.htm
From scoder at codespeak.net Mon May 19 23:59:47 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Mon, 19 May 2008 23:59:47 +0200 (CEST)
Subject: [Lxml-checkins] r54966 - in lxml/trunk: . src/lxml/html
Message-ID: <20080519215947.2CCFA1683FE@codespeak.net>
Author: scoder
Date: Mon May 19 23:59:46 2008
New Revision: 54966
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/html/__init__.py
lxml/trunk/test.py
Log:
r4224 at delle: sbehnel | 2008-05-18 13:36:07 +0200
Py3 syntax fixes
Modified: lxml/trunk/src/lxml/html/__init__.py
==============================================================================
--- lxml/trunk/src/lxml/html/__init__.py (original)
+++ lxml/trunk/src/lxml/html/__init__.py Mon May 19 23:59:46 2008
@@ -191,7 +191,7 @@
if default:
return default[0]
else:
- raise KeyError, id
+ raise KeyError(id)
def text_content(self):
"""
@@ -1392,7 +1392,7 @@
fn = os.tempnam() + '.html'
write_doc(fn, method="html")
url = 'file://' + fn.replace(os.path.sep, '/')
- print url
+ print(url)
webbrowser.open(url)
################################################################################
Modified: lxml/trunk/test.py
==============================================================================
--- lxml/trunk/test.py (original)
+++ lxml/trunk/test.py Mon May 19 23:59:46 2008
@@ -71,10 +71,16 @@
import getopt
import unittest
import traceback
-from sets import Set
+try:
+ set
+except NameError:
+ from sets import Set as set
__metaclass__ = type
+def stderr(text):
+ sys.stderr.write(text)
+ sys.stderr.write("\n")
class Options:
"""Configurable properties of the test runner."""
@@ -169,7 +175,7 @@
results.append(path)
return
if '__init__.py' not in files:
- print >> sys.stderr, "%s is not a package" % dir
+ stderr("%s is not a package" % dir)
return
for file in files:
if file.startswith('test') and file.endswith('.py'):
@@ -236,7 +242,7 @@
"""Returns a set of test case classes used in a test suite."""
if not isinstance(suite, unittest.TestSuite):
raise TypeError('not a TestSuite', suite)
- results = Set()
+ results = set()
for test in suite._tests:
if isinstance(test, unittest.TestCase):
results.add(test.__class__)
@@ -259,16 +265,14 @@
if test_suite is None:
continue
if cfg.warn_omitted:
- all_classes = Set(get_all_test_cases(module))
+ all_classes = set(get_all_test_cases(module))
classes_in_suite = get_test_classes_from_testsuite(test_suite)
difference = all_classes - classes_in_suite
for test_class in difference:
# surround the warning with blank lines, otherwise it tends
# to get lost in the noise
- print >> sys.stderr
- print >> sys.stderr, ("%s: WARNING: %s not in test suite"
+ stderr("\n%s: WARNING: %s not in test suite\n"
% (file, test_class.__name__))
- print >> sys.stderr
if (cfg.level is not None and
getattr(test_suite, 'level', 0) > cfg.level):
continue
@@ -280,7 +284,7 @@
def get_test_hooks(test_files, cfg, tracer=None):
"""Returns a list of test hooks from a given list of test modules."""
results = []
- dirs = Set(map(os.path.dirname, test_files))
+ dirs = set(map(os.path.dirname, test_files))
for dir in list(dirs):
if os.path.basename(dir) == 'ftests':
dirs.add(os.path.join(os.path.dirname(dir), 'tests'))
@@ -425,7 +429,7 @@
self.stream.writeln()
if not result.wasSuccessful():
self.stream.write("FAILED (")
- failed, errored = map(len, (result.failures, result.errors))
+ failed, errored = list(map(len, (result.failures, result.errors)))
if failed:
self.stream.write("failures=%d" % failed)
if errored:
@@ -447,8 +451,8 @@
# Environment
if sys.version_info < (2, 3):
- print >> sys.stderr, '%s: need Python 2.3 or later' % argv[0]
- print >> sys.stderr, 'your python is %s' % sys.version
+ stderr('%s: need Python 2.3 or later' % argv[0])
+ stderr('your python is %s' % sys.version)
return 1
# Defaults
@@ -476,7 +480,7 @@
'level=', 'all-levels', 'coverage'])
for k, v in opts:
if k == '-h':
- print __doc__
+ print(__doc__)
return 0
elif k == '-v':
cfg.verbosity += 1
@@ -509,22 +513,22 @@
try:
cfg.level = int(v)
except ValueError:
- print >> sys.stderr, '%s: invalid level: %s' % (argv[0], v)
- print >> sys.stderr, 'run %s -h for help'
+ stderr('%s: invalid level: %s' % (argv[0], v))
+ stderr('run %s -h for help')
return 1
elif k == '--all-levels':
cfg.level = None
else:
- print >> sys.stderr, '%s: invalid option: %s' % (argv[0], k)
- print >> sys.stderr, 'run %s -h for help'
+ stderr('%s: invalid option: %s' % (argv[0], k))
+ stderr('run %s -h for help')
return 1
if args:
cfg.pathname_regex = args[0]
if len(args) > 1:
cfg.test_regex = args[1]
if len(args) > 2:
- print >> sys.stderr, '%s: too many arguments: %s' % (argv[0], args[2])
- print >> sys.stderr, 'run %s -h for help'
+ stderr('%s: too many arguments: %s' % (argv[0], args[2]))
+ stderr('run %s -h for help')
return 1
if not cfg.unit_tests and not cfg.functional_tests:
cfg.unit_tests = True
@@ -564,11 +568,11 @@
success = True
if cfg.list_files:
baselen = len(cfg.basedir) + 1
- print "\n".join([fn[baselen:] for fn in test_files])
+ print("\n".join([fn[baselen:] for fn in test_files]))
if cfg.list_tests:
- print "\n".join([test.id() for test in test_cases])
+ print("\n".join([test.id() for test in test_cases]))
if cfg.list_hooks:
- print "\n".join([str(hook) for hook in test_hooks])
+ print("\n".join([str(hook) for hook in test_hooks]))
if cfg.run_tests:
runner = CustomTestRunner(cfg, test_hooks)
suite = unittest.TestSuite()
From scoder at codespeak.net Tue May 20 00:00:16 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 20 May 2008 00:00:16 +0200 (CEST)
Subject: [Lxml-checkins] r54967 - in lxml/trunk: . src/lxml
Message-ID: <20080519220016.CB1511683FE@codespeak.net>
Author: scoder
Date: Tue May 20 00:00:11 2008
New Revision: 54967
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/apihelpers.pxi
lxml/trunk/src/lxml/classlookup.pxi
lxml/trunk/src/lxml/docloader.pxi
lxml/trunk/src/lxml/dtd.pxi
lxml/trunk/src/lxml/etree_defs.h
lxml/trunk/src/lxml/extensions.pxi
lxml/trunk/src/lxml/iterparse.pxi
lxml/trunk/src/lxml/lxml.etree.pyx
lxml/trunk/src/lxml/nsclasses.pxi
lxml/trunk/src/lxml/parser.pxi
lxml/trunk/src/lxml/parsertarget.pxi
lxml/trunk/src/lxml/proxy.pxi
lxml/trunk/src/lxml/public-api.pxi
lxml/trunk/src/lxml/readonlytree.pxi
lxml/trunk/src/lxml/relaxng.pxi
lxml/trunk/src/lxml/saxparser.pxi
lxml/trunk/src/lxml/schematron.pxi
lxml/trunk/src/lxml/serializer.pxi
lxml/trunk/src/lxml/xinclude.pxi
lxml/trunk/src/lxml/xmlerror.pxi
lxml/trunk/src/lxml/xmlid.pxi
lxml/trunk/src/lxml/xmlschema.pxi
lxml/trunk/src/lxml/xpath.pxi
lxml/trunk/src/lxml/xslt.pxi
lxml/trunk/src/lxml/xsltext.pxi
Log:
r4225 at delle: sbehnel | 2008-05-19 00:50:22 +0200
loads of Py3 fixes, mostly byte string/unicode string cleanup
Modified: lxml/trunk/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/trunk/src/lxml/apihelpers.pxi (original)
+++ lxml/trunk/src/lxml/apihelpers.pxi Tue May 20 00:00:11 2008
@@ -3,14 +3,14 @@
cdef void displayNode(xmlNode* c_node, indent):
# to help with debugging
cdef xmlNode* c_child
- print indent * ' ', c_node
+ print indent * u' ', c_node
c_child = c_node.children
while c_child is not NULL:
displayNode(c_child, indent + 1)
c_child = c_child.next
cdef _Document _documentOrRaise(object input):
- """Call this to get the document of a _Document, _ElementTree or _Element
+ u"""Call this to get the document of a _Document, _ElementTree or _Element
object, or to raise an exception if it can't be determined.
Should be used in all API functions for consistency.
@@ -26,16 +26,16 @@
elif isinstance(input, _Document):
doc = <_Document>input
else:
- raise TypeError, "Invalid input object: %s" % \
+ raise TypeError, u"Invalid input object: %s" % \
python._fqtypename(input)
if doc is None:
- raise ValueError, "Input object has no document: %s" % \
+ raise ValueError, u"Input object has no document: %s" % \
python._fqtypename(input)
else:
return doc
cdef _Element _rootNodeOrRaise(object input):
- """Call this to get the root node of a _Document, _ElementTree or
+ u"""Call this to get the root node of a _Document, _ElementTree or
_Element object, or to raise an exception if it can't be determined.
Should be used in all API functions for consistency.
@@ -48,10 +48,10 @@
elif isinstance(input, _Document):
node = (<_Document>input).getroot()
else:
- raise TypeError, "Invalid input object: %s" % \
+ raise TypeError, u"Invalid input object: %s" % \
python._fqtypename(input)
if node is None:
- raise ValueError, "Input object has no element: %s" % \
+ raise ValueError, u"Input object has no element: %s" % \
python._fqtypename(input)
else:
return node
@@ -87,7 +87,7 @@
cdef _Element _makeElement(tag, xmlDoc* c_doc, _Document doc,
_BaseParser parser, text, tail, attrib, nsmap,
extra_attrs):
- """Create a new element and initialize text content, namespaces and
+ u"""Create a new element and initialize text content, namespaces and
attributes.
This helper function will reuse as much of the existing document as
@@ -144,7 +144,7 @@
cdef _Element _makeSubElement(_Element parent, tag, text, tail,
attrib, nsmap, extra_attrs):
- """Create a new child element and initialize text content, namespaces and
+ u"""Create a new child element and initialize text content, namespaces and
attributes.
"""
cdef xmlNode* c_node
@@ -176,7 +176,7 @@
cdef int _initNodeNamespaces(xmlNode* c_node, _Document doc,
object node_ns_utf, object nsmap) except -1:
- """Lookup current namespace prefixes, then set namespace structure for
+ u"""Lookup current namespace prefixes, then set namespace structure for
node and register new ns-prefix mappings.
This only works for a newly created node!
@@ -225,13 +225,13 @@
return 0
cdef _initNodeAttributes(xmlNode* c_node, _Document doc, attrib, extra):
- """Initialise the attributes of an element node.
+ u"""Initialise the attributes of an element node.
"""
cdef bint is_html
cdef xmlNs* c_ns
# 'extra' is not checked here (expected to be a keyword dict)
- if attrib is not None and not hasattr(attrib, 'items'):
- raise TypeError, "Invalid attribute dictionary: %s" % \
+ if attrib is not None and not hasattr(attrib, u'items'):
+ raise TypeError, u"Invalid attribute dictionary: %s" % \
python._fqtypename(attrib)
if extra is not None and extra:
if attrib is None:
@@ -342,7 +342,7 @@
return 0
cdef object _collectAttributes(xmlNode* c_node, int collecttype):
- """Collect all attributes of a node in a list. Depending on collecttype,
+ u"""Collect all attributes of a node in a list. Depending on collecttype,
it collects either the name (1), the value (2) or the name-value tuples.
"""
cdef Py_ssize_t count
@@ -378,7 +378,7 @@
cdef object __RE_XML_ENCODING
__RE_XML_ENCODING = re.compile(
- r'^(\s*<\?\s*xml[^>]+)\s+encoding\s*=\s*"[^"]*"\s*', re.U)
+ ur'^(\s*<\?\s*xml[^>]+)\s+encoding\s*=\s*"[^"]*"\s*', re.U)
cdef object __REPLACE_XML_ENCODING
__REPLACE_XML_ENCODING = __RE_XML_ENCODING.sub
@@ -388,7 +388,7 @@
cdef object _stripEncodingDeclaration(object xml_string):
# this is a hack to remove the XML encoding declaration from unicode
- return __REPLACE_XML_ENCODING(r'\g<1>', xml_string)
+ return __REPLACE_XML_ENCODING(ur'\g<1>', xml_string)
cdef int _hasEncodingDeclaration(object xml_string):
# check if a (unicode) string has an XML encoding declaration
@@ -413,7 +413,7 @@
return c_node is not NULL and _textNodeOrSkip(c_node.next) is not NULL
cdef _collectText(xmlNode* c_node):
- """Collect all text nodes and return them as a unicode string.
+ u"""Collect all text nodes and return them as a unicode string.
Start collecting at c_node.
@@ -449,7 +449,7 @@
return funicode(result)
cdef void _removeText(xmlNode* c_node):
- """Remove all text nodes.
+ u"""Remove all text nodes.
Start removing at c_node.
"""
@@ -505,13 +505,13 @@
else:
c_ns = element._doc._findOrBuildNodeNs(
element._c_node, _cstr(ns), NULL)
- return '%s:%s' % (c_ns.prefix, tag)
+ return '%s:%s' % (c_ns.prefix, tag) # UTF-8
cdef inline bint _hasChild(xmlNode* c_node):
return c_node is not NULL and _findChildForwards(c_node, 0) is not NULL
cdef inline Py_ssize_t _countElements(xmlNode* c_node):
- "Counts the elements within the following siblings and the node itself."
+ u"Counts the elements within the following siblings and the node itself."
cdef Py_ssize_t count
count = 0
while c_node is not NULL:
@@ -523,7 +523,7 @@
cdef int _findChildSlice(
python.slice sliceobject, xmlNode* c_parent,
xmlNode** c_start_node, Py_ssize_t* c_step, Py_ssize_t* c_length) except -1:
- """Resolve a children slice.
+ u"""Resolve a children slice.
Returns the start node, step size and the slice length in the
pointer arguments.
@@ -547,7 +547,7 @@
return 0
cdef bint _isFullSlice(python.slice sliceobject):
- """Conservative guess if this slice is a full slice as in ``s[:]``.
+ u"""Conservative guess if this slice is a full slice as in ``s[:]``.
"""
cdef Py_ssize_t step
if sliceobject is None:
@@ -581,7 +581,7 @@
return _findChildForwards(c_node, index)
cdef inline xmlNode* _findChildForwards(xmlNode* c_node, Py_ssize_t index):
- """Return child element of c_node with index, or return NULL if not found.
+ u"""Return child element of c_node with index, or return NULL if not found.
"""
cdef xmlNode* c_child
cdef Py_ssize_t c
@@ -596,7 +596,7 @@
return NULL
cdef inline xmlNode* _findChildBackwards(xmlNode* c_node, Py_ssize_t index):
- """Return child element of c_node with index, or return NULL if not found.
+ u"""Return child element of c_node with index, or return NULL if not found.
Search from the end.
"""
cdef xmlNode* c_child
@@ -612,7 +612,7 @@
return NULL
cdef inline xmlNode* _textNodeOrSkip(xmlNode* c_node):
- """Return the node if it's a text node. Skip over ignorable nodes in a
+ u"""Return the node if it's a text node. Skip over ignorable nodes in a
series of text nodes. Return NULL if a non-ignorable node is found.
This is used to skip over XInclude nodes when collecting adjacent text
@@ -631,7 +631,7 @@
return NULL
cdef inline xmlNode* _nextElement(xmlNode* c_node):
- """Given a node, find the next sibling that is an element.
+ u"""Given a node, find the next sibling that is an element.
"""
if c_node is NULL:
return NULL
@@ -643,7 +643,7 @@
return NULL
cdef inline xmlNode* _previousElement(xmlNode* c_node):
- """Given a node, find the next sibling that is an element.
+ u"""Given a node, find the next sibling that is an element.
"""
if c_node is NULL:
return NULL
@@ -655,7 +655,7 @@
return NULL
cdef inline xmlNode* _parentElement(xmlNode* c_node):
- "Given a node, find the parent element."
+ u"Given a node, find the parent element."
if c_node is NULL or not _isElement(c_node):
return NULL
c_node = c_node.parent
@@ -664,7 +664,7 @@
return c_node
cdef inline bint _tagMatches(xmlNode* c_node, char* c_href, char* c_name):
- """Tests if the node matches namespace URI and tag name.
+ u"""Tests if the node matches namespace URI and tag name.
A node matches if it matches both c_href and c_name.
@@ -707,7 +707,7 @@
return 0
cdef int _removeNode(_Document doc, xmlNode* c_node) except -1:
- """Unlink and free a node and subnodes if possible. Otherwise, make sure
+ u"""Unlink and free a node and subnodes if possible. Otherwise, make sure
it's self-contained.
"""
cdef xmlNode* c_next
@@ -747,7 +747,7 @@
cdef int _deleteSlice(_Document doc, xmlNode* c_node,
Py_ssize_t count, Py_ssize_t step) except -1:
- """Delete slice, ``count`` items starting with ``c_node`` with a step
+ u"""Delete slice, ``count`` items starting with ``c_node`` with a step
width of ``step``.
"""
cdef xmlNode* c_next
@@ -774,7 +774,7 @@
cdef int _replaceSlice(_Element parent, xmlNode* c_node,
Py_ssize_t slicelength, Py_ssize_t step,
bint left_to_right, elements) except -1:
- """Replace the slice of ``count`` elements starting at ``c_node`` with
+ u"""Replace the slice of ``count`` elements starting at ``c_node`` with
positive step width ``step`` by the Elements in ``elements``. The
direction is given by the boolean argument ``left_to_right``.
@@ -800,18 +800,18 @@
# *replacing* children stepwise with list => check size!
seqlength = len(elements)
if seqlength != slicelength:
- raise ValueError, "attempt to assign sequence of size %d " \
- "to extended slice of size %d" % (seqlength, slicelength)
+ raise ValueError, u"attempt to assign sequence of size %d " \
+ u"to extended slice of size %d" % (seqlength, slicelength)
if c_node is NULL:
# no children yet => add all elements straight away
if left_to_right:
for element in elements:
- assert element is not None, "Node must not be None"
+ assert element is not None, u"Node must not be None"
_appendChild(parent, element)
else:
for element in elements:
- assert element is not None, "Node must not be None"
+ assert element is not None, u"Node must not be None"
_prependChild(parent, element)
return 0
@@ -846,7 +846,7 @@
# at the end, but reversed stepping
# append one element and go to the next insertion point
for element in elements:
- assert element is not None, "Node must not be None"
+ assert element is not None, u"Node must not be None"
_appendChild(parent, element)
c_node = element._c_node
if slicelength > 0:
@@ -863,7 +863,7 @@
# now insert elements where we removed them
if c_node is not NULL:
for element in elements:
- assert element is not None, "Node must not be None"
+ assert element is not None, u"Node must not be None"
# move element and tail over
c_source_doc = element._c_node.doc
c_next = element._c_node.next
@@ -887,17 +887,17 @@
# append the remaining elements at the respective end
if left_to_right:
for element in elements:
- assert element is not None, "Node must not be None"
+ assert element is not None, u"Node must not be None"
_appendChild(parent, element)
else:
for element in elements:
- assert element is not None, "Node must not be None"
+ assert element is not None, u"Node must not be None"
_prependChild(parent, element)
return 0
cdef int _appendChild(_Element parent, _Element child) except -1:
- """Append a new child to a parent element.
+ u"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
@@ -915,7 +915,7 @@
moveNodeToDocument(parent._doc, c_source_doc, c_node)
cdef int _prependChild(_Element parent, _Element child) except -1:
- """Prepend a new child to a parent element.
+ u"""Prepend a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_child
@@ -938,7 +938,7 @@
moveNodeToDocument(parent._doc, c_source_doc, c_node)
cdef int _appendSibling(_Element element, _Element sibling) except -1:
- """Append a new child to a parent element.
+ u"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
@@ -955,7 +955,7 @@
moveNodeToDocument(element._doc, c_source_doc, c_node)
cdef int _prependSibling(_Element element, _Element sibling) except -1:
- """Append a new child to a parent element.
+ u"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
@@ -1021,18 +1021,18 @@
cdef object _utf8(object s):
if python.PyString_Check(s):
assert isutf8py(s) == 0, \
- "All strings must be XML compatible: Unicode or ASCII, no NULL bytes"
+ u"All strings must be XML compatible: Unicode or ASCII, no NULL bytes"
elif python.PyUnicode_Check(s):
# FIXME: we should test these strings, too ...
s = python.PyUnicode_AsUTF8String(s)
assert isutf8py(s) != -1, \
- "All strings must be XML compatible: Unicode or ASCII, no NULL bytes"
+ u"All strings must be XML compatible: Unicode or ASCII, no NULL bytes"
else:
raise TypeError, "Argument must be string or unicode."
return s
cdef object _encodeFilename(object filename):
- """Make sure a filename is 8-bit encoded (or None).
+ u"""Make sure a filename is 8-bit encoded (or None).
"""
if filename is None:
return None
@@ -1042,10 +1042,10 @@
return python.PyUnicode_AsEncodedString(
filename, _C_FILENAME_ENCODING, NULL)
else:
- raise TypeError, "Argument must be string or unicode."
+ raise TypeError, u"Argument must be string or unicode."
cdef object _encodeFilenameUTF8(object filename):
- """Recode filename as UTF-8. Tries ASCII, local filesystem encoding and
+ u"""Recode filename as UTF-8. Tries ASCII, local filesystem encoding and
UTF-8 as source encoding.
"""
cdef char* c_filename
@@ -1071,10 +1071,10 @@
if python.PyUnicode_Check(filename):
return python.PyUnicode_AsUTF8String(filename)
else:
- raise TypeError, "Argument must be string or unicode."
+ raise TypeError, u"Argument must be string or unicode."
cdef _getNsTag(tag):
- """Given a tag, find namespace URI and tag name.
+ u"""Given a tag, find namespace URI and tag name.
Return None for NS uri if no namespace URI available.
"""
cdef char* c_tag
@@ -1090,16 +1090,16 @@
c_tag = c_tag + 1
c_ns_end = cstd.strchr(c_tag, c'}')
if c_ns_end is NULL:
- raise ValueError, "Invalid tag name"
+ raise ValueError, u"Invalid tag name"
nslen = c_ns_end - c_tag
taglen = python.PyString_GET_SIZE(tag) - nslen - 2
if taglen == 0:
- raise ValueError, "Empty tag name"
+ raise ValueError, u"Empty tag name"
if nslen > 0:
ns = python.PyString_FromStringAndSize(c_tag, nslen)
tag = python.PyString_FromStringAndSize(c_ns_end+1, taglen)
elif python.PyString_GET_SIZE(tag) == 0:
- raise ValueError, "Empty tag name"
+ raise ValueError, u"Empty tag name"
return ns, tag
cdef int _pyXmlNameIsValid(name_utf8):
@@ -1151,25 +1151,25 @@
cdef int _tagValidOrRaise(tag_utf) except -1:
if not _pyXmlNameIsValid(tag_utf):
- raise ValueError, "Invalid tag name %r" % \
+ raise ValueError, u"Invalid tag name %r" % \
python.PyUnicode_FromEncodedObject(tag_utf, 'UTF-8', 'strict')
return 0
cdef int _htmlTagValidOrRaise(tag_utf) except -1:
if not _pyHtmlNameIsValid(tag_utf):
- raise ValueError, "Invalid HTML tag name %r" % \
+ raise ValueError, u"Invalid HTML tag name %r" % \
python.PyUnicode_FromEncodedObject(tag_utf, 'UTF-8', 'strict')
return 0
cdef int _attributeValidOrRaise(name_utf) except -1:
if not _pyXmlNameIsValid(name_utf):
- raise ValueError, "Invalid attribute name %r" % \
+ raise ValueError, u"Invalid attribute name %r" % \
python.PyUnicode_FromEncodedObject(name_utf, 'UTF-8', 'strict')
return 0
cdef int _prefixValidOrRaise(tag_utf) except -1:
if not _pyXmlNameIsValid(tag_utf):
- raise ValueError, "Invalid namespace prefix %r" % \
+ raise ValueError, u"Invalid namespace prefix %r" % \
python.PyUnicode_FromEncodedObject(tag_utf, 'UTF-8', 'strict')
return 0
@@ -1187,20 +1187,20 @@
return s
cdef _getFilenameForFile(source):
- """Given a Python File or Gzip object, give filename back.
+ u"""Given a Python File or Gzip object, give filename back.
Returns None if not a file object.
"""
# file instances have a name attribute
- filename = getattr3(source, 'name', None)
+ filename = getattr3(source, u'name', None)
if filename is not None:
return filename
# gzip file instances have a filename attribute
- filename = getattr3(source, 'filename', None)
+ filename = getattr3(source, u'filename', None)
if filename is not None:
return filename
# urllib2 provides a geturl() method
- geturl = getattr3(source, 'geturl', None)
+ geturl = getattr3(source, u'geturl', None)
if geturl is not None:
return geturl()
# can't determine filename
Modified: lxml/trunk/src/lxml/classlookup.pxi
==============================================================================
--- lxml/trunk/src/lxml/classlookup.pxi (original)
+++ lxml/trunk/src/lxml/classlookup.pxi Tue May 20 00:00:11 2008
@@ -5,7 +5,7 @@
cdef public class ElementBase(_Element) [ type LxmlElementBaseType,
object LxmlElementBase ]:
- """All custom Element classes must inherit from this one.
+ u"""All custom Element classes must inherit from this one.
Note that you cannot (and must not) instantiate this class or its
subclasses.
@@ -19,7 +19,7 @@
"""
cdef class CommentBase(_Comment):
- """All custom Comment classes must inherit from this one.
+ u"""All custom Comment classes must inherit from this one.
Note that you cannot (and must not) instantiate this class or its
subclasses.
@@ -33,7 +33,7 @@
"""
cdef class PIBase(_ProcessingInstruction):
- """All custom Processing Instruction classes must inherit from this one.
+ u"""All custom Processing Instruction classes must inherit from this one.
Note that you cannot (and must not) instantiate this class or its
subclasses.
@@ -47,7 +47,7 @@
"""
cdef class EntityBase(_Entity):
- """All custom Entity classes must inherit from this one.
+ u"""All custom Entity classes must inherit from this one.
Note that you cannot (and must not) instantiate this class or its
subclasses.
@@ -69,7 +69,7 @@
# class to store element class lookup functions
cdef public class ElementClassLookup [ type LxmlElementClassLookupType,
object LxmlElementClassLookup ]:
- """ElementClassLookup(self)
+ u"""ElementClassLookup(self)
Superclass of Element class lookups.
"""
@@ -80,7 +80,7 @@
cdef public class FallbackElementClassLookup(ElementClassLookup) \
[ type LxmlFallbackElementClassLookupType,
object LxmlFallbackElementClassLookup ]:
- """FallbackElementClassLookup(self, fallback=None)
+ u"""FallbackElementClassLookup(self, fallback=None)
Superclass of Element class lookups with additional fallback.
"""
@@ -93,7 +93,7 @@
self._fallback_function = _lookupDefaultElementClass
cdef void _setFallback(self, ElementClassLookup lookup):
- """Sets the fallback scheme for this lookup method.
+ u"""Sets the fallback scheme for this lookup method.
"""
self.fallback = lookup
self._fallback_function = lookup._lookup_function
@@ -101,7 +101,7 @@
self._fallback_function = _lookupDefaultElementClass
def set_fallback(self, ElementClassLookup lookup not None):
- """set_fallback(self, lookup)
+ u"""set_fallback(self, lookup)
Sets the fallback scheme for this lookup method.
"""
@@ -116,7 +116,7 @@
# default lookup scheme
cdef class ElementDefaultClassLookup(ElementClassLookup):
- """ElementDefaultClassLookup(self, element=None, comment=None, pi=None, entity=None)
+ u"""ElementDefaultClassLookup(self, element=None, comment=None, pi=None, entity=None)
Element class lookup scheme that always returns the default Element
class.
@@ -134,31 +134,31 @@
elif issubclass(element, ElementBase):
self.element_class = element
else:
- raise TypeError, "element class must be subclass of ElementBase"
+ raise TypeError, u"element class must be subclass of ElementBase"
if comment is None:
self.comment_class = _Comment
elif issubclass(comment, CommentBase):
self.comment_class = comment
else:
- raise TypeError, "comment class must be subclass of CommentBase"
+ raise TypeError, u"comment class must be subclass of CommentBase"
if entity is None:
self.entity_class = _Entity
elif issubclass(entity, EntityBase):
self.entity_class = entity
else:
- raise TypeError, "Entity class must be subclass of EntityBase"
+ raise TypeError, u"Entity class must be subclass of EntityBase"
if pi is None:
self.pi_class = None # special case, see below
elif issubclass(pi, PIBase):
self.pi_class = pi
else:
- raise TypeError, "PI class must be subclass of PIBase"
+ raise TypeError, u"PI class must be subclass of PIBase"
cdef object _lookupDefaultElementClass(state, _Document _doc, xmlNode* c_node):
- "Trivial class lookup function that always returns the default class."
+ u"Trivial class lookup function that always returns the default class."
if c_node.type == tree.XML_ELEMENT_NODE:
if state is not None:
return (state).element_class
@@ -188,14 +188,14 @@
else:
return cls
else:
- assert 0, "Unknown node type: %s" % c_node.type
+ assert 0, u"Unknown node type: %s" % c_node.type
################################################################################
# attribute based lookup scheme
cdef class AttributeBasedElementClassLookup(FallbackElementClassLookup):
- """AttributeBasedElementClassLookup(self, attribute_name, class_mapping, fallback=None)
+ u"""AttributeBasedElementClassLookup(self, attribute_name, class_mapping, fallback=None)
Checks an attribute of an Element and looks up the value in a
class dictionary.
@@ -243,7 +243,7 @@
# per-parser lookup scheme
cdef class ParserBasedElementClassLookup(FallbackElementClassLookup):
- """ParserBasedElementClassLookup(self, fallback=None)
+ u"""ParserBasedElementClassLookup(self, fallback=None)
Element class lookup based on the XML parser.
"""
def __init__(self, ElementClassLookup fallback=None):
@@ -261,7 +261,7 @@
# custom class lookup based on node type, namespace, name
cdef class CustomElementClassLookup(FallbackElementClassLookup):
- """CustomElementClassLookup(self, fallback=None)
+ u"""CustomElementClassLookup(self, fallback=None)
Element class lookup based on a subclass method.
You can inherit from this class and override the method::
@@ -281,7 +281,7 @@
self._lookup_function = _custom_class_lookup
def lookup(self, type, doc, namespace, name):
- "lookup(self, type, doc, namespace, name)"
+ u"lookup(self, type, doc, namespace, name)"
return None
cdef object _custom_class_lookup(state, _Document doc, xmlNode* c_node):
@@ -291,15 +291,15 @@
lookup = state
if c_node.type == tree.XML_ELEMENT_NODE:
- element_type = "element"
+ element_type = u"element"
elif c_node.type == tree.XML_COMMENT_NODE:
- element_type = "comment"
+ element_type = u"comment"
elif c_node.type == tree.XML_PI_NODE:
- element_type = "PI"
+ element_type = u"PI"
elif c_node.type == tree.XML_ENTITY_REF_NODE:
- element_type = "entity"
+ element_type = u"entity"
else:
- element_type = "element"
+ element_type = u"element"
if c_node.name is NULL:
name = None
else:
@@ -320,7 +320,7 @@
# read-only tree based class lookup
cdef class PythonElementClassLookup(FallbackElementClassLookup):
- """PythonElementClassLookup(self, fallback=None)
+ u"""PythonElementClassLookup(self, fallback=None)
Element class lookup based on a subclass method.
This class lookup scheme allows access to the entire XML tree in
@@ -367,7 +367,7 @@
self._lookup_function = _python_class_lookup
def lookup(self, doc, element):
- """lookup(self, doc, element)
+ u"""lookup(self, doc, element)
Override this method to implement your own lookup scheme.
"""
@@ -403,7 +403,7 @@
LOOKUP_ELEMENT_CLASS = function
def set_element_class_lookup(ElementClassLookup lookup = None):
- """set_element_class_lookup(lookup = None)
+ u"""set_element_class_lookup(lookup = None)
Set the global default element class lookup method.
"""
Modified: lxml/trunk/src/lxml/docloader.pxi
==============================================================================
--- lxml/trunk/src/lxml/docloader.pxi (original)
+++ lxml/trunk/src/lxml/docloader.pxi Tue May 20 00:00:11 2008
@@ -13,9 +13,9 @@
cdef object _file
cdef class Resolver:
- "This is the base class of all resolvers."
+ u"This is the base class of all resolvers."
def resolve(self, system_url, public_id, context):
- """resolve(self, system_url, public_id, context)
+ u"""resolve(self, system_url, public_id, context)
Override this method to resolve an external source by
``system_url`` and ``public_id``. The third argument is an
@@ -26,7 +26,7 @@
return None
def resolve_empty(self, context):
- """resolve_empty(self, context)
+ u"""resolve_empty(self, context)
Return an empty input document.
@@ -38,7 +38,7 @@
return doc_ref
def resolve_string(self, string, context, *, base_url=None):
- """resolve_string(self, string, context, base_url=None)
+ u"""resolve_string(self, string, context, base_url=None)
Return a parsable string as input document.
@@ -55,7 +55,7 @@
return doc_ref
def resolve_filename(self, filename, context):
- """resolve_filename(self, filename, context)
+ u"""resolve_filename(self, filename, context)
Return the name of a parsable file as input document.
@@ -68,7 +68,7 @@
return doc_ref
def resolve_file(self, f, context, *, base_url=None):
- """resolve_file(self, f, context, base_url=None)
+ u"""resolve_file(self, f, context, base_url=None)
Return an open file-like object as input document.
@@ -78,7 +78,7 @@
try:
f.read
except AttributeError:
- raise TypeError, "Argument is not a file-like object"
+ raise TypeError, u"Argument is not a file-like object"
doc_ref = _InputDocument()
doc_ref._type = PARSER_DATA_FILE
if base_url is not None:
@@ -96,7 +96,7 @@
self._default_resolver = default_resolver
def add(self, Resolver resolver not None):
- """add(self, resolver)
+ u"""add(self, resolver)
Register a resolver.
@@ -109,7 +109,7 @@
self._resolvers.add(resolver)
def remove(self, resolver):
- "remove(self, resolver)"
+ u"remove(self, resolver)"
self._resolvers.discard(resolver)
cdef _ResolverRegistry _copy(self):
@@ -119,11 +119,11 @@
return registry
def copy(self):
- "copy(self)"
+ u"copy(self)"
return self._copy()
def resolve(self, system_url, public_id, context):
- "resolve(self, system_url, public_id, context)"
+ u"resolve(self, system_url, public_id, context)"
for resolver in self._resolvers:
result = resolver.resolve(system_url, public_id, context)
if result is not None:
Modified: lxml/trunk/src/lxml/dtd.pxi
==============================================================================
--- lxml/trunk/src/lxml/dtd.pxi (original)
+++ lxml/trunk/src/lxml/dtd.pxi Tue May 20 00:00:11 2008
@@ -2,17 +2,17 @@
cimport dtdvalid
class DTDError(LxmlError):
- """Base class for DTD errors.
+ u"""Base class for DTD errors.
"""
pass
class DTDParseError(DTDError):
- """Error while parsing a DTD.
+ u"""Error while parsing a DTD.
"""
pass
class DTDValidateError(DTDError):
- """Error while validating an XML document with a DTD.
+ u"""Error while validating an XML document with a DTD.
"""
pass
@@ -20,7 +20,7 @@
# DTD
cdef class DTD(_Validator):
- """DTD(self, file=None, external_id=None)
+ u"""DTD(self, file=None, external_id=None)
A DTD validator.
Can load from filesystem directly given a filename or file-like object.
@@ -37,27 +37,27 @@
self._error_log.connect()
self._c_dtd = xmlparser.xmlParseDTD(NULL, _cstr(file))
self._error_log.disconnect()
- elif hasattr(file, 'read'):
+ elif hasattr(file, u'read'):
self._c_dtd = _parseDtdFromFilelike(file)
else:
- raise DTDParseError, "file must be a filename or file-like object"
+ raise DTDParseError, u"file must be a filename or file-like object"
elif external_id is not None:
self._error_log.connect()
self._c_dtd = xmlparser.xmlParseDTD(external_id, NULL)
self._error_log.disconnect()
else:
- raise DTDParseError, "either filename or external ID required"
+ raise DTDParseError, u"either filename or external ID required"
if self._c_dtd is NULL:
raise DTDParseError(
- self._error_log._buildExceptionMessage("error parsing DTD"),
+ self._error_log._buildExceptionMessage(u"error parsing DTD"),
self._error_log)
def __dealloc__(self):
tree.xmlFreeDtd(self._c_dtd)
def __call__(self, etree):
- """__call__(self, etree)
+ u"""__call__(self, etree)
Validate doc using the DTD.
@@ -76,7 +76,7 @@
valid_ctxt = dtdvalid.xmlNewValidCtxt()
if valid_ctxt is NULL:
self._error_log.disconnect()
- raise DTDError("Failed to create validation context",
+ raise DTDError(u"Failed to create validation context",
self._error_log)
c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node)
@@ -88,7 +88,7 @@
self._error_log.disconnect()
if ret == -1:
- raise DTDValidateError("Internal error in DTD validation",
+ raise DTDValidateError(u"Internal error in DTD validation",
self._error_log)
if ret == 1:
return True
@@ -111,7 +111,7 @@
exc_context._raise_if_stored()
if c_dtd is NULL:
- raise DTDParseError("error parsing DTD", error_log)
+ raise DTDParseError(u"error parsing DTD", error_log)
return c_dtd
cdef extern from "etree_defs.h":
Modified: lxml/trunk/src/lxml/etree_defs.h
==============================================================================
--- lxml/trunk/src/lxml/etree_defs.h (original)
+++ lxml/trunk/src/lxml/etree_defs.h Tue May 20 00:00:11 2008
@@ -12,6 +12,11 @@
# endif
#endif
+/* Python 3 doesn't have PyFile_*() */
+#if PY_VERSION_HEX >= 0x03000000
+# define PyFile_AsFile(o) (NULL)
+#endif
+
#ifdef WITHOUT_THREADING
# define PyEval_SaveThread() (NULL)
# define PyEval_RestoreThread(state)
Modified: lxml/trunk/src/lxml/extensions.pxi
==============================================================================
--- lxml/trunk/src/lxml/extensions.pxi (original)
+++ lxml/trunk/src/lxml/extensions.pxi Tue May 20 00:00:11 2008
@@ -1,22 +1,22 @@
# support for extension functions in XPath and XSLT
class XPathError(LxmlError):
- """Base class of all XPath errors.
+ u"""Base class of all XPath errors.
"""
pass
class XPathEvalError(XPathError):
- """Error during XPath evaluation.
+ u"""Error during XPath evaluation.
"""
pass
class XPathFunctionError(XPathEvalError):
- """Internal error looking up an XPath extension function.
+ u"""Internal error looking up an XPath extension function.
"""
pass
class XPathResultError(XPathEvalError):
- """Error handling an XPath result.
+ u"""Error handling an XPath result.
"""
pass
@@ -57,7 +57,7 @@
for extension in extensions:
for (ns_uri, name), function in extension.items():
if name is None:
- raise ValueError, "extensions must have non empty names"
+ raise ValueError, u"extensions must have non empty names"
ns_utf = self._to_utf(ns_uri)
name_utf = self._to_utf(name)
python.PyDict_SetItem(
@@ -72,10 +72,10 @@
for prefix, ns_uri in namespaces:
if prefix is None or not prefix:
raise TypeError, \
- "empty namespace prefix is not supported in XPath"
+ u"empty namespace prefix is not supported in XPath"
if ns_uri is None or not ns_uri:
raise TypeError, \
- "setting default namespace is not supported in XPath"
+ u"setting default namespace is not supported in XPath"
prefix_utf = self._to_utf(prefix)
ns_uri_utf = self._to_utf(ns_uri)
python.PyList_Append(ns, (prefix_utf, ns_uri_utf))
@@ -103,7 +103,7 @@
return context
cdef object _to_utf(self, s):
- "Convert to UTF-8 and keep a reference to the encoded string"
+ u"Convert to UTF-8 and keep a reference to the encoded string"
cdef python.PyObject* dict_result
if s is None:
return None
@@ -138,7 +138,7 @@
cdef addNamespace(self, prefix, ns_uri):
if prefix is None:
- raise TypeError, "empty prefix is not supported in XPath"
+ raise TypeError, u"empty prefix is not supported in XPath"
prefix_utf = self._to_utf(prefix)
ns_uri_utf = self._to_utf(ns_uri)
new_item = (prefix_utf, ns_uri_utf)
@@ -160,7 +160,7 @@
cdef registerNamespace(self, prefix, ns_uri):
if prefix is None:
- raise TypeError, "empty prefix is not supported in XPath"
+ raise TypeError, u"empty prefix is not supported in XPath"
prefix_utf = self._to_utf(prefix)
ns_uri_utf = self._to_utf(ns_uri)
python.PyList_Append(self._global_namespaces, prefix_utf)
@@ -212,7 +212,7 @@
d = {}
python.PyDict_SetItem(
self._function_cache, ns_utf, d)
- for name_utf, function in ns_functions.iteritems():
+ for name_utf, function in ns_functions.items():
python.PyDict_SetItem(d, name_utf, function)
reg_func(ctxt, name_utf, ns_utf)
@@ -223,7 +223,7 @@
return # done
last_ns = None
d = None
- for (ns_utf, name_utf), function in self._extensions.iteritems():
+ for (ns_utf, name_utf), function in self._extensions.items():
if ns_utf is not last_ns or d is None:
last_ns = ns_utf
dict_result = python.PyDict_GetItem(
@@ -239,20 +239,20 @@
cdef unregisterAllFunctions(self, void* ctxt,
_register_function unreg_func):
- for ns_utf, functions in self._function_cache.iteritems():
+ for ns_utf, functions in self._function_cache.items():
for name_utf in functions:
unreg_func(ctxt, name_utf, ns_utf)
cdef unregisterGlobalFunctions(self, void* ctxt,
_register_function unreg_func):
- for ns_utf, functions in self._function_cache.iteritems():
+ for ns_utf, functions in self._function_cache.items():
for name_utf in functions:
if self._extensions is None or \
(ns_utf, name_utf) not in self._extensions:
unreg_func(ctxt, name_utf, ns_utf)
cdef _find_cached_function(self, char* c_ns_uri, char* c_name):
- """Lookup an extension function in the cache and return it.
+ u"""Lookup an extension function in the cache and return it.
Parameters: c_ns_uri may be NULL, c_name must not be NULL
"""
@@ -279,15 +279,15 @@
cdef xmlNode* c_node
if self._xpathCtxt is NULL:
raise XPathError, \
- "XPath context is only usable during the evaluation"
+ u"XPath context is only usable during the evaluation"
c_node = self._xpathCtxt.node
if c_node is NULL:
- raise XPathError, "no context node"
+ raise XPathError, u"no context node"
if c_node.doc != self._xpathCtxt.doc:
raise XPathError, \
- "document-external context nodes are not supported"
+ u"document-external context nodes are not supported"
if self._doc is None:
- raise XPathError, "document context is missing"
+ raise XPathError, u"document context is missing"
return _elementFactory(self._doc, c_node)
property eval_context:
@@ -299,11 +299,11 @@
# Python reference keeping during XPath function evaluation
cdef _release_temp_refs(self):
- "Free temporarily referenced objects from this context."
+ u"Free temporarily referenced objects from this context."
self._temp_refs.clear()
cdef _hold(self, obj):
- """A way to temporarily hold references to nodes in the evaluator.
+ u"""A way to temporarily hold references to nodes in the evaluator.
This is needed because otherwise nodes created in XPath extension
functions would be reference counted too soon, during the XPath
@@ -324,7 +324,7 @@
self._temp_refs.add((<_Element>o)._doc)
def Extension(module, function_mapping=None, *, ns=None):
- """Extension(module, function_mapping=None, ns=None)
+ u"""Extension(module, function_mapping=None, ns=None)
Build a dictionary of extension functions from the functions
defined in a module or the methods of an object.
@@ -365,7 +365,7 @@
elif python.PyList_Check(value):
# node set: take recursive text concatenation of first element
if python.PyList_GET_SIZE(value) == 0:
- return ''
+ return u''
firstnode = value[0]
if _isString(firstnode):
return firstnode
@@ -379,9 +379,9 @@
tree.xmlFree(c_text)
return s
else:
- return str(firstnode)
+ return unicode(firstnode)
else:
- return str(value)
+ return unicode(value)
cdef _compile(self, rexp, ignore_case):
cdef python.PyObject* c_result
@@ -397,20 +397,20 @@
python.PyDict_SetItem(self._compile_map, key, rexp_compiled)
return rexp_compiled
- def test(self, ctxt, s, rexp, flags=''):
+ def test(self, ctxt, s, rexp, flags=u''):
flags = self._make_string(flags)
s = self._make_string(s)
- rexpc = self._compile(rexp, 'i' in flags)
+ rexpc = self._compile(rexp, u'i' in flags)
if rexpc.search(s) is None:
return False
else:
return True
- def match(self, ctxt, s, rexp, flags=''):
+ def match(self, ctxt, s, rexp, flags=u''):
flags = self._make_string(flags)
s = self._make_string(s)
- rexpc = self._compile(rexp, 'i' in flags)
- if 'g' in flags:
+ rexpc = self._compile(rexp, u'i' in flags)
+ if u'g' in flags:
results = rexpc.findall(s)
if not results:
return ()
@@ -419,14 +419,14 @@
if not result:
return ()
results = [ result.group() ]
- results.extend( result.groups('') )
+ results.extend( result.groups(u'') )
result_list = []
- root = Element('matches')
- join_groups = ''.join
+ root = Element(u'matches')
+ join_groups = u''.join
for s_match in results:
if python.PyTuple_CheckExact(s_match):
s_match = join_groups(s_match)
- elem = SubElement(root, 'match')
+ elem = SubElement(root, u'match')
elem.text = s_match
python.PyList_Append(result_list, elem)
return result_list
@@ -435,8 +435,8 @@
replacement = self._make_string(replacement)
flags = self._make_string(flags)
s = self._make_string(s)
- rexpc = self._compile(rexp, 'i' in flags)
- if 'g' in flags:
+ rexpc = self._compile(rexp, u'i' in flags)
+ if u'g' in flags:
count = 0
else:
count = 1
@@ -475,16 +475,16 @@
xpath.xmlXPathNodeSetAdd(resultSet, node._c_node)
else:
xpath.xmlXPathFreeNodeSet(resultSet)
- raise XPathResultError, "This is not a node: %r" % element
+ raise XPathResultError, u"This is not a node: %r" % element
else:
- raise XPathResultError, "Unknown return type: %s" % \
+ raise XPathResultError, u"Unknown return type: %s" % \
python._fqtypename(obj)
return xpath.xmlXPathWrapNodeSet(resultSet)
cdef object _unwrapXPathObject(xpath.xmlXPathObject* xpathObj,
_Document doc):
if xpathObj.type == xpath.XPATH_UNDEFINED:
- raise XPathResultError, "Undefined xpath result"
+ raise XPathResultError, u"Undefined xpath result"
elif xpathObj.type == xpath.XPATH_NODESET:
return _createNodeSetResult(xpathObj, doc)
elif xpathObj.type == xpath.XPATH_BOOLEAN:
@@ -505,7 +505,7 @@
elif xpathObj.type == xpath.XPATH_XSLT_TREE:
raise NotImplementedError
else:
- raise XPathResultError, "Unknown xpath result %s" % str(xpathObj.type)
+ raise XPathResultError, u"Unknown xpath result %s" % unicode(xpathObj.type)
cdef object _createNodeSetResult(xpath.xmlXPathObject* xpathObj, _Document doc):
cdef xmlNode* c_node
@@ -546,12 +546,12 @@
continue
else:
raise NotImplementedError, \
- "Not yet implemented result node type: %d" % c_node.type
+ u"Not yet implemented result node type: %d" % unicode(c_node.type)
python.PyList_Append(result, value)
return result
cdef void _freeXPathObject(xpath.xmlXPathObject* xpathObj):
- """Free the XPath object, but *never* free the *content* of node sets.
+ u"""Free the XPath object, but *never* free the *content* of node sets.
Python dealloc will do that for us.
"""
if xpathObj.nodesetval is not NULL:
@@ -678,9 +678,9 @@
_extension_function_call(context, function, ctxt, nargs)
else:
if rctxt.functionURI is not NULL:
- fref = "{%s}%s" % (rctxt.functionURI, rctxt.function)
+ fref = u"{%s}%s" % (rctxt.functionURI, rctxt.function)
else:
fref = rctxt.function
xpath.xmlXPathErr(ctxt, xpath.XPATH_UNKNOWN_FUNC_ERROR)
- exception = XPathFunctionError("XPath function '%s' not found" % fref)
+ exception = XPathFunctionError(u"XPath function '%s' not found" % fref)
context._exc._store_exception(exception)
Modified: lxml/trunk/src/lxml/iterparse.pxi
==============================================================================
--- lxml/trunk/src/lxml/iterparse.pxi (original)
+++ lxml/trunk/src/lxml/iterparse.pxi Tue May 20 00:00:11 2008
@@ -14,20 +14,20 @@
cdef int event_filter
event_filter = 0
for event in events:
- if event == 'start':
+ if event == u'start':
event_filter |= ITERPARSE_FILTER_START
- elif event == 'end':
+ elif event == u'end':
event_filter |= ITERPARSE_FILTER_END
- elif event == 'start-ns':
+ elif event == u'start-ns':
event_filter |= ITERPARSE_FILTER_START_NS
- elif event == 'end-ns':
+ elif event == u'end-ns':
event_filter |= ITERPARSE_FILTER_END_NS
- elif event == 'comment':
+ elif event == u'comment':
event_filter |= ITERPARSE_FILTER_COMMENT
- elif event == 'pi':
+ elif event == u'pi':
event_filter |= ITERPARSE_FILTER_PI
else:
- raise ValueError, "invalid event name '%s'" % event
+ raise ValueError, u"invalid event name '%s'" % event
return event_filter
cdef int _countNsDefs(xmlNode* c_node):
@@ -51,7 +51,7 @@
else:
prefix = funicode(c_ns.prefix)
ns_tuple = (prefix, funicode(c_ns.href))
- python.PyList_Append(event_list, ("start-ns", ns_tuple))
+ python.PyList_Append(event_list, (u"start-ns", ns_tuple))
count = count + 1
c_ns = c_ns.next
return count
@@ -85,7 +85,7 @@
self._event_index = 0
cdef void _initParserContext(self, xmlparser.xmlParserCtxt* c_ctxt):
- "wrap original SAX2 callbacks"
+ u"wrap original SAX2 callbacks"
cdef xmlparser.xmlSAXHandler* sax
_ParserContext._initParserContext(self, c_ctxt)
sax = c_ctxt.sax
@@ -118,17 +118,17 @@
cdef _setEventFilter(self, events, tag):
self._event_filter = _buildIterparseEventFilter(events)
- if tag is None or tag == '*':
+ if tag is None or tag == u'*':
self._tag_href = NULL
self._tag_name = NULL
else:
self._tag_tuple = _getNsTag(tag)
href, name = self._tag_tuple
- if href is None or href == '*':
+ if href is None or href == u'*':
self._tag_href = NULL
else:
self._tag_href = _cstr(href)
- if name is None or name == '*':
+ if name is None or name == u'*':
self._tag_name = NULL
else:
self._tag_name = _cstr(name)
@@ -154,7 +154,7 @@
if self._event_filter & ITERPARSE_FILTER_END:
python.PyList_Append(self._node_stack, node)
if self._event_filter & ITERPARSE_FILTER_START:
- python.PyList_Append(self._events, ("start", node))
+ python.PyList_Append(self._events, (u"start", node))
return 0
cdef int endNode(self, xmlNode* c_node) except -1:
@@ -173,12 +173,12 @@
self._doc = _documentFactory(c_node.doc, None)
self._root = self._doc.getroot()
node = _elementFactory(self._doc, c_node)
- python.PyList_Append(self._events, ("end", node))
+ python.PyList_Append(self._events, (u"end", node))
if self._event_filter & ITERPARSE_FILTER_END_NS:
ns_count = self._pop_ns()
if ns_count > 0:
- event = ("end-ns", None)
+ event = (u"end-ns", None)
for i from 0 <= i < ns_count:
python.PyList_Append(self._events, event)
return 0
@@ -279,7 +279,7 @@
context._origSaxComment(ctxt, text)
c_node = _iterparseFindLastNode(c_ctxt)
if c_node is not NULL:
- _pushSaxEvent(context, "comment", c_node)
+ _pushSaxEvent(context, u"comment", c_node)
cdef void _iterparseSaxPI(void* ctxt, char* target, char* data):
cdef xmlNode* c_node
@@ -290,7 +290,7 @@
context._origSaxPI(ctxt, target, data)
c_node = _iterparseFindLastNode(c_ctxt)
if c_node is not NULL:
- _pushSaxEvent(context, "pi", c_node)
+ _pushSaxEvent(context, u"pi", c_node)
cdef inline xmlNode* _iterparseFindLastNode(xmlparser.xmlParserCtxt* c_ctxt):
# this mimics what libxml2 creates for comments/PIs
@@ -306,7 +306,7 @@
return c_ctxt.node.next
cdef class iterparse(_BaseParser):
- """iterparse(self, source, events=("end",), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, html=False, schema=None)
+ u"""iterparse(self, source, events=("end",), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, html=False, schema=None)
Incremental parser.
Parses XML into a tree and generates tuples (event, element) in a
@@ -354,7 +354,7 @@
cdef object _buffer
cdef int (*_parse_chunk)(xmlparser.xmlParserCtxt* ctxt,
char* chunk, int size, int terminate)
- def __init__(self, source, events=("end",), *, tag=None,
+ def __init__(self, source, events=(u"end",), *, tag=None,
attribute_defaults=False, dtd_validation=False,
load_dtd=False, no_network=True, remove_blank_text=False,
compact=True, resolve_entities=True, remove_comments=False,
@@ -363,9 +363,9 @@
cdef _IterparseContext context
cdef char* c_encoding
cdef int parse_options
- if not hasattr(source, 'read'):
+ if not hasattr(source, u'read'):
filename = _encodeFilename(source)
- source = open(filename, 'rb')
+ source = open(filename, u'rb')
else:
filename = _encodeFilename(_getFilenameForFile(source))
@@ -373,7 +373,7 @@
if html:
# make sure we're not looking for namespaces
events = tuple([ event for event in events
- if event != 'start-ns' and event != 'end-ns' ])
+ if event != u'start-ns' and event != u'end-ns' ])
self._events = events
self._tag = tag
@@ -413,7 +413,7 @@
# parser will not be unlocked - no other methods supported
property error_log:
- """The error log of the last (or current) parser run.
+ u"""The error log of the last (or current) parser run.
"""
def __get__(self):
cdef _ParserContext context
@@ -427,7 +427,7 @@
return context
def copy(self):
- raise TypeError, "iterparse parsers cannot be copied"
+ raise TypeError, u"iterparse parsers cannot be copied"
def __iter__(self):
return self
@@ -458,7 +458,7 @@
data = self._source.read(__ITERPARSE_CHUNK_SIZE)
if not python.PyString_Check(data):
self._source = None
- raise TypeError, "reading file objects must return plain strings"
+ raise TypeError, u"reading file objects must return plain strings"
c_data_len = python.PyString_GET_SIZE(data)
c_data = _cstr(data)
done = (c_data_len == 0)
@@ -502,7 +502,7 @@
cdef class iterwalk:
- """iterwalk(self, element_or_tree, events=("end",), tag=None)
+ u"""iterwalk(self, element_or_tree, events=("end",), tag=None)
A tree walker that generates events from an existing tree as if it
was parsing XML data with ``iterparse()``.
@@ -517,7 +517,7 @@
cdef char* _tag_href
cdef char* _tag_name
- def __init__(self, element_or_tree, events=("end",), tag=None):
+ def __init__(self, element_or_tree, events=(u"end",), tag=None):
cdef _Element root
cdef int ns_count
root = _rootNodeOrRaise(element_or_tree)
@@ -536,17 +536,17 @@
self._index = -1
cdef void _setTagFilter(self, tag):
- if tag is None or tag == '*':
+ if tag is None or tag == u'*':
self._tag_href = NULL
self._tag_name = NULL
else:
self._tag_tuple = _getNsTag(tag)
href, name = self._tag_tuple
- if href is None or href == '*':
+ if href is None or href == u'*':
self._tag_href = NULL
else:
self._tag_href = _cstr(href)
- if name is None or name == '*':
+ if name is None or name == u'*':
self._tag_name = NULL
else:
self._tag_name = _cstr(name)
@@ -605,7 +605,7 @@
if self._event_filter & ITERPARSE_FILTER_START:
if self._tag_tuple is None or \
_tagMatches(node._c_node, self._tag_href, self._tag_name):
- python.PyList_Append(self._events, ("start", node))
+ python.PyList_Append(self._events, (u"start", node))
return ns_count
cdef _Element _end_node(self):
@@ -615,9 +615,9 @@
if self._event_filter & ITERPARSE_FILTER_END:
if self._tag_tuple is None or \
_tagMatches(node._c_node, self._tag_href, self._tag_name):
- python.PyList_Append(self._events, ("end", node))
+ python.PyList_Append(self._events, (u"end", node))
if self._event_filter & ITERPARSE_FILTER_END_NS:
- event = ("end-ns", None)
+ event = (u"end-ns", None)
for i from 0 <= i < ns_count:
python.PyList_Append(self._events, event)
return node
Modified: lxml/trunk/src/lxml/lxml.etree.pyx
==============================================================================
--- lxml/trunk/src/lxml/lxml.etree.pyx (original)
+++ lxml/trunk/src/lxml/lxml.etree.pyx Tue May 20 00:00:11 2008
@@ -1,8 +1,8 @@
-"""The ``lxml.etree`` module implements the extended ElementTree API
+u"""The ``lxml.etree`` module implements the extended ElementTree API
for XML.
"""
-__docformat__ = "restructuredtext en"
+__docformat__ = u"restructuredtext en"
cimport tree, python, config
from tree cimport xmlDoc, xmlNode, xmlAttr, xmlNs, _isElement, _getNs
@@ -11,7 +11,11 @@
cimport c14n
cimport cstd
-import __builtin__
+try:
+ import __builtin__
+except ImportError:
+ # Python 3
+ import builtins as __builtin__
cdef object set
try:
@@ -20,7 +24,11 @@
from sets import Set as set
cdef object _unicode
-_unicode = __builtin__.unicode
+try:
+ _unicode = __builtin__.unicode
+except AttributeError:
+ # Python 3
+ _unicode = __builtin__.str
del __builtin__
@@ -99,7 +107,7 @@
# module level superclass for all exceptions
class LxmlError(Error):
- """Main exception base class for lxml. All other exceptions inherit from
+ u"""Main exception base class for lxml. All other exceptions inherit from
this one.
"""
def __init__(self, message, error_log=None):
@@ -122,30 +130,30 @@
# superclass for all syntax errors
class LxmlSyntaxError(LxmlError, SyntaxError):
- """Base class for all syntax errors.
+ u"""Base class for all syntax errors.
"""
pass
class C14NError(LxmlError):
- """Error during C14N serialisation.
+ u"""Error during C14N serialisation.
"""
pass
# version information
cdef __unpackDottedVersion(version):
version_list = []
- l = (version.replace('-', '.').split('.') + [0]*4)[:4]
+ l = (version.decode(u"ASCII").replace(u'-', u'.').split(u'.') + [0]*4)[:4]
for item in l:
try:
item = int(item)
except ValueError:
- if item.startswith('dev'):
+ if item.startswith(u'dev'):
count = item[3:]
item = -300
- elif item.startswith('alpha'):
+ elif item.startswith(u'alpha'):
count = item[5:]
item = -200
- elif item.startswith('beta'):
+ elif item.startswith(u'beta'):
count = item[4:]
item = -100
else:
@@ -164,9 +172,10 @@
cdef int _LIBXML_VERSION_INT
try:
- _LIBXML_VERSION_INT = int(re.match('[0-9]+', tree.xmlParserVersion).group(0))
+ _LIBXML_VERSION_INT = int(
+ re.match(u'[0-9]+', (tree.xmlParserVersion).decode(u"ASCII")).group(0))
except Exception:
- print "Unknown libxml2 version:", tree.xmlParserVersion
+ print u"Unknown libxml2 version: %s" % (tree.xmlParserVersion).decode(u"ASCII")
_LIBXML_VERSION_INT = 0
LIBXML_VERSION = __unpackIntVersion(_LIBXML_VERSION_INT)
@@ -217,7 +226,7 @@
cdef class QName:
- """QName(text_or_uri, tag=None)
+ u"""QName(text_or_uri, tag=None)
QName wrapper.
@@ -233,7 +242,7 @@
def __init__(self, text_or_uri, tag=None):
if tag is not None:
_tagValidOrRaise(_utf8(tag))
- text_or_uri = "{%s}%s" % (text_or_uri, tag)
+ text_or_uri = u"{%s}%s" % (text_or_uri, tag)
else:
if not _isString(text_or_uri):
text_or_uri = str(text_or_uri)
@@ -259,7 +268,7 @@
cdef public class _Document [ type LxmlDocumentType, object LxmlDocument ]:
- """Internal base class to reference a libxml document.
+ u"""Internal base class to reference a libxml document.
When instances of this class are garbage collected, the libxml
document is cleaned up.
@@ -344,7 +353,7 @@
cdef xmlNs* _findOrBuildNodeNs(self, xmlNode* c_node,
char* c_href, char* c_prefix) except NULL:
- """Get or create namespace structure for a node. Reuses the prefix if
+ u"""Get or create namespace structure for a node. Reuses the prefix if
possible.
"""
cdef xmlNs* c_ns
@@ -352,7 +361,7 @@
cdef python.PyObject* dict_result
if c_node.type != tree.XML_ELEMENT_NODE:
assert c_node.type == tree.XML_ELEMENT_NODE, \
- "invalid node type %d, expected %d" % (
+ u"invalid node type %d, expected %d" % (
c_node.type, tree.XML_ELEMENT_NODE)
# look for existing ns
c_ns = tree.xmlSearchNsByHref(self._c_doc, c_node, c_href)
@@ -379,7 +388,7 @@
return c_ns
cdef int _setNodeNs(self, xmlNode* c_node, char* href) except -1:
- "Lookup namespace structure and set it for the node."
+ u"Lookup namespace structure and set it for the node."
cdef xmlNs* c_ns
c_ns = self._findOrBuildNodeNs(c_node, href, NULL)
tree.xmlSetNs(c_node, c_ns)
@@ -409,47 +418,47 @@
cdef class DocInfo:
- "Document information provided by parser and DTD."
+ u"Document information provided by parser and DTD."
cdef _Document _doc
def __init__(self, tree):
- "Create a DocInfo object for an ElementTree object or root Element."
+ u"Create a DocInfo object for an ElementTree object or root Element."
self._doc = _documentOrRaise(tree)
root_name, public_id, system_url = self._doc.getdoctype()
if not root_name and (public_id or system_url):
- raise ValueError, "Could not find root node"
+ raise ValueError, u"Could not find root node"
property root_name:
- "Returns the name of the root node as defined by the DOCTYPE."
+ u"Returns the name of the root node as defined by the DOCTYPE."
def __get__(self):
root_name, public_id, system_url = self._doc.getdoctype()
return root_name
property public_id:
- "Returns the public ID of the DOCTYPE."
+ u"Returns the public ID of the DOCTYPE."
def __get__(self):
root_name, public_id, system_url = self._doc.getdoctype()
return public_id
property system_url:
- "Returns the system ID of the DOCTYPE."
+ u"Returns the system ID of the DOCTYPE."
def __get__(self):
root_name, public_id, system_url = self._doc.getdoctype()
return system_url
property xml_version:
- "Returns the XML version as declared by the document."
+ u"Returns the XML version as declared by the document."
def __get__(self):
xml_version, encoding = self._doc.getxmlinfo()
return xml_version
property encoding:
- "Returns the encoding name as declared by the document."
+ u"Returns the encoding name as declared by the document."
def __get__(self):
xml_version, encoding = self._doc.getxmlinfo()
return encoding
property URL:
- "The source URL of the document (or None if unknown)."
+ u"The source URL of the document (or None if unknown)."
def __get__(self):
if self._doc._c_doc.URL is NULL:
return None
@@ -466,35 +475,35 @@
tree.xmlFree(c_oldurl)
property doctype:
- "Returns a DOCTYPE declaration string for the document."
+ u"Returns a DOCTYPE declaration string for the document."
def __get__(self):
root_name, public_id, system_url = self._doc.getdoctype()
if public_id:
if system_url:
- return '' % (
+ return u'' % (
root_name, public_id, system_url)
else:
- return '' % (
+ return u'' % (
root_name, public_id)
elif system_url:
- return '' % (
+ return u'' % (
root_name, system_url)
else:
- return ""
+ return u""
property internalDTD:
- "Returns a DTD validator based on the internal subset of the document."
+ u"Returns a DTD validator based on the internal subset of the document."
def __get__(self):
return _dtdFactory(self._doc._c_doc.intSubset)
property externalDTD:
- "Returns a DTD validator based on the external subset of the document."
+ u"Returns a DTD validator based on the external subset of the document."
def __get__(self):
return _dtdFactory(self._doc._c_doc.extSubset)
cdef public class _Element [ type LxmlElementType, object LxmlElement ]:
- """Element class.
+ u"""Element class.
References a document object and a libxml node.
@@ -508,7 +517,7 @@
cdef object _attrib
def _init(self):
- """_init(self)
+ u"""_init(self)
Called after object initialisation. Custom subclasses may override
this if they recursively call _init() in the superclasses.
@@ -525,7 +534,7 @@
# MANIPULATORS
def __setitem__(self, x, value):
- """__setitem__(self, x, value)
+ u"""__setitem__(self, x, value)
Replaces the given subelement index or slice.
"""
@@ -536,7 +545,7 @@
cdef bint left_to_right
cdef Py_ssize_t slicelength, step
if value is None:
- raise ValueError, "cannot assign None"
+ raise ValueError, u"cannot assign None"
if python.PySlice_Check(x):
# slice assignment
_findChildSlice(x, self._c_node, &c_node, &step, &slicelength)
@@ -552,7 +561,7 @@
element = value
c_node = _findChild(self._c_node, x)
if c_node is NULL:
- raise IndexError, "list index out of range"
+ raise IndexError, u"list index out of range"
c_source_doc = element._c_node.doc
c_next = element._c_node.next
_removeText(c_node.next)
@@ -563,7 +572,7 @@
moveNodeToDocument(self._doc, c_node.doc, c_node)
def __delitem__(self, x):
- """__delitem__(self, x)
+ u"""__delitem__(self, x)
Deletes the given subelement or a slice.
"""
@@ -588,12 +597,12 @@
# item deletion
c_node = _findChild(self._c_node, x)
if c_node is NULL:
- raise IndexError, "index out of range: %d" % x
+ raise IndexError, u"index out of range: %d" % x
_removeText(c_node.next)
_removeNode(self._doc, c_node)
def __deepcopy__(self, memo):
- "__deepcopy__(self, memo)"
+ u"__deepcopy__(self, memo)"
return self.__copy__()
def __copy__(self):
@@ -615,21 +624,21 @@
return _elementFactory(new_doc, c_node)
def set(self, key, value):
- """set(self, key, value)
+ u"""set(self, key, value)
Sets an element attribute.
"""
_setAttributeValue(self, key, value)
def append(self, _Element element not None):
- """append(self, element)
+ u"""append(self, element)
Adds a subelement to the end of this element.
"""
_appendChild(self, element)
def addnext(self, _Element element):
- """addnext(self, element)
+ u"""addnext(self, element)
Adds the element as a following sibling directly after this
element.
@@ -641,12 +650,12 @@
if self._c_node.parent != NULL and not _isElement(self._c_node.parent):
if element._c_node.type != tree.XML_PI_NODE:
if element._c_node.type != tree.XML_COMMENT_NODE:
- raise TypeError, "Only processing instructions and comments can be siblings of the root element"
+ raise TypeError, u"Only processing instructions and comments can be siblings of the root element"
element.tail = None
_appendSibling(self, element)
def addprevious(self, _Element element):
- """addprevious(self, element)
+ u"""addprevious(self, element)
Adds the element as a preceding sibling directly before this
element.
@@ -658,12 +667,12 @@
if self._c_node.parent != NULL and not _isElement(self._c_node.parent):
if element._c_node.type != tree.XML_PI_NODE:
if element._c_node.type != tree.XML_COMMENT_NODE:
- raise TypeError, "Only processing instructions and comments can be siblings of the root element"
+ raise TypeError, u"Only processing instructions and comments can be siblings of the root element"
element.tail = None
_prependSibling(self, element)
def extend(self, elements):
- """extend(self, elements)
+ u"""extend(self, elements)
Extends the current children by the elements in the iterable.
"""
@@ -671,7 +680,7 @@
_appendChild(self, element)
def clear(self):
- """clear(self)
+ u"""clear(self)
Resets an element. This function removes all subelements, clears
all attributes and sets the text and tail properties to None.
@@ -701,7 +710,7 @@
c_node = c_node_next
def insert(self, index, _Element element not None):
- """insert(self, index, element)
+ u"""insert(self, index, element)
Inserts a subelement at the given position in this element
"""
@@ -719,7 +728,7 @@
moveNodeToDocument(self._doc, c_source_doc, element._c_node)
def remove(self, _Element element not None):
- """remove(self, element)
+ u"""remove(self, element)
Removes a matching subelement. Unlike the find methods, this
method compares elements based on identity, not on tag value
@@ -729,7 +738,7 @@
cdef xmlNode* c_next
c_node = element._c_node
if c_node.parent is not self._c_node:
- raise ValueError, "Element is not a child of this node."
+ raise ValueError, u"Element is not a child of this node."
c_next = element._c_node.next
tree.xmlUnlinkNode(c_node)
_moveTail(c_next, c_node)
@@ -738,7 +747,7 @@
def replace(self, _Element old_element not None,
_Element new_element not None):
- """replace(self, old_element, new_element)
+ u"""replace(self, old_element, new_element)
Replaces a subelement with the element passed as second argument.
"""
@@ -749,7 +758,7 @@
cdef xmlDoc* c_source_doc
c_old_node = old_element._c_node
if c_old_node.parent is not self._c_node:
- raise ValueError, "Element is not a child of this node."
+ raise ValueError, u"Element is not a child of this node."
c_old_next = c_old_node.next
c_new_node = new_element._c_node
c_new_next = c_new_node.next
@@ -763,7 +772,7 @@
# PROPERTIES
property tag:
- """Element tag
+ u"""Element tag
"""
def __get__(self):
if self._tag is not None:
@@ -787,7 +796,7 @@
self._doc._setNodeNs(self._c_node, _cstr(ns))
property attrib:
- """Element attribute dictionary. Where possible, use get(), set(),
+ u"""Element attribute dictionary. Where possible, use get(), set(),
keys(), values() and items() to access element attributes.
"""
def __get__(self):
@@ -796,7 +805,7 @@
return self._attrib
property text:
- """Text before the first subelement. This is either a string or
+ u"""Text before the first subelement. This is either a string or
the value None, if there was no text.
"""
def __get__(self):
@@ -813,7 +822,7 @@
# _setNodeText(self._c_node, None)
property tail:
- """Text after this element's end tag, but before the next sibling
+ u"""Text after this element's end tag, but before the next sibling
element's start tag. This is either a string or the value None, if
there was no text.
"""
@@ -829,7 +838,7 @@
# not in ElementTree, read-only
property prefix:
- """Namespace prefix or None.
+ u"""Namespace prefix or None.
"""
def __get__(self):
if self._c_node.ns is not NULL:
@@ -839,7 +848,7 @@
# not in ElementTree, read-only
property sourceline:
- """Original line number as found by the parser or None if unknown.
+ u"""Original line number as found by the parser or None if unknown.
"""
def __get__(self):
cdef long line
@@ -857,7 +866,7 @@
# not in ElementTree, read-only
property nsmap:
- """Namespace prefix->URI mapping known in the context of this Element.
+ u"""Namespace prefix->URI mapping known in the context of this Element.
"""
def __get__(self):
cdef xmlNode* c_node
@@ -880,7 +889,7 @@
# not in ElementTree, read-only
property base:
- """The base URI of the Element (xml:base or HTML base URL).
+ u"""The base URI of the Element (xml:base or HTML base URL).
None if the base URI is unknown.
Note that the value depends on the URL of the document that
@@ -912,11 +921,11 @@
# ACCESSORS
def __repr__(self):
- "__repr__(self)"
- return "" % (self.tag, id(self))
-
+ u"__repr__(self)"
+ return u"" % (self.tag, id(self))
+
def __getitem__(self, x):
- """Returns the subelement at the given position or the requested
+ u"""Returns the subelement at the given position or the requested
slice.
"""
cdef xmlNode* c_node
@@ -948,29 +957,29 @@
# indexing
c_node = _findChild(self._c_node, x)
if c_node is NULL:
- raise IndexError, "list index out of range"
+ raise IndexError, u"list index out of range"
return _elementFactory(self._doc, c_node)
def __len__(self):
- """__len__(self)
+ u"""__len__(self)
Returns the number of subelements.
"""
return _countElements(self._c_node.children)
def __nonzero__(self):
- "__nonzero__(self)"
+ u"__nonzero__(self)"
import warnings
warnings.warn(
- "The behavior of this method will change in future versions. "
- "Use specific 'len(elem)' or 'elem is not None' test instead.",
+ u"The behavior of this method will change in future versions. "
+ u"Use specific 'len(elem)' or 'elem is not None' test instead.",
FutureWarning
)
# emulate old behaviour
return _hasChild(self._c_node)
def __contains__(self, element):
- "__contains__(self, element)"
+ u"__contains__(self, element)"
cdef xmlNode* c_node
if not isinstance(element, _Element):
return 0
@@ -978,15 +987,15 @@
return c_node is not NULL and c_node.parent is self._c_node
def __iter__(self):
- "__iter__(self)"
+ u"__iter__(self)"
return ElementChildIterator(self)
def __reversed__(self):
- "__reversed__(self)"
+ u"__reversed__(self)"
return ElementChildIterator(self, reversed=True)
def index(self, _Element child not None, start=None, stop=None):
- """index(self, child, start=None, stop=None)
+ u"""index(self, child, start=None, stop=None)
Find the position of the child within the parent.
@@ -998,7 +1007,7 @@
cdef xmlNode* c_start_node
c_child = child._c_node
if c_child.parent is not self._c_node:
- raise ValueError, "Element is not a child of this node."
+ raise ValueError, u"Element is not a child of this node."
# handle the unbounded search straight away (normal case)
if stop is None and (start is None or start == 0):
@@ -1021,7 +1030,7 @@
c_stop = stop
if c_stop == 0 or \
c_start >= c_stop and (c_stop > 0 or c_start < 0):
- raise ValueError, "list.index(x): x not in slice"
+ raise ValueError, u"list.index(x): x not in slice"
# for negative slice indices, check slice before searching index
if c_start < 0 or c_stop < 0:
@@ -1039,9 +1048,9 @@
if c_start_node == c_child:
# found! before slice end?
if c_stop < 0 and l <= -c_stop:
- raise ValueError, "list.index(x): x not in slice"
+ raise ValueError, u"list.index(x): x not in slice"
elif c_start < 0:
- raise ValueError, "list.index(x): x not in slice"
+ raise ValueError, u"list.index(x): x not in slice"
# now determine the index backwards from child
c_child = c_child.prev
@@ -1066,19 +1075,19 @@
else:
return k
if c_start != 0 or c_stop != 0:
- raise ValueError, "list.index(x): x not in slice"
+ raise ValueError, u"list.index(x): x not in slice"
else:
- raise ValueError, "list.index(x): x not in list"
+ raise ValueError, u"list.index(x): x not in list"
def get(self, key, default=None):
- """get(self, key, default=None)
+ u"""get(self, key, default=None)
Gets an element attribute.
"""
return _getAttributeValue(self, key, default)
def keys(self):
- """keys(self)
+ u"""keys(self)
Gets a list of attribute names. The names are returned in an
arbitrary order (just like for an ordinary Python dictionary).
@@ -1086,7 +1095,7 @@
return _collectAttributes(self._c_node, 1)
def values(self):
- """values(self)
+ u"""values(self)
Gets element attribute values as a sequence of strings. The
attributes are returned in an arbitrary order.
@@ -1094,7 +1103,7 @@
return _collectAttributes(self._c_node, 2)
def items(self):
- """items(self)
+ u"""items(self)
Gets element attributes, as a sequence. The attributes are returned in
an arbitrary order.
@@ -1102,7 +1111,7 @@
return _collectAttributes(self._c_node, 3)
def getchildren(self):
- """getchildren(self)
+ u"""getchildren(self)
Returns all direct children. The elements are returned in document
order.
@@ -1114,7 +1123,7 @@
return _collectChildren(self)
def getparent(self):
- """getparent(self)
+ u"""getparent(self)
Returns the parent of this element or None for the root element.
"""
@@ -1125,7 +1134,7 @@
return _elementFactory(self._doc, c_node)
def getnext(self):
- """getnext(self)
+ u"""getnext(self)
Returns the following sibling of this element or None.
"""
@@ -1136,7 +1145,7 @@
return _elementFactory(self._doc, c_node)
def getprevious(self):
- """getprevious(self)
+ u"""getprevious(self)
Returns the preceding sibling of this element or None.
"""
@@ -1147,7 +1156,7 @@
return _elementFactory(self._doc, c_node)
def itersiblings(self, tag=None, *, preceding=False):
- """itersiblings(self, tag=None, preceding=False)
+ u"""itersiblings(self, tag=None, preceding=False)
Iterate over the following or preceding siblings of this element.
@@ -1159,7 +1168,7 @@
return SiblingsIterator(self, tag, preceding=preceding)
def iterancestors(self, tag=None):
- """iterancestors(self, tag=None)
+ u"""iterancestors(self, tag=None)
Iterate over the ancestors of this element (from parent to parent).
@@ -1169,7 +1178,7 @@
return AncestorsIterator(self, tag)
def iterdescendants(self, tag=None):
- """iterdescendants(self, tag=None)
+ u"""iterdescendants(self, tag=None)
Iterate over the descendants of this element in document order.
@@ -1180,7 +1189,7 @@
return ElementDepthFirstIterator(self, tag, inclusive=False)
def iterchildren(self, tag=None, *, reversed=False):
- """iterchildren(self, tag=None, reversed=False)
+ u"""iterchildren(self, tag=None, reversed=False)
Iterate over the children of this element.
@@ -1191,7 +1200,7 @@
return ElementChildIterator(self, tag, reversed=reversed)
def getroottree(self):
- """getroottree(self)
+ u"""getroottree(self)
Return an ElementTree for the root node of the document that
contains this element.
@@ -1202,7 +1211,7 @@
return _elementTreeFactory(self._doc, None)
def getiterator(self, tag=None):
- """getiterator(self, tag=None)
+ u"""getiterator(self, tag=None)
Returns a sequence or iterator of all elements in the subtree in
document order (depth first pre-order), starting with this
@@ -1225,7 +1234,7 @@
return ElementDepthFirstIterator(self, tag)
def iter(self, tag=None):
- """iter(self, tag=None)
+ u"""iter(self, tag=None)
Iterate over all elements in the subtree in document order (depth
first pre-order), starting with this element.
@@ -1239,7 +1248,7 @@
return ElementDepthFirstIterator(self, tag)
def itertext(self, tag=None, *, with_tail=True):
- """itertext(self, tag=None, with_tail=True)
+ u"""itertext(self, tag=None, with_tail=True)
Iterates over the text content of a subtree.
@@ -1252,7 +1261,7 @@
return ElementTextIterator(self, tag, with_tail=with_tail)
def makeelement(self, _tag, attrib=None, nsmap=None, **_extra):
- """makeelement(self, _tag, attrib=None, nsmap=None, **_extra)
+ u"""makeelement(self, _tag, attrib=None, nsmap=None, **_extra)
Creates a new element associated with the same document.
"""
@@ -1260,7 +1269,7 @@
attrib, nsmap, _extra)
def find(self, path):
- """find(self, path)
+ u"""find(self, path)
Finds the first matching subelement, by tag name or path.
"""
@@ -1269,7 +1278,7 @@
return _elementpath.find(self, path)
def findtext(self, path, default=None):
- """findtext(self, path, default=None)
+ u"""findtext(self, path, default=None)
Finds text for the first matching subelement, by tag name or path.
"""
@@ -1278,7 +1287,7 @@
return _elementpath.findtext(self, path, default)
def findall(self, path):
- """findall(self, path)
+ u"""findall(self, path)
Finds all matching subelements, by tag name or path.
"""
@@ -1287,7 +1296,7 @@
return _elementpath.findall(self, path)
def iterfind(self, path):
- """iterfind(self, path)
+ u"""iterfind(self, path)
Iterates over all matching subelements, by tag name or path.
"""
@@ -1296,7 +1305,7 @@
return _elementpath.iterfind(self, path)
def xpath(self, _path, *, namespaces=None, extensions=None, **_variables):
- """xpath(self, _path, namespaces=None, extensions=None, **_variables)
+ u"""xpath(self, _path, namespaces=None, extensions=None, **_variables)
Evaluate an xpath expression using the element as context node.
"""
@@ -1359,22 +1368,22 @@
cdef class __ContentOnlyElement(_Element):
cdef int _raiseImmutable(self) except -1:
- raise TypeError, "this element does not have children or attributes"
+ raise TypeError, u"this element does not have children or attributes"
def set(self, key, value):
- "set(self, key, value)"
+ u"set(self, key, value)"
self._raiseImmutable()
def append(self, value):
- "append(self, value)"
+ u"append(self, value)"
self._raiseImmutable()
def insert(self, index, value):
- "insert(self, index, value)"
+ u"insert(self, index, value)"
self._raiseImmutable()
def __setitem__(self, index, value):
- "__setitem__(self, index, value)"
+ u"__setitem__(self, index, value)"
self._raiseImmutable()
property attrib:
@@ -1400,30 +1409,30 @@
# ACCESSORS
def __getitem__(self, x):
- "__getitem__(self, x)"
+ u"__getitem__(self, x)"
if python.PySlice_Check(x):
return []
else:
- raise IndexError, "list index out of range"
+ raise IndexError, u"list index out of range"
def __len__(self):
- "__len__(self)"
+ u"__len__(self)"
return 0
def get(self, key, default=None):
- "get(self, key, default=None)"
+ u"get(self, key, default=None)"
return None
def keys(self):
- "keys(self)"
+ u"keys(self)"
return []
def items(self):
- "items(self)"
+ u"items(self)"
return []
def values(self):
- "values(self)"
+ u"values(self)"
return []
cdef class _Comment(__ContentOnlyElement):
@@ -1432,7 +1441,7 @@
return Comment
def __repr__(self):
- return "" % self.text
+ return u"" % self.text
cdef class _ProcessingInstruction(__ContentOnlyElement):
property tag:
@@ -1452,9 +1461,9 @@
def __repr__(self):
text = self.text
if text:
- return "%s %s?>" % (self.target, text)
+ return u"%s %s?>" % (self.target, text)
else:
- return "%s?>" % self.target
+ return u"%s?>" % self.target
cdef class _Entity(__ContentOnlyElement):
property tag:
@@ -1468,8 +1477,8 @@
def __set__(self, value):
value = _utf8(value)
- assert '&' not in value and ';' not in value, \
- "Invalid entity name '%s'" % value
+ assert u'&' not in value and u';' not in value, \
+ u"Invalid entity name '%s'" % value
c_text = _cstr(value)
tree.xmlNodeSetName(self._c_node, c_text)
@@ -1477,10 +1486,10 @@
# FIXME: should this be None or '&[VALUE];' or the resolved
# entity value ?
def __get__(self):
- return '&%s;' % funicode(self._c_node.name)
+ return u'&%s;' % funicode(self._c_node.name)
def __repr__(self):
- return "&%s;" % self.name
+ return u"&%s;" % self.name
cdef public class _ElementTree [ type LxmlElementTreeType,
@@ -1493,16 +1502,16 @@
# to honour tree restructuring. _doc can happily be None!
cdef _assertHasRoot(self):
- """We have to take care here: the document may not have a root node!
+ u"""We have to take care here: the document may not have a root node!
This can happen if ElementTree() is called without any argument and
the caller 'forgets' to call parse() afterwards, so this is a bug in
the caller program.
"""
assert self._context_node is not None, \
- "ElementTree not initialized, missing root"
+ u"ElementTree not initialized, missing root"
def parse(self, source, _BaseParser parser=None, *, base_url=None):
- """parse(self, source, parser=None, base_url=None)
+ u"""parse(self, source, parser=None, base_url=None)
Updates self with the content of source and returns its root
"""
@@ -1516,17 +1525,17 @@
return self._context_node
def _setroot(self, _Element root not None):
- """_setroot(self, root)
+ u"""_setroot(self, root)
Relocate the ElementTree to a new root node.
"""
if root._c_node.type != tree.XML_ELEMENT_NODE:
- raise TypeError, "Only elements can be the root of an ElementTree"
+ raise TypeError, u"Only elements can be the root of an ElementTree"
self._context_node = root
self._doc = None
def getroot(self):
- """getroot(self)
+ u"""getroot(self)
Gets the root element for this tree.
"""
@@ -1543,7 +1552,7 @@
# not in ElementTree, read-only
property docinfo:
- """Information about the document provided by parser and DTD. This
+ u"""Information about the document provided by parser and DTD. This
value is only defined for ElementTree objects based on the root node
of a parsed document (e.g. those returned by the parse functions).
"""
@@ -1553,7 +1562,7 @@
# not in ElementTree, read-only
property parser:
- """The parser that was used to parse the document in this ElementTree.
+ u"""The parser that was used to parse the document in this ElementTree.
"""
def __get__(self):
if self._context_node is not None and \
@@ -1563,10 +1572,10 @@
return self._doc._parser
return None
- def write(self, file, *, encoding=None, method="xml",
+ def write(self, file, *, encoding=None, method=u"xml",
pretty_print=False, xml_declaration=None, with_tail=True):
- """write(self, file, encoding=None, method="xml",
- pretty_print=False, xml_declaration=None, with_tail=True)
+ u"""write(self, file, encoding=None, method="xml",
+ pretty_print=False, xml_declaration=None, with_tail=True)
Write the tree to a file or file-like object.
@@ -1581,19 +1590,19 @@
if xml_declaration is not None:
write_declaration = xml_declaration
if encoding is None:
- encoding = 'ASCII'
+ encoding = u'ASCII'
elif encoding is None:
- encoding = 'ASCII'
+ encoding = u'ASCII'
write_declaration = 0
else:
encoding = encoding.upper()
write_declaration = encoding not in \
- ('US-ASCII', 'ASCII', 'UTF8', 'UTF-8')
+ (u'US-ASCII', u'ASCII', u'UTF8', u'UTF-8')
_tofilelike(file, self._context_node, encoding, method,
write_declaration, 1, pretty_print, with_tail)
def getpath(self, _Element element not None):
- """getpath(self, element)
+ u"""getpath(self, element)
Returns a structural, absolute XPath expression to find that element.
"""
@@ -1602,7 +1611,7 @@
cdef char* c_path
doc = self._context_node._doc
if element._doc is not doc:
- raise ValueError, "Element is not in this tree."
+ raise ValueError, u"Element is not in this tree."
c_doc = _fakeRootDoc(doc._c_doc, self._context_node._c_node)
c_path = tree.xmlGetNodePath(element._c_node)
_destroyFakeDoc(doc._c_doc, c_doc)
@@ -1613,7 +1622,7 @@
return path
def getiterator(self, tag=None):
- """getiterator(self, tag=None)
+ u"""getiterator(self, tag=None)
Returns a sequence or iterator of all elements in document order
(depth first pre-order), starting with the root element.
@@ -1639,7 +1648,7 @@
return root.getiterator(tag)
def iter(self, tag=None):
- """iter(self, tag=None)
+ u"""iter(self, tag=None)
Creates an iterator for the root element. The iterator loops over
all elements in this tree, in document order.
@@ -1650,19 +1659,19 @@
return root.iter(tag)
def find(self, path):
- """find(self, path)
+ u"""find(self, path)
Finds the first toplevel element with given tag. Same as
``tree.getroot().find(path)``.
"""
self._assertHasRoot()
root = self.getroot()
- if _isString(path) and path[:1] == "/":
- path = "." + path
+ if _isString(path) and path[:1] == u"/":
+ path = u"." + path
return root.find(path)
def findtext(self, path, default=None):
- """findtext(self, path, default=None)
+ u"""findtext(self, path, default=None)
Finds the text for the first element matching the ElementPath
expression. Same as getroot().findtext(path)
@@ -1674,19 +1683,19 @@
return root.findtext(path, default)
def findall(self, path):
- """findall(self, path)
+ u"""findall(self, path)
Finds all elements matching the ElementPath expression. Same as
getroot().findall(path).
"""
self._assertHasRoot()
root = self.getroot()
- if _isString(path) and path[:1] == "/":
- path = "." + path
+ if _isString(path) and path[:1] == u"/":
+ path = u"." + path
return root.findall(path)
def iterfind(self, path):
- """iterfind(self, path)
+ u"""iterfind(self, path)
Iterates over all elements matching the ElementPath expression.
Same as getroot().finditer(path).
@@ -1698,7 +1707,7 @@
return root.iterfind(path)
def xpath(self, _path, *, namespaces=None, extensions=None, **_variables):
- """xpath(self, _path, namespaces=None, extensions=None, **_variables)
+ u"""xpath(self, _path, namespaces=None, extensions=None, **_variables)
XPath evaluate in context of document.
@@ -1721,7 +1730,7 @@
return evaluator(_path, **_variables)
def xslt(self, _xslt, extensions=None, access_control=None, **_kw):
- """xslt(self, _xslt, extensions=None, access_control=None, **_kw)
+ u"""xslt(self, _xslt, extensions=None, access_control=None, **_kw)
Transform this document using other document.
@@ -1740,7 +1749,7 @@
return style(self, **_kw)
def relaxng(self, relaxng):
- """relaxng(self, relaxng)
+ u"""relaxng(self, relaxng)
Validate this document using other document.
@@ -1758,7 +1767,7 @@
return schema.validate(self)
def xmlschema(self, xmlschema):
- """xmlschema(self, xmlschema)
+ u"""xmlschema(self, xmlschema)
Validate this document using other document.
@@ -1776,7 +1785,7 @@
return schema.validate(self)
def xinclude(self):
- """xinclude(self)
+ u"""xinclude(self)
Process the XInclude nodes in this document and include the
referenced XML fragments.
@@ -1792,7 +1801,7 @@
XInclude()(self._context_node)
def write_c14n(self, file):
- """write_c14n(self, file)
+ u"""write_c14n(self, file)
C14N write of document. Always writes UTF-8.
"""
@@ -1815,7 +1824,7 @@
cdef class _Attrib:
- """A dict-like proxy for the ``Element.attrib`` property.
+ u"""A dict-like proxy for the ``Element.attrib`` property.
"""
cdef _Element _element
def __init__(self, _Element element not None):
@@ -1836,7 +1845,7 @@
def pop(self, key, *default):
if python.PyTuple_GET_SIZE(default) > 1:
- raise TypeError, "pop expected at most 2 arguments, got %d" % (
+ raise TypeError, u"pop expected at most 2 arguments, got %d" % (
python.PyTuple_GET_SIZE(default)+1)
result = _getAttributeValue(self._element, key, None)
if result is None:
@@ -1941,7 +1950,7 @@
return python.PyObject_RichCompare(one, other, op)
cdef class _AttribIterator:
- """Attribute iterator - for internal use only!
+ u"""Attribute iterator - for internal use only!
"""
# XML attributes must not be removed while running!
cdef _Element _node
@@ -2042,7 +2051,7 @@
return current_node
cdef class ElementChildIterator(_ElementIterator):
- """ElementChildIterator(self, node, tag=None, reversed=False)
+ u"""ElementChildIterator(self, node, tag=None, reversed=False)
Iterates over the children of an element.
"""
def __init__(self, _Element node not None, tag=None, *, reversed=False):
@@ -2065,7 +2074,7 @@
self._node = _elementFactory(node._doc, c_node)
cdef class SiblingsIterator(_ElementIterator):
- """SiblingsIterator(self, node, tag=None, preceding=False)
+ u"""SiblingsIterator(self, node, tag=None, preceding=False)
Iterates over the siblings of an element.
You can pass the boolean keyword ``preceding`` to specify the direction.
@@ -2079,7 +2088,7 @@
self._storeNext(node)
cdef class AncestorsIterator(_ElementIterator):
- """AncestorsIterator(self, node, tag=None)
+ u"""AncestorsIterator(self, node, tag=None)
Iterates over the ancestors of an element (from parent to parent).
"""
def __init__(self, _Element node not None, tag=None):
@@ -2088,7 +2097,7 @@
self._storeNext(node)
cdef class ElementDepthFirstIterator(_ElementTagMatcher):
- """ElementDepthFirstIterator(self, node, tag=None, inclusive=True)
+ u"""ElementDepthFirstIterator(self, node, tag=None, inclusive=True)
Iterates over an element and its sub-elements in document order (depth
first pre-order).
@@ -2166,7 +2175,7 @@
return NULL
cdef class ElementTextIterator:
- """ElementTextIterator(self, element, tag=None, with_tail=True)
+ u"""ElementTextIterator(self, element, tag=None, with_tail=True)
Iterates over the text content of a subtree.
You can pass the ``tag`` keyword argument to restrict text content to a
@@ -2179,9 +2188,9 @@
cdef _Element _start_element
def __init__(self, _Element element not None, tag=None, *, with_tail=True):
if with_tail:
- events = ("start", "end")
+ events = (u"start", u"end")
else:
- events = ("start",)
+ events = (u"start",)
self._start_element = element
self._nextEvent = iterwalk(element, events=events, tag=tag).next
@@ -2192,7 +2201,7 @@
cdef _Element element
while result is None:
event, element = self._nextEvent() # raises StopIteration
- if event == "start":
+ if event == u"start":
result = element.text
elif element is not self._start_element:
result = element.tail
@@ -2221,7 +2230,7 @@
# module-level API for ElementTree
def Element(_tag, attrib=None, nsmap=None, **_extra):
- """Element(_tag, attrib=None, nsmap=None, **_extra)
+ u"""Element(_tag, attrib=None, nsmap=None, **_extra)
Element factory. This function returns an object implementing the
Element interface.
@@ -2231,7 +2240,7 @@
attrib, nsmap, _extra)
def Comment(text=None):
- """Comment(text=None)
+ u"""Comment(text=None)
Comment element factory. This factory function creates a special element that will
be serialized as an XML comment.
@@ -2250,7 +2259,7 @@
return _elementFactory(doc, c_node)
def ProcessingInstruction(target, text=None):
- """ProcessingInstruction(target, text=None)
+ u"""ProcessingInstruction(target, text=None)
ProcessingInstruction element factory. This factory function creates a
special element that will be serialized as an XML processing instruction.
@@ -2272,7 +2281,7 @@
PI = ProcessingInstruction
cdef class CDATA:
- """CDATA(data)
+ u"""CDATA(data)
CDATA factory. This factory creates an opaque data object that
can be used to set Element text. The usual way to use it is::
@@ -2286,7 +2295,7 @@
self._utf8_data = _utf8(data)
def Entity(name):
- """Entity(name)
+ u"""Entity(name)
Entity factory. This factory function creates a special element
that will be serialized as an XML entity reference or character
@@ -2302,9 +2311,9 @@
c_name = _cstr(name_utf)
if c_name[0] == c'#':
if not _characterReferenceIsValid(c_name + 1):
- raise ValueError, "Invalid character reference: '%s'" % name
+ raise ValueError, u"Invalid character reference: '%s'" % name
elif not _xmlNameIsValid(c_name):
- raise ValueError, "Invalid entity reference: '%s'" % name
+ raise ValueError, u"Invalid entity reference: '%s'" % name
c_doc = _newXMLDoc()
doc = _documentFactory(c_doc, None)
c_node = _createEntity(c_doc, c_name)
@@ -2313,7 +2322,7 @@
def SubElement(_Element _parent not None, _tag,
attrib=None, nsmap=None, **_extra):
- """SubElement(_parent, _tag, attrib=None, nsmap=None, **_extra)
+ u"""SubElement(_parent, _tag, attrib=None, nsmap=None, **_extra)
Subelement factory. This function creates an element instance, and
appends it to an existing element.
@@ -2321,7 +2330,7 @@
return _makeSubElement(_parent, _tag, None, None, attrib, nsmap, _extra)
def ElementTree(_Element element=None, *, file=None, _BaseParser parser=None):
- """ElementTree(element=None, file=None, parser=None)
+ u"""ElementTree(element=None, file=None, parser=None)
ElementTree wrapper class.
"""
@@ -2346,7 +2355,7 @@
return _elementTreeFactory(doc, element)
def HTML(text, _BaseParser parser=None, *, base_url=None):
- """HTML(text, parser=None, base_url=None)
+ u"""HTML(text, parser=None, base_url=None)
Parses an HTML document from a string constant. This function can be used
to embed "HTML literals" in Python code.
@@ -2370,7 +2379,7 @@
return result_container.result
def XML(text, _BaseParser parser=None, *, base_url=None):
- """XML(text, parser=None, base_url=None)
+ u"""XML(text, parser=None, base_url=None)
Parses an XML document from a string constant. This function can be used
to embed "XML literals" in Python code, like in
@@ -2396,7 +2405,7 @@
return result_container.result
def fromstring(text, _BaseParser parser=None, *, base_url=None):
- """fromstring(text, parser=None, base_url=None)
+ u"""fromstring(text, parser=None, base_url=None)
Parses an XML document from a string.
@@ -2415,7 +2424,7 @@
return result_container.result
def fromstringlist(strings, _BaseParser parser=None):
- """fromstringlist(strings, parser=None)
+ u"""fromstringlist(strings, parser=None)
Parses an XML document from a sequence of strings.
@@ -2431,14 +2440,14 @@
return parser.close()
def iselement(element):
- """iselement(element)
+ u"""iselement(element)
Checks if an object appears to be a valid element object.
"""
return isinstance(element, _Element)
def dump(_Element elem not None, *, pretty_print=True, with_tail=True):
- """dump(elem, pretty_print=True, with_tail=True)
+ u"""dump(elem, pretty_print=True, with_tail=True)
Writes an element tree or element structure to sys.stdout. This function
should be used for debugging only.
@@ -2447,7 +2456,7 @@
def tostring(element_or_tree, *, encoding=None, method="xml",
xml_declaration=None, pretty_print=False, with_tail=True):
- """tostring(element_or_tree, encoding=None, method="xml",
+ u"""tostring(element_or_tree, encoding=None, method="xml",
xml_declaration=None, pretty_print=False, with_tail=True)
Serialize an element to an encoded string representation of its XML
@@ -2474,16 +2483,16 @@
if encoding is _unicode:
if xml_declaration:
raise ValueError, \
- "Serialisation to unicode must not request an XML declaration"
+ u"Serialisation to unicode must not request an XML declaration"
write_declaration = 0
elif xml_declaration is None:
# by default, write an XML declaration only for non-standard encodings
write_declaration = encoding is not None and encoding.upper() not in \
- ('ASCII', 'UTF-8', 'UTF8', 'US-ASCII')
+ (u'ASCII', u'UTF-8', u'UTF8', u'US-ASCII')
else:
write_declaration = xml_declaration
if encoding is None:
- encoding = 'ASCII'
+ encoding = u'ASCII'
if isinstance(element_or_tree, _Element):
return _tostring(<_Element>element_or_tree, encoding, method,
@@ -2493,11 +2502,11 @@
encoding, method, write_declaration, 1, pretty_print,
with_tail)
else:
- raise TypeError, "Type '%s' cannot be serialized." % \
+ raise TypeError, u"Type '%s' cannot be serialized." % \
python._fqtypename(element_or_tree)
def tostringlist(element_or_tree, *args, **kwargs):
- """tostringlist(element_or_tree, *args, **kwargs)
+ u"""tostringlist(element_or_tree, *args, **kwargs)
Serialize an element to an encoded string representation of its XML
tree, stored in a list of partial strings.
@@ -2507,9 +2516,9 @@
"""
return [tostring(element_or_tree, *args, **kwargs)]
-def tounicode(element_or_tree, *, method="xml", pretty_print=False,
+def tounicode(element_or_tree, *, method=u"xml", pretty_print=False,
with_tail=True):
- """tounicode(element_or_tree, method="xml", pretty_print=False,
+ u"""tounicode(element_or_tree, method="xml", pretty_print=False,
with_tail=True)
Serialize an element to the Python unicode representation of its XML
@@ -2537,11 +2546,11 @@
return _tostring((<_ElementTree>element_or_tree)._context_node,
_unicode, method, 0, 1, pretty_print, with_tail)
else:
- raise TypeError, "Type '%s' cannot be serialized." % \
+ raise TypeError, u"Type '%s' cannot be serialized." % \
type(element_or_tree)
def parse(source, _BaseParser parser=None, *, base_url=None):
- """parse(source, parser=None, base_url=None)
+ u"""parse(source, parser=None, base_url=None)
Return an ElementTree object loaded with source elements. If no parser
is provided as second argument, the default parser is used.
@@ -2590,7 +2599,7 @@
# Validation
class DocumentInvalid(LxmlError):
- """Validation error.
+ u"""Validation error.
Raised by all document validators when their ``assertValid(tree)``
method fails.
@@ -2598,14 +2607,14 @@
pass
cdef class _Validator:
- "Base class for XML validators."
+ u"Base class for XML validators."
cdef _ErrorLog _error_log
def __init__(self):
- "__init__(self)"
+ u"__init__(self)"
self._error_log = _ErrorLog()
def validate(self, etree):
- """validate(self, etree)
+ u"""validate(self, etree)
Validate the document using this schema.
@@ -2614,26 +2623,26 @@
return self(etree)
def assertValid(self, etree):
- """assertValid(self, etree)
+ u"""assertValid(self, etree)
Raises `DocumentInvalid` if the document does not comply with the schema.
"""
if not self(etree):
raise DocumentInvalid(self._error_log._buildExceptionMessage(
- "Document does not comply with schema"),
+ u"Document does not comply with schema"),
self._error_log)
def assert_(self, etree):
- """assert_(self, etree)
+ u"""assert_(self, etree)
Raises `AssertionError` if the document does not comply with the schema.
"""
if not self(etree):
raise AssertionError, self._error_log._buildExceptionMessage(
- "Document does not comply with schema")
+ u"Document does not comply with schema")
property error_log:
- "The log of validation errors and warnings."
+ u"The log of validation errors and warnings."
def __get__(self):
return self._error_log.copy()
Modified: lxml/trunk/src/lxml/nsclasses.pxi
==============================================================================
--- lxml/trunk/src/lxml/nsclasses.pxi (original)
+++ lxml/trunk/src/lxml/nsclasses.pxi Tue May 20 00:00:11 2008
@@ -1,18 +1,18 @@
# module-level API for namespace implementations
class LxmlRegistryError(LxmlError):
- """Base class of lxml registry errors.
+ u"""Base class of lxml registry errors.
"""
pass
class NamespaceRegistryError(LxmlRegistryError):
- """Error registering a namespace extension.
+ u"""Error registering a namespace extension.
"""
pass
cdef class _NamespaceRegistry:
- "Dictionary-like namespace registry"
+ u"Dictionary-like namespace registry"
cdef object _ns_uri
cdef object _ns_uri_utf
cdef object _entries
@@ -28,7 +28,7 @@
self._entries = {}
def update(self, class_dict_iterable):
- """update(self, class_dict_iterable)
+ u"""update(self, class_dict_iterable)
Forgivingly update the registry.
@@ -56,14 +56,14 @@
cdef python.PyObject* dict_result
dict_result = python.PyDict_GetItem(self._entries, name)
if dict_result is NULL:
- raise KeyError, "Name not registered."
+ raise KeyError, u"Name not registered."
return dict_result
cdef object _getForString(self, char* name):
cdef python.PyObject* dict_result
dict_result = python.PyDict_GetItemString(self._entries, name)
if dict_result is NULL:
- raise KeyError, "Name not registered."
+ raise KeyError, u"Name not registered."
return dict_result
def __iter__(self):
@@ -79,21 +79,21 @@
python.PyDict_Clear(self._entries)
cdef class _ClassNamespaceRegistry(_NamespaceRegistry):
- "Dictionary-like registry for namespace implementation classes"
+ u"Dictionary-like registry for namespace implementation classes"
def __setitem__(self, name, item):
if not python.PyType_Check(item) or not issubclass(item, ElementBase):
raise NamespaceRegistryError, \
- "Registered element classes must be subtypes of ElementBase"
+ u"Registered element classes must be subtypes of ElementBase"
if name is not None:
name = _utf8(name)
self._entries[name] = item
def __repr__(self):
- return "Namespace(%r)" % self._ns_uri
+ return u"Namespace(%r)" % self._ns_uri
cdef class ElementNamespaceClassLookup(FallbackElementClassLookup):
- """ElementNamespaceClassLookup(self, fallback=None)
+ u"""ElementNamespaceClassLookup(self, fallback=None)
Element class lookup scheme that searches the Element class in the
Namespace registry.
@@ -105,7 +105,7 @@
self._lookup_function = _find_nselement_class
def get_namespace(self, ns_uri):
- """get_namespace(self, ns_uri)
+ u"""get_namespace(self, ns_uri)
Retrieve the namespace object associated with the given URI.
@@ -165,7 +165,7 @@
__FUNCTION_NAMESPACE_REGISTRIES = {}
def FunctionNamespace(ns_uri):
- """FunctionNamespace(ns_uri)
+ u"""FunctionNamespace(ns_uri)
Retrieve the function namespace object associated with the given
URI.
@@ -187,21 +187,21 @@
def __setitem__(self, name, item):
if not callable(item):
raise NamespaceRegistryError, \
- "Registered functions must be callable."
+ u"Registered functions must be callable."
if not name:
raise ValueError, \
- "extensions must have non empty names"
+ u"extensions must have non empty names"
self._entries[_utf8(name)] = item
def __repr__(self):
- return "FunctionNamespace(%r)" % self._ns_uri
+ return u"FunctionNamespace(%r)" % self._ns_uri
cdef class _XPathFunctionNamespaceRegistry(_FunctionNamespaceRegistry):
cdef object _prefix
cdef object _prefix_utf
property prefix:
- "Namespace prefix for extension functions."
+ u"Namespace prefix for extension functions."
def __del__(self):
self._prefix = None # no prefix configured
self._prefix_utf = None
@@ -220,7 +220,7 @@
self._prefix = prefix
cdef object _find_all_extension_prefixes():
- "Internal lookup function to find all function prefixes for XSLT/XPath."
+ u"Internal lookup function to find all function prefixes for XSLT/XPath."
cdef _XPathFunctionNamespaceRegistry registry
ns_prefixes = []
for registry in __FUNCTION_NAMESPACE_REGISTRIES.itervalues():
Modified: lxml/trunk/src/lxml/parser.pxi
==============================================================================
--- lxml/trunk/src/lxml/parser.pxi (original)
+++ lxml/trunk/src/lxml/parser.pxi Tue May 20 00:00:11 2008
@@ -4,14 +4,14 @@
cimport htmlparser
class ParseError(LxmlSyntaxError):
- """Syntax error while parsing an XML document.
+ u"""Syntax error while parsing an XML document.
For compatibility with ElementTree 1.3 and later.
"""
pass
class XMLSyntaxError(ParseError):
- """Syntax error while parsing an XML document.
+ u"""Syntax error while parsing an XML document.
"""
def __init__(self, message, code, line, column):
ParseError.__init__(self, message)
@@ -19,7 +19,7 @@
self.code = code
class ParserError(LxmlError):
- """Internal lxml parser error.
+ u"""Internal lxml parser error.
"""
pass
@@ -40,16 +40,16 @@
xmlparser.xmlDictFree(self._c_dict)
cdef void initMainParserContext(self):
- """Put the global context into the thread dictionary of the main
+ u"""Put the global context into the thread dictionary of the main
thread. To be called once and only in the main thread."""
cdef python.PyObject* thread_dict
cdef python.PyObject* result
thread_dict = python.PyThreadState_GetDict()
if thread_dict is not NULL:
- python.PyDict_SetItem(thread_dict, "_ParserDictionaryContext", self)
+ (thread_dict)[u"_ParserDictionaryContext"] = self
cdef _ParserDictionaryContext _findThreadParserContext(self):
- "Find (or create) the _ParserDictionaryContext object for the current thread"
+ u"Find (or create) the _ParserDictionaryContext object for the current thread"
cdef python.PyObject* thread_dict
cdef python.PyObject* result
cdef _ParserDictionaryContext context
@@ -57,21 +57,21 @@
if thread_dict is NULL:
return self
d = thread_dict
- result = python.PyDict_GetItem(d, "_ParserDictionaryContext")
+ result = python.PyDict_GetItem(d, u"_ParserDictionaryContext")
if result is not NULL:
return result
context = _ParserDictionaryContext()
- python.PyDict_SetItem(d, "_ParserDictionaryContext", context)
+ d[u"_ParserDictionaryContext"] = context
return context
cdef void setDefaultParser(self, _BaseParser parser):
- "Set the default parser for the current thread"
+ u"Set the default parser for the current thread"
cdef _ParserDictionaryContext context
context = self._findThreadParserContext()
context._default_parser = parser
cdef _BaseParser getDefaultParser(self):
- "Return (or create) the default parser of the current thread"
+ u"Return (or create) the default parser of the current thread"
cdef _ParserDictionaryContext context
context = self._findThreadParserContext()
if context._default_parser is None:
@@ -82,7 +82,7 @@
return context._default_parser
cdef tree.xmlDict* _getThreadDict(self, tree.xmlDict* default):
- "Return the thread-local dict or create a new one if necessary."
+ u"Return the thread-local dict or create a new one if necessary."
cdef _ParserDictionaryContext context
context = self._findThreadParserContext()
if context._c_dict is NULL:
@@ -110,15 +110,15 @@
xmlparser.xmlDictReference(c_thread_dict)
cdef void initParserDict(self, xmlparser.xmlParserCtxt* pctxt):
- "Assure we always use the same string dictionary."
+ u"Assure we always use the same string dictionary."
self.initThreadDictRef(&pctxt.dict)
cdef void initXPathParserDict(self, xpath.xmlXPathContext* pctxt):
- "Assure we always use the same string dictionary."
+ u"Assure we always use the same string dictionary."
self.initThreadDictRef(&pctxt.dict)
cdef void initDocDict(self, xmlDoc* result):
- "Store dict of last object parsed if no shared dict yet"
+ u"Store dict of last object parsed if no shared dict yet"
# XXX We also free the result dict here if there already was one.
# This case should only occur for new documents with empty dicts,
# otherwise we'd free data that's in use => segfault
@@ -129,7 +129,7 @@
__GLOBAL_PARSER_CONTEXT.initMainParserContext()
cdef int _checkThreadDict(tree.xmlDict* c_dict):
- """Check that c_dict is either the local thread dictionary or the global
+ u"""Check that c_dict is either the local thread dictionary or the global
parent dictionary.
"""
#if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict:
@@ -147,7 +147,7 @@
_UNICODE_ENCODING = NULL
cdef void _setupPythonUnicode():
- """Sets _UNICODE_ENCODING to the internal encoding name of Python unicode
+ u"""Sets _UNICODE_ENCODING to the internal encoding name of Python unicode
strings if libxml2 supports reading native Python unicode. This depends
on iconv and the local Python installation, so we simply check if we find
a matching encoding handler.
@@ -180,7 +180,7 @@
_UNICODE_ENCODING = enc
cdef char* _findEncodingName(char* buffer, int size):
- "Work around bug in libxml2: find iconv name of encoding on our own."
+ u"Work around bug in libxml2: find iconv name of encoding on our own."
cdef tree.xmlCharEncoding enc
enc = tree.xmlDetectCharEncoding(buffer, size)
if enc == tree.XML_CHAR_ENCODING_UTF16LE:
@@ -296,7 +296,7 @@
self._bytes = self._filelike.read(c_size)
if not python.PyString_Check(self._bytes):
raise TypeError, \
- "reading file objects must return plain strings"
+ u"reading file objects must return plain strings"
remaining = python.PyString_GET_SIZE(self._bytes)
self._bytes_read = 0
if remaining == 0:
@@ -449,7 +449,7 @@
result = python.PyThread_acquire_lock(
self._lock, python.WAIT_LOCK)
if result == 0:
- raise ParserError, "parser locking failed"
+ raise ParserError, u"parser locking failed"
self._error_log.connect()
if self._validator is not None:
self._validator.connect(self._c_ctxt)
@@ -498,21 +498,21 @@
if filename is not None and \
ctxt.lastError.domain == xmlerror.XML_FROM_IO:
if ctxt.lastError.message is not NULL:
- message = "Error reading file '%s': %s" % (
+ message = u"Error reading file '%s': %s" % (
filename, (ctxt.lastError.message).strip())
else:
- message = "Error reading '%s'" % filename
+ message = u"Error reading '%s'" % filename
raise IOError, message
elif error_log:
raise error_log._buildParseException(
- XMLSyntaxError, "Document is not well formed")
+ XMLSyntaxError, u"Document is not well formed")
elif ctxt.lastError.message is not NULL:
message = (ctxt.lastError.message).strip()
code = ctxt.lastError.code
line = ctxt.lastError.line
column = ctxt.lastError.int2
if ctxt.lastError.line > 0:
- message = "line %d: %s" % (line, message)
+ message = u"line %d: %s" % (line, message)
raise XMLSyntaxError(message, code, line, column)
else:
raise XMLSyntaxError(None, xmlerror.XML_ERR_INTERNAL_ERROR, 0, 0)
@@ -627,7 +627,7 @@
if not isinstance(self, HTMLParser) and \
not isinstance(self, XMLParser) and \
not isinstance(self, iterparse):
- raise TypeError, "This class cannot be instantiated"
+ raise TypeError, u"This class cannot be instantiated"
self._parse_options = parse_options
self._filename = filename
@@ -648,7 +648,7 @@
c_encoding = tree.xmlParseCharEncoding(_cstr(encoding))
if c_encoding == tree.XML_CHAR_ENCODING_ERROR or \
c_encoding == tree.XML_CHAR_ENCODING_NONE:
- raise LookupError, "unknown encoding: '%s'" % encoding
+ raise LookupError, u"unknown encoding: '%s'" % encoding
self._default_encoding = encoding
self._default_encoding_int = c_encoding
@@ -730,7 +730,7 @@
return c_ctxt
property error_log:
- """The error log of the last parser run.
+ u"""The error log of the last parser run.
"""
def __get__(self):
cdef _ParserContext context
@@ -738,21 +738,21 @@
return context._error_log.copy()
property resolvers:
- "The custom resolver registry of this parser."
+ u"The custom resolver registry of this parser."
def __get__(self):
return self._resolvers
property version:
- "The version of the underlying XML parser."
+ u"The version of the underlying XML parser."
def __get__(self):
- return "libxml2 %d.%d.%d" % LIBXML_VERSION
+ return u"libxml2 %d.%d.%d" % LIBXML_VERSION
def setElementClassLookup(self, ElementClassLookup lookup = None):
- ":deprecated: use ``parser.set_element_class_lookup(lookup)`` instead."
+ u":deprecated: use ``parser.set_element_class_lookup(lookup)`` instead."
self.set_element_class_lookup(lookup)
def set_element_class_lookup(self, ElementClassLookup lookup = None):
- """set_element_class_lookup(self, lookup = None)
+ u"""set_element_class_lookup(self, lookup = None)
Set a lookup scheme for element classes generated from this parser.
@@ -761,7 +761,7 @@
self._class_lookup = lookup
cdef _BaseParser _copy(self):
- "Create a new parser with the same configuration."
+ u"Create a new parser with the same configuration."
cdef _BaseParser parser
parser = self.__class__()
parser._parse_options = self._parse_options
@@ -776,14 +776,14 @@
return parser
def copy(self):
- """copy(self)
+ u"""copy(self)
Create a new parser with the same configuration.
"""
return self._copy()
def makeelement(self, _tag, attrib=None, nsmap=None, **_extra):
- """makeelement(self, _tag, attrib=None, nsmap=None, **_extra)
+ u"""makeelement(self, _tag, attrib=None, nsmap=None, **_extra)
Creates a new element associated with this parser.
"""
@@ -793,7 +793,7 @@
# internal parser methods
cdef xmlDoc* _parseUnicodeDoc(self, utext, char* c_filename) except NULL:
- """Parse unicode document, share dictionary if possible.
+ u"""Parse unicode document, share dictionary if possible.
"""
cdef _ParserContext context
cdef xmlDoc* result
@@ -835,14 +835,14 @@
cdef xmlDoc* _parseDoc(self, char* c_text, Py_ssize_t c_len,
char* c_filename) except NULL:
- """Parse document, share dictionary if possible.
+ u"""Parse document, share dictionary if possible.
"""
cdef _ParserContext context
cdef xmlDoc* result
cdef xmlparser.xmlParserCtxt* pctxt
cdef char* c_encoding
if c_len > python.INT_MAX:
- raise ParserError, "string is too long to parse it with libxml2"
+ raise ParserError, u"string is too long to parse it with libxml2"
context = self._getParserContext()
context.prepare()
@@ -941,7 +941,7 @@
cdef bint _feed_parser_running
property feed_error_log:
- """The error log of the last (or current) run of the feed parser.
+ u"""The error log of the last (or current) run of the feed parser.
Note that this is local to the feed parser and thus is
different from what the ``error_log`` property returns.
@@ -952,7 +952,7 @@
return context._error_log.copy()
def feed(self, data):
- """feed(self, data)
+ u"""feed(self, data)
Feeds data to the parser. The argument should be an 8-bit string
buffer containing encoded data, although Unicode is supported as long
@@ -986,12 +986,12 @@
elif python.PyUnicode_Check(data):
if _UNICODE_ENCODING is NULL:
raise ParserError, \
- "Unicode parsing is not supported on this platform"
+ u"Unicode parsing is not supported on this platform"
c_encoding = _UNICODE_ENCODING
c_data = python.PyUnicode_AS_DATA(data)
py_buffer_len = python.PyUnicode_GET_DATA_SIZE(data)
else:
- raise TypeError, "Parsing requires string data"
+ raise TypeError, u"Parsing requires string data"
context = self._getPushParserContext()
pctxt = context._c_ctxt
@@ -1036,7 +1036,7 @@
context.cleanup()
def close(self):
- """close(self)
+ u"""close(self)
Terminates feeding data to this parser. This tells the parser to
process any remaining data in the feed buffer, and then returns the
@@ -1051,7 +1051,7 @@
cdef xmlDoc* c_doc
cdef _Document doc
if not self._feed_parser_running:
- raise XMLSyntaxError("no element found",
+ raise XMLSyntaxError(u"no element found",
xmlerror.XML_ERR_INTERNAL_ERROR, 0, 0)
context = self._getPushParserContext()
@@ -1108,7 +1108,7 @@
)
cdef class XMLParser(_FeedParser):
- """XMLParser(self, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, remove_blank_text=False, compact=True, resolve_entities=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None)
+ u"""XMLParser(self, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, remove_blank_text=False, compact=True, resolve_entities=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None)
The XML parser.
Parsers can be supplied as additional argument to various parse
@@ -1182,7 +1182,7 @@
target, None, encoding)
cdef class ETCompatXMLParser(XMLParser):
- """ETCompatXMLParser(self, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, remove_blank_text=False, compact=True, resolve_entities=True, remove_comments=True, remove_pis=True, target=None, encoding=None, schema=None)
+ u"""ETCompatXMLParser(self, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, remove_blank_text=False, compact=True, resolve_entities=True, remove_comments=True, remove_pis=True, target=None, encoding=None, schema=None)
An XML parser with an ElementTree compatible default setup.
See the XMLParser class for details.
@@ -1220,7 +1220,7 @@
__GLOBAL_PARSER_CONTEXT.setDefaultParser(__DEFAULT_XML_PARSER)
def set_default_parser(_BaseParser parser=None):
- """set_default_parser(parser=None)
+ u"""set_default_parser(parser=None)
Set a default parser for the current thread. This parser is used
globally whenever no parser is supplied to the various parse functions of
@@ -1236,7 +1236,7 @@
__GLOBAL_PARSER_CONTEXT.setDefaultParser(parser)
def get_default_parser():
- "get_default_parser()"
+ u"get_default_parser()"
return __GLOBAL_PARSER_CONTEXT.getDefaultParser()
############################################################
@@ -1251,7 +1251,7 @@
)
cdef class HTMLParser(_FeedParser):
- """HTMLParser(self, recover=True, no_network=True, remove_blank_text=False, compact=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None)
+ u"""HTMLParser(self, recover=True, no_network=True, remove_blank_text=False, compact=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None)
The HTML parser.
This parser allows reading HTML into a normal XML tree. By
@@ -1362,7 +1362,7 @@
return result
cdef xmlDoc* _copyDocRoot(xmlDoc* c_doc, xmlNode* c_new_root) except NULL:
- "Recursively copy the document and make c_new_root the new root node."
+ u"Recursively copy the document and make c_new_root the new root node."
cdef xmlDoc* result
cdef xmlNode* c_node
result = tree.xmlCopyDoc(c_doc, 0) # non recursive
@@ -1376,7 +1376,7 @@
return result
cdef xmlNode* _copyNodeToDoc(xmlNode* c_node, xmlDoc* c_doc) except NULL:
- "Recursively copy the element into the document. c_doc is not modified."
+ u"Recursively copy the element into the document. c_doc is not modified."
cdef xmlNode* c_root
c_root = tree.xmlDocCopyNode(c_node, c_doc, 1) # recursive
if c_root is NULL:
@@ -1408,18 +1408,18 @@
else:
url = _getFilenameForFile(source)
- if hasattr(source, 'getvalue') and hasattr(source, 'tell'):
+ if hasattr(source, u'getvalue') and hasattr(source, u'tell'):
# StringIO - reading from start?
if source.tell() == 0:
return _parseMemoryDocument(
source.getvalue(), _encodeFilenameUTF8(url), parser)
# Support for file-like objects (urlgrabber.urlopen, ...)
- if hasattr(source, 'read'):
+ if hasattr(source, u'read'):
return _parseFilelikeDocument(
source, _encodeFilenameUTF8(url), parser)
- raise TypeError, "cannot parse from '%s'" % python._fqtypename(source)
+ raise TypeError, u"cannot parse from '%s'" % funicode(python._fqtypename(source))
cdef _Document _parseDocumentFromURL(url, _BaseParser parser):
cdef xmlDoc* c_doc
@@ -1431,12 +1431,12 @@
if python.PyUnicode_Check(text):
if _hasEncodingDeclaration(text):
raise ValueError, \
- "Unicode strings with encoding declaration are not supported."
+ u"Unicode strings with encoding declaration are not supported."
# pass native unicode only if libxml2 can handle it
if _UNICODE_ENCODING is NULL:
text = python.PyUnicode_AsUTF8String(text)
elif not python.PyString_Check(text):
- raise ValueError, "can only parse strings"
+ raise ValueError, u"can only parse strings"
if python.PyUnicode_Check(url):
url = python.PyUnicode_AsUTF8String(url)
c_doc = _parseDoc(text, url, parser)
Modified: lxml/trunk/src/lxml/parsertarget.pxi
==============================================================================
--- lxml/trunk/src/lxml/parsertarget.pxi (original)
+++ lxml/trunk/src/lxml/parsertarget.pxi Tue May 20 00:00:11 2008
@@ -91,13 +91,13 @@
cdef class _TargetParserContext(_SaxParserContext):
- """This class maps SAX2 events to the ET parser target interface.
+ u"""This class maps SAX2 events to the ET parser target interface.
"""
cdef object _python_target
cdef int _setTarget(self, target) except -1:
self._python_target = target
if not isinstance(target, _SaxParserTarget) or \
- hasattr(target, '__dict__'):
+ hasattr(target, u'__dict__'):
target = _PythonSaxParserTarget(target)
self._setSaxParserTarget(target)
return 0
Modified: lxml/trunk/src/lxml/proxy.pxi
==============================================================================
--- lxml/trunk/src/lxml/proxy.pxi (original)
+++ lxml/trunk/src/lxml/proxy.pxi Tue May 20 00:00:11 2008
@@ -5,7 +5,7 @@
# the Python class
cdef inline _Element getProxy(xmlNode* c_node):
- """Get a proxy for a given node.
+ u"""Get a proxy for a given node.
"""
#print "getProxy for:", c_node
if c_node is not NULL and c_node._private is not NULL:
@@ -17,7 +17,7 @@
return c_node._private is not NULL
cdef inline int _registerProxy(_Element proxy) except -1:
- """Register a proxy and type for the node it's proxying for.
+ u"""Register a proxy and type for the node it's proxying for.
"""
cdef xmlNode* c_node
# cannot register for NULL
@@ -25,29 +25,29 @@
if c_node is NULL:
return 0
#print "registering for:", proxy._c_node
- assert c_node._private is NULL, "double registering proxy!"
+ assert c_node._private is NULL, u"double registering proxy!"
c_node._private = proxy
# additional INCREF to make sure _Document is GC-ed LAST!
proxy._gc_doc = proxy._doc
python.Py_INCREF(proxy._doc)
cdef inline int _unregisterProxy(_Element proxy) except -1:
- """Unregister a proxy for the node it's proxying for.
+ u"""Unregister a proxy for the node it's proxying for.
"""
cdef xmlNode* c_node
c_node = proxy._c_node
- assert c_node._private is proxy, "Tried to unregister unknown proxy"
+ assert c_node._private is proxy, u"Tried to unregister unknown proxy"
c_node._private = NULL
return 0
cdef inline void _releaseProxy(_Element proxy):
- """An additional DECREF for the document.
+ u"""An additional DECREF for the document.
"""
python.Py_XDECREF(proxy._gc_doc)
proxy._gc_doc = NULL
cdef inline void _updateProxyDocument(xmlNode* c_node, _Document doc):
- """Replace the document reference of a proxy.
+ u"""Replace the document reference of a proxy.
This may deallocate the original document of the proxy!
"""
@@ -116,7 +116,7 @@
tree.xmlFreeDoc(c_doc)
cdef _Element _fakeDocElementFactory(_Document doc, xmlNode* c_element):
- """Special element factory for cases where we need to create a fake
+ u"""Special element factory for cases where we need to create a fake
root document, but still need to instantiate arbitrary nodes from
it. If we instantiate the fake root node, things will turn bad
when it's destroyed.
@@ -192,7 +192,7 @@
# fix _Document references and namespaces when a node changes documents
cdef void _copyParentNamespaces(xmlNode* c_from_node, xmlNode* c_to_node):
- """Copy the namespaces of all ancestors of c_from_node to c_to_node.
+ u"""Copy the namespaces of all ancestors of c_from_node to c_to_node.
"""
cdef xmlNode* c_parent
cdef xmlNs* c_ns
@@ -245,7 +245,7 @@
cdef int _stripRedundantNamespaceDeclarations(
xmlNode* c_element, _nscache* c_ns_cache, xmlNs** c_del_ns_list) except -1:
- """Removes namespace declarations from an element that are already
+ u"""Removes namespace declarations from an element that are already
defined in its parents. Does not free the xmlNs's, just prepends
them to the c_del_ns_list.
"""
@@ -276,7 +276,7 @@
cdef int moveNodeToDocument(_Document doc, xmlDoc* c_source_doc,
xmlNode* c_element) except -1:
- """Fix the xmlNs pointers of a node and its subtree that were moved.
+ u"""Fix the xmlNs pointers of a node and its subtree that were moved.
Mainly copied from libxml2's xmlReconciliateNs(). Expects libxml2 doc
pointers of node to be correct already, but fixes _Document references.
Modified: lxml/trunk/src/lxml/public-api.pxi
==============================================================================
--- lxml/trunk/src/lxml/public-api.pxi (original)
+++ lxml/trunk/src/lxml/public-api.pxi Tue May 20 00:00:11 2008
@@ -1,7 +1,7 @@
# Public C API for lxml.etree
cdef public api _Element deepcopyNodeToDocument(_Document doc, xmlNode* c_root):
- "Recursively copy the element into the document. doc is not modified."
+ u"Recursively copy the element into the document. doc is not modified."
cdef xmlNode* c_node
c_node = _copyNodeToDoc(c_root, doc._c_doc)
return _elementFactory(doc, c_node)
Modified: lxml/trunk/src/lxml/readonlytree.pxi
==============================================================================
--- lxml/trunk/src/lxml/readonlytree.pxi (original)
+++ lxml/trunk/src/lxml/readonlytree.pxi Tue May 20 00:00:11 2008
@@ -1,32 +1,32 @@
# read-only tree implementation
cdef class _ReadOnlyElementProxy:
- "The main read-only Element proxy class (for internal use only!)."
+ u"The main read-only Element proxy class (for internal use only!)."
cdef bint _free_after_use
cdef xmlNode* _c_node
cdef object _source_proxy
cdef object _dependent_proxies
cdef int _assertNode(self) except -1:
- """This is our way of saying: this proxy is invalid!
+ u"""This is our way of saying: this proxy is invalid!
"""
- assert self._c_node is not NULL, "Proxy invalidated!"
+ assert self._c_node is not NULL, u"Proxy invalidated!"
return 0
cdef void free_after_use(self):
- """Should the xmlNode* be freed when releasing the proxy?
+ u"""Should the xmlNode* be freed when releasing the proxy?
"""
self._free_after_use = 1
property tag:
- """Element tag
+ u"""Element tag
"""
def __get__(self):
self._assertNode()
return _namespacedName(self._c_node)
property text:
- """Text before the first subelement. This is either a string or
+ u"""Text before the first subelement. This is either a string or
the value None, if there was no text.
"""
def __get__(self):
@@ -34,7 +34,7 @@
return _collectText(self._c_node.children)
property tail:
- """Text after this element's end tag, but before the next sibling
+ u"""Text after this element's end tag, but before the next sibling
element's start tag. This is either a string or the value None, if
there was no text.
"""
@@ -48,7 +48,7 @@
return dict(_collectAttributes(self._c_node, 3))
property prefix:
- """Namespace prefix or None.
+ u"""Namespace prefix or None.
"""
def __get__(self):
self._assertNode()
@@ -58,7 +58,7 @@
return None
property sourceline:
- """Original line number as found by the parser or None if unknown.
+ u"""Original line number as found by the parser or None if unknown.
"""
def __get__(self):
cdef long line
@@ -70,19 +70,19 @@
return None
def __repr__(self):
- return "" % (self.tag, id(self))
+ return u"" % (self.tag, id(self))
def __getitem__(self, Py_ssize_t index):
- """Returns the subelement at the given position.
+ u"""Returns the subelement at the given position.
"""
cdef xmlNode* c_node
c_node = _findChild(self._c_node, index)
if c_node is NULL:
- raise IndexError, "list index out of range"
+ raise IndexError, u"list index out of range"
return _newReadOnlyProxy(self._source_proxy, c_node)
def __getslice__(self, Py_ssize_t start, Py_ssize_t stop):
- """Returns a list containing subelements in the given range.
+ u"""Returns a list containing subelements in the given range.
"""
cdef xmlNode* c_node
cdef Py_ssize_t c
@@ -100,7 +100,7 @@
return result
def __len__(self):
- """Returns the number of subelements.
+ u"""Returns the number of subelements.
"""
cdef Py_ssize_t c
cdef xmlNode* c_node
@@ -120,11 +120,11 @@
return c_node != NULL
def __deepcopy__(self, memo):
- "__deepcopy__(self, memo)"
+ u"__deepcopy__(self, memo)"
return self.__copy__()
def __copy__(self):
- "__copy__(self)"
+ u"__copy__(self)"
cdef xmlDoc* c_doc
cdef xmlNode* c_node
cdef _Document new_doc
@@ -145,7 +145,7 @@
return iter(self.getchildren())
def iterchildren(self, tag=None, *, reversed=False):
- """iterchildren(self, tag=None, reversed=False)
+ u"""iterchildren(self, tag=None, reversed=False)
Iterate over the children of this element.
"""
@@ -157,34 +157,34 @@
return iter(children)
def get(self, key, default=None):
- """Gets an element attribute.
+ u"""Gets an element attribute.
"""
self._assertNode()
return _getNodeAttributeValue(self._c_node, key, default)
def keys(self):
- """Gets a list of attribute names. The names are returned in an
+ u"""Gets a list of attribute names. The names are returned in an
arbitrary order (just like for an ordinary Python dictionary).
"""
self._assertNode()
return _collectAttributes(self._c_node, 1)
def values(self):
- """Gets element attributes, as a sequence. The attributes are returned
+ u"""Gets element attributes, as a sequence. The attributes are returned
in an arbitrary order.
"""
self._assertNode()
return _collectAttributes(self._c_node, 2)
def items(self):
- """Gets element attributes, as a sequence. The attributes are returned
+ u"""Gets element attributes, as a sequence. The attributes are returned
in an arbitrary order.
"""
self._assertNode()
return _collectAttributes(self._c_node, 3)
cpdef getchildren(self):
- """Returns all subelements. The elements are returned in document
+ u"""Returns all subelements. The elements are returned in document
order.
"""
cdef xmlNode* c_node
@@ -199,7 +199,7 @@
return result
def getparent(self):
- """Returns the parent of this element or None for the root element.
+ u"""Returns the parent of this element or None for the root element.
"""
cdef xmlNode* c_parent
self._assertNode()
@@ -210,7 +210,7 @@
return _newReadOnlyProxy(self._source_proxy, c_parent)
def getnext(self):
- """Returns the following sibling of this element or None.
+ u"""Returns the following sibling of this element or None.
"""
cdef xmlNode* c_node
self._assertNode()
@@ -220,7 +220,7 @@
return None
def getprevious(self):
- """Returns the preceding sibling of this element or None.
+ u"""Returns the preceding sibling of this element or None.
"""
cdef xmlNode* c_node
self._assertNode()
@@ -267,11 +267,11 @@
del sourceProxy._dependent_proxies[:]
cdef class _AppendOnlyElementProxy(_ReadOnlyElementProxy):
- """A read-only element that allows adding children and changing the
+ u"""A read-only element that allows adding children and changing the
text content (i.e. everything that adds to the subtree).
"""
cpdef append(self, other_element):
- """Append a copy of an Element to the list of children.
+ u"""Append a copy of an Element to the list of children.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
@@ -283,7 +283,7 @@
_moveTail(c_next, c_node)
def extend(self, elements):
- """Append a copy of all Elements from a sequence to the list of
+ u"""Append a copy of all Elements from a sequence to the list of
children.
"""
self._assertNode()
@@ -291,7 +291,7 @@
self.append(element)
property text:
- """Text before the first subelement. This is either a string or the
+ u"""Text before the first subelement. This is either a string or the
value None, if there was no text.
"""
def __get__(self):
@@ -320,8 +320,8 @@
elif isinstance(element, _ReadOnlyElementProxy):
c_node = (<_ReadOnlyElementProxy>element)._c_node
else:
- raise TypeError, "invalid value to append()"
+ raise TypeError, u"invalid value to append()"
if c_node is NULL:
- raise TypeError, "invalid element"
+ raise TypeError, u"invalid element"
return c_node
Modified: lxml/trunk/src/lxml/relaxng.pxi
==============================================================================
--- lxml/trunk/src/lxml/relaxng.pxi (original)
+++ lxml/trunk/src/lxml/relaxng.pxi Tue May 20 00:00:11 2008
@@ -2,17 +2,17 @@
cimport relaxng
class RelaxNGError(LxmlError):
- """Base class for RelaxNG errors.
+ u"""Base class for RelaxNG errors.
"""
pass
class RelaxNGParseError(RelaxNGError):
- """Error while parsing an XML document as RelaxNG.
+ u"""Error while parsing an XML document as RelaxNG.
"""
pass
class RelaxNGValidateError(RelaxNGError):
- """Error while validating an XML document with a RelaxNG schema.
+ u"""Error while validating an XML document with a RelaxNG schema.
"""
pass
@@ -20,7 +20,7 @@
# RelaxNG
cdef class RelaxNG(_Validator):
- """RelaxNG(self, etree=None, file=None)
+ u"""RelaxNG(self, etree=None, file=None)
Turn a document into a Relax NG validator.
Either pass a schema as Element or ElementTree, or pass a file or
@@ -47,7 +47,7 @@
if c_href is NULL or \
cstd.strcmp(c_href,
'http://relaxng.org/ns/structure/1.0') != 0:
- raise RelaxNGParseError, "Document is not Relax NG"
+ raise RelaxNGParseError, u"Document is not Relax NG"
self._error_log.connect()
fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node)
parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(fake_c_doc)
@@ -61,7 +61,7 @@
self._error_log.connect()
parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(doc._c_doc)
else:
- raise RelaxNGParseError, "No tree or file given"
+ raise RelaxNGParseError, u"No tree or file given"
if parser_ctxt is NULL:
self._error_log.disconnect()
@@ -69,7 +69,7 @@
_destroyFakeDoc(doc._c_doc, fake_c_doc)
raise RelaxNGParseError(
self._error_log._buildExceptionMessage(
- "Document is not parsable as Relax NG"),
+ u"Document is not parsable as Relax NG"),
self._error_log)
self._c_schema = relaxng.xmlRelaxNGParse(parser_ctxt)
self._error_log.disconnect()
@@ -83,7 +83,7 @@
_destroyFakeDoc(doc._c_doc, fake_c_doc)
raise RelaxNGParseError(
self._error_log._buildExceptionMessage(
- "Document is not valid Relax NG"),
+ u"Document is not valid Relax NG"),
self._error_log)
if fake_c_doc is not NULL:
_destroyFakeDoc(doc._c_doc, fake_c_doc)
@@ -92,7 +92,7 @@
relaxng.xmlRelaxNGFree(self._c_schema)
def __call__(self, etree):
- """__call__(self, etree)
+ u"""__call__(self, etree)
Validate doc using Relax NG.
@@ -122,7 +122,7 @@
self._error_log.disconnect()
if ret == -1:
raise RelaxNGValidateError(
- "Internal error in Relax NG validation",
+ u"Internal error in Relax NG validation",
self._error_log)
if ret == 0:
return True
Modified: lxml/trunk/src/lxml/saxparser.pxi
==============================================================================
--- lxml/trunk/src/lxml/saxparser.pxi (original)
+++ lxml/trunk/src/lxml/saxparser.pxi Tue May 20 00:00:11 2008
@@ -29,7 +29,7 @@
return None
cdef class _SaxParserContext(_ParserContext):
- """This class maps SAX2 events to method calls.
+ u"""This class maps SAX2 events to method calls.
"""
cdef _SaxParserTarget _target
cdef xmlparser.startElementNsSAX2Func _origSaxStart
@@ -46,7 +46,7 @@
self._target = target
cdef void _initParserContext(self, xmlparser.xmlParserCtxt* c_ctxt):
- "wrap original SAX2 callbacks"
+ u"wrap original SAX2 callbacks"
cdef xmlparser.xmlSAXHandler* sax
_ParserContext._initParserContext(self, c_ctxt)
sax = c_ctxt.sax
@@ -323,7 +323,7 @@
############################################################
cdef class TreeBuilder(_SaxParserTarget):
- """TreeBuilder(self, element_factory=None, parser=None)
+ u"""TreeBuilder(self, element_factory=None, parser=None)
Parser target that builds a tree.
The final tree is returned by the ``close()`` method.
@@ -353,10 +353,10 @@
if self._last is not None:
text = "".join(self._data)
if self._in_tail:
- assert self._last.tail is None, "internal error (tail)"
+ assert self._last.tail is None, u"internal error (tail)"
self._last.tail = text
else:
- assert self._last.text is None, "internal error (text)"
+ assert self._last.text is None, u"internal error (text)"
self._last.text = text
del self._data[:]
return 0
@@ -364,17 +364,17 @@
# Python level event handlers
def close(self):
- """close(self)
+ u"""close(self)
Flushes the builder buffers, and returns the toplevel document
element.
"""
- assert python.PyList_GET_SIZE(self._element_stack) == 0, "missing end tags"
- assert self._last is not None, "missing toplevel element"
+ assert python.PyList_GET_SIZE(self._element_stack) == 0, u"missing end tags"
+ assert self._last is not None, u"missing toplevel element"
return self._last
def data(self, data):
- """data(self, data)
+ u"""data(self, data)
Adds text to the current element. The value should be either an
8-bit string containing ASCII text, or a Unicode string.
@@ -382,7 +382,7 @@
self._handleSaxData(data)
def start(self, tag, attrs, nsmap=None):
- """start(self, tag, attrs, nsmap=None)
+ u"""start(self, tag, attrs, nsmap=None)
Opens a new element.
"""
@@ -391,23 +391,23 @@
return self._handleSaxStart(tag, attrs, nsmap)
def end(self, tag):
- """end(self, tag)
+ u"""end(self, tag)
Closes the current element.
"""
element = self._handleSaxEnd(tag)
assert self._last.tag == tag,\
- "end tag mismatch (expected %s, got %s)" % (
+ u"end tag mismatch (expected %s, got %s)" % (
self._last.tag, tag)
return element
def pi(self, target, data):
- """pi(self, target, data)
+ u"""pi(self, target, data)
"""
return self._handleSaxPi(target, data)
def comment(self, comment):
- """comment(self, comment)
+ u"""comment(self, comment)
"""
return self._handleSaxComment(comment)
Modified: lxml/trunk/src/lxml/schematron.pxi
==============================================================================
--- lxml/trunk/src/lxml/schematron.pxi (original)
+++ lxml/trunk/src/lxml/schematron.pxi Tue May 20 00:00:11 2008
@@ -1,64 +1,18 @@
# support for Schematron validation
cimport schematron
-'''
-Schematron
-----------
-
-Schematron is a less well known, but very powerful schema language. The main
-idea is to use the capabilities of XPath to put restrictions on the structure
-and the content of XML documents. Here is a simple example::
-
- >>> schematron = etree.Schematron(etree.XML("""
- ...
- ...
- ...
- ... Attribute
- ... is forbidden
- ...
- ...
- ...
- ...
- ... """))
-
- >>> xml = etree.XML("""
- ...
- ...
- ...
- ...
- ... """)
-
- >>> schematron.validate(xml)
- 0
-
- >>> xml = etree.XML("""
- ...
- ...
- ...
- ...
- ... """)
-
- >>> schematron.validate(xml)
- 1
-
-Schematron was added to libxml2 in version 2.6.21. As of version 2.6.27,
-however, Schematron lacks support for error reporting other than to stderr.
-It is therefore not possible to retrieve validation warnings and errors in
-lxml.
-'''
-
class SchematronError(LxmlError):
- """Base class of all Schematron errors.
+ u"""Base class of all Schematron errors.
"""
pass
class SchematronParseError(SchematronError):
- """Error while parsing an XML document as Schematron schema.
+ u"""Error while parsing an XML document as Schematron schema.
"""
pass
class SchematronValidateError(SchematronError):
- """Error while validating an XML document with a Schematron schema.
+ u"""Error while validating an XML document with a Schematron schema.
"""
pass
@@ -66,12 +20,53 @@
# Schematron
cdef class Schematron(_Validator):
- """Schematron(self, etree=None, file=None)
+ u"""Schematron(self, etree=None, file=None)
A Schematron validator.
Pass a root Element or an ElementTree to turn it into a validator.
Alternatively, pass a filename as keyword argument 'file' to parse from
the file system.
+
+ Schematron is a less well known, but very powerful schema language. The main
+ idea is to use the capabilities of XPath to put restrictions on the structure
+ and the content of XML documents. Here is a simple example::
+
+ >>> schematron = etree.Schematron(etree.XML('''
+ ...
+ ...
+ ...
+ ... Attribute
+ ... is forbidden
+ ...
+ ...
+ ...
+ ...
+ ... '''))
+
+ >>> xml = etree.XML('''
+ ...
+ ...
+ ...
+ ...
+ ... ''')
+
+ >>> schematron.validate(xml)
+ 0
+
+ >>> xml = etree.XML('''
+ ...
+ ...
+ ...
+ ...
+ ... ''')
+
+ >>> schematron.validate(xml)
+ 1
+
+ Schematron was added to libxml2 in version 2.6.21. Before version 2.6.32,
+ however, Schematron lacked support for error reporting other than to stderr.
+ This version is therefore required to retrieve validation warnings and
+ errors in lxml.
"""
cdef schematron.xmlSchematron* _c_schema
cdef xmlDoc* _c_schema_doc
@@ -86,7 +81,7 @@
_Validator.__init__(self)
if not config.ENABLE_SCHEMATRON:
raise SchematronError, \
- "lxml.etree was compiled without Schematron support."
+ u"lxml.etree was compiled without Schematron support."
if etree is not None:
doc = _documentOrRaise(etree)
root_node = _rootNodeOrRaise(etree)
@@ -103,7 +98,7 @@
self._error_log.connect()
parser_ctxt = schematron.xmlSchematronNewParserCtxt(_cstr(filename))
else:
- raise SchematronParseError, "No tree or file given"
+ raise SchematronParseError, u"No tree or file given"
if parser_ctxt is NULL:
self._error_log.disconnect()
@@ -119,7 +114,7 @@
schematron.xmlSchematronFreeParserCtxt(parser_ctxt)
if self._c_schema is NULL:
raise SchematronParseError(
- "Document is not a valid Schematron schema",
+ u"Document is not a valid Schematron schema",
self._error_log)
def __dealloc__(self):
@@ -131,7 +126,7 @@
tree.xmlFreeDoc(self._c_schema_doc)
def __call__(self, etree):
- """__call__(self, etree)
+ u"""__call__(self, etree)
Validate doc using Schematron.
@@ -173,7 +168,7 @@
if ret == -1:
raise SchematronValidateError(
- "Internal error in Schematron validation",
+ u"Internal error in Schematron validation",
self._error_log)
if ret == 0:
return True
Modified: lxml/trunk/src/lxml/serializer.pxi
==============================================================================
--- lxml/trunk/src/lxml/serializer.pxi (original)
+++ lxml/trunk/src/lxml/serializer.pxi Tue May 20 00:00:11 2008
@@ -9,13 +9,13 @@
if method is None:
return OUTPUT_METHOD_XML
method = method.lower()
- if method == "xml":
+ if method == u"xml":
return OUTPUT_METHOD_XML
- if method == "html":
+ if method == u"html":
return OUTPUT_METHOD_HTML
- if method == "text":
+ if method == u"text":
return OUTPUT_METHOD_TEXT
- raise ValueError, "unknown output method %r" % method
+ raise ValueError, u"unknown output method %r" % method
cdef _textToString(xmlNode* c_node, encoding, bint with_tail):
cdef bint needs_conversion
@@ -42,8 +42,8 @@
needs_conversion = 1
elif encoding is not None:
encoding = encoding.upper()
- if encoding != 'UTF-8':
- if encoding == 'ASCII':
+ if encoding != u'UTF-8':
+ if encoding == u'ASCII':
if isutf8(c_text):
# will raise a decode error below
needs_conversion = 1
@@ -66,7 +66,7 @@
cdef _tostring(_Element element, encoding, method,
bint write_xml_declaration, bint write_complete_document,
bint pretty_print, bint with_tail):
- """Serialize an element to an encoded string representation of its XML
+ u"""Serialize an element to an encoded string representation of its XML
tree.
"""
cdef tree.xmlOutputBuffer* c_buffer
@@ -89,8 +89,9 @@
# encoding during output
enchandler = tree.xmlFindCharEncodingHandler(c_enc)
if enchandler is NULL and c_enc is not NULL:
- raise LookupError, python.PyString_FromFormat(
- "unknown encoding: '%s'", c_enc)
+ if encoding is not None:
+ encoding = encoding.decode(u'UTF-8')
+ raise LookupError, u"unknown encoding: '%s'" % encoding
c_buffer = tree.xmlAllocOutputBuffer(enchandler)
if c_buffer is NULL:
tree.xmlCharEncCloseFunc(enchandler)
@@ -216,7 +217,7 @@
cdef void _writeTail(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node,
char* encoding, bint pretty_print) nogil:
- "Write the element tail."
+ u"Write the element tail."
c_node = c_node.next
while c_node is not NULL and c_node.type == tree.XML_TEXT_NODE:
tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0,
@@ -278,13 +279,13 @@
_writeFilelikeWriter, _closeFilelikeWriter,
self, enchandler)
if c_buffer is NULL:
- raise IOError, "Could not create I/O writer context."
+ raise IOError, u"Could not create I/O writer context."
return c_buffer
cdef int write(self, char* c_buffer, int size):
try:
if self._filelike is None:
- raise IOError, "File is already closed"
+ raise IOError, u"File is already closed"
py_buffer = python.PyString_FromStringAndSize(c_buffer, size)
self._filelike.write(py_buffer)
return size
@@ -320,7 +321,7 @@
if c_method == OUTPUT_METHOD_TEXT:
if _isString(f):
filename8 = _encodeFilename(f)
- f = open(filename8, 'wb')
+ f = open(filename8, u'wb')
f.write(_textToString(element._c_node, encoding, with_tail))
f.close()
else:
@@ -328,8 +329,9 @@
return
enchandler = tree.xmlFindCharEncodingHandler(c_enc)
if enchandler is NULL:
- raise LookupError, python.PyString_FromFormat(
- "unknown encoding: '%s'", c_enc)
+ if encoding is not None:
+ encoding = encoding.decode(u'UTF-8')
+ raise LookupError, u"unknown encoding: '%s'" % encoding
if _isString(f):
filename8 = _encodeFilename(f)
@@ -338,13 +340,13 @@
if c_buffer is NULL:
return python.PyErr_SetFromErrno(IOError)
state = python.PyEval_SaveThread()
- elif hasattr(f, 'write'):
+ elif hasattr(f, u'write'):
writer = _FilelikeWriter(f)
c_buffer = writer._createOutputBuffer(enchandler)
else:
tree.xmlCharEncCloseFunc(enchandler)
raise TypeError, \
- "File or filename expected, got '%s'" % python._fqtypename(f)
+ u"File or filename expected, got '%s'" % funicode(python._fqtypename(f))
_writeNodeToBuffer(c_buffer, element._c_node, c_enc, c_method,
write_xml_declaration, write_doctype,
@@ -372,7 +374,7 @@
with nogil:
bytes = c14n.xmlC14NDocSave(c_doc, NULL, 0, NULL, 1,
c_filename, 0)
- elif hasattr(f, 'write'):
+ elif hasattr(f, u'write'):
writer = _FilelikeWriter(f)
c_buffer = writer._createOutputBuffer(NULL)
writer.error_log.connect()
@@ -381,7 +383,7 @@
tree.xmlOutputBufferClose(c_buffer)
else:
raise TypeError, \
- "File or filename expected, got '%s'" % python._fqtypename(f)
+ u"File or filename expected, got '%s'" % funicode(python._fqtypename(f))
finally:
_destroyFakeDoc(c_base_doc, c_doc)
@@ -389,7 +391,7 @@
writer._exc_context._raise_if_stored()
if bytes < 0:
- message = "C14N failed"
+ message = u"C14N failed"
if writer is not None:
errors = writer.error_log
if len(errors):
@@ -403,7 +405,7 @@
cdef cstd.FILE* c_file
c_file = python.PyFile_AsFile(f)
if c_file is NULL:
- raise ValueError, "not a file"
+ raise ValueError, u"not a file"
c_buffer = tree.xmlOutputBufferCreateFile(c_file, NULL)
tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0, pretty_print, NULL)
if with_tail:
Modified: lxml/trunk/src/lxml/xinclude.pxi
==============================================================================
--- lxml/trunk/src/lxml/xinclude.pxi (original)
+++ lxml/trunk/src/lxml/xinclude.pxi Tue May 20 00:00:11 2008
@@ -3,12 +3,12 @@
cimport xinclude
class XIncludeError(LxmlError):
- """Error during XInclude processing.
+ u"""Error during XInclude processing.
"""
pass
cdef class XInclude:
- """XInclude(self)
+ u"""XInclude(self)
XInclude processor.
Create an instance and call it on an Element to run XInclude
@@ -23,7 +23,7 @@
return self._error_log.copy()
def __call__(self, _Element node not None):
- "__call__(self, node)"
+ u"__call__(self, node)"
# We cannot pass the XML_PARSE_NOXINCNODE option as this would free
# the XInclude nodes - there may still be Python references to them!
# Therefore, we allow XInclude nodes to be converted to
@@ -43,5 +43,5 @@
if result == -1:
raise XIncludeError(
self._error_log._buildExceptionMessage(
- "XInclude processing failed"),
+ u"XInclude processing failed"),
self._error_log)
Modified: lxml/trunk/src/lxml/xmlerror.pxi
==============================================================================
--- lxml/trunk/src/lxml/xmlerror.pxi (original)
+++ lxml/trunk/src/lxml/xmlerror.pxi Tue May 20 00:00:11 2008
@@ -5,7 +5,7 @@
# module level API functions
def clear_error_log():
- """clear_error_log()
+ u"""clear_error_log()
Clear the global error log. Note that this log is already bound to a
fixed size.
@@ -70,13 +70,13 @@
self.filename = filename
def __repr__(self):
- return "%s:%d:%d:%s:%s:%s: %s" % (
+ return u"%s:%d:%d:%s:%s:%s: %s" % (
self.filename, self.line, self.column, self.level_name,
self.domain_name, self.type_name, self.message)
property domain_name:
def __get__(self):
- return ErrorDomains._getName(self.domain, "unknown")
+ return ErrorDomains._getName(self.domain, u"unknown")
property type_name:
def __get__(self):
@@ -84,11 +84,11 @@
getName = RelaxNGErrorTypes._getName
else:
getName = ErrorTypes._getName
- return getName(self.type, "unknown")
+ return getName(self.type, u"unknown")
property level_name:
def __get__(self):
- return ErrorLevels._getName(self.level, "unknown")
+ return ErrorLevels._getName(self.level, u"unknown")
cdef class _BaseErrorLog:
cdef _LogEntry _first_error
@@ -151,9 +151,9 @@
column = self._first_error.column
if line > 0:
if column > 0:
- message = "%s, line %d, column %d" % (message, line, column)
+ message = u"%s, line %d, column %d" % (message, line, column)
else:
- message = "%s, line %d" % (message, line)
+ message = u"%s, line %d" % (message, line)
return exctype(message, code, line, column)
cdef _buildExceptionMessage(self, default_message):
@@ -167,14 +167,14 @@
message = default_message
if self._first_error.line > 0:
if self._first_error.column > 0:
- message = "%s, line %d, column %d" % (
+ message = u"%s, line %d, column %d" % (
message, self._first_error.line, self._first_error.column)
else:
- message = "%s, line %d" % (message, self._first_error.line)
+ message = u"%s, line %d" % (message, self._first_error.line)
return message
cdef class _ListErrorLog(_BaseErrorLog):
- "Immutable base version of a list based error log."
+ u"Immutable base version of a list based error log."
cdef object _entries
def __init__(self, entries, first_error, last_error):
if entries:
@@ -186,7 +186,7 @@
self._entries = entries
def copy(self):
- """Creates a shallow copy of this error log. Reuses the list of
+ u"""Creates a shallow copy of this error log. Reuses the list of
entries.
"""
return _ListErrorLog(self._entries, self._first_error, self.last_error)
@@ -218,7 +218,7 @@
return result
def filter_domains(self, domains):
- """Filter the errors by the given domains and return a new error log
+ u"""Filter the errors by the given domains and return a new error log
containing the matches.
"""
cdef _LogEntry entry
@@ -231,7 +231,7 @@
return _ListErrorLog(filtered, None, None)
def filter_types(self, types):
- """filter_types(self, types)
+ u"""filter_types(self, types)
Filter the errors by the given types and return a new error
log containing the matches.
@@ -246,7 +246,7 @@
return _ListErrorLog(filtered, None, None)
def filter_levels(self, levels):
- """filter_levels(self, levels)
+ u"""filter_levels(self, levels)
Filter the errors by the given error levels and return a new
error log containing the matches.
@@ -261,7 +261,7 @@
return _ListErrorLog(filtered, None, None)
def filter_from_level(self, level):
- """filter_from_level(self, level)
+ u"""filter_from_level(self, level)
Return a log with all messages of the requested level of worse.
"""
@@ -273,21 +273,21 @@
return _ListErrorLog(filtered, None, None)
def filter_from_fatals(self):
- """filter_from_fatals(self)
+ u"""filter_from_fatals(self)
Convenience method to get all fatal error messages.
"""
return self.filter_from_level(ErrorLevels.FATAL)
def filter_from_errors(self):
- """filter_from_errors(self)
+ u"""filter_from_errors(self)
Convenience method to get all error messages or worse.
"""
return self.filter_from_level(ErrorLevels.ERROR)
def filter_from_warnings(self):
- """filter_from_warnings(self)
+ u"""filter_from_warnings(self)
Convenience method to get all warnings or worse.
"""
@@ -310,7 +310,7 @@
del self._entries[:]
def copy(self):
- """Creates a shallow copy of this error log and the list of entries.
+ u"""Creates a shallow copy of this error log and the list of entries.
"""
return _ListErrorLog(self._entries[:], self._first_error,
self.last_error)
@@ -345,7 +345,7 @@
python.PyList_Append(entries, entry)
cdef class PyErrorLog(_BaseErrorLog):
- """PyErrorLog(self, logger_name=None)
+ u"""PyErrorLog(self, logger_name=None)
A global error log that connects to the Python stdlib logging package.
The constructor accepts an optional logger name.
@@ -382,7 +382,7 @@
self._log = logger.log
def copy(self):
- """Dummy method that returns an empty error log.
+ u"""Dummy method that returns an empty error log.
"""
return _ListErrorLog([], None, None)
@@ -400,11 +400,11 @@
__GLOBAL_ERROR_LOG = _RotatingErrorLog(__MAX_LOG_SIZE)
cdef __copyGlobalErrorLog():
- "Helper function for properties in exceptions."
+ u"Helper function for properties in exceptions."
return __GLOBAL_ERROR_LOG.copy()
def use_global_python_log(PyErrorLog log not None):
- """use_global_python_log(log)
+ u"""use_global_python_log(log)
Replace the global error log by an etree.PyErrorLog that uses the
standard Python logging package.
@@ -506,8 +506,8 @@
################################################################################
cdef void __initErrorConstants():
- "Called at setup time to parse the constants and build the classes below."
- find_constants = re.compile(r"\s*([a-zA-Z0-9_]+)\s*=\s*([0-9]+)").findall
+ u"Called at setup time to parse the constants and build the classes below."
+ find_constants = re.compile(ur"\s*([a-zA-Z0-9_]+)\s*=\s*([0-9]+)").findall
const_defs = ((ErrorLevels, __ERROR_LEVELS),
(ErrorDomains, __ERROR_DOMAINS),
(ErrorTypes, __PARSER_ERROR_TYPES),
@@ -519,22 +519,22 @@
for constants in constant_tuple:
#print len(constants) + 1
for name, value in find_constants(constants):
- value = python.PyNumber_Int(value)
+ value = int(value)
python.PyObject_SetAttr(cls, name, value)
python.PyDict_SetItem(reverse_dict, value, name)
class ErrorLevels:
- "Libxml2 error levels"
+ u"Libxml2 error levels"
class ErrorDomains:
- "Libxml2 error domains"
+ u"Libxml2 error domains"
class ErrorTypes:
- "Libxml2 error types"
+ u"Libxml2 error types"
class RelaxNGErrorTypes:
- "Libxml2 RelaxNG error types"
+ u"Libxml2 RelaxNG error types"
# --- BEGIN: GENERATED CONSTANTS ---
@@ -547,7 +547,7 @@
# cannot handle strings that are a few thousand bytes in length.
cdef object __ERROR_LEVELS
-__ERROR_LEVELS = ("""\
+__ERROR_LEVELS = (u"""\
NONE=0
WARNING=1
ERROR=2
@@ -555,7 +555,7 @@
""",)
cdef object __ERROR_DOMAINS
-__ERROR_DOMAINS = ("""\
+__ERROR_DOMAINS = (u"""\
NONE=0
PARSER=1
TREE=2
@@ -588,7 +588,7 @@
""",)
cdef object __PARSER_ERROR_TYPES
-__PARSER_ERROR_TYPES = ("""\
+__PARSER_ERROR_TYPES = (u"""\
ERR_OK=0
ERR_INTERNAL_ERROR=1
ERR_NO_MEMORY=2
@@ -669,7 +669,7 @@
ERR_TAG_NOT_FINISHED=77
ERR_STANDALONE_VALUE=78
""",
-"""\
+u"""\
ERR_ENCODING_NAME=79
ERR_HYPHEN_IN_COMMENT=80
ERR_INVALID_ENCODING=81
@@ -756,7 +756,7 @@
RNGP_CHOICE_CONTENT=1006
RNGP_CHOICE_EMPTY=1007
""",
-"""\
+u"""\
RNGP_CREATE_FAILURE=1008
RNGP_DATA_CONTENT=1009
RNGP_DEF_CHOICE_AND_INTERLEAVE=1010
@@ -829,7 +829,7 @@
RNGP_PAT_DATA_EXCEPT_TEXT=1077
RNGP_PAT_LIST_ATTR=1078
""",
-"""\
+u"""\
RNGP_PAT_LIST_ELEM=1079
RNGP_PAT_LIST_INTERLEAVE=1080
RNGP_PAT_LIST_LIST=1081
@@ -905,7 +905,7 @@
SAVE_NO_DOCTYPE=1402
SAVE_UNKNOWN_ENCODING=1403
""",
-"""\
+u"""\
REGEXP_COMPILE_ERROR=1450
IO_UNKNOWN=1500
IO_EACCES=1501
@@ -999,7 +999,7 @@
SCHEMAP_FACET_NO_VALUE=1708
SCHEMAP_FAILED_BUILD_IMPORT=1709
""",
-"""\
+u"""\
SCHEMAP_GROUP_NONAME_NOREF=1710
SCHEMAP_IMPORT_NAMESPACE_NOT_URI=1711
SCHEMAP_IMPORT_REDEFINE_NSNAME=1712
@@ -1062,7 +1062,7 @@
SCHEMAP_UNKNOWN_INCLUDE_CHILD=1769
SCHEMAP_INCLUDE_SCHEMA_NOT_URI=1770
""",
-"""\
+u"""\
SCHEMAP_INCLUDE_SCHEMA_NO_URI=1771
SCHEMAP_NOT_SCHEMA=1772
SCHEMAP_UNKNOWN_MEMBER_TYPE=1773
@@ -1127,7 +1127,7 @@
SCHEMAV_CVC_MAXLENGTH_VALID=1832
SCHEMAV_CVC_MININCLUSIVE_VALID=1833
""",
-"""\
+u"""\
SCHEMAV_CVC_MAXINCLUSIVE_VALID=1834
SCHEMAV_CVC_MINEXCLUSIVE_VALID=1835
SCHEMAV_CVC_MAXEXCLUSIVE_VALID=1836
@@ -1198,7 +1198,7 @@
SCHEMAP_SRC_RESOLVE=3004
SCHEMAP_SRC_RESTRICTION_BASE_OR_SIMPLETYPE=3005
""",
-"""\
+u"""\
SCHEMAP_SRC_LIST_ITEMTYPE_OR_SIMPLETYPE=3006
SCHEMAP_SRC_UNION_MEMBERTYPES_OR_SIMPLETYPES=3007
SCHEMAP_ST_PROPS_CORRECT_1=3008
@@ -1259,7 +1259,7 @@
SCHEMAP_COS_CT_EXTENDS_1_1=3063
SCHEMAP_SRC_IMPORT_1_1=3064
""",
-"""\
+u"""\
SCHEMAP_SRC_IMPORT_1_2=3065
SCHEMAP_SRC_IMPORT_2=3066
SCHEMAP_SRC_IMPORT_2_1=3067
@@ -1339,7 +1339,7 @@
""",)
cdef object __RELAXNG_ERROR_TYPES
-__RELAXNG_ERROR_TYPES = ("""\
+__RELAXNG_ERROR_TYPES = (u"""\
RELAXNG_OK=0
RELAXNG_ERR_MEMORY=1
RELAXNG_ERR_TYPE=2
Modified: lxml/trunk/src/lxml/xmlid.pxi
==============================================================================
--- lxml/trunk/src/lxml/xmlid.pxi (original)
+++ lxml/trunk/src/lxml/xmlid.pxi Tue May 20 00:00:11 2008
@@ -1,7 +1,7 @@
cdef object _find_id_attributes
def XMLID(text):
- """XMLID(text)
+ u"""XMLID(text)
Parse the text and return a tuple (root node, ID dictionary). The root
node is the same as returned by the XML() function. The dictionary
@@ -11,17 +11,17 @@
"""
global _find_id_attributes
if _find_id_attributes is None:
- _find_id_attributes = XPath('//*[string(@id)]')
+ _find_id_attributes = XPath(u'//*[string(@id)]')
# ElementTree compatible implementation: parse and look for 'id' attributes
root = XML(text)
dic = {}
for elem in _find_id_attributes(root):
- python.PyDict_SetItem(dic, elem.get('id'), elem)
+ python.PyDict_SetItem(dic, elem.get(u'id'), elem)
return (root, dic)
def XMLDTDID(text):
- """XMLDTDID(text)
+ u"""XMLDTDID(text)
Parse the text and return a tuple (root node, ID dictionary). The root
node is the same as returned by the XML() function. The dictionary
@@ -41,7 +41,7 @@
return (root, _IDDict(root))
def parseid(source, parser=None, *, base_url=None):
- """parseid(source, parser=None)
+ u"""parseid(source, parser=None)
Parses the source into a tuple containing an ElementTree object and an
ID dictionary. If no parser is provided as second argument, the default
@@ -55,7 +55,7 @@
return (_elementTreeFactory(doc, None), _IDDict(doc))
cdef class _IDDict:
- """IDDict(self, etree)
+ u"""IDDict(self, etree)
A dictionary-like proxy class that mapps ID attributes to elements.
The dictionary must be instantiated with the root element of a parsed XML
@@ -69,7 +69,7 @@
cdef _Document doc
doc = _documentOrRaise(etree)
if doc._c_doc.ids is NULL:
- raise ValueError, "No ID dictionary available."
+ raise ValueError, u"No ID dictionary available."
self._doc = doc
self._keys = None
self._items = None
@@ -85,10 +85,10 @@
id_utf = _utf8(id_name)
c_id = tree.xmlHashLookup(c_ids, _cstr(id_utf))
if c_id is NULL:
- raise KeyError, "key not found."
+ raise KeyError, u"key not found."
c_attr = c_id.attr
if c_attr is NULL or c_attr.parent is NULL:
- raise KeyError, "ID attribute not found."
+ raise KeyError, u"ID attribute not found."
return _elementFactory(self._doc, c_attr.parent)
def get(self, id_name):
Modified: lxml/trunk/src/lxml/xmlschema.pxi
==============================================================================
--- lxml/trunk/src/lxml/xmlschema.pxi (original)
+++ lxml/trunk/src/lxml/xmlschema.pxi Tue May 20 00:00:11 2008
@@ -2,17 +2,17 @@
cimport xmlschema
class XMLSchemaError(LxmlError):
- """Base class of all XML Schema errors
+ u"""Base class of all XML Schema errors
"""
pass
class XMLSchemaParseError(XMLSchemaError):
- """Error while parsing an XML document as XML Schema.
+ u"""Error while parsing an XML document as XML Schema.
"""
pass
class XMLSchemaValidateError(XMLSchemaError):
- """Error while validating an XML document with an XML Schema.
+ u"""Error while validating an XML document with an XML Schema.
"""
pass
@@ -20,7 +20,7 @@
# XMLSchema
cdef class XMLSchema(_Validator):
- """XMLSchema(self, etree=None, file=None)
+ u"""XMLSchema(self, etree=None, file=None)
Turn a document into an XML Schema validator.
Either pass a schema as Element or ElementTree, or pass a file or
@@ -47,7 +47,7 @@
c_href = _getNs(c_node)
if c_href is NULL or \
cstd.strcmp(c_href, 'http://www.w3.org/2001/XMLSchema') != 0:
- raise XMLSchemaParseError, "Document is not XML Schema"
+ raise XMLSchemaParseError, u"Document is not XML Schema"
fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node)
self._error_log.connect()
@@ -62,7 +62,7 @@
self._error_log.connect()
parser_ctxt = xmlschema.xmlSchemaNewDocParserCtxt(doc._c_doc)
else:
- raise XMLSchemaParseError, "No tree or file given"
+ raise XMLSchemaParseError, u"No tree or file given"
if parser_ctxt is not NULL:
self._c_schema = xmlschema.xmlSchemaParse(parser_ctxt)
@@ -77,14 +77,14 @@
if self._c_schema is NULL:
raise XMLSchemaParseError(
self._error_log._buildExceptionMessage(
- "Document is not valid XML Schema"),
+ u"Document is not valid XML Schema"),
self._error_log)
def __dealloc__(self):
xmlschema.xmlSchemaFree(self._c_schema)
def __call__(self, etree):
- """__call__(self, etree)
+ u"""__call__(self, etree)
Validate doc using XML Schema.
@@ -115,7 +115,7 @@
self._error_log.disconnect()
if ret == -1:
raise XMLSchemaValidateError(
- "Internal error in XML Schema validation.",
+ u"Internal error in XML Schema validation.",
self._error_log)
if ret == 0:
return True
Modified: lxml/trunk/src/lxml/xpath.pxi
==============================================================================
--- lxml/trunk/src/lxml/xpath.pxi (original)
+++ lxml/trunk/src/lxml/xpath.pxi Tue May 20 00:00:11 2008
@@ -108,8 +108,8 @@
if _XPATH_VERSION_WARNING_REQUIRED:
_XPATH_VERSION_WARNING_REQUIRED = 0
import warnings
- warnings.warn("This version of libxml2 has a known XPath bug. " + \
- "Use it at your own risk.")
+ warnings.warn(u"This version of libxml2 has a known XPath bug. " + \
+ u"Use it at your own risk.")
self._error_log = _ErrorLog()
self._context = _XPathContext(namespaces, extensions,
enable_regexp, None)
@@ -127,7 +127,7 @@
self._context.set_context(xpathCtxt)
def evaluate(self, _eval_arg, **_variables):
- """evaluate(self, _eval_arg, **_variables)
+ u"""evaluate(self, _eval_arg, **_variables)
Evaluate an XPath expression.
@@ -158,7 +158,7 @@
result = python.PyThread_acquire_lock(
self._eval_lock, python.WAIT_LOCK)
if result == 0:
- raise ParserError, "parser locking failed"
+ raise ParserError, u"parser locking failed"
return 0
cdef void _unlock(self):
@@ -173,7 +173,7 @@
if message is not None:
raise XPathSyntaxError(message, self._error_log)
raise XPathSyntaxError(self._error_log._buildExceptionMessage(
- "Error in xpath expression"),
+ u"Error in xpath expression"),
self._error_log)
cdef _raise_eval_error(self):
@@ -186,7 +186,7 @@
if message is not None:
raise XPathEvalError(message, self._error_log)
raise XPathEvalError(self._error_log._buildExceptionMessage(
- "Error in xpath expression"),
+ u"Error in xpath expression"),
self._error_log)
cdef object _handle_result(self, xpath.xmlXPathObject* xpathObj, _Document doc):
@@ -211,7 +211,7 @@
cdef class XPathElementEvaluator(_XPathEvaluatorBase):
- """XPathElementEvaluator(self, element, namespaces=None, extensions=None, regexp=True)
+ u"""XPathElementEvaluator(self, element, namespaces=None, extensions=None, regexp=True)
Create an XPath evaluator for an element.
Absolute XPath expressions (starting with '/') will be evaluated against
@@ -236,18 +236,18 @@
self.set_context(xpathCtxt)
def register_namespace(self, prefix, uri):
- """Register a namespace with the XPath context.
+ u"""Register a namespace with the XPath context.
"""
self._context.addNamespace(prefix, uri)
def register_namespaces(self, namespaces):
- """Register a prefix -> uri dict.
+ u"""Register a prefix -> uri dict.
"""
for prefix, uri in namespaces.items():
self._context.addNamespace(prefix, uri)
def __call__(self, _path, **_variables):
- """__call__(self, _path, **_variables)
+ u"""__call__(self, _path, **_variables)
Evaluate an XPath expression on the document.
@@ -283,7 +283,7 @@
cdef class XPathDocumentEvaluator(XPathElementEvaluator):
- """XPathDocumentEvaluator(self, etree, namespaces=None, extensions=None, regexp=True)
+ u"""XPathDocumentEvaluator(self, etree, namespaces=None, extensions=None, regexp=True)
Create an XPath evaluator for an ElementTree.
Additional namespace declarations can be passed with the 'namespace'
@@ -297,7 +297,7 @@
extensions=extensions, regexp=regexp)
def __call__(self, _path, **_variables):
- """__call__(self, _path, **_variables)
+ u"""__call__(self, _path, **_variables)
Evaluate an XPath expression on the document.
@@ -337,7 +337,7 @@
def XPathEvaluator(etree_or_element, *, namespaces=None, extensions=None,
regexp=True):
- """XPathEvaluator(etree_or_element, namespaces=None, extensions=None, regexp=True)
+ u"""XPathEvaluator(etree_or_element, namespaces=None, extensions=None, regexp=True)
Creates an XPath evaluator for an ElementTree or an Element.
@@ -359,7 +359,7 @@
cdef class XPath(_XPathEvaluatorBase):
- """XPath(self, path, namespaces=None, extensions=None, regexp=True)
+ u"""XPath(self, path, namespaces=None, extensions=None, regexp=True)
A compiled XPath expression that can be called on Elements and ElementTrees.
Besides the XPath expression, you can pass prefix-namespace mappings and
@@ -386,7 +386,7 @@
self._raise_parse_error()
def __call__(self, _etree_or_element, **_variables):
- "__call__(self, _etree_or_element, **_variables)"
+ u"__call__(self, _etree_or_element, **_variables)"
cdef xpath.xmlXPathObject* xpathObj
cdef _Document document
cdef _Element element
@@ -427,7 +427,7 @@
_find_namespaces = re.compile('({[^}]+})').findall
cdef class ETXPath(XPath):
- """ETXPath(self, path, extensions=None, regexp=True)
+ u"""ETXPath(self, path, extensions=None, regexp=True)
Special XPath class that supports the ElementTree {uri} notation for namespaces.
Note that this class does not accept the ``namespace`` keyword
@@ -449,12 +449,15 @@
for namespace_def in _find_namespaces(stripped_path):
if namespace_def not in namespace_defs:
prefix = python.PyString_FromFormat("__xpp%02d", i)
- i = i+1
+ i += 1
python.PyList_Append(namespace_defs, namespace_def)
namespace = namespace_def[1:-1] # remove '{}'
namespace = python.PyUnicode_FromEncodedObject(
namespace, 'UTF-8', 'strict')
- python.PyDict_SetItem(namespaces, prefix, namespace)
+ python.PyDict_SetItem(
+ namespaces,
+ python.PyUnicode_FromEncodedObject(prefix, 'UTF-8', 'strict'),
+ namespace)
prefix_str = prefix + ':'
# FIXME: this also replaces {namespaces} within strings!
path_utf = path_utf.replace(namespace_def, prefix_str)
Modified: lxml/trunk/src/lxml/xslt.pxi
==============================================================================
--- lxml/trunk/src/lxml/xslt.pxi (original)
+++ lxml/trunk/src/lxml/xslt.pxi Tue May 20 00:00:11 2008
@@ -3,27 +3,27 @@
cimport xslt
class XSLTError(LxmlError):
- """Base class of all XSLT errors.
+ u"""Base class of all XSLT errors.
"""
pass
class XSLTParseError(XSLTError):
- """Error parsing a stylesheet document.
+ u"""Error parsing a stylesheet document.
"""
pass
class XSLTApplyError(XSLTError):
- """Error running an XSL transformation.
+ u"""Error running an XSL transformation.
"""
pass
class XSLTSaveError(XSLTError):
- """Error serialising an XSLT result.
+ u"""Error serialising an XSLT result.
"""
pass
class XSLTExtensionError(XSLTError):
- """Error registering an XSLT extension.
+ u"""Error registering an XSLT extension.
"""
pass
@@ -116,7 +116,7 @@
cdef void _xslt_store_resolver_exception(char* c_uri, void* context,
xslt.xsltLoadType c_type) with gil:
- message = "Cannot resolve URI %s" % c_uri
+ message = u"Cannot resolve URI %s" % c_uri
if c_type == xslt.XSLT_LOAD_DOCUMENT:
exception = XSLTApplyError(message)
else:
@@ -167,7 +167,7 @@
# XSLT file/network access control
cdef class XSLTAccessControl:
- """XSLTAccessControl(self, read_file=True, write_file=True, create_dir=True, read_network=True, write_network=True)
+ u"""XSLTAccessControl(self, read_file=True, write_file=True, create_dir=True, read_network=True, write_network=True)
Access control for XSLT: reading/writing files, directories and
network I/O. Access to a type of resource is granted or denied by
@@ -223,14 +223,14 @@
xslt.xsltSetCtxtSecurityPrefs(self._prefs, ctxt)
property options:
- "The access control configuration as a map of options."
+ u"The access control configuration as a map of options."
def __get__(self):
return {
- 'read_file': self._optval(xslt.XSLT_SECPREF_READ_FILE),
- 'write_file': self._optval(xslt.XSLT_SECPREF_WRITE_FILE),
- 'create_dir': self._optval(xslt.XSLT_SECPREF_CREATE_DIRECTORY),
- 'read_network': self._optval(xslt.XSLT_SECPREF_READ_NETWORK),
- 'write_network': self._optval(xslt.XSLT_SECPREF_WRITE_NETWORK),
+ u'read_file': self._optval(xslt.XSLT_SECPREF_READ_FILE),
+ u'write_file': self._optval(xslt.XSLT_SECPREF_WRITE_FILE),
+ u'create_dir': self._optval(xslt.XSLT_SECPREF_CREATE_DIRECTORY),
+ u'read_network': self._optval(xslt.XSLT_SECPREF_READ_NETWORK),
+ u'write_network': self._optval(xslt.XSLT_SECPREF_WRITE_NETWORK),
}
cdef _optval(self, xslt.xsltSecurityOption option):
@@ -246,9 +246,9 @@
def __repr__(self):
items = self.options.items()
items.sort()
- return "%s(%s)" % (
- python._fqtypename(self).split('.')[-1],
- ', '.join(["%s=%r" % item for item in items]))
+ return u"%s(%s)" % (
+ funicode(python._fqtypename(self)).split(u'.')[-1],
+ u', '.join([u"%s=%r" % item for item in items]))
################################################################################
# XSLT
@@ -279,7 +279,7 @@
for ns_name_tuple, extension in extensions.items():
if ns_name_tuple[0] is None:
raise XSLTExtensionError, \
- "extensions must not have empty namespaces"
+ u"extensions must not have empty namespaces"
if isinstance(extension, XSLTExtension):
if self._extension_elements is EMPTY_READ_ONLY_DICT:
self._extension_elements = {}
@@ -317,7 +317,7 @@
cdef class XSLT:
- """XSLT(self, xslt_input, extensions=None, regexp=True, access_control=None)
+ u"""XSLT(self, xslt_input, extensions=None, regexp=True, access_control=None)
Turn an XSL document into an XSLT object.
@@ -392,7 +392,7 @@
else:
raise XSLTParseError(
self._error_log._buildExceptionMessage(
- "Cannot parse stylesheet"),
+ u"Cannot parse stylesheet"),
self._error_log)
c_doc._private = NULL # no longer used!
@@ -407,18 +407,18 @@
xslt.xsltFreeStylesheet(self._c_style)
property error_log:
- "The log of errors and warnings of an XSLT execution."
+ u"The log of errors and warnings of an XSLT execution."
def __get__(self):
return self._error_log.copy()
def apply(self, _input, *, profile_run=False, **_kw):
- """apply(self, _input, profile_run=False, **_kw)
+ u"""apply(self, _input, profile_run=False, **_kw)
:deprecated: call the object, not this method."""
return self(_input, profile_run=profile_run, **_kw)
def tostring(self, _ElementTree result_tree):
- """tostring(self, result_tree)
+ u"""tostring(self, result_tree)
Save result doc to string based on stylesheet output method.
@@ -433,7 +433,7 @@
return _copyXSLT(self)
def __call__(self, _input, *, profile_run=False, **_kw):
- """__call__(self, _input, profile_run=False, **_kw)
+ u"""__call__(self, _input, profile_run=False, **_kw)
Execute the XSL transformation on a tree or Element.
@@ -513,13 +513,13 @@
error = self._error_log.last_error
if error is not None and error.message:
if error.line > 0:
- message = "%s, line %d" % (error.message, error.line)
+ message = u"%s, line %d" % (error.message, error.line)
else:
message = error.message
elif error is not None and error.line > 0:
- message = "Error applying stylesheet, line %d" % error.line
+ message = u"Error applying stylesheet, line %d" % error.line
else:
- message = "Error applying stylesheet"
+ message = u"Error applying stylesheet"
raise XSLTApplyError(message, self._error_log)
finally:
if resolver_context is not None:
@@ -647,7 +647,7 @@
cdef int l
self._saveToStringAndSize(&s, &l)
if s is NULL:
- return unicode('')
+ return u''
encoding = self._xslt._c_style.encoding
if encoding is NULL:
encoding = 'ascii'
@@ -658,7 +658,7 @@
return _stripEncodingDeclaration(result)
property xslt_profile:
- """Return an ElementTree with profiling data for the stylesheet run.
+ u"""Return an ElementTree with profiling data for the stylesheet run.
"""
def __get__(self):
cdef object root
@@ -694,10 +694,10 @@
# XSLT PI support
cdef object _FIND_PI_ATTRIBUTES
-_FIND_PI_ATTRIBUTES = re.compile(r'\s+(\w+)\s*=\s*["\']([^"\']+)["\']', re.U).findall
+_FIND_PI_ATTRIBUTES = re.compile(ur'\s+(\w+)\s*=\s*["\']([^"\']+)["\']', re.U).findall
cdef object _RE_PI_HREF
-_RE_PI_HREF = re.compile(r'\s+href\s*=\s*["\']([^"\']+)["\']')
+_RE_PI_HREF = re.compile(ur'\s+href\s*=\s*["\']([^"\']+)["\']')
cdef object _FIND_PI_HREF
_FIND_PI_HREF = _RE_PI_HREF.findall
@@ -712,13 +712,13 @@
global __findStylesheetByID
if __findStylesheetByID is None:
__findStylesheetByID = XPath(
- "//xsl:stylesheet[@xml:id = $id]",
- namespaces={"xsl" : "http://www.w3.org/1999/XSL/Transform"})
+ u"//xsl:stylesheet[@xml:id = $id]",
+ namespaces={u"xsl" : u"http://www.w3.org/1999/XSL/Transform"})
return __findStylesheetByID(doc, id=id)
cdef class _XSLTProcessingInstruction(PIBase):
def parseXSL(self, parser=None):
- """Try to parse the stylesheet referenced by this PI and return an
+ u"""Try to parse the stylesheet referenced by this PI and return an
ElementTree for it. If the stylesheet is embedded in the same
document (referenced via xml:id), find and return an ElementTree for
the stylesheet Element.
@@ -731,10 +731,10 @@
cdef char* c_href
cdef xmlAttr* c_attr
if self._c_node.content is NULL:
- raise ValueError, "PI lacks content"
+ raise ValueError, u"PI lacks content"
hrefs_utf = _FIND_PI_HREF(' ' + self._c_node.content)
if len(hrefs_utf) != 1:
- raise ValueError, "malformed PI attributes"
+ raise ValueError, u"malformed PI attributes"
href_utf = hrefs_utf[0]
c_href = _cstr(href_utf)
@@ -760,30 +760,30 @@
# try XPath search
root = _findStylesheetByID(self._doc, funicode(c_href))
if not root:
- raise ValueError, "reference to non-existing embedded stylesheet"
+ raise ValueError, u"reference to non-existing embedded stylesheet"
elif len(root) > 1:
- raise ValueError, "ambiguous reference to embedded stylesheet"
+ raise ValueError, u"ambiguous reference to embedded stylesheet"
result_node = root[0]
return _elementTreeFactory(result_node._doc, result_node)
def set(self, key, value):
- if key != "href":
+ if key != u"href":
raise AttributeError, \
- "only setting the 'href' attribute is supported on XSLT-PIs"
+ u"only setting the 'href' attribute is supported on XSLT-PIs"
if value is None:
- attrib = ""
- elif '"' in value or '>' in value:
- raise ValueError, "Invalid URL, must not contain '\"' or '>'"
+ attrib = u""
+ elif u'"' in value or u'>' in value:
+ raise ValueError, u"Invalid URL, must not contain '\"' or '>'"
else:
- attrib = ' href="%s"' % value
- text = ' ' + self.text
+ attrib = u' href="%s"' % value
+ text = u' ' + self.text
if _FIND_PI_HREF(text):
self.text = _REPLACE_PI_HREF(attrib, text)
else:
self.text = text + attrib
def get(self, key, default=None):
- for attr, value in _FIND_PI_ATTRIBUTES(' ' + self.text):
+ for attr, value in _FIND_PI_ATTRIBUTES(u' ' + self.text):
if attr == key:
return value
return default
Modified: lxml/trunk/src/lxml/xsltext.pxi
==============================================================================
--- lxml/trunk/src/lxml/xsltext.pxi (original)
+++ lxml/trunk/src/lxml/xsltext.pxi Tue May 20 00:00:11 2008
@@ -1,10 +1,10 @@
# XSLT extension elements
cdef class XSLTExtension:
- """Base class of an XSLT extension element.
+ u"""Base class of an XSLT extension element.
"""
def execute(self, context, self_node, input_node, output_parent):
- """execute(self, context, self_node, input_node, output_parent)
+ u"""execute(self, context, self_node, input_node, output_parent)
Execute this extension element.
Subclasses must override this method. They may append
@@ -16,7 +16,7 @@
pass
def apply_templates(self, _XSLTContext context not None, node):
- """apply_templates(self, context, node)
+ u"""apply_templates(self, context, node)
Call this method to retrieve the result of applying templates
to an element.
@@ -59,7 +59,7 @@
proxy.free_after_use()
else:
raise TypeError, \
- "unsupported XSLT result type: %d" % c_node.type
+ u"unsupported XSLT result type: %d" % c_node.type
c_node = c_next
finally:
# free all intermediate nodes that will not be freed by proxies
@@ -95,7 +95,7 @@
context._extension_elements, (c_uri, c_inst_node.name))
if dict_result is NULL:
raise KeyError, \
- "extension element %s not found" % c_inst_node.name
+ u"extension element %s not found" % funicode(c_inst_node.name)
extension = dict_result
try:
@@ -110,12 +110,12 @@
if self_node is not None:
_freeReadOnlyProxies(self_node)
except Exception, e:
- message = "Error executing extension element '%s': %s" % (
- c_inst_node.name, e)
+ message = u"Error executing extension element '%s': %s" % (
+ funicode(c_inst_node.name), e)
xslt.xsltTransformError(c_ctxt, NULL, c_inst_node, message)
context._exc._store_raised()
except:
# just in case
- message = "Error executing extension element '%s'" % c_inst_node.name
+ message = u"Error executing extension element '%s'" % funicode(c_inst_node.name)
xslt.xsltTransformError(c_ctxt, NULL, c_inst_node, message)
context._exc._store_raised()
From scoder at codespeak.net Tue May 20 00:00:24 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 20 May 2008 00:00:24 +0200 (CEST)
Subject: [Lxml-checkins] r54968 - lxml/trunk
Message-ID: <20080519220024.CDF4B168535@codespeak.net>
Author: scoder
Date: Tue May 20 00:00:23 2008
New Revision: 54968
Modified:
lxml/trunk/ (props changed)
lxml/trunk/update-error-constants.py
Log:
r4226 at delle: sbehnel | 2008-05-19 00:52:10 +0200
initial Py3 fixes for error constant parsing script
Modified: lxml/trunk/update-error-constants.py
==============================================================================
--- lxml/trunk/update-error-constants.py (original)
+++ lxml/trunk/update-error-constants.py Tue May 20 00:00:23 2008
@@ -1,14 +1,14 @@
#!/usr/bin/env python
-import sys, os, os.path, re
+import sys, os, os.path, re, codecs
BUILD_SOURCE_FILE = os.path.join("src", "lxml", "xmlerror.pxi")
BUILD_DEF_FILE = os.path.join("src", "lxml", "xmlerror.pxd")
if len(sys.argv) < 2 or sys.argv[1].lower() in ('-h', '--help'):
- print "This script generates the constants in file", BUILD_SOURCE_FILE
- print "Call as"
- print sys.argv[0], "/path/to/libxml2-doc-dir"
+ print("This script generates the constants in file %s" % BUILD_SOURCE_FILE)
+ print("Call as")
+ print(sys.argv[0], "/path/to/libxml2-doc-dir")
sys.exit(len(sys.argv) > 1)
HTML_DIR = os.path.join(sys.argv[1], 'html')
@@ -58,12 +58,12 @@
def regenerate_file(filename, result):
# read .pxi source file
- f = open(filename, 'r')
+ f = codecs.open(filename, 'r', encoding="utf-8")
pre, post = split(f)
f.close()
# write .pxi source file
- f = open(filename, 'w')
+ f = codecs.open(filename, 'w', encoding="utf-8")
f.write(''.join(pre))
f.write(COMMENT)
f.write('\n'.join(result))
@@ -87,7 +87,7 @@
enum_name = enum_name.group(1)
if enum_name not in ENUM_MAP:
continue
- print "Found enum", enum_name
+ print("Found enum", enum_name)
entries = []
for child in enum:
name = child.text
@@ -132,7 +132,7 @@
append_pxd(ctypedef_indent + 'ctypedef enum %s:' % enum_name)
append_pxi('cdef object %s' % pxi_name)
- append_pxi('%s = ("""\\' % pxi_name)
+ append_pxi('%s = (u"""\\' % pxi_name)
prefix_len = len(prefix)
length = 2 # each string ends with '\n\0'
@@ -148,7 +148,7 @@
line = '%s=%d' % (name, val)
if length + len(line) >= 2040: # max string length in MSVC is 2048
append_pxi('""",')
- append_pxi('"""\\')
+ append_pxi('u"""\\')
length = 2 # each string ends with '\n\0'
append_pxi(line)
length += len(line) + 2 # + '\n\0'
@@ -158,10 +158,10 @@
append_pxi('')
# write source files
-print "Updating file", BUILD_SOURCE_FILE
+print("Updating file %s" % BUILD_SOURCE_FILE)
regenerate_file(BUILD_SOURCE_FILE, pxi_result)
-print "Updating file", BUILD_DEF_FILE
+print("Updating file %s" % BUILD_DEF_FILE)
regenerate_file(BUILD_DEF_FILE, pxd_result)
-print "Done"
+print("Done")
From scoder at codespeak.net Tue May 20 00:00:35 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 20 May 2008 00:00:35 +0200 (CEST)
Subject: [Lxml-checkins] r54969 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080519220035.591B116852A@codespeak.net>
Author: scoder
Date: Tue May 20 00:00:33 2008
New Revision: 54969
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/test_elementtree.py
Log:
r4227 at delle: sbehnel | 2008-05-19 09:30:19 +0200
initial Py3 test fixes
Modified: lxml/trunk/src/lxml/tests/test_elementtree.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_elementtree.py (original)
+++ lxml/trunk/src/lxml/tests/test_elementtree.py Tue May 20 00:00:33 2008
@@ -9,10 +9,15 @@
"""
import unittest
-import os, re, tempfile, copy, operator, gc
+import os, re, tempfile, copy, operator, gc, sys
+
+this_dir = os.path.dirname(__file__)
+if this_dir not in sys.path:
+ sys.path.insert(0, this_dir) # needed for Py3
from common_imports import StringIO, etree, ElementTree, cElementTree
from common_imports import fileInTestDir, canonicalize, HelperTestCase
+from common_imports import unicode_literal, byte_literal
if cElementTree is not None:
if tuple([int(n) for n in
@@ -2787,9 +2792,9 @@
Element = self.etree.Element
a = Element('a')
- a.text = u'S?k p? nettet'
+ a.text = unicode_literal('S?k p? nettet')
self.assertXML(
- u'S?k p? nettet '.encode('UTF-8'),
+ unicode_literal('S?k p? nettet ').encode('UTF-8'),
a, 'utf-8')
def test_encoding_exact(self):
@@ -2797,12 +2802,12 @@
Element = self.etree.Element
a = Element('a')
- a.text = u'S?k p? nettet'
+ a.text = unicode_literal('S?k p? nettet')
f = StringIO()
tree = ElementTree(element=a)
tree.write(f, encoding='utf-8')
- self.assertEquals(u'S?k p? nettet '.encode('UTF-8'),
+ self.assertEquals(unicode_literal('S?k p? nettet ').encode('UTF-8'),
f.getvalue().replace('\n',''))
def test_parse_file_encoding(self):
@@ -2810,7 +2815,7 @@
# from file
tree = parse(fileInTestDir('test-string.xml'))
self.assertXML(
- u'S?k p? nettet '.encode('UTF-8'),
+ unicode_literal('S?k p? nettet ').encode('UTF-8'),
tree.getroot(), 'UTF-8')
def test_parse_file_object_encoding(self):
@@ -2820,7 +2825,7 @@
tree = parse(f)
f.close()
self.assertXML(
- u'S?k p? nettet '.encode('UTF-8'),
+ unicode_literal('S?k p? nettet ').encode('UTF-8'),
tree.getroot(), 'UTF-8')
def test_encoding_8bit_latin1(self):
@@ -2828,7 +2833,7 @@
Element = self.etree.Element
a = Element('a')
- a.text = u'S?k p? nettet'
+ a.text = unicode_literal('S?k p? nettet')
f = StringIO()
tree = ElementTree(element=a)
@@ -2837,14 +2842,14 @@
declaration = ""
self.assertEncodingDeclaration(result,'iso-8859-1')
result = result.split('?>', 1)[-1].replace('\n','')
- self.assertEquals(u'S?k p? nettet '.encode('iso-8859-1'),
+ self.assertEquals(unicode_literal('S?k p? nettet ').encode('iso-8859-1'),
result)
def test_parse_encoding_8bit_explicit(self):
XMLParser = self.etree.XMLParser
- text = u'S?k p? nettet'
- xml_latin1 = (u'%s ' % text).encode('iso-8859-1')
+ text = unicode_literal('S?k p? nettet')
+ xml_latin1 = (unicode_literal('%s ') % text).encode('iso-8859-1')
self.assertRaises(self.etree.ParseError,
self.etree.parse,
@@ -2858,9 +2863,9 @@
def test_parse_encoding_8bit_override(self):
XMLParser = self.etree.XMLParser
- text = u'S?k p? nettet'
+ text = unicode_literal('S?k p? nettet')
wrong_declaration = ""
- xml_latin1 = (u'%s%s ' % (wrong_declaration, text)
+ xml_latin1 = (unicode_literal('%s%s ') % (wrong_declaration, text)
).encode('iso-8859-1')
self.assertRaises(self.etree.ParseError,
@@ -2875,8 +2880,8 @@
def _test_wrong_unicode_encoding(self):
# raise error on wrong encoding declaration in unicode strings
XML = self.etree.XML
- test_utf = (u'' + \
- u'S?k p? nettet ')
+ test_utf = (unicode_literal('') +
+ unicode_literal('S?k p? nettet '))
self.assertRaises(SyntaxError, XML, test_utf)
def test_encoding_write_default_encoding(self):
@@ -2884,14 +2889,14 @@
Element = self.etree.Element
a = Element('a')
- a.text = u'S?k p? nettet'
+ a.text = unicode_literal('S?k p? nettet')
f = StringIO()
tree = ElementTree(element=a)
tree.write(f)
data = f.getvalue().replace('\n','')
self.assertEquals(
- u'S?k p? nettet '.encode('ASCII', 'xmlcharrefreplace'),
+ unicode_literal('S?k p? nettet ').encode('ASCII', 'xmlcharrefreplace'),
data)
def test_encoding_tostring(self):
@@ -2899,8 +2904,8 @@
tostring = self.etree.tostring
a = Element('a')
- a.text = u'S?k p? nettet'
- self.assertEquals(u'S?k p? nettet '.encode('UTF-8'),
+ a.text = unicode_literal('S?k p? nettet')
+ self.assertEquals(unicode_literal('S?k p? nettet ').encode('UTF-8'),
tostring(a, encoding='utf-8'))
def test_encoding_tostring_unknown(self):
@@ -2908,7 +2913,7 @@
tostring = self.etree.tostring
a = Element('a')
- a.text = u'S?k p? nettet'
+ a.text = unicode_literal('S?k p? nettet')
self.assertRaises(LookupError, tostring, a,
encoding='Invalid Encoding')
@@ -2919,8 +2924,8 @@
a = Element('a')
b = SubElement(a, 'b')
- b.text = u'S?k p? nettet'
- self.assertEquals(u'S?k p? nettet '.encode('UTF-8'),
+ b.text = unicode_literal('S?k p? nettet')
+ self.assertEquals(unicode_literal('S?k p? nettet ').encode('UTF-8'),
tostring(b, encoding='utf-8'))
def test_encoding_tostring_sub_tail(self):
@@ -2930,9 +2935,9 @@
a = Element('a')
b = SubElement(a, 'b')
- b.text = u'S?k p? nettet'
- b.tail = u'S?k'
- self.assertEquals(u'S?k p? nettet S?k'.encode('UTF-8'),
+ b.text = unicode_literal('S?k p? nettet')
+ b.tail = unicode_literal('S?k')
+ self.assertEquals(unicode_literal('S?k p? nettet S?k').encode('UTF-8'),
tostring(b, encoding='utf-8'))
def test_encoding_tostring_default_encoding(self):
@@ -2941,7 +2946,7 @@
tostring = self.etree.tostring
a = Element('a')
- a.text = u'S?k p? nettet'
+ a.text = unicode_literal('S?k p? nettet')
expected = 'Søk på nettet '
self.assertEquals(
@@ -2955,7 +2960,7 @@
a = Element('a')
b = SubElement(a, 'b')
- b.text = u'S?k p? nettet'
+ b.text = unicode_literal('S?k p? nettet')
expected = 'Søk på nettet '
self.assertEquals(
@@ -2963,25 +2968,25 @@
tostring(b))
def test_encoding_8bit_xml(self):
- utext = u'S?k p? nettet'
- uxml = u'%s
' % utext
+ utext = unicode_literal('S?k p? nettet')
+ uxml = unicode_literal('%s
') % utext
prologue = ''
isoxml = prologue + uxml.encode('iso-8859-1')
tree = self.etree.XML(isoxml)
self.assertEquals(utext, tree.text)
def test_encoding_utf8_bom(self):
- utext = u'S?k p? nettet'
- uxml = u'' + \
- u'%s
' % utext
+ utext = unicode_literal('S?k p? nettet')
+ uxml = (unicode_literal('') +
+ unicode_literal('%s
') % utext)
bom = '\xEF\xBB\xBF'
xml = bom + uxml.encode("utf-8")
tree = etree.XML(xml)
self.assertEquals(utext, tree.text)
def test_encoding_8bit_parse_stringio(self):
- utext = u'S?k p? nettet'
- uxml = u'%s
' % utext
+ utext = unicode_literal('S?k p? nettet')
+ uxml = unicode_literal('%s
') % utext
prologue = ''
isoxml = prologue + uxml.encode('iso-8859-1')
el = self.etree.parse(StringIO(isoxml)).getroot()
@@ -3285,7 +3290,8 @@
parser = self.etree.XMLParser()
try:
parser.close()
- except ParseError, e:
+ except ParseError:
+ e = sys.exc_info()[0]
self.assertNotEquals(None, e.code)
self.assertNotEquals(0, e.code)
self.assert_(isinstance(e.position, tuple))
@@ -3567,4 +3573,4 @@
return suite
if __name__ == '__main__':
- print 'to test use test.py %s' % __file__
+ print ('to test use test.py %s' % __file__)
From scoder at codespeak.net Tue May 20 00:00:46 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 20 May 2008 00:00:46 +0200 (CEST)
Subject: [Lxml-checkins] r54970 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080519220046.27D35168533@codespeak.net>
Author: scoder
Date: Tue May 20 00:00:45 2008
New Revision: 54970
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/common_imports.py
Log:
r4228 at delle: sbehnel | 2008-05-19 09:30:39 +0200
initial Py3 test fixes
Modified: lxml/trunk/src/lxml/tests/common_imports.py
==============================================================================
--- lxml/trunk/src/lxml/tests/common_imports.py (original)
+++ lxml/trunk/src/lxml/tests/common_imports.py Tue May 20 00:00:45 2008
@@ -1,6 +1,5 @@
import unittest
import os.path
-from StringIO import StringIO
import re, gc
from lxml import etree
@@ -58,6 +57,27 @@
seq.sort(**kwargs)
return seq
+try:
+ unicode
+except NameError:
+ # Python 3
+ unicode = str
+ def unicode_literal(s, encoding="UTF-8"):
+ return s
+ def byte_literal(s, encoding="UTF-8"):
+ return s.encode(encoding)
+else:
+ # Python 2
+ unicode_literal = unicode
+ def byte_literal(s, encoding="UTF-8"):
+ return s
+
+try:
+ from StringIO import StringIO
+except ImportError:
+ # Python 3
+ from io import StringIO
+
class HelperTestCase(unittest.TestCase):
def tearDown(self):
gc.collect()
From scoder at codespeak.net Tue May 20 00:01:01 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 20 May 2008 00:01:01 +0200 (CEST)
Subject: [Lxml-checkins] r54971 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080519220101.A558516852A@codespeak.net>
Author: scoder
Date: Tue May 20 00:00:57 2008
New Revision: 54971
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/test_elementtree.py
Log:
r4229 at delle: sbehnel | 2008-05-19 09:34:08 +0200
small fix
Modified: lxml/trunk/src/lxml/tests/test_elementtree.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_elementtree.py (original)
+++ lxml/trunk/src/lxml/tests/test_elementtree.py Tue May 20 00:00:57 2008
@@ -3291,7 +3291,7 @@
try:
parser.close()
except ParseError:
- e = sys.exc_info()[0]
+ e = sys.exc_info()[1]
self.assertNotEquals(None, e.code)
self.assertNotEquals(0, e.code)
self.assert_(isinstance(e.position, tuple))
From scoder at codespeak.net Tue May 20 00:01:09 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 20 May 2008 00:01:09 +0200 (CEST)
Subject: [Lxml-checkins] r54972 - in lxml/trunk: . src/lxml
Message-ID: <20080519220109.638E6168533@codespeak.net>
Author: scoder
Date: Tue May 20 00:01:07 2008
New Revision: 54972
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/apihelpers.pxi
lxml/trunk/src/lxml/etree_defs.h
lxml/trunk/src/lxml/python.pxd
Log:
r4230 at delle: sbehnel | 2008-05-19 09:35:46 +0200
let funicode() always return unicode strings in Py3
Modified: lxml/trunk/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/trunk/src/lxml/apihelpers.pxi (original)
+++ lxml/trunk/src/lxml/apihelpers.pxi Tue May 20 00:01:07 2008
@@ -1004,6 +1004,9 @@
cdef Py_ssize_t slen
cdef char* spos
cdef bint is_non_ascii
+ if python.IS_PYTHON3:
+ slen = cstd.strlen(s)
+ return python.PyUnicode_DecodeUTF8(s, slen, NULL)
spos = s
is_non_ascii = 0
while spos[0] != c'\0':
Modified: lxml/trunk/src/lxml/etree_defs.h
==============================================================================
--- lxml/trunk/src/lxml/etree_defs.h (original)
+++ lxml/trunk/src/lxml/etree_defs.h Tue May 20 00:01:07 2008
@@ -17,6 +17,12 @@
# define PyFile_AsFile(o) (NULL)
#endif
+#if PY_VERSION_HEX >= 0x03000000
+# define IS_PYTHON3 1
+#else
+# define IS_PYTHON3 0
+#endif
+
#ifdef WITHOUT_THREADING
# define PyEval_SaveThread() (NULL)
# define PyEval_RestoreThread(state)
Modified: lxml/trunk/src/lxml/python.pxd
==============================================================================
--- lxml/trunk/src/lxml/python.pxd (original)
+++ lxml/trunk/src/lxml/python.pxd Tue May 20 00:01:07 2008
@@ -122,3 +122,4 @@
cdef bint _isString(object obj)
cdef char* _fqtypename(object t)
cdef object PY_NEW(object t)
+ cdef bint IS_PYTHON3
From scoder at codespeak.net Tue May 20 00:01:22 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 20 May 2008 00:01:22 +0200 (CEST)
Subject: [Lxml-checkins] r54973 - in lxml/trunk: . src/lxml
Message-ID: <20080519220122.2AC3916852A@codespeak.net>
Author: scoder
Date: Tue May 20 00:01:19 2008
New Revision: 54973
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/ElementInclude.py
lxml/trunk/src/lxml/_elementpath.py
Log:
r4231 at delle: sbehnel | 2008-05-19 23:46:38 +0200
Py3 fixes
Modified: lxml/trunk/src/lxml/ElementInclude.py
==============================================================================
--- lxml/trunk/src/lxml/ElementInclude.py (original)
+++ lxml/trunk/src/lxml/ElementInclude.py Tue May 20 00:01:19 2008
@@ -50,7 +50,8 @@
form of custom URL resolvers.
"""
-import copy, etree
+from lxml import etree
+import copy
from urlparse import urljoin
from urllib2 import urlopen
Modified: lxml/trunk/src/lxml/_elementpath.py
==============================================================================
--- lxml/trunk/src/lxml/_elementpath.py (original)
+++ lxml/trunk/src/lxml/_elementpath.py Tue May 20 00:01:19 2008
@@ -174,17 +174,23 @@
if path[:1] == "/":
raise SyntaxError("cannot use absolute path on element")
stream = iter(xpath_tokenizer(path))
- next = stream.next; token = next()
+ try:
+ _next = stream.next
+ except AttributeError:
+ # Python 3
+ def _next():
+ return next(stream)
+ token = _next()
selector = []
while 1:
try:
- selector.append(ops[token[0]](next, token))
+ selector.append(ops[token[0]](_next, token))
except StopIteration:
raise SyntaxError("invalid path")
try:
- token = next()
+ token = _next()
if token[0] == "/":
- token = next()
+ token = _next()
except StopIteration:
break
return selector
@@ -204,8 +210,14 @@
# Find first matching object.
def find(elem, path):
+ it = iterfind(elem, path)
try:
- return iterfind(elem, path).next()
+ try:
+ _next = it.next
+ except AttributeError:
+ return next(it)
+ else:
+ return _next()
except StopIteration:
return None
@@ -219,8 +231,8 @@
# Find text for first matching object.
def findtext(elem, path, default=None):
- try:
- elem = iterfind(elem, path).next()
- return elem.text
- except StopIteration:
+ el = find(elem, path)
+ if el is None:
return default
+ else:
+ return el.text
From scoder at codespeak.net Tue May 20 00:01:35 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Tue, 20 May 2008 00:01:35 +0200 (CEST)
Subject: [Lxml-checkins] r54974 - in lxml/trunk: . src/lxml
src/lxml/html/tests
Message-ID: <20080519220135.30AA9168533@codespeak.net>
Author: scoder
Date: Tue May 20 00:01:33 2008
New Revision: 54974
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/apihelpers.pxi
lxml/trunk/src/lxml/etree_defs.h
lxml/trunk/src/lxml/html/tests/test_forms.txt
lxml/trunk/src/lxml/iterparse.pxi
lxml/trunk/src/lxml/lxml.etree.pyx
lxml/trunk/src/lxml/parser.pxi
lxml/trunk/src/lxml/python.pxd
lxml/trunk/src/lxml/xslt.pxi
Log:
r4232 at delle: sbehnel | 2008-05-19 23:51:31 +0200
unicode filename handling, uses a heuristic to distinguish file paths and network paths, plus some general Py3 fixes
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Tue May 20 00:01:33 2008
@@ -8,6 +8,9 @@
Features added
--------------
+* File name handling now uses a heuristic to convert between byte
+ strings and unicode strings.
+
* Parsing from a plain file object frees the GIL.
* Running ``iterparse()`` on a plain file (or filename) frees the GIL
Modified: lxml/trunk/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/trunk/src/lxml/apihelpers.pxi (original)
+++ lxml/trunk/src/lxml/apihelpers.pxi Tue May 20 00:01:33 2008
@@ -435,7 +435,10 @@
# handle two most common cases first
if c_text is NULL:
if scount > 0:
- return ''
+ if python.IS_PYTHON3:
+ return u''
+ else:
+ return ''
else:
return None
if scount == 1:
@@ -505,7 +508,7 @@
else:
c_ns = element._doc._findOrBuildNodeNs(
element._c_node, _cstr(ns), NULL)
- return '%s:%s' % (c_ns.prefix, tag) # UTF-8
+ return python.PyString_FromFormat('%s:%s', c_ns.prefix, _cstr(tag))
cdef inline bint _hasChild(xmlNode* c_node):
return c_node is not NULL and _findChildForwards(c_node, 0) is not NULL
@@ -1034,6 +1037,27 @@
raise TypeError, "Argument must be string or unicode."
return s
+cdef bint _isFilePath(char* c_path):
+ u"simple heuristic to see if a path is a filename"
+ # test if it looks like an absolute Unix path or a Windows network path
+ if c_path[0] == c'/':
+ return 1
+ # test if it looks like an absolute Windows path
+ if (c_path[0] >= c'a' and c_path[0] <= c'z') or \
+ (c_path[0] >= c'A' and c_path[0] <= c'Z'):
+ if c_path[1] == c':':
+ return 1
+ # test if it looks like a relative path
+ while c_path[0] != c'\0':
+ if c_path[0] == c':':
+ return 0
+ if c_path[0] == c'/':
+ return 1
+ if c_path[0] == c'\\':
+ return 1
+ c_path += 1
+ return 1
+
cdef object _encodeFilename(object filename):
u"""Make sure a filename is 8-bit encoded (or None).
"""
@@ -1042,11 +1066,34 @@
elif python.PyString_Check(filename):
return filename
elif python.PyUnicode_Check(filename):
- return python.PyUnicode_AsEncodedString(
- filename, _C_FILENAME_ENCODING, NULL)
+ filename8 = python.PyUnicode_AsEncodedString(
+ filename, 'UTF-8', NULL)
+ if _isFilePath(filename8):
+ try:
+ return python.PyUnicode_AsEncodedString(
+ filename, _C_FILENAME_ENCODING, NULL)
+ except UnicodeEncodeError:
+ pass
+ return filename8
else:
raise TypeError, u"Argument must be string or unicode."
+cdef object _decodeFilename(char* c_path):
+ u"""Make the filename a unicode string if we are in Py3.
+ """
+ cdef Py_ssize_t c_len = cstd.strlen(c_path)
+ if _isFilePath(c_path):
+ try:
+ return python.PyUnicode_Decode(
+ c_path, c_len, _C_FILENAME_ENCODING, NULL)
+ except UnicodeDecodeError:
+ pass
+ try:
+ return python.PyUnicode_DecodeUTF8(c_path, c_len, NULL)
+ except UnicodeDecodeError:
+ # this is a stupid fallback, but it might still work...
+ return python.PyString_FromStringAndSize(c_path, c_len)
+
cdef object _encodeFilenameUTF8(object filename):
u"""Recode filename as UTF-8. Tries ASCII, local filesystem encoding and
UTF-8 as source encoding.
@@ -1182,6 +1229,8 @@
cdef object _namespacedNameFromNsName(char* href, char* name):
if href is NULL:
return funicode(name)
+ elif python.IS_PYTHON3:
+ return python.PyUnicode_FromFormat("{%s}%s", href, name)
else:
s = python.PyString_FromFormat("{%s}%s", href, name)
if isutf8(href) or isutf8(name):
Modified: lxml/trunk/src/lxml/etree_defs.h
==============================================================================
--- lxml/trunk/src/lxml/etree_defs.h (original)
+++ lxml/trunk/src/lxml/etree_defs.h Tue May 20 00:01:33 2008
@@ -15,6 +15,8 @@
/* Python 3 doesn't have PyFile_*() */
#if PY_VERSION_HEX >= 0x03000000
# define PyFile_AsFile(o) (NULL)
+#else
+# define PyUnicode_FromFormat(s, ...) (NULL)
#endif
#if PY_VERSION_HEX >= 0x03000000
Modified: lxml/trunk/src/lxml/html/tests/test_forms.txt
==============================================================================
--- lxml/trunk/src/lxml/html/tests/test_forms.txt (original)
+++ lxml/trunk/src/lxml/html/tests/test_forms.txt Tue May 20 00:01:33 2008
@@ -33,10 +33,10 @@
...
...