[Lxml-checkins] r43958 - lxml/branch/html/src/lxml/html/tests
ianb at codespeak.net
ianb at codespeak.net
Fri Jun 1 06:41:55 CEST 2007
Author: ianb
Date: Fri Jun 1 06:41:55 2007
New Revision: 43958
Modified:
lxml/branch/html/src/lxml/html/tests/test_rewritelinks.txt
Log:
remove references to now-gone rewritelinks module
Modified: lxml/branch/html/src/lxml/html/tests/test_rewritelinks.txt
==============================================================================
--- lxml/branch/html/src/lxml/html/tests/test_rewritelinks.txt (original)
+++ lxml/branch/html/src/lxml/html/tests/test_rewritelinks.txt Fri Jun 1 06:41:55 2007
@@ -1,52 +1,13 @@
-These are tests of relocateresponse::
+We'll define a link translation function:
- >>> from lxml.html.rewritelinks import Relocator
-
-In all these examples we'll be using ``http://old`` for the old
-(to-be-replaced) URL and ``https://new`` for the new URL (note the
-scheme change). To test the rewriting we'll use this handy rewriter
-that rewrites everything from one base to another base::
-
- >>> relocate_href = Relocator(
- ... base_href='http://old/base/path.html',
- ... old_href='http://old/',
- ... new_href='https://new/')
-
-Now lets look at simple href rewriting. Normal rewrite::
-
- >>> relocate_href('http://old/bar')
- 'https://new/bar'
-
-Note that the trailing / doesn't matter in this one case (since
-``http://old`` and ``http://old/`` are entirely equivalent)::
-
- >>> relocate_href('http://old')
- 'https://new/'
-
-The trailing / does matter in other cases::
-
- >>> Relocator(
- ... base_href='',
- ... old_href='http://old-test/foo/',
- ... new_href='https://new',
- ... )('http://old-test/foo')
- 'http://old-test/foo'
- >>> Relocator(
- ... base_href='',
- ... old_href='http://old-test/foo/',
- ... new_href='https://new',
- ... )('http://old-test/foo/')
- 'https://new'
-
-Rewriting a link that doesn't match old_href is a no-op::
-
- >>> relocate_href('http://foo/bar')
- 'http://foo/bar'
-
-Relative links are handled::
-
- >>> relocate_href('index.html')
- 'https://new/base/index.html'
+ >>> base_href = 'http://old/base/path.html'
+ >>> import urlparse
+ >>> def relocate_href(link):
+ ... link = urlparse.urljoin(base_href, link)
+ ... if link.startswith('http://old'):
+ ... return 'https://new' + link[len('http://old'):]
+ ... else:
+ ... return link
Now for content. First, to make it easier on us, we need to trim the
normalized HTML we get from these functions::
More information about the lxml-checkins
mailing list