[wwwsearch-commits] r41741 - in wwwsearch/mechanize/trunk: . mechanize test
jjlee at codespeak.net
jjlee at codespeak.net
Sat Mar 31 01:16:43 CEST 2007
Author: jjlee
Date: Sat Mar 31 01:16:32 2007
New Revision: 41741
Modified:
wwwsearch/mechanize/trunk/doc.html.in
wwwsearch/mechanize/trunk/mechanize/_auth.py
wwwsearch/mechanize/trunk/mechanize/_http.py
wwwsearch/mechanize/trunk/test/test_browser.doctest
Log:
Sub-requests should not usually be visiting, so make it so. In fact the visible behaviour wasn't really broken here, since .back() skips over None responses (which is odd in itself, but won't be changed until after stable release is out). However, this patch does change visible behaviour in that it creates new Request objects for sub-requests (e.g. basic auth retries) where previously we just mutated the existing Request object.
Modified: wwwsearch/mechanize/trunk/doc.html.in
==============================================================================
--- wwwsearch/mechanize/trunk/doc.html.in (original)
+++ wwwsearch/mechanize/trunk/doc.html.in Sat Mar 31 01:16:32 2007
@@ -444,6 +444,10 @@
handlers unless you use <code>mechanize.Request</code> in the first place.
Sorry about that.
+<p>Note also that handlers may create new <code>Request</code> instances (for
+example when performing redirects) rather than adding headers to existing
+<code>Request objects</code>.
+
<a name="headers"></a>
<h2>Adding headers</h2>
Modified: wwwsearch/mechanize/trunk/mechanize/_auth.py
==============================================================================
--- wwwsearch/mechanize/trunk/mechanize/_auth.py (original)
+++ wwwsearch/mechanize/trunk/mechanize/_auth.py Sat Mar 31 01:16:32 2007
@@ -11,7 +11,7 @@
"""
-import re, base64, urlparse, posixpath, md5, sha, sys
+import re, base64, urlparse, posixpath, md5, sha, sys, copy
from urllib2 import BaseHandler
from urllib import getproxies, unquote, splittype, splituser, splitpasswd, \
@@ -234,8 +234,10 @@
auth = 'Basic %s' % base64.encodestring(raw).strip()
if req.headers.get(self.auth_header, None) == auth:
return None
- req.add_header(self.auth_header, auth)
- return self.parent.open(req)
+ newreq = copy.copy(req)
+ newreq.add_header(self.auth_header, auth)
+ newreq.visit = False
+ return self.parent.open(newreq)
else:
return None
@@ -325,9 +327,10 @@
auth_val = 'Digest %s' % auth
if req.headers.get(self.auth_header, None) == auth_val:
return None
- req.add_unredirected_header(self.auth_header, auth_val)
- resp = self.parent.open(req)
- return resp
+ newreq = copy.copy(req)
+ newreq.add_unredirected_header(self.auth_header, auth_val)
+ newreq.visit = False
+ return self.parent.open(newreq)
def get_cnonce(self, nonce):
# The cnonce-value is an opaque
Modified: wwwsearch/mechanize/trunk/mechanize/_http.py
==============================================================================
--- wwwsearch/mechanize/trunk/mechanize/_http.py (original)
+++ wwwsearch/mechanize/trunk/mechanize/_http.py Sat Mar 31 01:16:32 2007
@@ -77,15 +77,13 @@
# from the user (of urllib2, in this case). In practice,
# essentially all clients do redirect in this case, so we do
# the same.
- try:
- visit = req.visit
- except AttributeError:
- visit = None
+ # XXX really refresh redirections should be visiting; tricky to
+ # fix, so this will wait until post-stable release
return Request(newurl,
headers=req.headers,
origin_req_host=req.get_origin_req_host(),
unverifiable=True,
- visit=visit,
+ visit=False,
)
else:
raise HTTPError(req.get_full_url(), code, msg, headers, fp)
@@ -446,7 +444,7 @@
HTTPRefererProcessor to fetch a series of URLs extracted from a single
page, this will break).
- There's a proper implementation of this in module mechanize.
+ There's a proper implementation of this in mechanize.Browser.
"""
def __init__(self):
Modified: wwwsearch/mechanize/trunk/test/test_browser.doctest
==============================================================================
--- wwwsearch/mechanize/trunk/test/test_browser.doctest (original)
+++ wwwsearch/mechanize/trunk/test/test_browser.doctest Sat Mar 31 01:16:32 2007
@@ -145,7 +145,7 @@
>>> req = Request("http://example.com")
>>> req.visit = False
>>> br = TestBrowser2()
->>> hh = MockHTTPHandler(301, "Location: http://example.com/\r\n\r\n")
+>>> hh = MockHTTPHandler(302, "Location: http://example.com/\r\n\r\n")
>>> br.add_handler(hh)
>>> br.add_handler(HTTPRedirectHandler())
>>> def raises(exc_class, fn, *args, **kwds):
@@ -166,6 +166,70 @@
True
+...in fact, any redirection (but not refresh), proxy request, basic or
+digest auth request, or robots.txt request should be non-visiting,
+even if .visit is True:
+
+>>> from test_urllib2 import MockPasswordManager
+>>> def test_one_visit(handlers):
+... br = TestBrowser2()
+... for handler in handlers: br.add_handler(handler)
+... req = Request("http://example.com")
+... req.visit = True
+... br.open(req)
+... return br
+>>> def test_state(br):
+... # XXX the _history._history check is needed because of the weird
+... # throwing-away of history entries by .back() where response is
+... # None, which makes the .back() check insufficient to tell if a
+... # history entry was .add()ed. I don't want to change this until
+... # post-stable.
+... return (
+... br.response() and
+... br.request and
+... len(br._history._history) == 0 and
+... raises(BrowserStateError, br.back))
+
+>>> hh = MockHTTPHandler(302, "Location: http://example.com/\r\n\r\n")
+>>> br = test_one_visit([hh, HTTPRedirectHandler()])
+>>> test_state(br)
+True
+
+>>> class MockPasswordManager:
+... def add_password(self, realm, uri, user, password): pass
+... def find_user_password(self, realm, authuri): return '', ''
+
+>>> ah = mechanize.HTTPBasicAuthHandler(MockPasswordManager())
+>>> hh = MockHTTPHandler(
+... 401, 'WWW-Authenticate: Basic realm="realm"\r\n\r\n')
+>>> test_state(test_one_visit([hh, ah]))
+True
+
+>>> ph = mechanize.ProxyHandler(dict(http="proxy.example.com:3128"))
+>>> ah = mechanize.ProxyBasicAuthHandler(MockPasswordManager())
+>>> hh = MockHTTPHandler(
+... 407, 'Proxy-Authenticate: Basic realm="realm"\r\n\r\n')
+>>> test_state(test_one_visit([ph, hh, ah]))
+True
+
+XXX Can't really fix this one properly without significant changes --
+the refresh should go onto the history *after* the call, but currently
+all redirects, including refreshes, are done by recursive .open()
+calls, which gets the history wrong in this case. Will have to wait
+until after stable release:
+
+#>>> hh = MockHTTPHandler(
+#... "refresh", 'Location: http://example.com/\r\n\r\n')
+#>>> br = test_one_visit([hh, HTTPRedirectHandler()])
+#>>> br.response() is not None
+#True
+#>>> br.request is not None
+#True
+#>>> r = br.back()
+
+XXX digest, robots
+
+
.global_form() is separate from the other forms (partly for backwards-
compatibility reasons).
More information about the wwwsearch-commits
mailing list