[wwwsearch-commits] r26692 - in wwwsearch/mechanize/trunk: .
mechanize
jjlee at codespeak.net
jjlee at codespeak.net
Wed May 3 01:47:02 CEST 2006
Author: jjlee
Date: Wed May 3 01:46:58 2006
New Revision: 26692
Added:
wwwsearch/mechanize/trunk/0.1.0-changes.txt
- copied unchanged from r26690, wwwsearch/mechanize/branch/mechanize-0.1.0-devel/0.1.0-changes.txt
Modified:
wwwsearch/mechanize/trunk/COPYING.txt
wwwsearch/mechanize/trunk/INSTALL.txt
wwwsearch/mechanize/trunk/MANIFEST.in
wwwsearch/mechanize/trunk/README.html.in
wwwsearch/mechanize/trunk/functional_tests.py
wwwsearch/mechanize/trunk/mechanize/__init__.py
wwwsearch/mechanize/trunk/mechanize/_html.py
wwwsearch/mechanize/trunk/mechanize/_mechanize.py
wwwsearch/mechanize/trunk/mechanize/_useragent.py
wwwsearch/mechanize/trunk/setup.py
wwwsearch/mechanize/trunk/test.py
Log:
Merge branch/mechanize-0.1.0-devel to trunk (-r26195:26690)
Modified: wwwsearch/mechanize/trunk/COPYING.txt
==============================================================================
--- wwwsearch/mechanize/trunk/COPYING.txt (original)
+++ wwwsearch/mechanize/trunk/COPYING.txt Wed May 3 01:46:58 2006
@@ -1,4 +1,4 @@
-Copyright (c) 2002-2005 John J. Lee <jjl at pobox.com>
+Copyright (c) 2002-2006 John J. Lee <jjl at pobox.com>
Copyright (c) 2003 Andy Lester
All rights reserved.
@@ -29,3 +29,28 @@
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+
+
+
+ZPL 2.1
+==================
+
+Zope Public License (ZPL) Version 2.1
+
+A copyright notice accompanies this license document that identifies the copyright holders.
+
+This license has been certified as open source. It has also been designated as GPL compatible by the Free Software Foundation (FSF).
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+ 1. Redistributions in source code must retain the accompanying copyright notice, this list of conditions, and the following disclaimer.
+ 2. Redistributions in binary form must reproduce the accompanying copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution.
+ 3. Names of the copyright holders must not be used to endorse or promote products derived from this software without prior written permission from the copyright holders.
+ 4. The right to distribute this software or to use it for any purpose does not give you the right to use Servicemarks (sm) or Trademarks (tm) of the copyright holders. Use of them is covered by separate agreement with the copyright holders.
+ 5. If any files are modified, you must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.
+
+Disclaimer
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Modified: wwwsearch/mechanize/trunk/INSTALL.txt
==============================================================================
--- wwwsearch/mechanize/trunk/INSTALL.txt (original)
+++ wwwsearch/mechanize/trunk/INSTALL.txt Wed May 3 01:46:58 2006
@@ -33,11 +33,10 @@
Alternatively, just copy the whole mechanize directory into your
-Python path (eg. unix: /usr/local/lib/python2.2/site-packages,
-Windows: C:\Python21, or C:\Python22\Lib\site-packages). That's all
-that setup.py does. Only copy the ClientCookie directory that's
-inside the distributed tarball / zip archive, not the entire
-mechanize-x.x.x directory!
+Python path (eg. unix: /usr/local/lib/python2.4/site-packages,
+Windows: C:\Python24\Lib\site-packages). Only copy the mechanize
+directory that's inside the distributed tarball / zip archive, not the
+entire mechanize-x.x.x directory!
To run the tests (none of which access the network), run the following
@@ -62,13 +61,13 @@
Copyright Notices
- (C) 2002-2003 John J. Lee. All rights reserved.
+ (C) 2002-2006 John J. Lee. All rights reserved.
(C) 2003 Andy Lester. All rights reserved. (Perl code from which
this module is derived)
This code in this package is free software; you can redistribute it
-and/or modify it under the terms of the BSD license (see the file
-COPYING).
+and/or modify it under the terms of the BSD or ZPL 2.1 licenses (see
+the file COPYING.txt).
John J. Lee <jjl at pobox.com>
-December 2003
+May 2006
Modified: wwwsearch/mechanize/trunk/MANIFEST.in
==============================================================================
--- wwwsearch/mechanize/trunk/MANIFEST.in (original)
+++ wwwsearch/mechanize/trunk/MANIFEST.in Wed May 3 01:46:58 2006
@@ -1,10 +1,10 @@
include MANIFEST.in
-include COPYING
-include INSTALL
+include COPYING.txt
+include INSTALL.txt
include GeneralFAQ.html
include README.html.in
include README.html
include README.txt
-include ChangeLog
+include ChangeLog.txt
include *.py
recursive-include examples *.py
Modified: wwwsearch/mechanize/trunk/README.html.in
==============================================================================
--- wwwsearch/mechanize/trunk/README.html.in (original)
+++ wwwsearch/mechanize/trunk/README.html.in Wed May 3 01:46:58 2006
@@ -159,23 +159,22 @@
<h3>Specific to mechanize</h3>
<ul>
- <li>Apply Titus' patch to move stuff into separate file and change
- Factory interface.
- <li>Kill off <code>.get_links_iter()</code>.
+ <li>Make encoding_finder public, I guess.
+ <li>Fix BeautifulSoup support to use a single BeautifulSoup instance
+ per page.
+ <li>Test BeautifulSoup support better / fix encoding issue.
<li>Support Mark Pilgrim's universal encoding detector?
<li>Add another History implementation or two and finalise interface.
<li>History cache expiration.
<li>Investigate possible leak (see Balazs Ree's list posting).
<li>Add <code>Browser.form_as_string()</code> and
<code>Browser.__str__()</code> methods.
- <li>Test BeautifulSoup support better / fix encoding issue.
<li>Add two-way links between BeautifulSoup & ClientForm object models.
- <li>Add basic proxy support. I hope somebody else does this!
</ul>
<h3>mechanize documentation</h3>
<ul>
- <li>Add docs re auth! And perhaps simpler API...
+ <li>Auth / proxies.
<li>Document means of processing response on ad-hoc basis with
.set_response() - e.g. to fix bad encoding in Content-type header or
clean up bad HTML.
@@ -195,9 +194,9 @@
<li>Unicode support in general (not sure yet how/when/whether this will
happen).
<li>Provide per-connection access to timeouts (ClientCookie).
- <li>Keep-alive / connection caching
- <li>Work on auth / proxies (yawn).
+ <li>Keep-alive / connection caching.
<li>Pipelining??
+ <li>Content negotiation.
</ul>
@@ -423,8 +422,10 @@
and install them manually, instead – see the <code>INSTALL.txt</code>
file (included with the distribution).
<li>Which license?
- <p>The <a href="http://www.opensource.org/licenses/bsd-license.php">
- BSD license</a> (included in distribution).
+ <p>mechanize is dual-licensed: you may pick either the
+ <a href="http://www.opensource.org/licenses/bsd-license.php">BSD license</a>,
+ or the <a href="http://www.zope.org/Resources/ZPL">ZPL 2.1</a> (both are
+ included in the distribution).
</ul>
<p>I prefer questions and comments to be sent to the <a
Modified: wwwsearch/mechanize/trunk/functional_tests.py
==============================================================================
--- wwwsearch/mechanize/trunk/functional_tests.py (original)
+++ wwwsearch/mechanize/trunk/functional_tests.py Wed May 3 01:46:58 2006
@@ -74,11 +74,11 @@
self.assertEqual(br.response().read(), html)
br.response().set_data(newhtml)
self.assertEqual(br.response().read(), html)
- self.assertEqual(br.links()[0].url, 'http://sourceforge.net')
+ self.assertEqual(list(br.links())[0].url, 'http://sourceforge.net')
br.set_response(r)
self.assertEqual(br.response().read(), newhtml)
- self.assertEqual(br.links()[0].url, "spam")
+ self.assertEqual(list(br.links())[0].url, "spam")
def test_close_pickle_load(self):
print ("Test test_close_pickle_load is expected to fail unless Python "
Modified: wwwsearch/mechanize/trunk/mechanize/__init__.py
==============================================================================
--- wwwsearch/mechanize/trunk/mechanize/__init__.py (original)
+++ wwwsearch/mechanize/trunk/mechanize/__init__.py Wed May 3 01:46:58 2006
@@ -1,9 +1,9 @@
-from _useragent import UserAgent
+from _useragent import UserAgent, HTTPProxyPasswordMgr
from _mechanize import Browser, \
BrowserStateError, LinkNotFoundError, FormNotFoundError, \
__version__
from _html import Link, \
Factory, DefaultFactory, RobustFactory, \
- FormsFactory, LinksFactory, pp_get_title, \
- RobustFormsFactory, RobustLinksFactory, bs_get_title
+ FormsFactory, LinksFactory, TitleFactory, \
+ RobustFormsFactory, RobustLinksFactory, RobustTitleFactory
Modified: wwwsearch/mechanize/trunk/mechanize/_html.py
==============================================================================
--- wwwsearch/mechanize/trunk/mechanize/_html.py (original)
+++ wwwsearch/mechanize/trunk/mechanize/_html.py Wed May 3 01:46:58 2006
@@ -1,9 +1,20 @@
+"""HTML handling.
+
+Copyright 2003-2006 John J. Lee <jjl at pobox.com>
+
+This code is free software; you can redistribute it and/or modify it under
+the terms of the BSD or ZPL 2.1 licenses (see the file COPYING.txt
+included with the distribution).
+
+"""
+
from __future__ import generators
-import re, urllib, htmlentitydefs
+import re, copy, urllib, htmlentitydefs
from urlparse import urljoin
import ClientCookie
+from ClientCookie._HeadersUtil import split_header_words, is_html as _is_html
## # XXXX miserable hack
## def urljoin(base, url):
@@ -23,6 +34,41 @@
# 'safe'-by-default characters that urllib.urlquote never quotes
URLQUOTE_SAFE_URL_CHARS = "!*'();:@&=+$,/?%#[]~"
+DEFAULT_ENCODING = "latin-1"
+
+class CachingGeneratorFunction(object):
+ """Caching wrapper around a no-arguments iterable."""
+ def __init__(self, iterable):
+ self._iterable = iterable
+ self._cache = []
+ def __call__(self):
+ cache = self._cache
+ for item in cache:
+ yield item
+ for item in self._iterable:
+ cache.append(item)
+ yield item
+
+def encoding_finder(default_encoding):
+ def encoding(response):
+ # HTTPEquivProcessor may be in use, so both HTTP and HTTP-EQUIV
+ # headers may be in the response. HTTP-EQUIV headers come last,
+ # so try in order from first to last.
+ for ct in response.info().getheaders("content-type"):
+ for k, v in split_header_words([ct])[0]:
+ if k == "charset":
+ return v
+ return default_encoding
+ return encoding
+
+def make_is_html(allow_xhtml):
+ def is_html(response, encoding):
+ ct_hdrs = response.info().getheaders("content-type")
+ url = response.geturl()
+ # XXX encoding
+ return _is_html(ct_hdrs, url, allow_xhtml)
+ return is_html
+
# idea for this argument-processing trick is from Peter Otten
class Args:
def __init__(self, args_map):
@@ -38,7 +84,6 @@
form_parser_class=None,
request_class=None,
backwards_compat=False,
- encoding="latin-1", # deprecated
):
return Args(locals())
@@ -88,11 +133,21 @@
"iframe": "src",
}
self.urltags = urltags
+ self._response = None
+ self._encoding = None
+
+ def set_response(self, response, base_url, encoding):
+ self._response = response
+ self._encoding = encoding
+ self._base_url = base_url
- def links(self, fh, base_url, encoding=None):
+ def links(self):
"""Return an iterator that provides links of the document."""
import pullparser
- p = self.link_parser_class(fh, encoding=encoding)
+ response = self._response
+ encoding = self._encoding
+ base_url = self._base_url
+ p = self.link_parser_class(response, encoding=encoding)
for token in p.tags(*(self.urltags.keys()+["base"])):
if token.data == "base":
@@ -138,7 +193,6 @@
form_parser_class=None,
request_class=None,
backwards_compat=False,
- encoding="latin-1", # deprecated
):
import ClientForm
self.select_default = select_default
@@ -149,14 +203,18 @@
request_class = ClientCookie.Request
self.request_class = request_class
self.backwards_compat = backwards_compat
+ self._response = None
+ self.encoding = None
+
+ def set_response(self, response, encoding):
+ self._response = response
self.encoding = encoding
- def parse_response(self, response, encoding=None):
+ def forms(self):
import ClientForm
- if encoding is None:
- encoding = self.encoding
+ encoding = self.encoding
return ClientForm.ParseResponse(
- response,
+ self._response,
select_default=self.select_default,
form_parser_class=self.form_parser_class,
request_class=self.request_class,
@@ -164,29 +222,24 @@
encoding=encoding,
)
- def parse_file(self, file_obj, base_url, encoding=None):
- import ClientForm
- if encoding is None:
- encoding = self.encoding
- return ClientForm.ParseFile(
- file_obj,
- base_url,
- select_default=self.select_default,
- form_parser_class=self.form_parser_class,
- request_class=self.request_class,
- backwards_compat=self.backwards_compat,
- encoding=encoding,
- )
+class TitleFactory:
+ def __init__(self):
+ self._response = self._encoding = None
-def pp_get_title(response, encoding):
- import pullparser
- p = pullparser.TolerantPullParser(response, encoding=encoding)
- try:
- p.get_tag("title")
- except pullparser.NoMoreTokensError:
- return None
- else:
- return p.get_text()
+ def set_response(self, response, encoding):
+ self._response = response
+ self._encoding = encoding
+
+ def title(self):
+ import pullparser
+ p = pullparser.TolerantPullParser(
+ self._response, encoding=self._encoding)
+ try:
+ p.get_tag("title")
+ except pullparser.NoMoreTokensError:
+ return None
+ else:
+ return p.get_text()
def unescape(data, entities, encoding):
@@ -252,6 +305,13 @@
sgmllib.charref = re.compile("&#(x?[0-9a-fA-F]+)[^0-9a-fA-F]")
class MechanizeBs(BeautifulSoup.BeautifulSoup):
_entitydefs = get_entitydefs()
+ # don't want the magic Microsoft-char workaround
+ PARSER_MASSAGE = [(re.compile('(<[^<>]*)/>'),
+ lambda(x):x.group(1) + ' />'),
+ (re.compile('<!\s+([^<>]*)>'),
+ lambda(x):'<!' + x.group(1) + '>')
+ ]
+
def __init__(self, encoding, text=None, avoidParserProblems=True,
initialTextIsEverything=True):
self._encoding = encoding
@@ -293,11 +353,20 @@
"iframe": "src",
}
self.urltags = urltags
+ self._bs = None
+ self._encoding = None
+ self._base_url = None
+
+ def set_soup(self, soup, base_url, encoding):
+ self._bs = soup
+ self._base_url = base_url
+ self._encoding = encoding
- def links(self, fh, base_url, encoding=None):
+ def links(self):
import BeautifulSoup
- data = fh.read()
- bs = self.link_parser_class(encoding, data)
+ bs = self._bs
+ base_url = self._base_url
+ encoding = self._encoding
gen = bs.recursiveChildGenerator()
for ch in bs.recursiveChildGenerator():
if (isinstance(ch, BeautifulSoup.Tag) and
@@ -333,34 +402,73 @@
args.form_parser_class = ClientForm.RobustFormParser
FormsFactory.__init__(self, **args.dictionary)
-def bs_get_title(response, encoding):
- import BeautifulSoup
- # XXXX encoding
- bs = BeautifulSoup.BeautifulSoup(response.read())
- title = bs.first("title")
- if title == BeautifulSoup.Null:
- return None
- else:
- return title.firstText(lambda t: True)
+ def set_response(self, response, encoding):
+ self._response = response
+ self.encoding = encoding
+
+
+class RobustTitleFactory:
+ def __init__(self):
+ self._bs = self._encoding = None
+
+ def set_soup(self, soup, encoding):
+ self._bs = soup
+ self._encoding = encoding
+
+ def title(soup):
+ import BeautifulSoup
+ title = self._bs.first("title")
+ if title == BeautifulSoup.Null:
+ return None
+ else:
+ return title.firstText(lambda t: True)
class Factory:
"""Factory for forms, links, etc.
- The interface of this class may expand in future.
+ This interface may expand in future.
+
+ Public methods:
+
+ set_request_class(request_class)
+ set_response(response)
+ forms()
+ links()
+
+ Public attributes:
+
+ encoding: string specifying the encoding of response if it contains a text
+ document (this value is left unspecified for documents that do not have
+ an encoding, e.g. an image file)
+ is_html: true if response contains an HTML document (XHTML may be
+ regarded as HTML too)
+ title: page title, or None if no title or not HTML
"""
- def __init__(self, forms_factory, links_factory, get_title):
+ def __init__(self, forms_factory, links_factory, title_factory,
+ get_encoding=encoding_finder(DEFAULT_ENCODING),
+ is_html_p=make_is_html(allow_xhtml=False),
+ ):
"""
- Pass keyword
- arguments only.
+ Pass keyword arguments only.
+
+ default_encoding: character encoding to use if encoding cannot be
+ determined (or guessed) from the response. You should turn on
+ HTTP-EQUIV handling if you want the best chance of getting this right
+ without resorting to this default. The default value of this
+ parameter (currently latin-1) may change in future.
"""
self._forms_factory = forms_factory
self._links_factory = links_factory
- self._get_title = get_title
+ self._title_factory = title_factory
+ self._get_encoding = get_encoding
+ self._is_html_p = is_html_p
+
+ self.set_response(None)
def set_request_class(self, request_class):
"""Set urllib2.Request class.
@@ -371,30 +479,100 @@
"""
self._forms_factory.request_class = request_class
- def forms(self, response, encoding):
+ def set_response(self, response):
+ """Set response.
+
+ The response must implement the same interface as objects returned by
+ urllib2.urlopen().
+
+ """
+ self._response = response
+ self._forms_genf = self._links_genf = None
+ self._get_title = None
+ for name in ["encoding", "is_html", "title"]:
+ try:
+ delattr(self, name)
+ except AttributeError:
+ pass
+
+ def __getattr__(self, name):
+ if name not in ["encoding", "is_html", "title"]:
+ return getattr(self.__class__, name)
+
+ try:
+ if name == "encoding":
+ self.encoding = self._get_encoding(self._response)
+ return self.encoding
+ elif name == "is_html":
+ self.is_html = self._is_html_p(self._response, self.encoding)
+ return self.is_html
+ elif name == "title":
+ if self.is_html:
+ self.title = self._title_factory.title()
+ else:
+ self.title = None
+ return self.title
+ finally:
+ self._response.seek(0)
+
+ def forms(self):
"""Return iterable over ClientForm.HTMLForm-like objects."""
- return self._forms_factory.parse_response(response, encoding)
+ if self._forms_genf is None:
+ self._forms_genf = CachingGeneratorFunction(
+ self._forms_factory.forms())
+ return self._forms_genf()
- def links(self, response, encoding):
+ def links(self):
"""Return iterable over mechanize.Link-like objects."""
- return self._links_factory.links(response, response.geturl(), encoding)
-
- def title(self, response, encoding):
- """Return page title."""
- return self._get_title(response, encoding)
+ if self._links_genf is None:
+ self._links_genf = CachingGeneratorFunction(
+ self._links_factory.links())
+ return self._links_genf()
class DefaultFactory(Factory):
- def __init__(self):
- Factory.__init__(self,
- forms_factory=FormsFactory(),
- links_factory=LinksFactory(),
- get_title=pp_get_title,
- )
+ """Based on sgmllib."""
+ def __init__(self, i_want_broken_xhtml_support=False):
+ Factory.__init__(
+ self,
+ forms_factory=FormsFactory(),
+ links_factory=LinksFactory(),
+ title_factory=TitleFactory(),
+ is_html_p=make_is_html(allow_xhtml=i_want_broken_xhtml_support),
+ )
+
+ def set_response(self, response):
+ Factory.set_response(self, response)
+ if response is not None:
+ self._forms_factory.set_response(
+ copy.copy(response), self.encoding)
+ self._links_factory.set_response(
+ copy.copy(response), self._response.geturl(), self.encoding)
+ self._title_factory.set_response(
+ copy.copy(response), self.encoding)
class RobustFactory(Factory):
- def __init__(self):
- Factory.__init__(self,
- forms_factory=RobustFormsFactory(),
- links_factory=RobustLinksFactory(),
- get_title=bs_get_title,
- )
+ """Based on BeautifulSoup, hopefully a bit more robust to bad HTML than is
+ DefaultFactory.
+
+ """
+ def __init__(self, i_want_broken_xhtml_support=False,
+ soup_class=MechanizeBs):
+ Factory.__init__(
+ self,
+ forms_factory=RobustFormsFactory(),
+ links_factory=RobustLinksFactory(),
+ title_factory=RobustTitleFactory(),
+ is_html_p=make_is_html(allow_xhtml=i_want_broken_xhtml_support),
+ )
+ self._soup_class = soup_class
+
+ def set_response(self, response):
+ import BeautifulSoup
+ Factory.set_response(self, response)
+ if response is not None:
+ data = response.read()
+ soup = self._soup_class(self.encoding, data)
+ self._forms_factory.set_response(response, self.encoding)
+ self._links_factory.set_soup(
+ soup, response.geturl(), self.encoding)
+ self._title_factory.set_soup(soup, self.encoding)
Modified: wwwsearch/mechanize/trunk/mechanize/_mechanize.py
==============================================================================
--- wwwsearch/mechanize/trunk/mechanize/_mechanize.py (original)
+++ wwwsearch/mechanize/trunk/mechanize/_mechanize.py Wed May 3 01:46:58 2006
@@ -3,24 +3,26 @@
Copyright 2003-2006 John J. Lee <jjl at pobox.com>
Copyright 2003 Andy Lester (original Perl code)
-This code is free software; you can redistribute it and/or modify it under
-the terms of the BSD License (see the file COPYING included with the
-distribution).
+This code is free software; you can redistribute it and/or modify it
+under the terms of the BSD or ZPL 2.1 licenses (see the file COPYING.txt
+included with the distribution).
"""
# XXXX
+# spaces in URLs
+# clean_url(): test Moz behaviour against Apache rather than File->Open!
# test referer bugs (frags and don't add in redirect unless orig req had Referer)
# XXX
# The stuff on web page's todo list.
# Moof's emails about response object, .back(), etc.
+from __future__ import generators
+
import urllib2, urlparse, sys, copy
import ClientCookie
-from ClientCookie._Util import response_seek_wrapper
-from ClientCookie._HeadersUtil import split_header_words, is_html
from _useragent import UserAgent
from _html import DefaultFactory
@@ -76,66 +78,47 @@
request: current request (ClientCookie.Request or urllib2.Request)
form: currently selected form (see .select_form())
- default_encoding: character encoding used if no encoding is found in the
- response (you should turn on HTTP-EQUIV handling if you want the best
- chance of getting this right without resorting to this default)
"""
- def __init__(self, default_encoding="latin-1",
+ def __init__(self,
factory=None,
history=None,
request_class=None,
- i_want_broken_xhtml_support=False,
- forms_factory=None, # deprecated
- links_factory=None, # deprecated
- get_title=None, # deprecated
):
"""
Only named arguments should be passed to this constructor.
- default_encoding: See class docs.
+ factory: object implementing the mechanize.Factory interface.
+ history: object implementing the mechanize.History interface. Note this
+ interface is still experimental and may change in future.
request_class: Request class to use. Defaults to ClientCookie.Request
by default for Pythons older than 2.4, urllib2.Request otherwise.
- factory: mechanize.Factory
+
+ The Factory and History objects passed in are 'owned' by the Browser,
+ so they should not be shared across Browsers. In particular,
+ factory.set_response() should not be called except by the owning
+ Browser itself.
Note that the supplied factory's request_class is overridden by this
constructor, to ensure only one Request class is used.
-
- Deprecated arguments:
-
- forms_factory: Object supporting the mechanize.FormsFactory interface.
- links_factory: Object supporting the mechanize.LinksFactory interface.
- get_title: callable taking a response object and an encoding string,
- and returning the page title.
-
"""
- self.default_encoding = default_encoding
- self._allow_xhtml = i_want_broken_xhtml_support
if history is None:
history = History()
self._history = history
self.request = self._response = None
self.form = None
- self._forms = None
- self._links = None
- self._title = None
if request_class is None:
if not hasattr(urllib2.Request, "add_unredirected_header"):
request_class = ClientCookie.Request
else:
- request_class = urllib2.Request # Python 2.4
+ request_class = urllib2.Request # Python >= 2.4
if factory is None:
- if (forms_factory is None and
- links_factory is None and
- get_title is None):
- factory = DefaultFactory()
- else:
- factory = Factory(forms_factory, links_factory, get_title)
+ factory = DefaultFactory()
factory.set_request_class(request_class)
self._factory = factory
self.request_class = request_class
@@ -149,7 +132,6 @@
if self._history is not None:
self._history.close()
self._history = None
- self._forms = self._title = self._links = None
self.request = self._response = None
def open(self, url, data=None):
@@ -180,10 +162,10 @@
success = True
try:
- self._response = UserAgent.open(self, self.request, data)
+ response = UserAgent.open(self, self.request, data)
except urllib2.HTTPError, error:
success = False
- self._response = error
+ response = error
## except (IOError, socket.error, OSError), error:
## # Yes, urllib2 really does raise all these :-((
## # See test_urllib2.py in stdlib and in ClientCookie for examples
@@ -196,9 +178,7 @@
## # Python core, a fix would need some backwards-compat. hack to be
## # acceptable.
## raise
- if not hasattr(self._response, "seek"):
- self._response = response_seek_wrapper(self._response)
- self._parse_html(self._response)
+ self.set_response(response)
if not success:
raise error
return copy.copy(self._response)
@@ -213,11 +193,44 @@
return copy.copy(self._response)
def set_response(self, response):
- """Replace current response with response."""
+ """Replace current response with (a copy of) response."""
+ from ClientCookie._Util import closeable_response
+ # sanity check, necessary but far from sufficient
+ if not (hasattr(response, "info") and hasattr(response, "geturl") and
+ hasattr(response, "read")):
+ raise ValueError("not a response object")
+
+ self.form = None
+
+ # XXX bleah!!
+
+ if not hasattr(response, 'closeable_response'):
+ # we expect to get here if a urllib2 handler constructed the
+ # response, i.e. the response is an urllib.addinfourl, instead of a
+ # ClientCookie._Util.closeable_response as returned by
+ # e.g. ClientCookie.HTTPHandler
+ try:
+ code = response.code
+ except AttributeError:
+ code = None
+ try:
+ msg = response.msg
+ except AttributeError:
+ msg = None
+ # assume response has an .fp attribute, the socket fileobject
+ # (i.e. is an urllib.addinfourl, really).
+ response = closeable_response(
+ response.fp, response.info(), response.geturl(), code, msg)
if not hasattr(response, "seek"):
- response = response_seek_wrapper(self._response)
+ response = ClientCookie.response_seek_wrapper(response)
+ # 0) don't want to copy here, but
+ # 1) don't want to copy some of the time and not other times
+ # 2) need response to be .close()able and .seek()able
+ # 3) 2) and 1) imply must always be copy.copy()ed
+ response = copy.copy(response)
+
self._response = response
- self._parse_html(self._response)
+ self._factory.set_response(self._response)
def geturl(self):
"""Get URL of current document."""
@@ -241,9 +254,9 @@
"""
if self._response is not None:
self._response.close()
- self.request, self._response = self._history.back(n, self._response)
- self._parse_html(self._response)
- return self._response
+ self.request, response = self._history.back(n, self._response)
+ self.set_response(response)
+ return response
def clear_history(self):
self._history.clear()
@@ -252,31 +265,11 @@
"""Return iterable over links (mechanize.Link objects)."""
if not self.viewing_html():
raise BrowserStateError("not viewing HTML")
+ links = self._factory.links()
if kwds:
- return self._find_links(False, **kwds)
- if self._links is None:
- try:
- self._links = list(self.get_links_iter())
- finally:
- self._response.seek(0)
- return self._links
-
- def get_links_iter(self):
- """Return an iterator that provides links of the document.
-
- This method is provided in addition to .links() to allow lazy iteration
- over links, while still keeping .links() safe against somebody
- .seek()ing on a response "behind your back". When response objects are
- fixed to have independent seek positions, this method will be
- deprecated in favour of .links().
-
- """
- if not self.viewing_html():
- raise BrowserStateError("not viewing HTML")
- base_url = self._response.geturl()
- self._response.seek(0)
- return self._factory.links(
- self._response, self.encoding(self._response))
+ return self._filter_links(links, **kwds)
+ else:
+ return links
def forms(self):
"""Return iterable over forms.
@@ -286,33 +279,19 @@
"""
if not self.viewing_html():
raise BrowserStateError("not viewing HTML")
- if self._forms is None:
- response = self._response
- response.seek(0)
- try:
- self._forms = self._factory.forms(
- response, self.encoding(self._response))
- finally:
- response.seek(0)
- return self._forms
+ return self._factory.forms()
def viewing_html(self):
"""Return whether the current response contains HTML data."""
if self._response is None:
raise BrowserStateError("not viewing any document")
- ct_hdrs = self._response.info().getheaders("content-type")
- url = self._response.geturl()
- return is_html(ct_hdrs, url, self._allow_xhtml)
-
- def encoding(self, response):
- # HTTPEquivProcessor may be in use, so both HTTP and HTTP-EQUIV
- # headers may be in the response. HTTP-EQUIV headers come last,
- # so try in order from first to last.
- for ct in response.info().getheaders("content-type"):
- for k, v in split_header_words([ct])[0]:
- if k == "charset":
- return v
- return self.default_encoding
+ return self._factory.is_html
+
+ def encoding(self):
+ """"""
+ if self._response is None:
+ raise BrowserStateError("not viewing any document")
+ return self._factory.encoding
def title(self):
"""Return title, or None if there is no title element in the document.
@@ -323,10 +302,7 @@
"""
if not self.viewing_html():
raise BrowserStateError("not viewing HTML")
- if self._title is None:
- self._title = self._factory.title(
- self._response, self.encoding(self._response))
- return self._title
+ return self._factory.title
def select_form(self, name=None, predicate=None, nr=None):
"""Select an HTML form for input.
@@ -489,7 +465,10 @@
nr: matches the nth link that matches all other criteria (default 0)
"""
- return self._find_links(True, **kwds)
+ try:
+ return self._filter_links(self._factory.links(), **kwds).next()
+ except StopIteration:
+ raise LinkNotFoundError()
def __getattr__(self, name):
# pass through ClientForm / DOMForm methods and attributes
@@ -503,7 +482,7 @@
#---------------------------------------------------
# Private methods.
- def _find_links(self, single,
+ def _filter_links(self, links,
text=None, text_regex=None,
name=None, name_regex=None,
url=None, url_regex=None,
@@ -517,19 +496,7 @@
found_links = []
orig_nr = nr
- # An optimization, so that if we look for a single link we do not have
- # to necessarily parse the entire file.
- if self._links is None and single:
- all_links = self.get_links_iter()
- else:
- if self._links is None:
- try:
- self._links = list(self.get_links_iter())
- finally:
- self._response.seek(0)
- all_links = self._links
-
- for link in all_links:
+ for link in links:
if url is not None and url != link.url:
continue
if url_regex is not None and not url_regex.search(link.url):
@@ -553,18 +520,5 @@
if nr:
nr -= 1
continue
- if single:
- return link
- else:
- found_links.append(link)
- nr = orig_nr
- if not found_links:
- raise LinkNotFoundError()
- return found_links
-
- def _parse_html(self, response):
- # this is now lazy, so we just reset the various attributes that
- # result from parsing
- self.form = None
- self._title = None
- self._forms = self._links = None
+ yield link
+ nr = orig_nr
Modified: wwwsearch/mechanize/trunk/mechanize/_useragent.py
==============================================================================
--- wwwsearch/mechanize/trunk/mechanize/_useragent.py (original)
+++ wwwsearch/mechanize/trunk/mechanize/_useragent.py Wed May 3 01:46:58 2006
@@ -6,13 +6,13 @@
Copyright 2003-2006 John J. Lee <jjl at pobox.com>
This code is free software; you can redistribute it and/or modify it under
-the terms of the BSD License (see the file COPYING included with the
-distribution).
+the terms of the BSD or ZPL 2.1 licenses (see the file COPYING.txt
+included with the distribution).
"""
import sys
-import urllib2, httplib
+import urllib2
import ClientCookie
if sys.version_info[:2] >= (2, 4):
from urllib2 import OpenerDirector, BaseHandler, HTTPErrorProcessor
@@ -34,6 +34,51 @@
https_request = http_request
+class HTTPProxyPasswordMgr(urllib2.HTTPPasswordMgr):
+ # has default realm and host/port
+ def add_password(self, realm, uri, user, passwd):
+ # uri could be a single URI or a sequence
+ if uri is None or isinstance(uri, basestring):
+ uris = [uri]
+ else:
+ uris = uri
+ passwd_by_domain = self.passwd.setdefault(realm, {})
+ for uri in uris:
+ uri = self.reduce_uri(uri)
+ passwd_by_domain[uri] = (user, passwd)
+
+ def find_user_password(self, realm, authuri):
+ perms = [(realm, authuri), (None, authuri)]
+ # bleh, want default realm to take precedence over default
+ # URI/authority, hence this outer loop
+ for default_uri in False, True:
+ for realm, authuri in perms:
+ authinfo_by_domain = self.passwd.get(realm, {})
+ reduced_authuri = self.reduce_uri(authuri)
+ for uri, authinfo in authinfo_by_domain.iteritems():
+ if uri is None and not default_uri:
+ continue
+ if self.is_suburi(uri, reduced_authuri):
+ return authinfo
+ user, password = None, None
+
+ if user is not None:
+ break
+ return user, password
+
+ def reduce_uri(self, uri):
+ if uri is None:
+ return None
+ return urllib2.HTTPPasswordMgr.reduce_uri(self, uri)
+
+ def is_suburi(self, base, test):
+ if base is None:
+ # default to the proxy's host/port
+ hostport, path = test
+ base = (hostport, "/")
+ return urllib2.HTTPPasswordMgr.is_suburi(self, base, test)
+
+
class UserAgent(OpenerDirector):
"""Convenient user-agent class.
@@ -58,7 +103,6 @@
"ftp": urllib2.FTPHandler, # CacheFTPHandler is buggy in 2.3
"file": urllib2.FileHandler,
"gopher": urllib2.GopherHandler,
- # XXX etc.
# other handlers
"_unknown": urllib2.UnknownHandler,
@@ -68,8 +112,8 @@
"_http_default_error": urllib2.HTTPDefaultErrorHandler,
# feature handlers
- "_authen": urllib2.HTTPBasicAuthHandler,
- # XXX rest of authentication stuff
+ "_basicauth": urllib2.HTTPBasicAuthHandler,
+ "_digestauth": urllib2.HTTPBasicAuthHandler,
"_redirect": ClientCookie.HTTPRedirectHandler,
"_cookies": ClientCookie.HTTPCookieProcessor,
"_refresh": ClientCookie.HTTPRefreshProcessor,
@@ -77,7 +121,8 @@
"_equiv": ClientCookie.HTTPEquivProcessor,
"_seek": ClientCookie.SeekableProcessor,
"_proxy": urllib2.ProxyHandler,
- # XXX there's more to proxies, too
+ "_proxy_basicauth": urllib2.ProxyBasicAuthHandler,
+ "_proxy_digestauth": urllib2.ProxyDigestAuthHandler,
# debug handlers
"_debug_redirect": ClientCookie.HTTPRedirectDebugProcessor,
@@ -86,10 +131,15 @@
default_schemes = ["http", "ftp", "file", "gopher"]
default_others = ["_unknown", "_http_error", "_http_request_upgrade",
- "_http_default_error"]
- default_features = ["_authen", "_redirect", "_cookies", "_refresh",
- "_referer", "_equiv", "_seek", "_proxy"]
- if hasattr(httplib, 'HTTPS'):
+ "_http_default_error",
+ ]
+ default_features = ["_redirect", "_cookies", "_referer",
+ "_refresh", "_equiv",
+ "_basicauth", "_digestauth",
+ "_proxy", "_proxy_basicauth", "_proxy_digestauth",
+ "_seek",
+ ]
+ if hasattr(ClientCookie, 'HTTPSHandler'):
handler_classes["https"] = ClientCookie.HTTPSHandler
default_schemes.append("https")
if hasattr(ClientCookie, "HTTPRobotRulesProcessor"):
@@ -99,21 +149,31 @@
def __init__(self):
OpenerDirector.__init__(self)
- self._ua_handlers = {}
+ ua_handlers = self._ua_handlers = {}
for scheme in (self.default_schemes+
self.default_others+
self.default_features):
klass = self.handler_classes[scheme]
- self._ua_handlers[scheme] = klass()
- for handler in self._ua_handlers.itervalues():
+ ua_handlers[scheme] = klass()
+ for handler in ua_handlers.itervalues():
self.add_handler(handler)
+ # Yuck.
# Ensure correct default constructor args were passed to
- # HTTPRefererProcessor and HTTPEquivProcessor. Yuck.
- if '_refresh' in self._ua_handlers:
+ # HTTPRefererProcessor and HTTPEquivProcessor.
+ if "_refresh" in ua_handlers:
self.set_handle_refresh(True)
- if '_equiv' in self._ua_handlers:
+ if "_equiv" in ua_handlers:
self.set_handle_equiv(True)
+ # Ensure default password managers are installed.
+ pm = ppm = None
+ if "_basicauth" in ua_handlers or "_digestauth" in ua_handlers:
+ pm = urllib2.HTTPPasswordMgrWithDefaultRealm()
+ if ("_proxy_basicauth" in ua_handlers or
+ "_proxy_digestauth" in ua_handlers):
+ ppm = HTTPProxyPasswordMgr()
+ self.set_password_manager(pm)
+ self.set_proxy_password_manager(ppm)
# special case, requires extra support from mechanize.Browser
self._handle_referer = True
@@ -132,17 +192,20 @@
## self._ftp_conn_cache = conn_cache
def set_handled_schemes(self, schemes):
- """Set sequence of protocol scheme strings.
+ """Set sequence of URL scheme (protocol) strings.
+
+ For example: ua.set_handled_schemes(["http", "ftp"])
If this fails (with ValueError) because you've passed an unknown
- scheme, the set of handled schemes WILL be updated, but schemes in the
- list that come after the unknown scheme won't be handled.
+ scheme, the set of handled schemes will not be changed.
"""
want = {}
for scheme in schemes:
if scheme.startswith("_"):
- raise ValueError("invalid scheme '%s'" % scheme)
+ raise ValueError("not a scheme '%s'" % scheme)
+ if scheme not in self.handler_classes:
+ raise ValueError("unknown scheme '%s'")
want[scheme] = None
# get rid of scheme handlers we don't want
@@ -154,8 +217,6 @@
del want[scheme] # already got it
# add the scheme handlers that are missing
for scheme in want.keys():
- if scheme not in self.handler_classes:
- raise ValueError("unknown scheme '%s'")
self._set_handler(scheme, True)
def _add_referer_header(self, request, origin_request=True):
@@ -165,10 +226,36 @@
def set_cookiejar(self, cookiejar):
"""Set a ClientCookie.CookieJar, or None."""
self._set_handler("_cookies", obj=cookiejar)
- def set_credentials(self, credentials):
- """Set a urllib2.HTTPPasswordMgr, or None."""
- # XXX use Greg Stein's httpx instead?
- self._set_handler("_authen", obj=credentials)
+
+ # XXX could use Greg Stein's httpx for some of this instead?
+ # or httplib2??
+ def set_proxies(self, proxies):
+ """Set a dictionary mapping URL scheme to proxy specification, or None.
+
+ e.g. {'http': 'myproxy.example.com',
+ 'ftp': 'joe:password at proxy.example.com:8080'}
+
+ """
+ self._set_handler("_proxy", obj=proxies)
+
+ def add_password(self, url, user, password, realm=None):
+ self._password_manager.add_password(realm, url, user, password)
+ def add_proxy_password(self, user, password, hostport=None, realm=None):
+ self._proxy_password_manager.add_password(
+ realm, hostport, user, password)
+
+ # the following are rarely useful -- use add_password / add_proxy_password
+ # instead
+ def set_password_manager(self, password_manager):
+ """Set a urllib2.HTTPPasswordMgrWithDefaultRealm, or None."""
+ self._password_manager = password_manager
+ self._set_handler("_basicauth", obj=password_manager)
+ self._set_handler("_digestauth", obj=password_manager)
+ def set_proxy_password_manager(self, password_manager):
+ """Set a mechanize.HTTPProxyPasswordMgr, or None."""
+ self._proxy_password_manager = password_manager
+ self._set_handler("_proxy_basicauth", obj=password_manager)
+ self._set_handler("_proxy_digestauth", obj=password_manager)
# these methods all take a boolean parameter
def set_handle_robots(self, handle):
Modified: wwwsearch/mechanize/trunk/setup.py
==============================================================================
--- wwwsearch/mechanize/trunk/setup.py (original)
+++ wwwsearch/mechanize/trunk/setup.py Wed May 3 01:46:58 2006
@@ -45,7 +45,7 @@
"pullparser>=0.0.8.dev_r21645, ==dev"]
NAME = "mechanize"
PACKAGE = True
-LICENSE = "BSD"
+LICENSE = "BSD" # or ZPL 2.1
PLATFORMS = ["any"]
ZIP_SAFE = True
CLASSIFIERS = """\
@@ -53,6 +53,7 @@
Intended Audience :: Developers
Intended Audience :: System Administrators
License :: OSI Approved :: BSD License
+License :: OSI Approved :: Zope Public License
Natural Language :: English
Operating System :: OS Independent
Programming Language :: Python
Modified: wwwsearch/mechanize/trunk/test.py
==============================================================================
--- wwwsearch/mechanize/trunk/test.py (original)
+++ wwwsearch/mechanize/trunk/test.py Wed May 3 01:46:58 2006
@@ -1,5 +1,7 @@
#!/usr/bin/env python
+from __future__ import generators
+
import sys, random
from unittest import TestCase
import StringIO, re, UserDict, urllib2
@@ -17,6 +19,144 @@
FACTORY_CLASSES.append(mechanize.RobustFactory)
+def test_password_manager(self):
+ """
+ >>> mgr = mechanize.HTTPProxyPasswordMgr()
+ >>> add = mgr.add_password
+
+ >>> add("Some Realm", "http://example.com/", "joe", "password")
+ >>> add("Some Realm", "http://example.com/ni", "ni", "ni")
+ >>> add("c", "http://example.com/foo", "foo", "ni")
+ >>> add("c", "http://example.com/bar", "bar", "nini")
+ >>> add("b", "http://example.com/", "first", "blah")
+ >>> add("b", "http://example.com/", "second", "spam")
+ >>> add("a", "http://example.com", "1", "a")
+ >>> add("Some Realm", "http://c.example.com:3128", "3", "c")
+ >>> add("Some Realm", "d.example.com", "4", "d")
+ >>> add("Some Realm", "e.example.com:3128", "5", "e")
+
+ >>> mgr.find_user_password("Some Realm", "example.com")
+ ('joe', 'password')
+ >>> mgr.find_user_password("Some Realm", "http://example.com")
+ ('joe', 'password')
+ >>> mgr.find_user_password("Some Realm", "http://example.com/")
+ ('joe', 'password')
+ >>> mgr.find_user_password("Some Realm", "http://example.com/spam")
+ ('joe', 'password')
+ >>> mgr.find_user_password("Some Realm", "http://example.com/spam/spam")
+ ('joe', 'password')
+ >>> mgr.find_user_password("c", "http://example.com/foo")
+ ('foo', 'ni')
+ >>> mgr.find_user_password("c", "http://example.com/bar")
+ ('bar', 'nini')
+
+ Currently, we use the highest-level path where more than one match:
+
+ >>> mgr.find_user_password("Some Realm", "http://example.com/ni")
+ ('joe', 'password')
+
+ Use latest add_password() in case of conflict:
+
+ >>> mgr.find_user_password("b", "http://example.com/")
+ ('second', 'spam')
+
+ No special relationship between a.example.com and example.com:
+
+ >>> mgr.find_user_password("a", "http://example.com/")
+ ('1', 'a')
+ >>> mgr.find_user_password("a", "http://a.example.com/")
+ (None, None)
+
+ Ports:
+
+ >>> mgr.find_user_password("Some Realm", "c.example.com")
+ (None, None)
+ >>> mgr.find_user_password("Some Realm", "c.example.com:3128")
+ ('3', 'c')
+ >>> mgr.find_user_password("Some Realm", "http://c.example.com:3128")
+ ('3', 'c')
+ >>> mgr.find_user_password("Some Realm", "d.example.com")
+ ('4', 'd')
+ >>> mgr.find_user_password("Some Realm", "e.example.com:3128")
+ ('5', 'e')
+
+
+ Now features specific to HTTPProxyPasswordMgr.
+
+ Default realm:
+
+ >>> mgr.find_user_password("d", "f.example.com")
+ (None, None)
+ >>> add(None, "f.example.com", "6", "f")
+ >>> mgr.find_user_password("d", "f.example.com")
+ ('6', 'f')
+
+ Default host/port:
+
+ >>> mgr.find_user_password("e", "g.example.com")
+ (None, None)
+ >>> add("e", None, "7", "g")
+ >>> mgr.find_user_password("e", "g.example.com")
+ ('7', 'g')
+
+ Default realm and host/port:
+
+ >>> mgr.find_user_password("f", "h.example.com")
+ (None, None)
+ >>> add(None, None, "8", "h")
+ >>> mgr.find_user_password("f", "h.example.com")
+ ('8', 'h')
+
+ Default realm beats default host/port:
+
+ >>> add("d", None, "9", "i")
+ >>> mgr.find_user_password("d", "f.example.com")
+ ('6', 'f')
+
+ """
+ pass
+
+
+class CachingGeneratorFunctionTests(TestCase):
+
+ def _get_simple_cgenf(self, log):
+ from mechanize._html import CachingGeneratorFunction
+ todo = []
+ for ii in range(2):
+ def work(ii=ii):
+ log.append(ii)
+ return ii
+ todo.append(work)
+ def genf():
+ for a in todo:
+ yield a()
+ return CachingGeneratorFunction(genf())
+
+ def test_cache(self):
+ log = []
+ cgenf = self._get_simple_cgenf(log)
+ for repeat in range(2):
+ for ii, jj in zip(cgenf(), range(2)):
+ self.assertEqual(ii, jj)
+ self.assertEqual(log, range(2)) # work only done once
+
+ def test_interleaved(self):
+ log = []
+ cgenf = self._get_simple_cgenf(log)
+ cgen = cgenf()
+ self.assertEqual(cgen.next(), 0)
+ self.assertEqual(log, [0])
+ cgen2 = cgenf()
+ self.assertEqual(cgen2.next(), 0)
+ self.assertEqual(log, [0])
+ self.assertEqual(cgen.next(), 1)
+ self.assertEqual(log, [0, 1])
+ self.assertEqual(cgen2.next(), 1)
+ self.assertEqual(log, [0, 1])
+ self.assertRaises(StopIteration, cgen.next)
+ self.assertRaises(StopIteration, cgen2.next)
+
+
class UnescapeTests(TestCase):
def test_unescape_charref(self):
@@ -74,18 +214,19 @@
return self.data.values()
class MockResponse:
+ closeable_response = None
def __init__(self, url="http://example.com/", data=None, info=None):
self.url = url
- self._f = StringIO.StringIO(data)
+ self.fp = StringIO.StringIO(data)
if info is None: info = {}
self._info = MockHeaders(info)
self.source = "%d%d" % (id(self), random.randint(0, sys.maxint-1))
def info(self): return self._info
def geturl(self): return self.url
- def read(self, size=-1): return self._f.read(size)
+ def read(self, size=-1): return self.fp.read(size)
def seek(self, whence):
assert whence == 0
- self._f.seek(0)
+ self.fp.seek(0)
def close(self): pass
def __getstate__(self):
state = self.__dict__
@@ -199,11 +340,12 @@
import mechanize
from StringIO import StringIO
import urllib, mimetools
- # always take first encoding, since that's the one
+ # always take first encoding, since that's the one from the real HTTP
+ # headers, rather than from HTTP-EQUIV
b = mechanize.Browser()
- for s, ct in [("", b.default_encoding),
+ for s, ct in [("", mechanize._html.DEFAULT_ENCODING),
- ("Foo: Bar\r\n\r\n", b.default_encoding),
+ ("Foo: Bar\r\n\r\n", mechanize._html.DEFAULT_ENCODING),
("Content-Type: text/html; charset=UTF-8\r\n\r\n",
"UTF-8"),
@@ -214,7 +356,8 @@
]:
msg = mimetools.Message(StringIO(s))
r = urllib.addinfourl(StringIO(""), msg, "http://www.example.com/")
- self.assertEqual(b.encoding(r), ct)
+ b.set_response(r)
+ self.assertEqual(b.encoding(), ct)
def test_history(self):
import mechanize
@@ -281,7 +424,8 @@
("text/html; charset=blah", True),
(" text/html ; charset=ook ", True),
]:
- b = TestBrowser(i_want_broken_xhtml_support=allow_xhtml)
+ b = TestBrowser(mechanize.DefaultFactory(
+ i_want_broken_xhtml_support=allow_xhtml))
hdrs = {}
if ct is not None:
hdrs["Content-Type"] = ct
@@ -303,7 +447,8 @@
(".xml", False),
("", False),
]:
- b = TestBrowser(i_want_broken_xhtml_support=allow_xhtml)
+ b = TestBrowser(mechanize.DefaultFactory(
+ i_want_broken_xhtml_support=allow_xhtml))
url = "http://example.com/foo"+ext
b.add_handler(MockHandler(
[("http_open", MockResponse(url, "", {}))]))
@@ -378,7 +523,7 @@
b.add_handler(MockHandler([("http_open", r)]))
r = b.open(url)
- forms = b.forms()
+ forms = list(b.forms())
self.assertEqual(len(forms), 2)
for got, expect in zip([f.name for f in forms], [
"form1", "form2"]):
@@ -489,7 +634,7 @@
Link(url, "foo", None, "iframe",
[("src", "foo")]),
]
- links = b.links()
+ links = list(b.links())
self.assertEqual(len(links), len(exp_links))
for got, expect in zip(links, exp_links):
self.assertEqual(got, expect)
@@ -579,6 +724,7 @@
class ResponseTests(TestCase):
def test_set_response(self):
+ import copy
from ClientCookie import response_seek_wrapper
br = TestBrowser()
@@ -591,7 +737,8 @@
r = br.open(url)
self.assertEqual(r.read(), html)
r.seek(0)
- self.assertEqual(br.links()[0].url, "spam")
+ self.assertEqual(copy.copy(r).read(), html)
+ self.assertEqual(list(br.links())[0].url, "spam")
newhtml = """<html><body><a href="eggs">click me</a></body></html>"""
@@ -600,11 +747,12 @@
self.assertEqual(br.response().read(), html)
br.response().set_data(newhtml)
self.assertEqual(br.response().read(), html)
- self.assertEqual(br.links()[0].url, "spam")
+ self.assertEqual(list(br.links())[0].url, "spam")
+ r.seek(0)
br.set_response(r)
self.assertEqual(br.response().read(), newhtml)
- self.assertEqual(br.links()[0].url, "eggs")
+ self.assertEqual(list(br.links())[0].url, "eggs")
class UserAgentTests(TestCase):
@@ -653,5 +801,7 @@
ua._set_handler("_blah", True)
if __name__ == "__main__":
+ import doctest
+ doctest.testmod()
import unittest
unittest.main()
More information about the wwwsearch-commits
mailing list