[wwwsearch-commits] r27866 - in wwwsearch/mechanize/trunk: . mechanize
jjlee at codespeak.net
jjlee at codespeak.net
Mon May 29 18:24:29 CEST 2006
Author: jjlee
Date: Mon May 29 18:24:27 2006
New Revision: 27866
Modified:
wwwsearch/mechanize/trunk/0.1-changes.txt
wwwsearch/mechanize/trunk/README.html.in
wwwsearch/mechanize/trunk/mechanize/_html.py
Log:
Get rid of encoding_finder / make_is_html closures in favour of classes (to be friendly to pickle), and change corresponding c'tor arguments of mechanize.Factory to take instances of those classes instead
Modified: wwwsearch/mechanize/trunk/0.1-changes.txt
==============================================================================
--- wwwsearch/mechanize/trunk/0.1-changes.txt (original)
+++ wwwsearch/mechanize/trunk/0.1-changes.txt Mon May 29 18:24:27 2006
@@ -1,5 +1,9 @@
Recent public API changes:
+- Since 0.1.2b beta release: Factory now takes EncodingFinder and
+ ResponseTypeFinder class instances instead of functions (since
+ closures don't play well with module pickle).
+
- ClientCookie has been moved into the mechanize package and is no
longer a separate package. The ClientCookie interface is still
supported, but all names must be imported from module mechanize
Modified: wwwsearch/mechanize/trunk/README.html.in
==============================================================================
--- wwwsearch/mechanize/trunk/README.html.in (original)
+++ wwwsearch/mechanize/trunk/README.html.in Mon May 29 18:24:27 2006
@@ -283,23 +283,21 @@
<em>This is <strong>very</strong> roughly in order of priority</em>
<ul>
+ <li>Implement RFC 3986 URL absolutization.
<li>Test <code>.any_response()</code> two handlers case: ordering.
<li>Test referer bugs (frags and don't add in redirect unless orig
req had Referer)
- <li>Implement RFC 3986 URL absolutization.
<li>Strip fragments before retrieving URLs (this should probably be
considered a bug in urllib2).
<li>Proper XHTML support!
- <li>Make encoding_finder public, I guess (but probably improve it first).
- (For example: support Mark Pilgrim's universal encoding detector?)
- Use class, not closure (closures don't pickle).
- <li>Continue with the de-crufting enabled by requirement for Python 2.3.
<li>Fix BeautifulSoup support to use a single BeautifulSoup instance
per page.
<li>Test BeautifulSoup support better / fix encoding issue.
<li>Add another History implementation or two and finalise interface.
<li>History cache expiration.
<li>Investigate possible leak (see Balazs Ree's list posting).
+ <li>Make encoding_finder public, I guess (but probably improve it first).
+ (For example: support Mark Pilgrim's universal encoding detector?)
<li>Add two-way links between BeautifulSoup & ClientForm object models.
<li>In 0.2: fork urllib2 — easier maintenance.
<li>In 0.2: switch to Python unicode strings everywhere appropriate
Modified: wwwsearch/mechanize/trunk/mechanize/_html.py
==============================================================================
--- wwwsearch/mechanize/trunk/mechanize/_html.py (original)
+++ wwwsearch/mechanize/trunk/mechanize/_html.py Mon May 29 18:24:27 2006
@@ -47,8 +47,10 @@
cache.append(item)
yield item
-def encoding_finder(default_encoding):
- def encoding(response):
+class EncodingFinder:
+ def __init__(self, default_encoding):
+ self._default_encoding = default_encoding
+ def encoding(self, response):
# HTTPEquivProcessor may be in use, so both HTTP and HTTP-EQUIV
# headers may be in the response. HTTP-EQUIV headers come last,
# so try in order from first to last.
@@ -56,16 +58,16 @@
for k, v in split_header_words([ct])[0]:
if k == "charset":
return v
- return default_encoding
- return encoding
+ return self._default_encoding
-def make_is_html(allow_xhtml):
- def is_html(response, encoding):
+class ResponseTypeFinder:
+ def __init__(self, allow_xhtml):
+ self._allow_xhtml = allow_xhtml
+ def is_html(self, response, encoding):
ct_hdrs = response.info().getheaders("content-type")
url = response.geturl()
# XXX encoding
- return _is_html(ct_hdrs, url, allow_xhtml)
- return is_html
+ return _is_html(ct_hdrs, url, self._allow_xhtml)
# idea for this argument-processing trick is from Peter Otten
class Args:
@@ -437,8 +439,8 @@
"""
def __init__(self, forms_factory, links_factory, title_factory,
- get_encoding=encoding_finder(DEFAULT_ENCODING),
- is_html_p=make_is_html(allow_xhtml=False),
+ encoding_finder=EncodingFinder(DEFAULT_ENCODING),
+ response_type_finder=ResponseTypeFinder(allow_xhtml=False),
):
"""
@@ -454,8 +456,8 @@
self._forms_factory = forms_factory
self._links_factory = links_factory
self._title_factory = title_factory
- self._get_encoding = get_encoding
- self._is_html_p = is_html_p
+ self._encoding_finder = encoding_finder
+ self._response_type_finder = response_type_finder
self.set_response(None)
@@ -490,10 +492,11 @@
try:
if name == "encoding":
- self.encoding = self._get_encoding(self._response)
+ self.encoding = self._encoding_finder.encoding(self._response)
return self.encoding
elif name == "is_html":
- self.is_html = self._is_html_p(self._response, self.encoding)
+ self.is_html = self._response_type_finder.is_html(
+ self._response, self.encoding)
return self.is_html
elif name == "title":
if self.is_html:
@@ -526,7 +529,8 @@
forms_factory=FormsFactory(),
links_factory=LinksFactory(),
title_factory=TitleFactory(),
- is_html_p=make_is_html(allow_xhtml=i_want_broken_xhtml_support),
+ response_type_finder=ResponseTypeFinder(
+ allow_xhtml=i_want_broken_xhtml_support),
)
def set_response(self, response):
@@ -551,7 +555,8 @@
forms_factory=RobustFormsFactory(),
links_factory=RobustLinksFactory(),
title_factory=RobustTitleFactory(),
- is_html_p=make_is_html(allow_xhtml=i_want_broken_xhtml_support),
+ response_type_finder=ResponseTypeFinder(
+ allow_xhtml=i_want_broken_xhtml_support),
)
if soup_class is None:
soup_class = MechanizeBs
More information about the wwwsearch-commits
mailing list