[wwwsearch-commits] r27866 - in wwwsearch/mechanize/trunk: . mechanize

jjlee at codespeak.net jjlee at codespeak.net
Mon May 29 18:24:29 CEST 2006


Author: jjlee
Date: Mon May 29 18:24:27 2006
New Revision: 27866

Modified:
   wwwsearch/mechanize/trunk/0.1-changes.txt
   wwwsearch/mechanize/trunk/README.html.in
   wwwsearch/mechanize/trunk/mechanize/_html.py
Log:
Get rid of encoding_finder / make_is_html closures in favour of classes (to be friendly to pickle), and change corresponding c'tor arguments of mechanize.Factory to take instances of those classes instead

Modified: wwwsearch/mechanize/trunk/0.1-changes.txt
==============================================================================
--- wwwsearch/mechanize/trunk/0.1-changes.txt	(original)
+++ wwwsearch/mechanize/trunk/0.1-changes.txt	Mon May 29 18:24:27 2006
@@ -1,5 +1,9 @@
 Recent public API changes:
 
+- Since 0.1.2b beta release: Factory now takes EncodingFinder and
+  ResponseTypeFinder class instances instead of functions (since
+  closures don't play well with module pickle).
+
 - ClientCookie has been moved into the mechanize package and is no
   longer a separate package.  The ClientCookie interface is still
   supported, but all names must be imported from module mechanize

Modified: wwwsearch/mechanize/trunk/README.html.in
==============================================================================
--- wwwsearch/mechanize/trunk/README.html.in	(original)
+++ wwwsearch/mechanize/trunk/README.html.in	Mon May 29 18:24:27 2006
@@ -283,23 +283,21 @@
 <em>This is <strong>very</strong> roughly in order of priority</em>
 
 <ul>
+  <li>Implement RFC 3986 URL absolutization.
   <li>Test <code>.any_response()</code> two handlers case: ordering.
   <li>Test referer bugs (frags and don't add in redirect unless orig
     req had Referer)
-  <li>Implement RFC 3986 URL absolutization.
   <li>Strip fragments before retrieving URLs (this should probably be
     considered a bug in urllib2).
   <li>Proper XHTML support!
-  <li>Make encoding_finder public, I guess (but probably improve it first).
-    (For example: support Mark Pilgrim's universal encoding detector?)
-    Use class, not closure (closures don't pickle).
-  <li>Continue with the de-crufting enabled by requirement for Python 2.3.
   <li>Fix BeautifulSoup support to use a single BeautifulSoup instance
     per page.
   <li>Test BeautifulSoup support better / fix encoding issue.
   <li>Add another History implementation or two and finalise interface.
   <li>History cache expiration.
   <li>Investigate possible leak (see Balazs Ree's list posting).
+  <li>Make encoding_finder public, I guess (but probably improve it first).
+    (For example: support Mark Pilgrim's universal encoding detector?)
   <li>Add two-way links between BeautifulSoup & ClientForm object models.
   <li>In 0.2: fork urllib2 &#8212; easier maintenance.
   <li>In 0.2: switch to Python unicode strings everywhere appropriate

Modified: wwwsearch/mechanize/trunk/mechanize/_html.py
==============================================================================
--- wwwsearch/mechanize/trunk/mechanize/_html.py	(original)
+++ wwwsearch/mechanize/trunk/mechanize/_html.py	Mon May 29 18:24:27 2006
@@ -47,8 +47,10 @@
             cache.append(item)
             yield item
 
-def encoding_finder(default_encoding):
-    def encoding(response):
+class EncodingFinder:
+    def __init__(self, default_encoding):
+        self._default_encoding = default_encoding
+    def encoding(self, response):
         # HTTPEquivProcessor may be in use, so both HTTP and HTTP-EQUIV
         # headers may be in the response.  HTTP-EQUIV headers come last,
         # so try in order from first to last.
@@ -56,16 +58,16 @@
             for k, v in split_header_words([ct])[0]:
                 if k == "charset":
                     return v
-        return default_encoding
-    return encoding
+        return self._default_encoding
 
-def make_is_html(allow_xhtml):
-    def is_html(response, encoding):
+class ResponseTypeFinder:
+    def __init__(self, allow_xhtml):
+        self._allow_xhtml = allow_xhtml
+    def is_html(self, response, encoding):
         ct_hdrs = response.info().getheaders("content-type")
         url = response.geturl()
         # XXX encoding
-        return _is_html(ct_hdrs, url, allow_xhtml)
-    return is_html
+        return _is_html(ct_hdrs, url, self._allow_xhtml)
 
 # idea for this argument-processing trick is from Peter Otten
 class Args:
@@ -437,8 +439,8 @@
     """
 
     def __init__(self, forms_factory, links_factory, title_factory,
-                 get_encoding=encoding_finder(DEFAULT_ENCODING),
-                 is_html_p=make_is_html(allow_xhtml=False),
+                 encoding_finder=EncodingFinder(DEFAULT_ENCODING),
+                 response_type_finder=ResponseTypeFinder(allow_xhtml=False),
                  ):
         """
 
@@ -454,8 +456,8 @@
         self._forms_factory = forms_factory
         self._links_factory = links_factory
         self._title_factory = title_factory
-        self._get_encoding = get_encoding
-        self._is_html_p = is_html_p
+        self._encoding_finder = encoding_finder
+        self._response_type_finder = response_type_finder
 
         self.set_response(None)
 
@@ -490,10 +492,11 @@
 
         try:
             if name == "encoding":
-                self.encoding = self._get_encoding(self._response)
+                self.encoding = self._encoding_finder.encoding(self._response)
                 return self.encoding
             elif name == "is_html":
-                self.is_html = self._is_html_p(self._response, self.encoding)
+                self.is_html = self._response_type_finder.is_html(
+                    self._response, self.encoding)
                 return self.is_html
             elif name == "title":
                 if self.is_html:
@@ -526,7 +529,8 @@
             forms_factory=FormsFactory(),
             links_factory=LinksFactory(),
             title_factory=TitleFactory(),
-            is_html_p=make_is_html(allow_xhtml=i_want_broken_xhtml_support),
+            response_type_finder=ResponseTypeFinder(
+                allow_xhtml=i_want_broken_xhtml_support),
             )
 
     def set_response(self, response):
@@ -551,7 +555,8 @@
             forms_factory=RobustFormsFactory(),
             links_factory=RobustLinksFactory(),
             title_factory=RobustTitleFactory(),
-            is_html_p=make_is_html(allow_xhtml=i_want_broken_xhtml_support),
+            response_type_finder=ResponseTypeFinder(
+                allow_xhtml=i_want_broken_xhtml_support),
             )
         if soup_class is None:
             soup_class = MechanizeBs


More information about the wwwsearch-commits mailing list