[ftputil] Making path.walk go faster.

Ido Abramovich idoa01 at yahoo.com
Wed Nov 30 13:26:08 CET 2005


Hi, First, thanks for ftputil, it saved me a bunch of
time in my current project :)

In my project, I'm using path.walk a lot (I'm
traversing over an ftp directory and deciding on each
file if it needs to be downloaded or not), but I found
that path.walk is a bit too slow for my needs.

So I started to poke around a bit with the source and
found that when you perform a walk you get the list of
the directory and then perform a stat on each file, so
you perform N=D+F network connections (D=number of
dirs, F=number of files). you could lower this number
to only D if you remember the result of the stat
operation in listdir and use it in walk.

I've done a small patch that adds this functionality
without changing anything else:

--- ftp_path.py 2005-11-30 12:59:47.896733885 +0200
+++ ftp_path.py.new     2005-11-30 12:52:35.442914494
+0200
@@ -156,11 +156,6 @@
             return
         func(arg, top, names)
         for name in names:
-            name = self.join(top, name)
-            try:
-                st = self._host.lstat(name)
-            except OSError:
-                continue
-            if stat.S_ISDIR(st[stat.ST_MODE]):
-                self.walk(name, func, arg)
+            if
stat.S_ISDIR(name.stat_result[stat.ST_MODE]):
+                self.walk(self.join(top,name), func,
arg)
  


--- ftp_stat.py 2005-11-30 12:59:51.773522353 +0200
+++ ftp_stat.py.new     2005-11-30 12:51:09.631739330
+0200
@@ -51,11 +51,18 @@
     class __InheritanceTest(tuple):
         pass
     _StatResultBase = tuple
+    _StatStringBase = str
 except TypeError:
     # "base is not a class object"
-    import UserTuple
+    import UserTuple, UserString
     _StatResultBase = UserTuple.UserTuple
+    _StatStringBase = UserString.UserString
  
+class _StatString(_StatStringBase):
+    """
+    Support class resembling a string to include
metadata
+    """
+    stat_result = None
  
 class _StatResult(_StatResultBase):
     """
@@ -118,8 +125,9 @@
         for line in lines:
             try:
                 stat_result = self.parse_line(line)
-                st_name = stat_result._st_name
+                st_name =
_StatString(stat_result._st_name)
                 if st_name not in (self._host.curdir,
self._host.pardir):
+                    st_name.stat_result = stat_result
                     names.append(st_name)
             except ftp_error.ParserError:
                 # ignore things like ".", "..",
"total 17"


a few last notes:
1) the string receives an attribute of stat_result,
you might want to change it to an attribute of
"metadata" which is a dictionary (that holds a stat
object) - but I couldn't think of other metadata to
add there.
2) on a small check I did, this little hack is about 8
times faster than the current implementation.

Thanks,
Ido.



	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


More information about the ftputil mailing list