[ftputil] Making path.walk go faster.
Ido Abramovich
idoa01 at yahoo.com
Wed Nov 30 13:26:08 CET 2005
Hi, First, thanks for ftputil, it saved me a bunch of
time in my current project :)
In my project, I'm using path.walk a lot (I'm
traversing over an ftp directory and deciding on each
file if it needs to be downloaded or not), but I found
that path.walk is a bit too slow for my needs.
So I started to poke around a bit with the source and
found that when you perform a walk you get the list of
the directory and then perform a stat on each file, so
you perform N=D+F network connections (D=number of
dirs, F=number of files). you could lower this number
to only D if you remember the result of the stat
operation in listdir and use it in walk.
I've done a small patch that adds this functionality
without changing anything else:
--- ftp_path.py 2005-11-30 12:59:47.896733885 +0200
+++ ftp_path.py.new 2005-11-30 12:52:35.442914494
+0200
@@ -156,11 +156,6 @@
return
func(arg, top, names)
for name in names:
- name = self.join(top, name)
- try:
- st = self._host.lstat(name)
- except OSError:
- continue
- if stat.S_ISDIR(st[stat.ST_MODE]):
- self.walk(name, func, arg)
+ if
stat.S_ISDIR(name.stat_result[stat.ST_MODE]):
+ self.walk(self.join(top,name), func,
arg)
--- ftp_stat.py 2005-11-30 12:59:51.773522353 +0200
+++ ftp_stat.py.new 2005-11-30 12:51:09.631739330
+0200
@@ -51,11 +51,18 @@
class __InheritanceTest(tuple):
pass
_StatResultBase = tuple
+ _StatStringBase = str
except TypeError:
# "base is not a class object"
- import UserTuple
+ import UserTuple, UserString
_StatResultBase = UserTuple.UserTuple
+ _StatStringBase = UserString.UserString
+class _StatString(_StatStringBase):
+ """
+ Support class resembling a string to include
metadata
+ """
+ stat_result = None
class _StatResult(_StatResultBase):
"""
@@ -118,8 +125,9 @@
for line in lines:
try:
stat_result = self.parse_line(line)
- st_name = stat_result._st_name
+ st_name =
_StatString(stat_result._st_name)
if st_name not in (self._host.curdir,
self._host.pardir):
+ st_name.stat_result = stat_result
names.append(st_name)
except ftp_error.ParserError:
# ignore things like ".", "..",
"total 17"
a few last notes:
1) the string receives an attribute of stat_result,
you might want to change it to an attribute of
"metadata" which is a dictionary (that holds a stat
object) - but I couldn't think of other metadata to
add there.
2) on a small check I did, this little hack is about 8
times faster than the current implementation.
Thanks,
Ido.
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
More information about the ftputil
mailing list