[ftputil] Ftputil seems really slow to distinguish file and folders
Stefan Schwarzer
sschwarzer at sschwarzer.net
Fri May 1 10:52:54 CEST 2009
Hi Nicola,
On 2009-04-30 23:50, MailingList SVR wrote:
>> What I like about your reports is that you always provide
>> concrete examples with working test code. Great! :-)
>
> and your answers are ever rich of quality valuable infos, thanks!
Thanks a lot :-)
>> Running this with ftputil 2.4 on my computer takes about
>> 25 minutes. When increasing the cache size to 2000, the code
>> runs in about 40 seconds. :)
>
> works in about 40 seconds on my box too, this time is acceptable,
> however on the same directory a standard ftpclient give the directory
> listing in few seconds. I haven't look at ftputil code but host.listdir
> is fast such as filezilla and company (2-3 seconds) what other tasks
> ftputil does in about 35 seconds?
First, I'd like to add that if I leave the print statements
out of the loop, i. e.
>>> def f():
... for i in lista:
... isd = host.path.isdir(folder+i)
... isf = host.path.isfile(folder+i)
I'm down to 25 seconds (with a cache size of 2000). :)
Using only one cache access per name (as listdir does) in the
loop, I get the loop done in 11 seconds. The following function,
including connecting to the server, runs in about 13 seconds:
>>> def h():
... host = ftputil.FTPHost('ftp.nluug.nl','anonymous','pippo at pippo.com')
... host.stat_cache.resize(2000)
... for i in host.listdir(folder):
... s = host.lstat(folder + i)
>>> %time h() # built into IPython
CPU times: user 9.38 s, sys: 0.08 s, total: 9.46 s
Wall time: 12.59 s
However, listdir (needing 1.2 seconds for the directory) only
_stores_ values in the cache while lstat _retrieves_ the info. So
I suspect the retrieval is significantly slower than the storage.
A little test:
>>> def g():
... for i in lista:
... s = host.lstat(folder+i)
>>> %prun g() # built into IPython
11358799 function calls in 54.934 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5661503 25.974 0.000 38.745 0.000 lrucache.py:110(__cmp__)
2386 15.896 0.007 54.641 0.023 {_heapq.heapify}
5661503 12.771 0.000 12.771 0.000 {cmp}
...
The total time of the other calls seems negligible, so ftputil's
own code seems rather innocent. ;-)
If you're writing an end-user client and need to give the user a
directory listing fast, including the stat'ed information for
each file/dir, the following - untested - approach _might_ work:
- Write a cache class with an interface like that in
ftp_stat_cache.py. However, use a raw Python dictionary to
store the stat results. This cache will have no (automatic)
means to prune old entries, but see below. You can use the
present cache class a template.
- Derive a stat class from ftp_stat._Stat:
class MyStat(ftp_stat._Stat):
def __init__(self, *args, **kwargs):
super(MyStat, self).__init__(*args, **kwargs)
self._lstat_cache = MyCache()
- Derive a class from FTPHost:
class MyFTPHost(ftputil.FTPHost):
def __init__(self, *args, **kwargs):
super(MyFTPHost, self).__init__(*args, **kwargs)
self._stat = MyStat(self)
self.stat_cache = self._stat._lstat_cache
- To get a listing, instantiate your custom FTPHost class. Clear
the cache with host.stat_cache.clear() before retrieving a
directory listing. Also clear the cache explicitly after you no
longer need the directory data.
Keep in mind that if your software isn't interactive, you most
probably don't need to worry about tuning at all. [1] ... And
_with_ interactivity, I just measured: Nautilus 2.24.1 needs
about 8 seconds to show the directory listing, so I think the
13 seconds I got with the pure lstat calls (see above) are not so
bad! Also remember that most users won't have so many directory
items frequently. For most directories, you won't notice any
difference.
If you nevertheless tried out the above idea, I (and maybe other
readers of the list) would be very thankful if you shared your
results. :-)
> It is possible to have a fastest
> listing or the code is already optimized?
As far as I remember, I haven't optimized the code at all because
I haven't had a use case yet where the code was too slow for me. I
don't know how much tuning has gone into lrucache [2], though.
[1] Please read
http://sschwarzer.com/download/optimization_europython2006.pdf
if you haven't already done so. :)
[2] http://pypi.python.org/pypi/lrucache/0.2 (It's listed there
as alpha code, but to my question over two years ago the
author replied it were used successfully in several projects
and that it had comprehensive unit tests. I haven't had any
complaints either.)
Best regards,
Stefan
More information about the ftputil
mailing list