[ftputil] Ftputil seems really slow to distinguish file and folders
MailingList SVR
lists at svrinformatica.it
Fri May 1 17:44:20 CEST 2009
Hi Stefan,
thanks for your suggestions I'll try them this weekend,
what I mean for fast mode is use only host.listdir to provide the distintion beetween files and directory. Is it possible to retrieve this basic info only with listdir and so in 1.2 seconds? This way we can give a very fast directory listing and for example give the whole informations only if len(host.listdir) < x
regards
Nicola
In data venerdì 01 maggio 2009 10:52:54, Stefan Schwarzer ha scritto:
: > Hi Nicola,
>
> On 2009-04-30 23:50, MailingList SVR wrote:
> >> What I like about your reports is that you always provide
> >> concrete examples with working test code. Great! :-)
> >
> > and your answers are ever rich of quality valuable infos, thanks!
>
> Thanks a lot :-)
>
> >> Running this with ftputil 2.4 on my computer takes about
> >> 25 minutes. When increasing the cache size to 2000, the code
> >> runs in about 40 seconds. :)
> >
> > works in about 40 seconds on my box too, this time is acceptable,
> > however on the same directory a standard ftpclient give the directory
> > listing in few seconds. I haven't look at ftputil code but host.listdir
> > is fast such as filezilla and company (2-3 seconds) what other tasks
> > ftputil does in about 35 seconds?
>
> First, I'd like to add that if I leave the print statements
> out of the loop, i. e.
>
> >>> def f():
> ... for i in lista:
> ... isd = host.path.isdir(folder+i)
> ... isf = host.path.isfile(folder+i)
>
> I'm down to 25 seconds (with a cache size of 2000). :)
>
> Using only one cache access per name (as listdir does) in the
> loop, I get the loop done in 11 seconds. The following function,
> including connecting to the server, runs in about 13 seconds:
>
> >>> def h():
> ... host = ftputil.FTPHost('ftp.nluug.nl','anonymous','pippo at pippo.com')
> ... host.stat_cache.resize(2000)
> ... for i in host.listdir(folder):
> ... s = host.lstat(folder + i)
> >>> %time h() # built into IPython
> CPU times: user 9.38 s, sys: 0.08 s, total: 9.46 s
> Wall time: 12.59 s
>
> However, listdir (needing 1.2 seconds for the directory) only
> _stores_ values in the cache while lstat _retrieves_ the info. So
> I suspect the retrieval is significantly slower than the storage.
>
> A little test:
>
> >>> def g():
> ... for i in lista:
> ... s = host.lstat(folder+i)
> >>> %prun g() # built into IPython
> 11358799 function calls in 54.934 CPU seconds
>
> Ordered by: internal time
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 5661503 25.974 0.000 38.745 0.000 lrucache.py:110(__cmp__)
> 2386 15.896 0.007 54.641 0.023 {_heapq.heapify}
> 5661503 12.771 0.000 12.771 0.000 {cmp}
> ...
>
> The total time of the other calls seems negligible, so ftputil's
> own code seems rather innocent. ;-)
>
> If you're writing an end-user client and need to give the user a
> directory listing fast, including the stat'ed information for
> each file/dir, the following - untested - approach _might_ work:
>
> - Write a cache class with an interface like that in
> ftp_stat_cache.py. However, use a raw Python dictionary to
> store the stat results. This cache will have no (automatic)
> means to prune old entries, but see below. You can use the
> present cache class a template.
>
> - Derive a stat class from ftp_stat._Stat:
>
> class MyStat(ftp_stat._Stat):
> def __init__(self, *args, **kwargs):
> super(MyStat, self).__init__(*args, **kwargs)
> self._lstat_cache = MyCache()
>
> - Derive a class from FTPHost:
>
> class MyFTPHost(ftputil.FTPHost):
> def __init__(self, *args, **kwargs):
> super(MyFTPHost, self).__init__(*args, **kwargs)
> self._stat = MyStat(self)
> self.stat_cache = self._stat._lstat_cache
>
> - To get a listing, instantiate your custom FTPHost class. Clear
> the cache with host.stat_cache.clear() before retrieving a
> directory listing. Also clear the cache explicitly after you no
> longer need the directory data.
>
> Keep in mind that if your software isn't interactive, you most
> probably don't need to worry about tuning at all. [1] ... And
> _with_ interactivity, I just measured: Nautilus 2.24.1 needs
> about 8 seconds to show the directory listing, so I think the
> 13 seconds I got with the pure lstat calls (see above) are not so
> bad! Also remember that most users won't have so many directory
> items frequently. For most directories, you won't notice any
> difference.
>
> If you nevertheless tried out the above idea, I (and maybe other
> readers of the list) would be very thankful if you shared your
> results. :-)
>
> > It is possible to have a fastest
> > listing or the code is already optimized?
>
> As far as I remember, I haven't optimized the code at all because
> I haven't had a use case yet where the code was too slow for me. I
> don't know how much tuning has gone into lrucache [2], though.
>
> [1] Please read
> http://sschwarzer.com/download/optimization_europython2006.pdf
> if you haven't already done so. :)
>
> [2] http://pypi.python.org/pypi/lrucache/0.2 (It's listed there
> as alpha code, but to my question over two years ago the
> author replied it were used successfully in several projects
> and that it had comprehensive unit tests. I haven't had any
> complaints either.)
>
> Best regards,
> Stefan
> _______________________________________________
> ftputil mailing list
> ftputil at codespeak.net
> http://codespeak.net/mailman/listinfo/ftputil
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/ftputil/attachments/20090501/567e5e01/attachment-0001.htm
More information about the ftputil
mailing list