[ftputil] My ftputil extensions (caching/mirror script)
Stefan Schwarzer
sschwarzer at sschwarzer.net
Thu Jul 20 00:19:18 CEST 2006
Hello Martin,
On 2006-07-19 12:08, Martin Wilck wrote:
> (List members: I had sent my code based on ftputil mentioned in the
> "Call for votes on new features" thread to Stefan. If anyone else wants
> to see it, drop me an email, or tell me to post it to the list).
The code is already in a branch on ftputil.sschwarzer.net :-)
http://ftputil.sschwarzer.net/trac/browser/branches/add_stat_caching/ftpsync-0.1
>>> The caching is implemented in a very simplistic manner (just store the
>>> _dir() results), but that saves a lot of execution time for me. I wanted
>>> to re-implement as little as possible ftputil code.
>>
>> Understandable. I'm not yet sure if I'll add caching of _dir()
>> results or of individual stat results.
>
> Caching stat results will be more efficient because, in my current
> implemtation, if one entry changes, the stat info of all other files in
> the same directory will have to be invalidated.
The approach and the implementation have to work together. We can
use a present implementation but we don't _have to_, so we are
free in which approach we use. :-)
>>> I haven't figured out yet which parts of this code could be integrated
>>> into ftputil, and how. If you are interested, I expect that we'll
>>> consider that together.
>>
>> I've not (yet) a firm idea at this point, too. I think there will
>> be a module stat_cache.py and an interface to set cache
>> parameters in the FTPHost class. This interface should be careful
>> to not expose implementation details of the caching, because
>>
>> - the interface should provide a high-level view of the caching
>> (being high-level is IMHO the main strength of ftputil w.r.t.
>> ftplib, after all);
>>
>> - exposing implementation details would make it more difficult to
>> change the caching implementation or details of it later
>>
>> An interface might be
>>
>> host = ftputil.FTPHost(...)
>> # set number of cached entries and their maximum age
>> host.cache_control(max_entries=1000, max_age=10*60)
>>
>> and perhaps an interface for invalidating the cache
>>
>> host.empty_cache()
>>
>> with
>>
>> def cache_control(self, max_entries=0, max_age=0):
>> """
>> Set parameters to control the cache for dir/file stat
>> results. There's one cache per `FTPHost` instance.
>>
>> `max_entries` is the maximum number of entries to store in
>> the cache; i. e. there will never be more than `max_entries`
>> stat result items in the cache.
>>
>> `max_age` is the maximum age of the cache entries; if an
>> entry is older it will _not_ be reused from the cache but the
>> information will be fetched from the FTP host.
>>
>> The defaults for both `max_entries` and `max_age` is 0 which
>> implicitly disables the cache. This is for backwards
>> compatibilty with old versions of ftputil. These settings
>> also avoid possibly unwanted side effects on items changing
>> on the server by other ways than this `FTPHost` instance.
>> """
>>
>> def empty_cache(self):
>> """
>> Empty the cache for stat results. After that call, caching
>> will restart for new stat queries.
>>
>> To switch off caching for the `FTPHost` instance completely,
>> use
>>
>> host.cache_control(max_age=0)
>> """
>>
>> The names of the methods and their parameters aren't carved in
>> stone.
>
> Sounds very reasonable to me. I am just wondering whether or not you
> should put this functionality into a derived class 'CachingFTPHost', as
> I needed to, or just extend 'FTPHost' itself.
Interface-wise, I would prefer to have the caching in `FTPHost`.
As I wrote, by default the caching will (effectively) be switched
off (to prevent unexpected side effects) so a user of `FTPHost`
can start using the caching feature at any time without having to
change the class.
Best wishes
Stefan
More information about the ftputil
mailing list