[ftputil] My ftputil extensions (caching/mirror script)

Stefan Schwarzer sschwarzer at sschwarzer.net
Thu Jul 20 00:19:18 CEST 2006


Hello Martin,

On 2006-07-19 12:08, Martin Wilck wrote:
> (List members: I had sent my code based on ftputil mentioned in the
> "Call for votes on new features" thread to Stefan. If anyone else wants
> to see it, drop me an email, or tell me to post it to the list).

The code is already in a branch on ftputil.sschwarzer.net :-)
http://ftputil.sschwarzer.net/trac/browser/branches/add_stat_caching/ftpsync-0.1

>>> The caching is implemented in a very simplistic manner (just store the
>>> _dir() results), but that saves a lot of execution time for me. I wanted
>>> to re-implement as little as possible ftputil code.
>>
>> Understandable. I'm not yet sure if I'll add caching of _dir()
>> results or of individual stat results.
>
> Caching stat results will be more efficient because, in my current
> implemtation, if one entry changes, the stat info of all other files in
> the same directory will have to be invalidated.

The approach and the implementation have to work together. We can
use a present implementation but we don't _have to_, so we are
free in which approach we use. :-)

>>> I haven't figured out yet which parts of this code could be integrated
>>> into ftputil, and how. If you are interested, I expect that we'll
>>> consider that together.
>>
>> I've not (yet) a firm idea at this point, too. I think there will
>> be a module stat_cache.py and an interface to set cache
>> parameters in the FTPHost class. This interface should be careful
>> to not expose implementation details of the caching, because
>>
>> - the interface should provide a high-level view of the caching
>>   (being high-level is IMHO the main strength of ftputil w.r.t.
>>   ftplib, after all);
>>
>> - exposing implementation details would make it more difficult to
>>   change the caching implementation or details of it later
>>
>> An interface might be
>>
>> host = ftputil.FTPHost(...)
>> # set number of cached entries and their maximum age
>> host.cache_control(max_entries=1000, max_age=10*60)
>>
>> and perhaps an interface for invalidating the cache
>>
>> host.empty_cache()
>>
>> with
>>
>> def cache_control(self, max_entries=0, max_age=0):
>>     """
>>     Set parameters to control the cache for dir/file stat
>>     results. There's one cache per `FTPHost` instance.
>>
>>     `max_entries` is the maximum number of entries to store in
>>     the cache; i. e. there will never be more than `max_entries`
>>     stat result items in the cache.
>>
>>     `max_age` is the maximum age of the cache entries; if an
>>     entry is older it will _not_ be reused from the cache but the
>>     information will be fetched from the FTP host.
>>
>>     The defaults for both `max_entries` and `max_age` is 0 which
>>     implicitly disables the cache. This is for backwards
>>     compatibilty with old versions of ftputil. These settings
>>     also avoid possibly unwanted side effects on items changing
>>     on the server by other ways than this `FTPHost` instance.
>>     """
>>
>> def empty_cache(self):
>>     """
>>     Empty the cache for stat results. After that call, caching
>>     will restart for new stat queries.
>>
>>     To switch off caching for the `FTPHost` instance completely,
>>     use
>>
>>     host.cache_control(max_age=0)
>>     """
>>
>> The names of the methods and their parameters aren't carved in
>> stone.
>
> Sounds very reasonable to me. I am just wondering whether or not you
> should put this functionality into a derived class 'CachingFTPHost', as
> I needed to, or just extend 'FTPHost' itself.

Interface-wise, I would prefer to have the caching in `FTPHost`.
As I wrote, by default the caching will (effectively) be switched
off (to prevent unexpected side effects) so a user of `FTPHost`
can start using the caching feature at any time without having to
change the class.

Best wishes
Stefan


More information about the ftputil mailing list