[ftputil] My ftputil extensions (caching/mirror script)
Martin Wilck
martin.wilck at fujitsu-siemens.com
Wed Jul 19 12:08:16 CEST 2006
Hi Stefan,
> you sent your mail in private. I don't know for sure if the
> mailing list strips attachments, so the private mail might
> actually have been sensible. ;-) What do you think about
> continuing the discussion on the mailing list?
Fine with me. I thought it'd be good to let you see the code first, and
that not all of it would be of interest to the list members. CC'ing the
list now.
(List members: I had sent my code based on ftputil mentioned in the
"Call for votes on new features" thread to Stefan. If anyone else wants
to see it, drop me an email, or tell me to post it to the list).
>>The caching is implemented in a very simplistic manner (just store the
>>_dir() results), but that saves a lot of execution time for me. I wanted
>>to re-implement as little as possible ftputil code.
>
>
> Understandable. I'm not yet sure if I'll add caching of _dir()
> results or of individual stat results.
Caching stat results will be more efficient because, in my current
implemtation, if one entry changes, the stat info of all other files in
the same directory will have to be invalidated.
>>The handling of case-insensitive FTP servers took me much more work, a
>>lot of the code is actually devoted to that (you might figure that I
>>came up with that because I had a lot of trouble with such a server).
>
>
> I won't approach case-insensitive servers for now; I'll tackle
> the caching first. In fact, I try to solve this rather
> minimalistic, without adding code unnecessary for the caching
> task.
Agreed. I suspect that the case-insensitive-server thing is a very
specific problem of mine. And it is pretty hard to say what is The Right
Thing(TM) when uploading from a case sensitve system to a case
insensitive system.
>>I haven't figured out yet which parts of this code could be integrated
>>into ftputil, and how. If you are interested, I expect that we'll
>>consider that together.
>
>
> I've not (yet) a firm idea at this point, too. I think there will
> be a module stat_cache.py and an interface to set cache
> parameters in the FTPHost class. This interface should be careful
> to not expose implementation details of the caching, because
>
> - the interface should provide a high-level view of the caching
> (being high-level is IMHO the main strength of ftputil w.r.t.
> ftplib, after all);
>
> - exposing implementation details would make it more difficult to
> change the caching implementation or details of it later
>
> An interface might be
>
> host = ftputil.FTPHost(...)
> # set number of cached entries and their maximum age
> host.cache_control(max_entries=1000, max_age=10*60)
>
> and perhaps an interface for invalidating the cache
>
> host.empty_cache()
>
> with
>
> def cache_control(self, max_entries=0, max_age=0):
> """
> Set parameters to control the cache for dir/file stat
> results. There's one cache per `FTPHost` instance.
>
> `max_entries` is the maximum number of entries to store in
> the cache; i. e. there will never be more than `max_entries`
> stat result items in the cache.
>
> `max_age` is the maximum age of the cache entries; if an
> entry is older it will _not_ be reused from the cache but the
> information will be fetched from the FTP host.
>
> The defaults for both `max_entries` and `max_age` is 0 which
> implicitly disables the cache. This is for backwards
> compatibilty with old versions of ftputil. These settings
> also avoid possibly unwanted side effects on items changing
> on the server by other ways than this `FTPHost` instance.
> """
>
> def empty_cache(self):
> """
> Empty the cache for stat results. After that call, caching
> will restart for new stat queries.
>
> To switch off caching for the `FTPHost` instance completely,
> use
>
> host.cache_control(max_age=0)
> """
>
> The names of the methods and their parameters aren't carved in
> stone.
Sounds very reasonable to me. I am just wondering whether or not you
should put this functionality into a derived class 'CachingFTPHost', as
I needed to, or just extend 'FTPHost' itself.
>>I hope you like the stuff, looking forward to your feedback,
>
>
> I read your code without trying to understand every detail. So
> far, it seems very understandable for what it's trying to solve.
> _For ftputil_, though, I'll try _not_ to write general-purpose
> code, but just enough to implement the caching to avoid fetching
> directory lists for each single file/directory entry. I like the
> YAGNI and DTSTTCPW principles
> ( http://c2.com/xp/YouArentGonnaNeedIt.html and
> http://c2.com/cgi/wiki?DoTheSimplestThingThatCouldPossiblyWork ).
I am not sure I fully understand what you mean, but with these
principles in mind and with the goal you're stating, simp,y caching
_dir() results might be the best thing you can do.
Regards
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck at Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
More information about the ftputil
mailing list