[ftputil] Concurrent cache accesses
Stefan Schwarzer
sschwarzer at sschwarzer.net
Sat Oct 14 21:36:34 CEST 2006
Hello,
the cache functionality in ftputil 2.2a1 should work fine if
any file system manipulations are done via a single FTPHost
instance. However, file/directory changes directly from a shell
account on the remote host may lead to cache inconsistencies.
For most applications, this shouldn't be a problem.
On the other hand, since there's an own cache for each FTPHost
object, a cache may become inconsistent with the other if the
file system is changed via another FTPHost instance. (Using
multiple FTPHost instances may seem feasible when working with
multiple threads since FTPHost objects are not thread-safe
when accessed from multiple threads.) For example, consider
this code:
import ftputil
# same server
host1 = ftputil.FTPHost('myhost', 'user', 'password')
host2 = ftputil.FTPHost('myhost', 'user', 'password')
# get stat data
stat1 = host1.stat("some_file")
# remove file via the other FTPHost instance
host2.remove("some_file")
# still gets the old stat data instead of raising an
# ftp_error.PermanentError because remote access is avoided
# and the caches aren't shared!
stat1 = host1.stat("some_file")
host1.close()
host2.close()
A workaround might be to invalidate the cache entry explicitly:
import ftputil
# same server
host1 = ftputil.FTPHost('myhost', 'user', 'password')
host2 = ftputil.FTPHost('myhost', 'user', 'password')
# get stat data
stat1 = host1.stat("some_file")
# remove file via the other FTPHost instance
host2.remove("some_file")
# invalidate cache entry
host1.stat_cache.invalidate("some_file")
# now will raise a ftp_error.PermanentError
stat1 = host1.stat("some_file")
host1.close()
host2.close()
One may think about using one cache for each _host_. This has
several problems, though, if more than one account is used on a
single host. When different accounts are used, e. g. with
host1 = ftputil.FTPHost('myhost', 'user1', 'password1')
host2 = ftputil.FTPHost('myhost', 'user2', 'password2')
it depends on the server configuration whether two
seemingly same paths actually refer to the same file or
directory. Example: The login directory for user1 may be
/home/user1 when seen from a shell account on the remote
host, but seem to be / when seen by the FTP client (ftputil);
the login directory for user2 may be /home/user2 but _also_
seem to be / when seen by the FTP client. So, the same path
"/some_file" for both logins will actually refer to _different_
files!
On the other hand, two _different_ paths for two logins may
actually point to the _same_ file, again depending on the server
configuration.
Now, sharing the cache data for logins with the same host and
account data may be possible but it would be turn out to be very
confusing if cache changes for those logins are kept consistent
with each other but not with another login on the same host with
different account data. Happy bug hunting! ;-)
What do you think about the matter? How should it be handled by
ftputil? Should ftputil just ignore the issue, requiring "manual"
invalidations? Or should ftputil be made thread-safe, which might
be quite difficult. And even if this works, it would not protect
from changes by a different Python process on the same local host
or changes on the remote host by some file system change
happening from there.
I hope this was understandable. ;-) If not, please ask.
Stefan
More information about the ftputil
mailing list