Re: flock (fcntl) with QFS Shared FS
Mark Crispin <markrcrispin <at> panda.com>
2009-04-03 22:12:46 GMT
You're stepped on a hornet's nest...
I understand what you want to do. At Messaging Architects (MA), we're
busy at work on a new generation product that offers a clustered message
store. But in the concept of UW imapd, you're setting yourself up for a
world of hurt.
 Almost all network filesystems are completely unsuitable for use with
With few exceptions -- one being the Cluster File System (CFS) that we
developed at MA -- network filesystems do NOT, repeat, NOT!! maintain the
data/inode synchronization semantics of a real filesystem.
mbx is utterly dependent upon full filesystem synchronization semantics.
mbx uses random access updates which MUST be in COMPLETE synchronization
at all times. Using mbx with any network filesystem that does not have
those semantics is like playing Russian roulette; you WILL eventually lose
mix format is less dependent -- in fact, we're working on a variant of mix
that uses CAstor! -- but I still wouldn't want to risk a network
filesystem that has not be specifically certified for mix.
By using Solaris, and hence SVR4, you are stuck with the fcntl form of
locking. fcntl locking is broken by design.
To understand the insanity of using fcntl, consider the following
statement from the BSD man pages on fcntl locking:
This interface follows the completely stupid semantics of System
V and IEEE Std 1003.1-1988 (``POSIX.1'') that require that all
locks associated with a file for a given process are removed when
ANY file descriptor for that file is closed by that process.
This semantic means that applications must be aware of any files
that a subroutine library may access. For example if an
application for updating the password file locks the password
file database while making the update, and then calls
getpwname(3) to retrieve a record, the lock will be lost because
getpwname(3) opens, reads, and closes the password database. The
database close will release all locks that the process has
associated with the database, even if the library routine never
requested a lock on the database. Another minor semantic problem
with this interface is that locks are not inherited by a child
process created using the fork(2) function. The flock(2)
interface has much more rational last close semantics and allows
locks to be inherited by child processes. Flock(2) is
recommended for applications that want to ensure the integrity of
their locks when using library routines or wish to pass locks to
their children. Note that flock(2) and fcntl(2) locks may be
safely used concurrently.
Not only must the application must be aware of any files that a subroutine
library may access, but it must also be aware of file access interactions
in multiple subroutine libraries and in multiple instances in the same
To put it in a more vivid way: by using Solaris -- even with a local
filesystem -- you are playing the same game of Russian roulette, with
every cylinder in the revolver is loaded and the safety set. You're
pulling strenuously and repeatedly on the trigger, all the time assuring
yourself that the safety will hold.
In this case, the safety is the flocksim code that I wrote, including its
elaborate master/slave communication mechanism. I banged my head against
the wall for years in an only partially-successful attempt to keep that
I've seen the safety fail on that revolver many times, and I keep on
fixing it; but every couple of years something new comes up to make it
I do not recommend Solaris or any other form of SVR4 -- including HP-UX
and AIX -- for use with UW or Panda IMAP. Linux and BSD (including Mac OS
X) are suitable operating systems.
 IMAP is a NAS. Network filesystems are also a NAS.
This is a fundamental collision. It usually does not make sense; and
you're trying to do this in a multi-vendor environment.
The new server that I've developed at MA layers IMAP not only over CFS,
but over a stateless store protocol. This was a HUGE undertaking, and was
accomplished only by divorcing mix from IMAP (the layering goes imap ->
store -> mix -> CFS). What's more, numerous changes were made in CFS
specifically to accomodate mix; and store was built from the onset to
None of these lower-level technologies -- store, mix, CFS -- are intended
to be consumed by end users. The consumers are our end-user facing
products, such as the IMAP server. So it's less of a "layer NAS on top of
NAS" as it is a "how we implemented the NAS".
And even with all those advantages, it took us MONTHS to get it working.
You don't have any of those advantages. You are attempting to layer one
end-user facing NAS on top of another end-user facing NAS. QFS, NFS, AFS,
blurdybloopFS, etc. ad nauseum know nothing about what mix wants to do.
You're crippled by the fcntl form of locking that requires an elaborate
(ever wonder why imapd is always forking??) kludge to keep locks from
being dropped. And you don't have all the engineers responsible for every
piece of the system working together in one room.
 Last, but not least, you're dealing with SUN.
SUN sells a email product. They are hostile to open-source email software
that competes with their product.
There is more that I can say about that subject.
...As I said, you're setting yourself up for a world of hurt...
With that said, and to answer your specific question:
In the mbx format, the same /tmp directory must be used for all agents
that access the mbx format file.
mix does not depend upon the /tmp directory.
Nonetheless, I don't think that it will help. You have lots of things
working to break you; and for the most part, you are going to be on your
own. I don't support the UW code at all any more (and there are several
known bugs in UW), and I only support the Panda code in specific
Good luck. You are going to need it.
-- Mark --
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Imap-uw mailing list
Imap-uw <at> u.washington.edu