1 Apr 2011 01:40
Re: [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock
Dave Chinner <david <at> fromorbit.com>
2011-03-31 23:40:50 GMT
2011-03-31 23:40:50 GMT
On Mon, Mar 28, 2011 at 05:06:28PM +0900, Toshiyuki Okajima wrote: > Hi. > > On Thu, 17 Feb 2011 11:45:52 +0100 > Jan Kara <jack <at> suse.cz> wrote: > > On Thu 17-02-11 12:50:51, Toshiyuki Okajima wrote: > > > (2011/02/16 23:56), Jan Kara wrote: > > > >On Wed 16-02-11 08:17:46, Toshiyuki Okajima wrote: > > > >>On Tue, 15 Feb 2011 18:29:54 +0100 > > > >>Jan Kara<jack <at> suse.cz> wrote: > > > >>>On Tue 15-02-11 12:03:52, Ted Ts'o wrote: > > > >>>>On Tue, Feb 15, 2011 at 05:06:30PM +0100, Jan Kara wrote: > > > >>>>>Thanks for detailed analysis. Indeed this is a bug. Whenever we do IO > > > >>>>>under s_umount semaphore, we are prone to deadlock like the one you > > > >>>>>describe above. > > > >>>> > > > >>>>One of the fundamental problems here is that the freeze and thaw > > > >>>>routines are using down_write(&sb->s_umount) for two purposes. The > > > >>>>first is to prevent the resume/thaw from racing with a umount (which > > > >>>>it could do just as well by taking a read lock), but the second is to > > > >>>>prevent the resume/thaw code from racing with itself. That's the core > > > >>>>fundamental problem here. > > > >>>> > > > >>>>So I think we can solve this by introduce a new mutex, s_freeze, and > > > >>>>having the the resume/thaw first take the s_freeze mutex and then > > > >>>>second take a read lock on the s_umount. > > > >>> Sadly this does not quite work because even down_read(&sb->s_umount) > > > >>>in thaw_super() can block if there is another process that tries to acquire > > > >>>s_umount for writing - a situation like: > > > >>> TASK 1 (e.g. flusher) TASK 2 (e.g. remount) TASK 3 (unfreeze)(Continue reading)
RSS Feed