Stewart Smith | 1 Jul 2005 03:09
Gravatar

Re: XFS corruption during power-blackout

On Thu, 2005-06-30 at 11:46 -0700, Chris Wedgwood wrote:
> Yes, but POSIX is broken in places.  The linux implmentation (now and
> for sometime but not always) won't return until all dirty data is
> flushed.

POSIX, in regard to fsync() provides "flexibility for the
implementation" - maybe your environment is special and you don't buffer
anything, so fsync() is null. Or perhaps you cannot control some of the
disk caches, so fsync() is null.

In newer systems, you can check for the flag POSIX_SYNCHRONIZED_IO (or
similar) that, if set, gaurentees that fsync() is synchronously flushing
buffers to disk. However, this only came into the spec in 99 or 2000 i
think, so there are still a lot of systems in which you have to know the
behaviour.

> > and some 'sync' programs do multiple sync()s.
> 
> Such programs are arguably broken (grub maybe?).  If one doesn't work,
> then why should doing it <n>-times?

It's a legacy from the days when it was an async operation. The idea
went: that the time it took to type sync and press enter three times
(note, no using up-arrow, enter - typing) would be long enough for the
buffers that started to get flushed on the first sync to have hit disk.

> > And it's also filesystem-type-dependent.
> 
> If a filesystem doesn't flush reliably with sync, I would call that a
> bug.
(Continue reading)

Yura Pakhuchiy | 1 Jul 2005 04:03
Picon

Print inode number if name can not be represented in the local charset

Hi Anton,

This patch changes message type from ntfs_error to ntfs_debug in the
unistr.c when unicode filename contains characters that can not be
converted into the local charset, because it's not useful for user and
actually it's not a error. Instead of this print warning in dir.c, this
warning includes inode number for further investigation (e.g. with
ntfsinfo).

Signed-off-by: Yura Pakhuchiy <pakhuchiy <at> gmail.com>

--

-- 
Best regards,
        Yura
Suparna Bhattacharya | 1 Jul 2005 09:56
Picon

aio-stress throughput regressions from 2.6.11 to 2.6.12

Has anyone else noticed major throughput regressions for random
reads/writes with aio-stress in 2.6.12 ?
Or have there been any other FS/IO regressions lately ?

On one test system I see a degradation from around 17+ MB/s to 11MB/s
for random O_DIRECT AIO (aio-stress -o3 testext3/rwfile5) from 2.6.11
to 2.6.12. It doesn't seem filesystem specific. Not good :(

BTW, Chris/Ben, it doesn't look like the changes to aio.c have had an impact
(I copied those back to my 2.6.11 tree and tried the runs with no effect)
So it is something else ...

Ideas/thoughts/observations ?

Regards
Suparna

--

-- 
Suparna Bhattacharya (suparna <at> in.ibm.com)
Linux Technology Center
IBM Software Lab, India

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo <at> kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart <at> kvack.org">aart <at> kvack.org</a>

David Masover | 1 Jul 2005 10:17
Gravatar

Re: XFS corruption during power-blackout

Chris Wedgwood wrote:
> On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote:
> 
> 
>>What I found were 4 things in the dest dir:
>>1. Missing Dirs,Files. That's OK.
>>2. Files of size 0. That's acceptable.
>>3. Corrupted Files. That's unacceptable.
>>4. Corrupted Files with original fingerprint. That's ABSOLUTELY
>>unacceptable.
> 
> 
> disk usually default to caching these days and can lose data as a
> result, disable that

Not always possible.  Some disks lie and leave caching on anyway.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pavel Fedin | 1 Jul 2005 19:04
Picon
Favicon

NLS for HFS

  Hello Roman!
  Previously i agreed with inclusion of your version of the NLS patch 
into HFS filesystem. As i see 2.6.12 is released but the patch is not 
there. What's wrong with it?
  I'd like to finish the work.

--

-- 
  Kind regards, Pavel Fedin
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe | 1 Jul 2005 11:24
Picon

Re: XFS corruption during power-blackout

On Fri, Jul 01 2005, David Masover wrote:
> Chris Wedgwood wrote:
> >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote:
> >
> >
> >>What I found were 4 things in the dest dir:
> >>1. Missing Dirs,Files. That's OK.
> >>2. Files of size 0. That's acceptable.
> >>3. Corrupted Files. That's unacceptable.
> >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY
> >>unacceptable.
> >
> >
> >disk usually default to caching these days and can lose data as a
> >result, disable that
> 
> Not always possible.  Some disks lie and leave caching on anyway.

And the same (and others) disks will not honor a flush anyways. Moral of
that story - avoid bad hardware.

--

-- 
Jens Axboe

Ric Wheeler | 1 Jul 2005 14:36

Re: XFS corruption during power-blackout

Chris Wedgwood wrote:

>On Thu, Jun 30, 2005 at 09:44:37PM +0200, J?rn Engel wrote:
>
>  
>
>>Or do you rather mean that a single sync() should block until all data
>>currently present is hardened?
>>    
>>
>
>Logically sync() should return only after all dirty buffers that
>existed before sync() was called are flushed.
>
>Anything more than this (i.e. waiting on newly (since sync was called
>but before it returns) dirtied buffers) could live-lock (actually,
>this used to happen sometimes, I don't know if that's the case).
>  
>
I think that we need one more stage in sync() behavior to make sure that 
the data is safely on the platter.  For file systems with supported 
write barriers, the last IO should be wrapped with a barrier to flush 
the disk cache.

It doesn't seem that sync() does that in today's code.

Ric Wheeler | 1 Jul 2005 14:53

Re: XFS corruption during power-blackout

Bryan Henderson wrote:

>
>It's because of the words before that:  "everything that was buffered when 
>sync()
>started is hardened before the next sync() returns."  The point is that 
>the second sync() is the one that waits (it actually waits for the 
>previous one to finish before it starts).  By the way, I'm not talking 
>about Linux at this point.  I'm talking about so-called POSIX systems in 
>general.
>
>But it does sound like Linux has a pretty firm philosophy of synchronous 
>sync (I see it documented in an old man page), so I guess it's OK to rely 
>on it.
>
>There are scenarios where you'd rather not have a process tied up while 
>syncing takes place.  Stepping back, I would guess the primary original 
>purpose of sync() was to allow you to make a sync daemon.  Early Unix 
>systems did not have in-kernel safety clean timers.  A user space process 
>did that.
>
>--
>Bryan Henderson                     IBM Almaden Research Center
>San Jose CA                         Filesystems
>  
>
We have been playing around with various sync techniques that allow you 
to get good data safety for a large batch of files (think of a restore 
of a file system or a migration of lots of files from one server to 
another).  You can always restart a restore if the box goes down in the 
(Continue reading)

Jens Axboe | 1 Jul 2005 14:56
Picon

Re: XFS corruption during power-blackout

On Fri, Jul 01 2005, Ric Wheeler wrote:
> Chris Wedgwood wrote:
> 
> >On Thu, Jun 30, 2005 at 09:44:37PM +0200, J?rn Engel wrote:
> >
> > 
> >
> >>Or do you rather mean that a single sync() should block until all data
> >>currently present is hardened?
> >>   
> >>
> >
> >Logically sync() should return only after all dirty buffers that
> >existed before sync() was called are flushed.
> >
> >Anything more than this (i.e. waiting on newly (since sync was called
> >but before it returns) dirtied buffers) could live-lock (actually,
> >this used to happen sometimes, I don't know if that's the case).
> > 
> >
> I think that we need one more stage in sync() behavior to make sure that 
> the data is safely on the platter.  For file systems with supported 
> write barriers, the last IO should be wrapped with a barrier to flush 
> the disk cache.
> 
> It doesn't seem that sync() does that in today's code.

That is true, sync() really only guarantees that the io has been issued
and the drive signalled completion, with write back caching on it might
not be on platter yet.
(Continue reading)

Ric Wheeler | 1 Jul 2005 15:57

Re: XFS corruption during power-blackout

Rogério Brito wrote:

>On Jul 01 2005, Jens Axboe wrote:
>  
>
>>On Fri, Jul 01 2005, David Masover wrote:
>>    
>>
>>>Not always possible.  Some disks lie and leave caching on anyway.
>>>      
>>>
>>And the same (and others) disks will not honor a flush anyways.
>>Moral of that story - avoid bad hardware.
>>    
>>
>
>But how does the end-user know what hardware is "good hardware"? Which
>vendors don't lie (or, at least, lie less than others) regarding HDs?
>
>
>Thanks, Rogério Brito.
>
>  
>
The only real way is to test the drive (and retest when you get a new 
versions of firmware) and the whole fsync -> write barrier code path.

We use a bus analyzer to make sure that when you fsync() a file, you 
will see a cache flush command coming across the bus. Of course, that is 
the easy step ;-)
(Continue reading)


Gmane