Helge Hafting | 1 Jan 14:39 2005
Picon
Picon

Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard

On Thu, Dec 30, 2004 at 10:50:55PM +0300, Michael Tokarev wrote:
> Peter T. Breuer wrote:
> >In gmane.linux.raid Georg C. F. Greve <greve@...> wrote:
> >
> >Yes, well, don't put the journal on the raid partition. Put it
> >elsewhere (anyway, journalling and raid do not mix, as write ordering
> >is not - deliberately - preserved in raid, as far as I can tell).
> 
> This is a sort of a nonsense, really.  Both claims, it seems.
> I can't say for sure whenever write ordering is preserved by
> raid -- it should, and if it isn't, it's a bug and should be
> fixed.  Nothing else is wrong with placing journal into raid
> (the same as the filesystem in question).  Suggesting to remove
> journal just isn't fair: the journal is here for a reason.
> And, finally, the kernel should not crash.  If something like
> this is unsupported, it should refuse to do so, instead of
> crashing randomly.

Write ordering trouble shouldn't crash the kernel, the way I
understand it.  Your journalled fs could be lost/inconsistent 
if the machine crashes for other reasons, due to bad write
ordering.  But the ordering trouble shouldn't cause a crash,
and all should be fine as soon as all the writes complete
without other incidents.

Helge Hafting

jan | 1 Jan 17:31 2005

Software RAID 1 boot procedure hangs after reset

Hi there,
i have a debian sarge system with kernel 2.6.6. After a hard reset the
boot process hangs at recognising/ mounting the root file system, but says
he can start /dev/mdX with 2 out of 2 disks.
The error message is:
 pivot_root: No such file or directory
    /sbin/init: 424: cannot open dev/console: no such file
Exactly the same as
http://lists.debian.org/debian-kernel/2004/08/msg01301.html.
I am using a initrd with sata and raid1 modules.
The filesystem is fd (linux raid autodetect) and autofs is compiled into
the kernel.
I tried the debian default kernel (2.6.3) and he detects a corrupted raid
(starting with 1 out of 2 devices) and gives another error message.
I repaired it by using dd and an old image of the root filesystem. But i
do not understand the big picture here. Why does one kernel detect a fully
synchronised raid and the other kernel doesn´t?
Actually it happened twice and the first time there was some message that
some filesystem has been mounted too many times and needs a check, after
that the system started normally. But before this messages he couldn´t
find the root fs. A friend of mine pointed to me that the wait time of the
kernel to autodetect the fs is/ was probably too short.
In general i googled for some information about the problem but couldn´t
find a good resource. Every howto explains how to set up software raid,
but no one explains how to repair a corrupted root fs.

jan <at>          _ __
 _    _____ (_) /____ ____   ___  _______ _
| |/|/ / -_) / __/ _ `/ _ \_/ _ \/ __/ _ `/
|__,__/\__/_/\__/\_,_/_//_(_)___/_/  \_, /
(Continue reading)

Axel Christiansen | 2 Jan 19:03 2005

raid5 resync 2.6.10 freez

Hello,

i recently upgraded to a 4-IDE-RAID5. Since than the array
is not able to resync itself.

linux 2.6.10 with software-raid
mdadm - v1.5.0 - 22 Jan 2004
4 X IDE DISK WDC WD1600JB-00FUA0 (160 GB)
2 X HPT374 (rev 07) IDE Conroller
Epox 8K5-A3+
Duron 1500 MHz CPU

After 5 to 10 minutes the os freez. The system console does
not show anything.

#
# Kernel hacking
#
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_SCHEDSTATS=y
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
CONFIG_DEBUG_KOBJECT=y
# CONFIG_DEBUG_HIGHMEM is not set
# CONFIG_DEBUG_INFO is not set
CONFIG_FRAME_POINTER=y
CONFIG_EARLY_PRINTK=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
(Continue reading)

Andy Smith | 2 Jan 20:42 2005
Picon

ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

On Thu, Dec 30, 2004 at 10:39:42PM +0100, Peter T. Breuer wrote:
> In gmane.linux.raid Michael Tokarev <mjt <at> tls.msk.ru> wrote:
> > Peter T. Breuer wrote:
> > > In gmane.linux.raid Georg C. F. Greve <greve <at> fsfeurope.org> wrote:
> > > 
> > > Yes, well, don't put the journal on the raid partition. Put it
> > > elsewhere (anyway, journalling and raid do not mix, as write ordering
> > > is not - deliberately - preserved in raid, as far as I can tell).
> > 
> > This is a sort of a nonsense, really.  Both claims, it seems.
> 
> It's perfectly correct, as far as I know!

Not really wishing to get into the middle of a flame war, but I
didn't really see how this could be true so I asked for more info on
ext3-users.

I got the following response:

https://listman.redhat.com/archives/ext3-users/2005-January/msg00003.html

Peter T. Breuer | 2 Jan 21:18 2005
Picon

Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

Andy Smith <andy <at> strugglers.net> wrote:
> [-- text/plain, encoding quoted-printable, charset: us-ascii, 22 lines --]
> 
> On Thu, Dec 30, 2004 at 10:39:42PM +0100, Peter T. Breuer wrote:
> > In gmane.linux.raid Michael Tokarev <mjt <at> tls.msk.ru> wrote:
> > > Peter T. Breuer wrote:
> > > > In gmane.linux.raid Georg C. F. Greve <greve <at> fsfeurope.org> wrote:
> > > > 
> > > > Yes, well, don't put the journal on the raid partition. Put it
> > > > elsewhere (anyway, journalling and raid do not mix, as write ordering
> > > > is not - deliberately - preserved in raid, as far as I can tell).
> > > 
> > > This is a sort of a nonsense, really.  Both claims, it seems.
> > 
> > It's perfectly correct, as far as I know!
> 
> Not really wishing to get into the middle of a flame war, but I
> didn't really see how this could be true so I asked for more info on
> ext3-users.
> 
> I got the following response:
> 
> https://listman.redhat.com/archives/ext3-users/2005-January/msg00003.html

Interesting - I'll post it (there is no flame war):

>     * From: "Stephen C. Tweedie" <sct redhat com>
>     * To: Andy Smith <andy lug org uk>
>     * Cc: Stephen Tweedie <sct redhat com>, ext3 users list <ext3-users
>     * redhat com>
(Continue reading)

Andy Smith | 3 Jan 01:30 2005
Picon

Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

On Sun, Jan 02, 2005 at 09:18:13PM +0100, Peter T. Breuer wrote:
> Stephen Tweedie wrote:
> > Umm, if soft raid is expected to have silent invisible corruptions in
> > normal use,
> 
> It is, just as is all types of RAID.  This is a very strange thing for
> Stephen to say - I cannot believe that he is as naive as he makes
> himself out to be about RAID here and I don't know why he should say
> that (presuming that he really knows better).
> 
> > then you shouldn't be using it, period.  That's got zero to
> > do with journaling.
> 
> It implies that one should not be doing journalling on top of it.
> 
> (The logic for why RAID corrupts silently is that errors accumulate at
> n times the normal rate per sector, but none of them are detected by
> RAID (no crc), and when a disk drops out then you get a good chance of
> picking up a corrupted copy instead of a good copy, because nobody
> has checked the copy meanwhiles to see if it matches the original).

I have no idea which of you to believe now. :(

I currently only have one system using software raid, and several of
my employer's machines using hardware raid, all of which have
various raid-1, -5 and -10 setups and all use only ext3.

Let's focus on the personal machine of mine for now since it uses
Linux software RAID and therefore on-topic here.  It has /boot on a
small RAID-1, and the rest of the system is on RAID-5 with an
(Continue reading)

Neil Brown | 3 Jan 06:49 2005
X-Face
Picon
Picon

Re: Regarding RAID0 Code

On Saturday January 1, poonamsbox <at> yahoo.com wrote:
> Respected Sir,
>  
>             I am a student from India. I am studying the RAID 0 code
>             in the linux kernel 2.4.20. I got some problems in
>             understanding the code, especially in the
>             raid0_make_request function. I didnt understand the
>             following statements.. like why the anding is done, and
>             what is sect_in_chunk. Please if you could tell me the
>             details, it would be a very great help for me. Please
>             sir, hoping for some help. 
>  
> 258         sect_in_chunk = bh->b_rsector & ((chunk_size<<1) -1);
> 259         chunk = (block - zone->zone_offset) / (zone->nb_dev << chunksize_bits);
> 260         tmp_dev = zone->dev[(block >> chunksize_bits) % zone->nb_dev];
> 261         rsect = (((chunk << chunksize_bits) + zone->dev_offset)<<1)
> 262                 + sect_in_chunk;

"sect_in_chunk" is the number of the target sector within it's chunk.
i.e. the remainder when the target sector is divided by the chunk
size.

If it still doesn't make sense, try writing the code for mapping an
array sector number to a (device number + device sector number).  Then
ask specifically about differences.

NeilBrown

> 
>                Thanking you in anticipation.
(Continue reading)

Neil Brown | 3 Jan 07:41 2005
X-Face
Picon
Picon

Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

On Monday January 3, andy <at> strugglers.net wrote:
> 
> I have no idea which of you to believe now. :(

Well, how about I wade in.....

(almost*) No block storage device will guarantee that write ordering
is maintained.  Neither will read requests necessarily be ordered.

Any SCSI, IDE, or similar disc drive in Linux (or any other non-toy
OS) will have requests managed by an "elevator algorithm" which
coalesces adjacent blocks and  tries to re-order requests to make
optimal use of the device.

A RAID controller, whether software, firmware, or hardware, will also
re-order requests to make best use of the devices.

Any filesystem that assumes that requests will not be re-ordered is
broken, as the assumption is wrong.
I would be *very* surprised if Reiserfs makes this assumption.

Until relatively recently, the only assumption that could be made is
that a write request will be handled sometime between when it is made,
and when the request completes (i.e. the end_io callback is called).
If several requests are concurrent they could commit in any order.

With only this guarantee, the simplest approach for a journalling
filesystem is to write the content of a journal entry, wait for the
writes to complete, and then write a single block "header" which
describes and hence commits that journal entry.  The journal entry is
(Continue reading)

Peter T. Breuer | 3 Jan 09:03 2005
Picon

Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

Andy Smith <andy <at> strugglers.net> wrote:
> [-- text/plain, encoding quoted-printable, charset: us-ascii, 45 lines --]
> 
> On Sun, Jan 02, 2005 at 09:18:13PM +0100, Peter T. Breuer wrote:
> > Stephen Tweedie wrote:
> > > Umm, if soft raid is expected to have silent invisible corruptions in
> > > normal use,
> > 
> > It is, just as is all types of RAID.  This is a very strange thing for
> > Stephen to say - I cannot believe that he is as naive as he makes
> > himself out to be about RAID here and I don't know why he should say
> > that (presuming that he really knows better).
> > 
> > > then you shouldn't be using it, period.  That's got zero to
> > > do with journaling.
> > 
> > It implies that one should not be doing journalling on top of it.
> > 
> > (The logic for why RAID corrupts silently is that errors accumulate at
> > n times the normal rate per sector, but none of them are detected by
> > RAID (no crc), and when a disk drops out then you get a good chance of
> > picking up a corrupted copy instead of a good copy, because nobody
> > has checked the copy meanwhiles to see if it matches the original).
> 
> I have no idea which of you to believe now. :(

Both of us. We have not disagreed fundamentally. Read closely! Stephen
says "IF (my caps) soft raid is expected to have ...". Well, it is,
just like any RAID.

(Continue reading)

Peter T. Breuer | 3 Jan 09:37 2005
Picon

Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

Neil Brown <neilb <at> cse.unsw.edu.au> wrote:
> Well, how about I wade in.....

Sure! :-)

> A RAID controller, whether software, firmware, or hardware, will also
> re-order requests to make best use of the devices.

Possibly.  I have written block device drivers that maintain write
order, however (or at least do so if you ask them to, with the right
switch), because ...

> Any filesystem that assumes that requests will not be re-ordered is
> broken, as the assumption is wrong.
> I would be *very* surprised if Reiserfs makes this assumption.

.. because that is EXACTLY what Hans Reiser has said to me. I don't
think I've kept the mail, but I remember it.  a quick google for
reiserfs + write ordering shows up some suggestive quotes:

  > We cannot use the buffer.c dirty list anyway because bdflush can write
  > those buffers to disk at any time.  Transactions have to control the
  > write ordering  ...

(hey, that was Hans quoting Stephen). From the Linux High Availability
website (http://linuxha.trick.ca/DataRedundancyByDrbd):

   Since later WRITE requests might depend on successful finished
   previous ones, this is needed to assure strict write ordering on
   both nodes. ...
(Continue reading)


Gmane