Jason Sydes | 3 May 2005 04:20

several ext3 and mysql kernel crashes

Hi Ext3!

I'm running about 30 dedicated MySQL machines under quite decent loads,
and they are occassionally crashing.  I've been logging console messages
recently in an effort to find the cause, and some appear to be related
to 

I perused your lists and found the message I'm replying to.

If you don't mind, I've included messages and ksymoops from two crashes
that I had recently.  Both were different.  I'm not sure if you have
fixes for them in the new kernel, so I'll be upgrading a few machines
tonight.

I'm running 2.6.10 with the "data=journal" mount option.
Is that the best / safest option for running with MySQL?

In any case, I'm logging all console messages now, so hopefully I can
have more ksymoops output for you soon enough.

I've included the output for each below.

Thank you for your time!
Jason

First Machine ("Scratchy")
==========================

Assertion failure in __journal_drop_transaction() at 
fs/jbd/checkpoint.c:613: "transaction->t_forget == NULL"
(Continue reading)

Christian | 8 May 2005 03:20
Picon

2.6.12-rc3-mm2 benchmarks


[!! i've Cc'ed several fs lists, please remove when when replying !!]

hi all,

from time to time i do some benchmarks for several filesystems and several
crypto-algorithms too, details here:

http://nerdbynature.de/bench/

latest results here:

http://nerdbynature.de/bench/prinz/2.6.12-rc3-mm2/bonnie.html
http://nerdbynature.de/bench/prinz/2.6.12-rc3-mm2/tiobench.txt

Christian.
--
BOFH excuse #173:

Recursive traversal of loopback mount points
Hans Yperman | 13 May 2005 00:35
Picon

Smashing EXT3 for fun and profit (or: how to loose all your data)

Hello everyone,

I've just lost my whole EXT3 linux partition by what was probably a
bug.  For your reading pleasure, and in the hope there is enough
information to fix this problem in the future, here the story of a
violent ending:

This tragic history starts actually on windows: MS Word had wiped out
an important file on a floppy, and I got the task of retrieving what
was possible.  Using Linux, I made an image with dd,and put it on the
now extinct EXT3 partition. I used an undelete programma ,  and then
mounted the image with a loopback device:
mount -o loop /tmp/image.img /floppy
  As it turns out,the undeleter managed to screw up the FAT, and the
loopback device complains about reading past the end of the device. 
After fixing the floppy on another computer, I come back to the linux
computer.  The console is full of error messages.

What happened?  A first bug: Linux remounted the loopback-device
read-only because  of the bad FAT on the image.  BUT this did not work
out right: not only the loopback device, but the whole EXT3-partition
were now read-only. Every little write action results in an error,
hence all the messages.  I did not really think much of it at that
point, and just did a
mount -o remount,rw /

At this point, I am already screwed, but I don't realize it yet:  The
computer works completely normal from here on.  The problem happens
the next time I boot: fsck complains about problems (weird, fsck is
not supposed to run for EXT3).  Specifically, fsck complains about
(Continue reading)

Joseph D. Wagner | 13 May 2005 21:55

RE: Smashing EXT3 for fun and profit (or: how to loose all your data)

> I guess these 2 facts need fixing:
> 1) loopback devices should not pass errors over
> to their underlying filesystems.

I have a test partition setup for these circumstances.  I'll try to reproduce the read-write/read-only
error spreading to an underlying file system when the loopback file system has the error.  However, I will
have to double check with the file system designers.  There may be a good reason it behaves this way.

> 2) ext3 suicidally allows remounting read-write
> when parts of its data are invalid.

When you are logged in as root, it will let you whatever suicidal -- or imho stupid -- things you tell it to do. 
That is not going to change.

It actually takes something serious to bring down a file system mid-stride, not just an atime update.  In
other words, by the time Linux is remounting your file system as read-only, something is already fubar. 
The remount as read-only is really only a stop-gap measure to prevent further damage while you save your
work -- on other partitions -- and reboot.

If all you have is one honkin' / (root) partition, you may just want to change that behavior to panic.  After
all, if you only have 1 partition, there's no where else to save your work.

So long as you're redoing your partitions, be sure to separate out /tmp, /var, and just to be safe /home too,
so next time all you lose is the one bad partition.

Joseph D. Wagner
Theodore Ts'o | 14 May 2005 04:28
Picon
Picon
Favicon
Gravatar

Re: Smashing EXT3 for fun and profit (or: how to loose all your data)

On Fri, May 13, 2005 at 12:35:16AM +0200, Hans Yperman wrote:
> This tragic history starts actually on windows: MS Word had wiped out
> an important file on a floppy, and I got the task of retrieving what
> was possible.  Using Linux, I made an image with dd,and put it on the
> now extinct EXT3 partition. I used an undelete programma ,  and then
> mounted the image with a loopback device:
> mount -o loop /tmp/image.img /floppy
>   As it turns out,the undeleter managed to screw up the FAT, and the
> loopback device complains about reading past the end of the device. 
> After fixing the floppy on another computer, I come back to the linux
> computer.  The console is full of error messages.

What version of the kernel are you using?  What undelete program were
you using?  Most undelete programs don't require that you mount the
filesystem; in fact, they often require that you *don't* mount them.

> What happened?  A first bug: Linux remounted the loopback-device
> read-only because  of the bad FAT on the image.  BUT this did not work
> out right: not only the loopback device, but the whole EXT3-partition
> were now read-only. Every little write action results in an error,
> hence all the messages.  I did not really think much of it at that
> point, and just did a
> mount -o remount,rw /

Without the logs, it sounds like the ext3 filesystem got corrupted,
and so it was mounted remounted read-only.  How this happened is not
clear, and you didn't give us enough information to determine that;
but it's consistent with e2fsck displaying errors.

> At this point, I am already screwed, but I don't realize it yet:  The
(Continue reading)

David Clunie | 15 May 2005 15:56
Gravatar

Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

Hi

I have a Firewire connected Micronet 1.5TB RAID with a single
large ext3 filesystem on one partition on a dual Xeon system.

I am checking out from an extremely large cvs repository
(don't ask) to this drive over the course of many days, and
intermittently I get bad blocks and the filesystem goes
read-only. This is not related to any power failure or
anything similar. The RAID is currently about 40% full;
this started to happen around the 15% mark as I recall.

I checked the RAID firmware setup, found that caching was
set to write-back, and changed it to write-through to
see if that would help (since I gather the Linux kernel
presumes write-through, though why it should make a
difference in the absence of a reboot or power failure
I don't understand).

This reduced the frequency of the error from once a night
to once every couple of nights; interestingly mostly at
about 04:03 AM or so. Looking at cron.daily, only mrtg
and sa seem to be starting up at about that time.

I suspect the timing is related to a change in the pattern
of disk activity rather than anything else.

I have no reason to suspect that there is anything actually
wrong with the RAID itself, which just appears as a really
big firewire external disk. It is new however, so this
(Continue reading)

Joseph D. Wagner | 16 May 2005 00:48

RE: Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

> May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63343526: bad block 165510584
> May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63343381: bad block 141623810
> May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63947123: bad block 203323361

These errors cannot be caused by a bug in the file system.  It is possible, although highly unlikely, that a
bug in the device driver could generate these errors.

The most likely cause is that there actually are bad blocks on your new 1.5TB file system.

Do us all a favor and run:

Badblocks -v -b block_size /dev/device

And let us know about the results.

Joseph D. Wagner
anandtiwari | 17 May 2005 01:39
Picon

Ext3 journal corruption

Hi all, 

I was having a ext3 filesystem with writeback. yesterday my system crashed 
and now when i try to mount it, it gives me "Invalid argument". Following is 
the command line
#mount -t ext3 /dev/hda1 /mnt/home 

i tried debugging it and later i found out, its was complaining about 
journaling inode. Is there any way to recover my files, i did clone the disk 
and mounted it as ext2 after few tries but there was nothing in it.
any help or pointers will be appreciated, 

Thanks
anand
Theodore Ts'o | 17 May 2005 03:08
Picon
Picon
Favicon
Gravatar

Re: Ext3 journal corruption

On Mon, May 16, 2005 at 05:39:00PM -0600, anandtiwari <at> softhome.net wrote:
> Hi all, 
> 
> I was having a ext3 filesystem with writeback. yesterday my system crashed 
> and now when i try to mount it, it gives me "Invalid argument". Following 
> is the command line
> #mount -t ext3 /dev/hda1 /mnt/home 
> 
> i tried debugging it and later i found out, its was complaining about 
> journaling inode. Is there any way to recover my files, i did clone the 
> disk and mounted it as ext2 after few tries but there was nothing in it.
> any help or pointers will be appreciated, 

1)  Run e2fsck to correct any filesystem errors.  This may remove the journal inode.

2)  If it didn't, to be safe, remove the journal: 
	"tune2fs -O ^has_journal /dev/hdXX" 

3)  Then recreate the journal:  "tune2fs -j /dev/hdXX"

					 Ted
Anand Tiwari | 17 May 2005 04:05
Picon

Re: Ext3 journal corruption

ok, but just curious, if it is not cleanly umounted, mount shouldnt be able
to mount it as ext2fs.

----- Original Message ----- 
From: "Theodore Ts'o" <tytso <at> mit.edu>
To: <anandtiwari <at> softhome.net>
Cc: <ext3-users <at> redhat.com>
Sent: Monday, May 16, 2005 7:08 PM
Subject: Re: Ext3 journal corruption

> On Mon, May 16, 2005 at 05:39:00PM -0600, anandtiwari <at> softhome.net wrote:
> > Hi all,
> >
> > I was having a ext3 filesystem with writeback. yesterday my system
crashed
> > and now when i try to mount it, it gives me "Invalid argument".
Following
> > is the command line
> > #mount -t ext3 /dev/hda1 /mnt/home
> >
> > i tried debugging it and later i found out, its was complaining about
> > journaling inode. Is there any way to recover my files, i did clone the
> > disk and mounted it as ext2 after few tries but there was nothing in it.
> > any help or pointers will be appreciated,
>
> 1)  Run e2fsck to correct any filesystem errors.  This may remove the
journal inode.
>
> 2)  If it didn't, to be safe, remove the journal:
> "tune2fs -O ^has_journal /dev/hdXX"
(Continue reading)


Gmane