1 Jan 2005 18:24
Advice for dealing with bad sectors on /
Steve Listopad <listopad <at> yahoo.com>
2005-01-01 17:24:57 GMT
2005-01-01 17:24:57 GMT
All,
Trying to figure out how to deal with, I assume, a dying disk that's
unfortunately on / (ext3).
Getting errors similar to:
Dec 31 20:44:30 mybox kernel: hdb: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Dec 31 20:44:30 mybox kernel: hdb: dma_intr: error=0x40 { UncorrectableError },
LBAsect=163423, high=0, low=163423, sector=163360
Dec 31 20:44:30 mybox kernel: end_request: I/O error, dev 03:41 (hdb), sector
163360
When I rebooted, the system threw me into a shell, to get me to "fix" things.
So, I did an e2fsck -c -v /dev/hdb1 to attempt to fix things. The badblocks
checking took 20 hours (it's a 200GB disk). Then I went through the
question/answer session, hoping to get through the problems...
I rebooted after this, and the machine is running, but I'm betting that failure
is close. I've been trying to use smartctl to see if the bad locations were
actually remapped, but the Current_Pending_Sector count is 227. I think this
means that these are sectors that are queued to be remapped, but have not been.
So... I have a / disk that's flaky, and believe it or not, it's under
warranty, so I can replace it.
Some questions:
1) Is there a way to copy data off of this disk, so that I could simply swap
(Continue reading)
Some of the facts are. The conclusions are not.
> There is nothing that attempts expliciitly to maintain the ordering in
> RAID (talking about mirroring here).
Disks and IO subsystems in general don't preserve IO ordering. ext3 is
designed not to care. As long as the raid device tells the truth about
when the data is actually committed to disk (all of the mirror volumes
are uptodate) for a given IO, ext3 should be quite happy.
> What's wrong is that the journal will be mirrored (if it's a mirror).
> That means that (1) its data will be written twice, which is a big deal
> since ALL the i/o goes through the journal first
Not true; by default, only metadata goes through the journal, not data.
> and (2) the journal
> is likely to be inconsistent (since it is so active) if you get one of
> those creeping invisible RAID corruptions that can crop up inevitably
> in RAID normal use.
RSS Feed