Eric Anderson | 2 Jan 2008 05:44
Picon
Favicon
Gravatar

ZFS i/o errors - which disk is the problem?

I created a zpool with two new identical (500GB) SATA disks.  I rsync'ed 
a bunch of data over to the new ZFS file systems, and started seeing i/o 
errors.

Here's how I created the file systems:

zpool create tank mirror ad6 ad8
zfs create tank/media
zfs create tank/documents
zfs set sharenfs=on tank/media
zfs set sharenfs=on tank/documents
zfs set atime=off tank
zfs set mountpoint=/media tank/media
zfs set mountpoint=/documents tank/documents

Here's what zpool status says:

# zpool status
   pool: tank
  state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub completed with 731 errors on Tue Jan  1 15:17:08 2008
config:

         NAME        STATE     READ WRITE CKSUM
         tank        ONLINE       0     0 1.47K
(Continue reading)

Bernd Walter | 2 Jan 2008 08:01
Picon

Re: ZFS i/o errors - which disk is the problem?

On Tue, Jan 01, 2008 at 10:44:43PM -0600, Eric Anderson wrote:
> I created a zpool with two new identical (500GB) SATA disks.  I rsync'ed 
> a bunch of data over to the new ZFS file systems, and started seeing i/o 
> errors.
> 
> Here's how I created the file systems:
> 
> zpool create tank mirror ad6 ad8
> zfs create tank/media
> zfs create tank/documents
> zfs set sharenfs=on tank/media
> zfs set sharenfs=on tank/documents
> zfs set atime=off tank
> zfs set mountpoint=/media tank/media
> zfs set mountpoint=/documents tank/documents
> 
> 
> Here's what zpool status says:
> 
> # zpool status
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: scrub completed with 731 errors on Tue Jan  1 15:17:08 2008
> config:
> 
(Continue reading)

Kris Kennaway | 2 Jan 2008 12:09
Picon
Favicon

Re: [PATCH] ZFS not caching on i386 with kmem_size >1GB

David Taylor wrote:
> Hi,
> 
> About 2 months ago I reported that I found ZFS extremely slow for
> some tasks (specifically upgrading ports).  This was because ZFS
> was only using the absolute minimum cache size at all times.
> 
> The problem is here in /sys/contrib/opensolaris/uts/common/fs/zfs/arc.c:
> 
> static int
> arc_reclaim_needed(void)
> {
> ...
>         if (kmem_used() > (kmem_size() * 4) / 5)
>                 return (1);
> }
> 
> I'm running on i386 with kmem_size set to 1GB.  As a result, the
> multiplication overflows and the test becomes (kmem_used() > 0).  ZFS then
> always tries to shrink the cache, and never grows it above the absolute
> minimum size (about 30MB for each of c and p)
> 
> The patch I have attached fixes the problem for me, although there is probably
> a better way to avoid the overflow (without calling kmem_size() twice).
> Best of all, portupgrade is now an order of magnitude faster!
> 
> Of course, I'm now worried that my previously rock-solid settings will actually
> trigger the kmem_map too small panics when the cache actually fills up.
> 

(Continue reading)

Eric Anderson | 2 Jan 2008 13:32
Picon
Favicon
Gravatar

Re: ZFS i/o errors - which disk is the problem?

Bernd Walter wrote:
> On Tue, Jan 01, 2008 at 10:44:43PM -0600, Eric Anderson wrote:
>> I created a zpool with two new identical (500GB) SATA disks.  I rsync'ed 
>> a bunch of data over to the new ZFS file systems, and started seeing i/o 
>> errors.
>>
>> Here's how I created the file systems:
>>
>> zpool create tank mirror ad6 ad8
>> zfs create tank/media
>> zfs create tank/documents
>> zfs set sharenfs=on tank/media
>> zfs set sharenfs=on tank/documents
>> zfs set atime=off tank
>> zfs set mountpoint=/media tank/media
>> zfs set mountpoint=/documents tank/documents
>>
>>
>> Here's what zpool status says:
>>
>> # zpool status
>>   pool: tank
>>  state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>>         corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
>>         entire pool from backup.
>>    see: http://www.sun.com/msg/ZFS-8000-8A
>>  scrub: scrub completed with 731 errors on Tue Jan  1 15:17:08 2008
>> config:
(Continue reading)

Gore Jarold | 3 Jan 2008 16:34
Picon
Favicon

moving slots with a 3ware raid controller ... danger ?

I am running a 3ware 9650SE-16ML on FreeBSD
6.1-RELEASE.

I think that the card is in a 4x PCI-E slot, and I
think that is causing my system to crash periodically
under very high IO load.

I am planning on moving the card from the 4X slot to
an 8x slot, which is the speed recommended for it.

Before I do this, I would like a sanity check - is
there ANY reason that moving slots with this card
would be dangerous ?  Will FreeBSD care that the card
comes in on a new slot ?  Will the card ?  The arrays
?

Is there anything at all I should know about this
proposed slot move that would put my filesystems in
danger in any way ?  War stories and speculation
appreciated...

Thanks.

      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs
_______________________________________________
freebsd-fs <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe <at> freebsd.org"
(Continue reading)

Peter Schuller | 3 Jan 2008 17:50
Gravatar

Re: ZFS i/o errors - which disk is the problem?

> I can believe a problematic SATA controller (it's an add-on PCI board),
> but does anyone know of a way to ask ZFS which devices in a pool it
> thinks has issues?

That is exactly what zpool status is intended to tell you. That is, the disks 
that you are seeing checksum errors on are the ones seeing the faults. In 
your case both drives show checksum errors (for some reason).

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller <at> infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey <at> scode.org
E-Mail: peter.schuller <at> infidyne.com Web: http://www.scode.org

Scott Long | 3 Jan 2008 17:13

Re: moving slots with a 3ware raid controller ... danger ?

Gore Jarold wrote:
> I am running a 3ware 9650SE-16ML on FreeBSD
> 6.1-RELEASE.
> 
> I think that the card is in a 4x PCI-E slot, and I
> think that is causing my system to crash periodically
> under very high IO load.

Can't help ya if you don't say what the crash is.

> 
> I am planning on moving the card from the 4X slot to
> an 8x slot, which is the speed recommended for it.

If this is what 3ware actually recommends, I'd find it
highly amusing, not mention highly unlikely that they have
a controller that can actually push more that 1GB/sec of
disk bandwidth, even if it had 16 disks hooked up to it.

> 
> Before I do this, I would like a sanity check - is
> there ANY reason that moving slots with this card
> would be dangerous ?  Will FreeBSD care that the card
> comes in on a new slot ?

No

>  Will the card ?

I'd certainly hope not.
(Continue reading)

Peter Schuller | 3 Jan 2008 17:53
Gravatar

Re: moving slots with a 3ware raid controller ... danger ?

> Is there anything at all I should know about this
> proposed slot move that would put my filesystems in
> danger in any way ?  War stories and speculation
> appreciated...

I don't have experience with that particular card. That said, the only likely 
issue I can think of off hand would be if you had multiple cards and moving 
it re-defined the order with which they, and thus the drives on them, are 
detected. This would mix up drive naming, which I suppose could be classified 
as putting filesystems in danger...

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller <at> infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey <at> scode.org
E-Mail: peter.schuller <at> infidyne.com Web: http://www.scode.org

Eric Anderson | 3 Jan 2008 18:10
Picon
Favicon
Gravatar

Re: ZFS i/o errors - which disk is the problem?

Peter Schuller wrote:
>> I can believe a problematic SATA controller (it's an add-on PCI board),
>> but does anyone know of a way to ask ZFS which devices in a pool it
>> thinks has issues?
> 
> That is exactly what zpool status is intended to tell you. That is, the disks 
> that you are seeing checksum errors on are the ones seeing the faults. In 
> your case both drives show checksum errors (for some reason).
> 

Yea, I suspect it's the cheesy SATA controller I stuck in the system.  I 
suppose I will rebuild my NFS server with different hardware :(

Thanks,
Eric

_______________________________________________
freebsd-fs <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe <at> freebsd.org"

Brooks Davis | 3 Jan 2008 18:18
Picon
Favicon

Re: ZFS i/o errors - which disk is the problem?

On Thu, Jan 03, 2008 at 11:10:06AM -0600, Eric Anderson wrote:
> Peter Schuller wrote:
>>> I can believe a problematic SATA controller (it's an add-on PCI board),
>>> but does anyone know of a way to ask ZFS which devices in a pool it
>>> thinks has issues?
>> That is exactly what zpool status is intended to tell you. That is, the 
>> disks that you are seeing checksum errors on are the ones seeing the 
>> faults. In your case both drives show checksum errors (for some reason).
> 
> Yea, I suspect it's the cheesy SATA controller I stuck in the system.  I 
> suppose I will rebuild my NFS server with different hardware :(

We've definitely seen cases where hardware changes fixed ZFS checksum errors.
In once case, a firmware upgrade on the raid controller fixed it.  In another
case, we'd been connecting to an external array with a SCSI card that didn't
have a PCI bracket and the errors went away when the replacement one arrived
and was installed.  The fact that there were significant errors caught by ZFS
was quite disturbing since we wouldn't have found them with UFS.

-- Brooks

Gmane