Nikolay Denev | 1 Mar 06:24 2010
Picon

Re: should zfs care so much about device name changes

On 27 Feb, 2010, at 22:47 , Pawel Jakub Dawidek wrote:

> On Wed, Feb 24, 2010 at 03:58:58PM +0200, Nikolay Denev wrote:
>> 
>> On Feb 24, 2010, at 11:53 AM, Nikolay Denev wrote:
>> 
>>> Hello,
>>> 
>>> I wanted to test the new option ATA_CAM, but that would require a boot from another media (USB Drive/CD)
and a zpool export/import
>>> to update the vdev names. (actually it's shown as "path" in the zpool.cache file) because otherwise the
system would refuse to open/mount
>>> the pool.
>>> But is that really necessary given the fact that all the devices are here, and have matching GUIDS to
those in the zpool.cache file?
>>> Shouldnt ZFS just import the pool? 
>>> 
>>> In the current state what would one have to do to for example test ATA_CAM on a remote machine, where
export/import from a rescue
>>> media is not possible?
>>> 
>>> Thanks,
>>> Niki
>> 
>> 
>> I have now looked at sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c and it seems that
when a vdev isn't found by pathname it
>> is being searched by guid. But it doesn't seem to be the case here. The loader sees the pool and boots the
kernel, but then the kernel does not see
>> the pool. Any ideas?
(Continue reading)

James R. Van Artsdalen | 1 Mar 06:58 2010

[zfs] attach by name/uuid still attaches wrong device

FreeBSD bigtex.housenet.jrv 9.0-CURRENT FreeBSD 9.0-CURRENT #2 r200727M:
Tue Dec 22 23:25:56 CST 2009    
james <at> bigtex.housenet.jrv:/usr/obj/usr/src/sys/BIGTEX  amd64

It appears the zfs/vdev_geom.c can still attach to the wrong device in
some cases.  Note in the zpool status output how ada10 appears in two
different vdevs.

What happened is that a disk failed completely (scbus3 target 3) and is
no longer detected by the driver.  At boot time:

1. ZFS fails to attach by path and UUID, since what was at ada11 is now
at ada10 and has a different  UUID.
2. ZFS fails to attach by UUID since that UUID is on a dead drive and
can no longer be found anywhere.
3. ZFS then attaches by path blindly, even though that drive is in a
different part of the pool and has a different UUID.

I don't think it's possible to do this right in vdev_geom.c: there's no
way to guess what is intended without a hint from higher ZFS layers as
to which drives should be found and which are new.

The best fixes I can think of are to expose drives by serial number in
GEOM, or perhaps as a fall-back expose names that are geographic
locations, i.e., "/dev/scbus0/target3/lun0".

# zpool status   
  pool: bigtex
 state: DEGRADED
status: One or more devices could not be used because the label is
(Continue reading)

FreeBSD bugmaster | 1 Mar 12:06 2010
Picon

Current problem reports assigned to freebsd-fs <at> FreeBSD.org

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/144330  fs         [nfs] mbuf leakage in nfsd with zfs
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o bin/144214   fs         zfsboot fails on gang block after upgrade to zfs v14
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o kern/143345  fs         [ext2fs] [patch] extfs minor header cleanups to better
o kern/143343  fs         [zfs] bug in sunlink flag on directories
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142924  fs         [ext2fs] [patch] Small cleanup for the inode struct in
o kern/142914  fs         [zfs] ZFS performance degradation over time
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142872  fs         [zfs] ZFS ZVOL Lockmgr Deadlock
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142594  fs         [zfs] Modification time reset to 1 Jan 1970 after fsyn
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142401  fs         [ntfs] [patch] Minor updates to NTFS from NetBSD
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141950  fs         [unionfs] [lor] ufs/unionfs/ufs Lock order reversal
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
(Continue reading)

Willem Jan Withagen | 1 Mar 12:52 2010
Picon

Re: mbuf leakage with nfs

On 28-2-2010 18:55, Gerrit Kühn wrote:
> On Sun, 28 Feb 2010 12:21:28 +0000 "Robert N. M. Watson"
> <rwatson <at> freebsd.org>  wrote about Re: mbuf leakage with nfs/zfs? :
>
> RNMW>  It's almost certainly one or a small number of very specific RPCs
> RNMW>  that are triggering it -- maybe OpenBSD does an extra lookup, or
> RNMW>  stat, or something, on a name that may not exist anymore, or does it
> RNMW>  sooner than the other clients. Hard to say, other than to wave hands
> RNMW>  at the possibilities.
> RNMW>
> RNMW>  And it may well be we're looking at two bugs: Danny may see one bug,
> RNMW>  perhaps triggered by a race condition, but it may be different from
> RNMW>  the OpenBSD client-triggered bug (to be clear: it's definitely a
> RNMW>  FreeBSD bug, although we might only see it when an OpenBSD client is
> RNMW>  used because perhaps OpenBSD also has a bug or feature).
>
> In my case it is the Linux client causing the problems (cannot tell yet if
> it is only with udp, but I would think so). If I understand Daniel
> correctly his latest testes were performed with FreeBSD client and udp. So
> it may very well be a generel issue with udp?! Would this help narrowing
> down the problem?

I'm off 'till thursday.
At which time I'm willing to run more tests. Got plenty of boxes here.
Both FreeBSD and Linux. And otherwise will boot more in VirtualBox.

--WjW

_______________________________________________
freebsd-fs <at> freebsd.org mailing list
(Continue reading)

joerg | 1 Mar 14:27 2010
Picon

Re: kern/141257: [gvinum] No puedo crear RAID5 por SW con gvinum

Synopsis: [gvinum] No puedo crear RAID5 por SW con gvinum

State-Changed-From-To: open->closed
State-Changed-By: joerg
State-Changed-When: Mon Mar 1 14:23:19 MET 2010
State-Changed-Why: 
In order to create a RAID-5 (or striped) plex, you have
to also provide the stripe size:

volume ...
plex org raid5 256k
sd ...

(Please submit bug reports in English, as this allows more
people to understand it.  I hope my poor Spanish knowledge
was good enough to understand your actual problem.)

http://www.freebsd.org/cgi/query-pr.cgi?pr=141257
_______________________________________________
freebsd-fs <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe <at> freebsd.org"

Gerrit Kühn | 1 Mar 16:32 2010
Picon
Picon

Re: mbuf leakage with nfs

On Mon, 01 Mar 2010 12:52:32 +0100 Willem Jan Withagen <wjw <at> digiware.nl>
wrote about Re: mbuf leakage with nfs:

WJW> > In my case it is the Linux client causing the problems (cannot tell
WJW> > yet if it is only with udp, but I would think so). If I understand
WJW> > Daniel correctly his latest testes were performed with FreeBSD
WJW> > client and udp. So it may very well be a generel issue with udp?!
WJW> > Would this help narrowing down the problem?
WJW> 
WJW> I'm off 'till thursday.
WJW> At which time I'm willing to run more tests. Got plenty of boxes here.
WJW> Both FreeBSD and Linux. And otherwise will boot more in VirtualBox.

I finally too an axe and restarted nfsd without "-u". Now my mbuf usage is
flat as it should be. I guess some people using computers with udp
mounts will complian, but this can be fixed easily by converting their
connections to tcp.
However, I am still interested in having the issue fixed, so I will be
following the thread and contribute if possible.

cu
  Gerrit
_______________________________________________
freebsd-fs <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe <at> freebsd.org"

Brandon Gooch | 1 Mar 19:37 2010
Picon

Re: ZFS and sh(1) panic: spin lock [lock addr] (smp rendezvous) held by [sh(1) proc tid] too long

On Sat, Feb 20, 2010 at 4:19 PM, Attilio Rao <attilio <at> freebsd.org> wrote:
> 2010/1/27 Brandon Gooch <jamesbrandongooch <at> gmail.com>:
>> The machine, a Dell Optiplex 755, has been locking up recently. The
>> situation usually occurs while using VirtualBox (running a 64-bit
>> Windows 7 instance) and doing anything else in another xterm (such as
>> rebuilding a port).  I've been unable to reliably reproduce it (I'm in
>> an X session and the machine will not panic "properly").
>>
>> However, while rebuilding Xorg today at ttyv0 and runnning
>> VBoxHeadless on ttyv1, I managed to trigger what I believe is the
>> lockup.
>>
>> I've attached a textdump in hopes that someone may be able to take a
>> look and provide clues or instruction on debugging this.
>
> I think that jhb <at>  saw a similar problem while working on nVidia driver
> or the like.
> Not sure if he made any progress to debug this.
>

The situation has improved slightly, although attempting to run two
VirtualBox guests at the same time inevitably leads to a lock-up. I've
just taken to running one at a time. Not ideal, but until more
debugging can be done, it's the only option I have.

I ran into this using nvidia and radeon both. I can't really find a
pattern, but I do see it when Windows is trying to draw a new window,
or dim the screen when UAC kicks in...

BTW, anyone know how to get a good dump when running Xorg? I'm not
(Continue reading)

Freddie Cash | 1 Mar 20:57 2010
Picon

HAST, ucarp, and ZFS

Perhaps it's just a misunderstanding on my part of the layering involved,
but I'm having an issue with the sample ucarp_up.sh script on the HAST wiki
page.

Here's the test setup that I have:
  hast1:
      glabel 4x 2 GB virtual disks (label/disk01 --> label/disk04)
      hast.conf create 4 resources (disk01 --> disk04, using the glabelled
disks)
      zpool create hapool raidz1 hast/disk01 .. hast/disk04

  hast2:
      glabel 4x 2 GB virtual disks (label/disk01 --> label/disk04)
      hast.conf create 4 resources (disk01 --> disk04)

So far so good.  On hast1, I have a working ZFS pool, I can create data,
filessytems, etc, and watch network traffic as it syncs to hast2.

I can manually down hast1 and switch hast2 to "primary" and import the
hapool.  I can create data, filesystems, etc.  And I can manually bring
hast1 online and set it to secondary, and watch it sync back.

Where I'm stuck is how to modify the ucarp_up.sh script to work with
multiple hast resources.  Do I just edit it to handle each of the 4 hast
resources in turn, or am I missing something simple, like that there should
only be a single hast resource?  I'm guess it's a simple "edit the script to
suit my setup" issue, but wanted to double-check.

The production server I want to use this with has 24 harddrives in it,
configured into multiple raidz2 vdevs, as part of a single ZFS pool.  Which
(Continue reading)

Pawel Jakub Dawidek | 1 Mar 21:43 2010
Picon

Re: HAST, ucarp, and ZFS

On Mon, Mar 01, 2010 at 11:57:15AM -0800, Freddie Cash wrote:
> Perhaps it's just a misunderstanding on my part of the layering involved,
> but I'm having an issue with the sample ucarp_up.sh script on the HAST wiki
> page.
> 
> Here's the test setup that I have:
>   hast1:
>       glabel 4x 2 GB virtual disks (label/disk01 --> label/disk04)
>       hast.conf create 4 resources (disk01 --> disk04, using the glabelled
> disks)
>       zpool create hapool raidz1 hast/disk01 .. hast/disk04
> 
>   hast2:
>       glabel 4x 2 GB virtual disks (label/disk01 --> label/disk04)
>       hast.conf create 4 resources (disk01 --> disk04)
> 
> So far so good.  On hast1, I have a working ZFS pool, I can create data,
> filessytems, etc, and watch network traffic as it syncs to hast2.
> 
> I can manually down hast1 and switch hast2 to "primary" and import the
> hapool.  I can create data, filesystems, etc.  And I can manually bring
> hast1 online and set it to secondary, and watch it sync back.
> 
> Where I'm stuck is how to modify the ucarp_up.sh script to work with
> multiple hast resources.  Do I just edit it to handle each of the 4 hast
> resources in turn, or am I missing something simple, like that there should
> only be a single hast resource?  I'm guess it's a simple "edit the script to
> suit my setup" issue, but wanted to double-check.

The scripts in share/examples/hast/ are well... just examples and
(Continue reading)

Rick Macklem | 1 Mar 23:21 2010
Picon

Re: mbuf leakage with nfs/zfs?


On Sat, 27 Feb 2010, Jeremy Chadwick wrote:

>> I concur.
>> Everything in my network is now on TCP, and there is no mbuf leakage.
>> I just don't get over the 5500 mark, no matter what I throw at it.
>>
>> I do feel that TCP is not as well performing on a local net with Linux,
>> hence the choice for UDP. But TCP is workable as next best.
>
> NFS; Rick Macklem would be a better choice, but as reported, he's MIA.
>

Not exactly MIA, but only able to read email from time to time at this
point. I don't know when I'll be able to do more than that.

So, it does sound like it is UDP specific. Robert mentioned one scenario,
which was an infrequently executed code path that is being tickled and it
has a missing m_freem().

One thing someone could try is switching to the experimental nfs server
("-e" on both mountd and nfsd) and see if the leak goes away. If it does
go away, it is almost certainly the above in the regular nfs server code.

If it doesn't go away, the problem is more likely in the krpc or the
generic udp code. (When I looked at svc_dg.c, I could only spot one
possible leak and you've already determined that patch doesn't help.
The other big difference when using udp on the FreeBSD8 krpc is the
reply cache code. I seem to recall it's an lru cache with a fixed upper
bound, but it might be broken and leaking.
(Continue reading)


Gmane