Mark Davies | 1 Oct 01:20 2008
Picon
Picon

wapbl causing hangs

On a dell poweredge 1950 running current from a few days ago I have a 
couple of filesystems that are identical except one is mounted with "-o 
log" and the other with "-o softdep".  If I try to copy approx 34GB from 
another machine to the softdep filesystem (via "tar | ssh tar" or rsync) 
it works fine, but if I try to do the same to the wapbl filesystem the 
whole machine hangs after a couple of GB.  Unfortunately the machine just 
hangs solidly without any indication of what happened.  Any suggestions on 
what I can do to try and get more info on whats going wrong?

cheers
mark

Thor Lancelot Simon | 1 Oct 01:20 2008

Re: hardlink to symlink behaviour [was: Re: MKXORG=yes on Linux: nbmtree: existing entry for `libXaw.so', type `link' does not match type `file']

On Tue, Sep 30, 2008 at 02:59:32PM +0200, Joerg Sonnenberger wrote:
> On Tue, Sep 30, 2008 at 01:59:15PM +0200, Hubert Feyrer wrote:
> > Digging a bit more into this, it's related to the difference in handling  
> > hardlink to symlinks. On NetBSD, it seems that you get a link to the  
> > target file these days[1], while on Linux you get a link to the  
> > symlink[2]. ISTR that NetBSD used to behave like Linux there - is this  
> > really intended?
> 
> NetBSD behaves like POSIX and always has.

Does POSIX actually specify this?  The difference probably stems from
whether or not the symlink consumes an inode: on 4.3BSD and prior, it
always did, and on 4.4 and NetBSD it does not (or is arranged to look
as if it does not, in a few uncommon cases, IIRC).

--

-- 
Thor Lancelot Simon	                                   tls <at> rek.tjls.com
    "Even experienced UNIX users occasionally enter rm *.* at the UNIX
     prompt only to realize too late that they have removed the wrong
     segment of the directory structure." - Microsoft WSS whitepaper

Greg Oster | 1 Oct 01:47 2008
Picon
Picon

Re: wapbl causing hangs

Mark Davies writes:
> On a dell poweredge 1950 running current from a few days ago I have a 
> couple of filesystems that are identical except one is mounted with "-o 
> log" and the other with "-o softdep".  If I try to copy approx 34GB from 
> another machine to the softdep filesystem (via "tar | ssh tar" or rsync) 
> it works fine, but if I try to do the same to the wapbl filesystem the 
> whole machine hangs after a couple of GB.  Unfortunately the machine just 
> hangs solidly without any indication of what happened.  Any suggestions on 
> what I can do to try and get more info on whats going wrong?

That Dell uses the mfi driver like the Dell 2950, yes?  
If so, see PR#39297 :(

(Basically WAPBL initiates a flush on the disk/raid cache, and, well, 
let's just say calling ltsleep() from mfi_intr() context is Bad..)

Later...

Greg Oster

NetBSD source update | 1 Oct 02:38 2008
Picon

triweekly CVS update output


Updating release-3-0 src tree (netbsd-3-0):

Running the SUP scanner:
SUP Scan for release-3-0 starting at Wed Oct  1 00:07:21 2008
SUP Scan for release-3-0 completed at Wed Oct  1 00:09:56 2008

Updating release-3-1 src tree (netbsd-3-1):

Running the SUP scanner:
SUP Scan for release-3-1 starting at Wed Oct  1 00:21:35 2008
SUP Scan for release-3-1 completed at Wed Oct  1 00:24:49 2008

Updating release-4-0 src tree (netbsd-4-0):
U doc/CHANGES-4.0.1

Running the SUP scanner:
SUP Scan for release-4-0 starting at Wed Oct  1 00:35:35 2008
SUP Scan for release-4-0 completed at Wed Oct  1 00:38:18 2008

Mark Davies | 1 Oct 05:40 2008
Picon
Picon

Re: wapbl causing hangs

On Wednesday 01 October 2008 12:47:15 Greg Oster wrote:
> That Dell uses the mfi driver like the Dell 2950, yes?
> If so, see PR#39297 :(
>
> (Basically WAPBL initiates a flush on the disk/raid cache, and, well,
> let's just say calling ltsleep() from mfi_intr() context is Bad..)

Hmm, that will be it.

cheers
mark

NetBSD source update | 1 Oct 05:41 2008
Picon

daily CVS update output


Updating src tree:
P src/distrib/sets/lists/man/mi
U src/external/bsd/fetch/Makefile
U src/external/bsd/fetch/Makefile.inc
U src/external/bsd/fetch/dist/libfetch/common.c
U src/external/bsd/fetch/dist/libfetch/common.h
U src/external/bsd/fetch/dist/libfetch/errlist.sh
U src/external/bsd/fetch/dist/libfetch/fetch.3
U src/external/bsd/fetch/dist/libfetch/fetch.c
U src/external/bsd/fetch/dist/libfetch/fetch.cat3
U src/external/bsd/fetch/dist/libfetch/fetch.h
U src/external/bsd/fetch/dist/libfetch/file.c
U src/external/bsd/fetch/dist/libfetch/ftp.c
U src/external/bsd/fetch/dist/libfetch/ftp.errors
U src/external/bsd/fetch/dist/libfetch/http.c
U src/external/bsd/fetch/dist/libfetch/http.errors
U src/external/bsd/fetch/lib/Makefile
U src/external/bsd/fetch/lib/shlib_version
U src/external/bsd/pkg_install/Makefile
U src/external/bsd/pkg_install/Makefile.inc
U src/external/bsd/pkg_install/prepare-import.sh
U src/external/bsd/pkg_install/dist/add/add.h
U src/external/bsd/pkg_install/dist/add/extract.c
U src/external/bsd/pkg_install/dist/add/futil.c
U src/external/bsd/pkg_install/dist/add/main.c
U src/external/bsd/pkg_install/dist/add/perform.c
U src/external/bsd/pkg_install/dist/add/pkg_add.1
U src/external/bsd/pkg_install/dist/add/verify.c
U src/external/bsd/pkg_install/dist/add/verify.h
(Continue reading)

Carlos Linares | 1 Oct 05:37 2008
Picon

i386/amd64 kernels hanging on boot

Hello all.  For a little while now, amd64 successive -current kernels have been hanging at boot time on one of my machines.  I tried compiling an i386 kernel or two today, and I got the same error.  The last kernel that works for me is an amd64 4.99.72 from late August.  I had one from slightly later that worked, but it got truncated along with the kernel I booted during one of these hanging boots (though more recent kernels have not inflicted such damage).  Here are the error messages:

 ixpide0:0:0: recal drive fault
wd0d: device fault reading fsbn 0 of 0-3 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
ixpide0:0:0: recal drive fault
wd0d: device fault reading fsbn 0 of 0-3 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
ixpide0:0:0: recal drive fault
wd0d: device fault reading fsbn 0 of 0-3 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
...
wd0: dos partition I/O error
ixpide0:0:0: recal drive fault
wd0d: device fault reading fsbn 0 of 0-3 (wd0 bn 0; cn 0 tn 0 sn 0), retrying


Interestingly, the messages always state wd0, even when booting from wd1a (my i386 test partition).  [Annotated] Dmesg follows.

TIA,

<<carlos>>
<<qvidnvnc>>


Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 4.99.72 (LUDO) #7: Mon Aug 25 14:50:55 UTC 2008
        carlos <at> leviathan:/mnt/huge/obj/sys/arch/amd64/compile/LUDO
total memory = 3838 MB
avail memory = 3705 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
SMBIOS rev. 2.4 <at> 0xf0100 (57 entries)
Gigabyte Technology Co., Ltd. GA-MA78GPM-DS2H ( )
mainbus0 (root)
cpu0 at mainbus0 apid 0: AMD 686-class, 2505MHz, id 0x60fb2
cpu0: AMD PowerNow! Technology 2500 MHz
cpu0: available frequencies (Mhz): 1000 1800 2000 2200 2400 2500
cpu1 at mainbus0 apid 1: AMD 686-class, 2505MHz, id 0x60fb2
ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 21, 24 pins
acpi0 at mainbus0: Intel ACPICA 20080321
acpi0: X/RSDT: OemId <GBT   ,GBTUACPI,42302e31>, AslId <GBTU,01010101>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
ACPI-Fast 32-bit timer
acpibut0 at acpi0 (PWRB, PNP0C0C): ACPI Power Button
attimer1 at acpi0 (TMR, PNP0100): AT Timer
attimer1: io 0x40-0x43
hpet0 at acpi0 (HPET, PNP0103)
hpet0: mem 0xfed00000-0xfed003ff irq 0,8
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
pcppi1 at acpi0 (SPKR, PNP0800)
pcppi1: io 0x61
midi0 at pcppi1: PC speaker (CPU-intensive output)
sysbeep0 at pcppi1
attimer1: attached to pcppi1
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: vendor 0x1022 product 0x9600 (rev. 0x00)
ppb0 at pci0 dev 1 function 0: vendor 0x1022 product 0x9602 (rev. 0x00)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
vga0 at pc i1 dev 5 function 0: vendor 0x1002 product 0x9610 (rev. 0x00)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
radeondrm0 at vga0: ATI Radeon HD 3200 (AMD 780G) (unit 0) <<my fruitless patchwork, don't get excited ;-) >>
[drm:pid0:drm_load]
[drm:pid0:radeon_driver_load] PCI card detected
[drm:pid0:drm_ctxbitmap_next] drm_ctxbitmap_next bit : 0
[drm:pid0:drm_ctxbitmap_init] drm_ctxbitmap_init : 0
radeondrm0: Initialized radeon 1.26.0 20060524
azalia0 at pci1 dev 5 function 1: Generic High Definition Audio Controller
azalia0: interrupting at ioapic0 pin 19
azalia0: host: 0x1002/0x960f (rev. 0), HDA rev. 1.0
ppb1 at pci0 dev 4 function 0: vendor 0x1022 product 0x9604 (rev. 0x00) <<my unsupported atheros 802.11n card?>>
ppb1: unsupported PCI Express version
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/i nv ok
vendor 0x168c product 0x0024 (miscellaneous network, revision 0x01) at pci2 dev 0 function 0 not configured
ppb2 at pci0 dev 10 function 0: vendor 0x1022 product 0x9609 (rev. 0x00)
ppb2: unsupported PCI Express version
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
re0 at pci3 dev 0 function 0: RealTek 8168B/8111B PCIe Gigabit Ethernet (rev. 0x02)
re0: interrupting at ioapic0 pin 18
re0: Unknown revision (0x3c400000)
re0: Ethernet address 00:00:00:00:00:00 <<does anyone know why the MAC address zeroes?  PXE sees real MAC>>
re0: using 256 tx descriptors
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 1000BASE-T media interface, rev. 2
rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
pciide0 at pci0 dev 17 function 0
pciide0: vendor 0x1002 product 0x4390 (rev. 0x00)
pciide0: bus-master DMA support present, but unused (no dri ver support)
pciide0: primary channel configured to native-PCI mode
pciide0: using ioapic0 pin 22 for native-PCI interrupt
atabus0 at pciide0 channel 0
pciide0: secondary channel configured to native-PCI mode
atabus1 at pciide0 channel 1
ohci0 at pci0 dev 18 function 0: vendor 0x1002 product 0x4397 (rev. 0x00)
ohci0: interrupting at ioapic0 pin 16
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
ohci1 at pci0 dev 18 function 1: vendor 0x1002 product 0x4398 (rev. 0x00)
ohci1: interrupting at ioapic0 pin 16
ohci1: OHCI version 1.0, legacy support
usb1 at ohci1: USB revision 1.0
ehci0 at pci0 dev 18 function 2: vendor 0x1002 product 0x4396 (rev. 0x00)
ehci0: interrupting at ioapic0 pin 17
ehci0: dropped intr workaround enabled
ehci0: EHCI version 1.0
ehci0: companion controllers, 3 ports each: ohci0 ohci1
usb2 at ehci0: USB revision 2.0
ohci2 at pci0 dev 19 function 0: vendor 0x1002 product 0x4397 (rev. 0x00)
ohci2: interrupting at ioapic0 pin 18
ohci2: OHCI version 1.0, legacy support
usb3 at ohci2: USB revision 1.0
ohci3 at pci0 dev 19 function 1: vendor 0x1002 product 0x4398 (rev. 0x00)
ohci3: interrupting at ioapic0 pin 18
ohci3: OHCI version 1.0, legacy support
usb4 at ohci3: USB revision 1.0
ehci1 at pci0 dev 19 function 2: vendor 0x1002 product 0x4396 (rev. 0x00)
ehci1: interrupting at ioapic0 pin 19
ehci1: dropped intr workaround enabled
ehci1: EHCI version 1.0
ehci1: companion controllers, 3 ports each: ohci2 ohci3
usb5 at ehci1: USB revision 2.0
piixpm0 at pci0 dev 20 function 0
piixpm0: vendor 0x1002 product 0x4385 (rev. 0x3a)
piixpm0: interrupting at SMI
iic0 at piixpm0: I2C bus
pciide1 at pci0 dev 20 function 1
pciide1: vendor 0x1002 product 0x439c (rev. 0x00)
pciide1: bus-master DMA support present, but unused (no driver support)
pciide1: primary channel configured to compatibility mode
pciide1: primary channel ignored (not responding; disabled or no drives?)
pciide1: secondary channel configured to compatibility mode
pciide1: secondary channel ignored (not responding; disabled or no drives?)
azalia1 at pci0 dev 20 function 2: Generic High Definition Audio Controller
azalia1: interrupting at ioapic0 pin 16
azalia1: host: 0x1002/0x4383 (rev. 0), HDA rev. 1.0
pcib0 at pci0 dev 20 function 3
pcib0: vendor 0x1002 product 0x439d (rev. 0x00)
ppb3 at pci0 dev 20 function 4: vendor 0x1002 product 0x4384 (rev. 0x00)
pci4 at ppb3 bus 4
pci4: i/o space, memory space enabled
ex0 at pci4 dev 7 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x30)
ex0: interrupting at ioapic0 pin 21
ex0: MAC address 00:01:03:e9:78:5a
ukphy0 at ex0 phy 24: Generic IEEE 802.3u media interface
ukphy0: OUI 0x0006b8, model 0x0035, rev. 0
ukphy0: 10b aseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fwohci0 at pci4 dev 14 function 0: vendor 0x104c product 0x8024 (rev. 0x00)
fwohci0: interrupting at ioapic0 pin 22
fwohci0: OHCI version 1.10 (ROM=0)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:74:c8:0a:00:00:1f:d0
fwohci0: Phy 1394a available S400, 3 ports.
fwohci0: Link S400, max_rec 2048 bytes.
ieee1394if0 at fwohci0: IEEE1394 bus
fwip0 at ieee1394if0: IP over IEEE1394
fwohci0: Initiate bus reset
ohci4 at pci0 dev 20 function 5: vendor 0x1002 product 0x4399 (rev. 0x00)
ohci4: interrupting at ioapic0 pin 18
ohci4: OHCI version 1.0, legacy support
usb6 at ohci4: USB revision 1.0
pchb1 at pci0 dev 24 function 0
pchb1: vendor 0x1022 product 0x1100 (rev. 0x00)
pchb2 at pci0 dev 24 function 1
pchb2: vendor 0x1022 product 0x1101 (rev. 0x00)
pchb3 at pci0 dev 24 function 2
pchb3: vendor 0x1022 product 0x1102 (rev. 0x00)
amdtemp0 a t pci0 dev 24 function 3
amdtemp0: AMD CPU Temperature Sensors (K8: core rev BH-G2)
isa0 at pcib0
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
ieee1394if0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
ieee1394if0: bus manager 0 (me)
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
azalia0: codec[0]: ATI RS690/780 HDMI (rev. 0.0), HDA rev. 1.0 <<no audio over HDMI? >>
audio0 at azalia0: full duplex, independent
azalia1: codec[0]: Realtek ALC885 (rev. 1.1), HDA rev. 1.0
audio1 at azalia1: full duplex, independent
uhub0 at usb0: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
uhub1 at usb1: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1 .00, addr 1
uhub1: 3 ports with 3 removable, self powered
uhub2 at usb2: vendor 0x1002 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 6 ports with 6 removable, self powered
uhub3 at usb3: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 3 ports with 3 removable, self powered
uhub4 at usb4: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub4: 3 ports with 3 removable, self powered
uhub5 at usb5: vendor 0x1002 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub5: 6 ports with 6 removable, self powered
uhub6 at usb6: vendor 0x1002 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub6: 2 ports with 2 removable, self powered
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 1: <PIONEER BD-ROM  BDC-202, GGDL021853WL, 1.04> cdrom removable <<blu-ray playback?>>
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
wd0 at ata bus0 drive 0: <WDC WD10EACS-00D6B0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 931 GB, 1938018 cyl, 16 head, 63 sec, 512 bytes/sect x 1953523055 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
ugen0 at uhub2 port 3
ugen0: EPSON EPSON Scanner, rev 2.00/1.00, addr 2 <<I've gotta bug the SANE people about V500 Photo support>>
uhidev0 at uhub0 port 1 configuration 1 interface 0
uhidev0: Gyration Gyration RF Technology Receiver, rev 1.10/2.20, addr 2, iclass 3/1
uhidev0: 4 report ids
ukbd0 at uhidev0 reportid 1
wd1 at atabus1 drive 0: <WDC WD10EACS-00D6B0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x 1953525168 sectors
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio2 at pad0: half duplex
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhid0 at uhidev0 reportid 2: input=2, output=0, feature=0
uhid1 at uhidev0 reportid 3: input=1, output=0, feature=0
uhid2 at uhidev0 reportid 4: input=7, output=0, feature=0
uhidev1 at uhub0 port 1 configuration 1 interface 1
uhidev1: Gyration Gyration RF Technology Receiver, rev 1.10/2.20, addr 2, iclass 3/1
uhidev1: 6 report ids
ums0 at uhidev1 reportid 1: 5 buttons and Z dir.
wsmouse0 at ums0 mux 0
uhid3 at uhidev1 reportid 2: input=2, output=0, feature=0
uhid4 at uhidev1 reportid 3: input=1, output=0, feature=0
uhid5 at uhidev1 reportid 5: input=2, output=0, feature=0
uhid6 at uhidev1 reportid 6: input=2, output=0, feature=0
cdce0 at uhub1 port 1 configuration 1 interface 0
cdce0: Linux 2.6.24/s3c2410_udc RNDIS/Ethernet Gadget, rev 2.00/2.12, addr 2
cdce0: could not find data bulk in
boot device: wd0
root on wd0a dumps on wd0b
/: replaying log to memory <<wapbl is WAY more reliable than softdeps...  yay!>>
root file system type: ffs
/: replaying log to disk
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
cd0(pciide0:0:1):  Check Condition on CDB: 0x00 00 00 00 00 00
    SENSE KEY:  Not Ready
     ASC/ASCQ:  Medium Not Present
Joerg Sonnenberger | 1 Oct 08:50 2008
Picon

Re: hardlink to symlink behaviour [was: Re: MKXORG=yes on Linux: nbmtree: existing entry for `libXaw.so', type `link' does not match type `file']

On Tue, Sep 30, 2008 at 07:20:56PM -0400, Thor Lancelot Simon wrote:
> On Tue, Sep 30, 2008 at 02:59:32PM +0200, Joerg Sonnenberger wrote:
> > On Tue, Sep 30, 2008 at 01:59:15PM +0200, Hubert Feyrer wrote:
> > > Digging a bit more into this, it's related to the difference in handling  
> > > hardlink to symlinks. On NetBSD, it seems that you get a link to the  
> > > target file these days[1], while on Linux you get a link to the  
> > > symlink[2]. ISTR that NetBSD used to behave like Linux there - is this  
> > > really intended?
> > 
> > NetBSD behaves like POSIX and always has.
> 
> Does POSIX actually specify this?  The difference probably stems from
> whether or not the symlink consumes an inode: on 4.3BSD and prior, it
> always did, and on 4.4 and NetBSD it does not (or is arranged to look
> as if it does not, in a few uncommon cases, IIRC).

POSIX allows in some places this behavior. But the general rules for
system calls apply to symlink(2) as well, so POSIX doesn't really specify a
mechanism to create. The POSIX rules are intended as you said for
implementing symlinks without separate inodes.

Joerg

Martin Husemann | 1 Oct 10:39 2008
Picon

Re: wapbl causing hangs

Should we add

	KASSERT(!cpu_intr_p());

at the start of ltsleep()?

Martin

Johan Ihren | 1 Oct 11:40 2008

Re: NFS lockups in Xen/amd64 -currentish?


Ok, so I'm back again from two weeks of travel and just to sort of  
wrap up this issue I'd like to report that

a) I tried Martin's suggestion of TCP mounts and/or smaller packets.  
No difference.

b) I was unable to use Sarton's suggestion of using a DOMU as the NFS  
server because of other constraints. But in general I agree that's  
probably the right thing regardless of my particular problem.

c) In the end I worked around my acute problems with hanging NFS  
mounts by making my DOMU images larger and just pre-populate them with  
all the stuff I usually access via NFS. That worked fine of course,  
but the underlying problem with NFS for that particular August cut of - 
current remains for me.

I will not spend more time on this but will rather bring the involved  
machines closer to -current and hope that it resolves itself.

Thanks for your responses,

Johan

On 16 Sep 2008, Martin Husemann wrote:

> On Mon, Sep 15, 2008 at 10:00:47PM +0200, Johan Ihren wrote:
>> Does anyone know of any recent NFS issues that cause hangs?
>
> Could you try if using TCP mounts help?
> Or if reducing read/write packet size to, say, 1024 helps?
>
> Martin

On 16 Sep 2008, Sarton O'Brien wrote:

> On Tuesday 16 September 2008 06:00:47 Johan Ihren wrote:
>> Hi,
>>
>> In a Xen environment where I run an amd64 DOM0 with either amd64 or
>> i386pae DOMUs I ran into the problem that amd64 DOMUs could not NFS
>> mount anything from the DOM0. As soon as they touched NFS they hang
>> hard. Now, I know that NFS problems tend to make things hang, and it
>> is sometimes hard to distinguish between "hang" as in "kernel is  
>> hung"
>> or as in "kernel is waiting for something".
>
> This may not help any but I have run into similar issues whenever I  
> run
> certain services on dom0. Running the same service from a domu and  
> working in
> reverse has always worked.
>
> It all started for me when I noticed my dom0 couldn't see broadcast  
> packets
> and therefore wouldn't respond to broadcast queries. I didn't bother  
> going any
> further and the interest was limited.
>
> I didn't like moving these services to a domu as they were central but
> admittedly it has helped prevent the dom0 from panicing as  
> regularly ;)
>
> Sarton

Gmane