Eric Haszlakiewicz | 14 Dec 07:31 2004

Problems with a dual Dell SC1420


	I've got a Dell SC1420 that has two Intel Xeon processors.  So far
I haven't been able to get it to work using both processors.  It seems
to run ok with just a single processor.  Turning on "options MULTIPROCESSOR"
makes it not work.  The reason appears to be some variation on 
interrupts not getting through.
	After a little bit of playing around with I noticed what seems to be
an interrupt getting to the ioapic but the CPU doesn't notice:
 ioapic0: int9 f1a0<vector=a0,delmode=1,pending,actlo,irrpending,level,dest=0> 0<target=0>
                                        ^^^^^^^       ^^^^^^^^^^
int9 seems to be wired up to the ide controlled:
 allocated pic ioapic0 type level pin 18 level 6 to cpu0 slot 9 idt entry 98
 piixide1: using ioapic0 pin 18 (irq 9) for native-PCI interrupt

and I get several timeouts before the drives are detected (cd0, wd0 and wd1)
but they eventually show up.  Further access doesn't work.  Trying to
netboot off wm0 appears to run into a similar problem: I get output on the
wire but input interrupts are never received.

I've determined that _some_ interrupts gets through, since setting a
breakpoint on either schedcpu or hardclock drops me into ddb.  However,
hitting ctrl-alt-esc does not work.

So, before I spend too much time playing around with stuff that I don't
entirely understand, does anyone have some ideas on what's going wrong?
(and hopefully what I can do to figure out how to fix it?)

eric

(Continue reading)

Brett Lymn | 14 Dec 12:21 2004
Picon

Re: Problems with a dual Dell SC1420

On Tue, Dec 14, 2004 at 12:31:31AM -0600, Eric Haszlakiewicz wrote:
> 
> So, before I spend too much time playing around with stuff that I don't
> entirely understand, does anyone have some ideas on what's going wrong?
> (and hopefully what I can do to figure out how to fix it?)
> 

Do you have ACPI_PCI_FIXUP in your options when you built the kernel?  Try
enabling/disabling that and see if it makes a difference.

--

-- 
Brett Lymn

Thor Lancelot Simon | 14 Dec 20:18 2004

Re: Problems with a dual Dell SC1420

On Tue, Dec 14, 2004 at 12:31:31AM -0600, Eric Haszlakiewicz wrote:
> 
> 	I've got a Dell SC1420 that has two Intel Xeon processors.  So far
> I haven't been able to get it to work using both processors.  It seems
> to run ok with just a single processor.  Turning on "options MULTIPROCESSOR"
> makes it not work.  The reason appears to be some variation on 
> interrupts not getting through.

Have you tried MPACPI instead of MPBIOS?  Some Dell machines make our ACPI
code freak out -- but that is a bug that should be fixed, and in general
the ACPI code has a much greater chance of working on machines with
complicated interrupt routing and many buses, I think.  Some machines will
probably not even *have* MPBIOS in the future.

Your time would probably be better spent making MPACPI and the various
ACPI_XYZ_FIXUP options work on this machine than trying to get the MPBIOS
code working. :-/

Of course you could always run NetBSD under Xen ;-) ;-) ;-)

Thor

Eric Haszlakiewicz | 14 Dec 22:57 2004

Re: Problems with a dual Dell SC1420

On Tue, Dec 14, 2004 at 02:18:21PM -0500, Thor Lancelot Simon wrote:
> Your time would probably be better spent making MPACPI and the various
> ACPI_XYZ_FIXUP options work on this machine than trying to get the MPBIOS
> code working. :-/

	I tried both.  MPBIOS and MPACPI.  Neither one seemed to do much different.
With MPBIOS I get an extra "ioapic0: int0 attached to ExtINT ..." and
"local apic: int0 attached to ExtINT ..." but otherwise it looks about
the same.
	"various" ACPI_*_FIXUP?  What else is there besides PCI?  I tried with
ACPI_PCI_FIXUP on and off, didn't help.

> Of course you could always run NetBSD under Xen ;-) ;-) ;-)
	I tried that.  It didn't work: the keyboard only responded sporradically
and garbage got written to the boot disk. :(

eric

Peter O'Kane | 15 Dec 18:08 2004
Picon

MP interrupt problems with PRIMERGY RX300

I have a Fujitsu-Siemens PRIMERGY RX300 dual Xeon server (ServerWorks GC LE 
533 chipset) with an adaptec 2005 zcr raid card.

2.0 GENERIC.MPACPI or GENERIC.MPBIOS kernel boots ok with only one 
(physical) processor enabled in the bios. With hyperthreading enabled the 
ACPI kernel sees and uses the two virtual processors at apid 6 (BSP) and 
apid 7 (AP).
With both physical processors enabled either MP kernel fails to configure 
the iop device and reports lost interrupts from the rccide0 while probing 
the atapi bus. The cd drive on the atapi bus is correctly recognized. 
Kernels are built with INTRDEBUG and the dmesg output before the iop 
failure is essentially identical except for the extra cpus.

Anyone got any suggestions for further debugging? Dmesg output from the 
failing case follows.

NetBSD 2.0 (GENERIC.MPACPI) #2: Tue Dec 14 18:17:47 GMT 2004
	peter <at> tantor.it.nuigalway.ie:/usr/src/sys/arch/i386/compile/GENERIC.MPACPI
total memory = 3071 MB
avail memory = 2962 MB
BIOS32 rev. 0 found at 0xfd907
mainbus0 (root)
cpu0 at mainbus0: apid 6 (boot processor)
cpu0: Intel Xeon (686-class), 2400.21 MHz, id 0xf29
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
(Continue reading)

Eric Haszlakiewicz | 15 Dec 21:51 2004

Re: MP interrupt problems with PRIMERGY RX300

On Wed, Dec 15, 2004 at 05:08:02PM +0000, Peter O'Kane wrote:
> I have a Fujitsu-Siemens PRIMERGY RX300 dual Xeon server (ServerWorks GC LE 
> 533 chipset) with an adaptec 2005 zcr raid card.
> 
> 2.0 GENERIC.MPACPI or GENERIC.MPBIOS kernel boots ok with only one 
> (physical) processor enabled in the bios. With hyperthreading enabled the 
> ACPI kernel sees and uses the two virtual processors at apid 6 (BSP) and 
> apid 7 (AP).
> With both physical processors enabled either MP kernel fails to configure 
> the iop device and reports lost interrupts from the rccide0 while probing 
> the atapi bus. The cd drive on the atapi bus is correctly recognized. 
> Kernels are built with INTRDEBUG and the dmesg output before the iop 
> failure is essentially identical except for the extra cpus.
> 
> Anyone got any suggestions for further debugging? Dmesg output from the 
> failing case follows.

	hmm.. I'm having similar problems.  My current random guess is that
the local apic's aren't being correctly enabled, possibly because it
doesn't seem like cpu_hatch() is being called.  (There are no "CPU X running"
messages).  However, the boot processor gets the lapic enabled during
initial attach, and it looks like your interrupts are being routed
to the boot processor (apid 6), so it should work.
	Try booting with -d, set a breakpoint on schedcpu, the continue.
When you drop into ddb, "call ioapic_dump", and see if anything are
listed as pending. (especially int14, aka rccide0)
	That's about as far as I've gotten, although I haven't tried
single cpu but with hyperthreading.  I'll need to see if that works for me.

eric
(Continue reading)

Peter O'Kane | 16 Dec 18:54 2004
Picon

Re: MP interrupt problems with PRIMERGY RX300

Interesting, ioapic_dump shows:
ioapic1: dump8 
f163<vector=63,delmode=1,pending,actlo,irrpending,level,dest=0> 
6000000<target=6>
That's the interrupt from the iop. Continuing (many times) from the break 
on schedcpu and dumping the ioapics shows the probe of the devices on the 
atapibus and the interrupt lost messages but the interrupts for the rccide0 
never show as pending.

--On 15 December 2004 14:51 -0600 Eric Haszlakiewicz <erh <at> nimenees.com> 
wrote:

> On Wed, Dec 15, 2004 at 05:08:02PM +0000, Peter O'Kane wrote:
>> I have a Fujitsu-Siemens PRIMERGY RX300 dual Xeon server (ServerWorks GC
>> LE  533 chipset) with an adaptec 2005 zcr raid card.
>>
>> 2.0 GENERIC.MPACPI or GENERIC.MPBIOS kernel boots ok with only one
>> (physical) processor enabled in the bios. With hyperthreading enabled
>> the  ACPI kernel sees and uses the two virtual processors at apid 6
>> (BSP) and  apid 7 (AP).
>> With both physical processors enabled either MP kernel fails to
>> configure  the iop device and reports lost interrupts from the rccide0
>> while probing  the atapi bus. The cd drive on the atapi bus is correctly
>> recognized.  Kernels are built with INTRDEBUG and the dmesg output
>> before the iop  failure is essentially identical except for the extra
>> cpus.
>>
>> Anyone got any suggestions for further debugging? Dmesg output from the
>> failing case follows.
>
(Continue reading)

Peter O'Kane | 17 Dec 18:01 2004
Picon

Re: MP interrupt problems with PRIMERGY RX300 --- Success

O.K. I have got my RX300 running with all four logical processors.
Problem appears to be that, at the point where the ioapics are enabled, the 
local apic on the second physical processor is not properly initialized to 
take part in the negotiations needed for lowest priority delivery mode and 
that, even though only the BSP is specified as the interrupt target the 
LOPRI delivery mode is being specified.

As a quick and dirty hack I just changed the definition of 
IOAPIC_REDLO_DEL_LOPRI to 0 (i.e. the same as IOAPIC_REDLO_DEL_FIXED) in 
sys/arch/x86/include/i82093.h

At least on the RX300 ACPI_PCI_FIXUP is not needed.

Peter O'Kane                            E-mail:peter.okane <at> it.nuigalway.ie
Information Technology Department,      Voice: +353 91 492527
National University of Ireland, Galway. Fax: +353 91 494501

Eric Haszlakiewicz | 17 Dec 21:57 2004

Re: MP interrupt problems with PRIMERGY RX300 --- Success

On Fri, Dec 17, 2004 at 05:01:30PM +0000, Peter O'Kane wrote:
> O.K. I have got my RX300 running with all four logical processors.
> Problem appears to be that, at the point where the ioapics are enabled, the 
> local apic on the second physical processor is not properly initialized to 
> take part in the negotiations needed for lowest priority delivery mode and 
> that, even though only the BSP is specified as the interrupt target the 
> LOPRI delivery mode is being specified.
> 
> As a quick and dirty hack I just changed the definition of 
> IOAPIC_REDLO_DEL_LOPRI to 0 (i.e. the same as IOAPIC_REDLO_DEL_FIXED) in 
> sys/arch/x86/include/i82093.h

	cool!  Thanks!  That worked for me too.

eric

Frank van der Linden | 20 Dec 10:29 2004
Picon

Re: MP interrupt problems with PRIMERGY RX300 --- Success

On Fri, Dec 17, 2004 at 05:01:30PM +0000, Peter O'Kane wrote:
> O.K. I have got my RX300 running with all four logical processors.
> Problem appears to be that, at the point where the ioapics are enabled, the 
> local apic on the second physical processor is not properly initialized to 
> take part in the negotiations needed for lowest priority delivery mode and 
> that, even though only the BSP is specified as the interrupt target the 
> LOPRI delivery mode is being specified.

That's an interesting observation.. I wasn't actually aware that this
could happen. Where in the Intel documentation is this described? It
may explain some other problems, and I'd like to fix it.

- Frank


Gmane