Havard Eidnes | 3 Sep 14:30 2008
Picon
Picon

HP ProLiant BL460c G1 boot failure

Hi,

I recently tried to boot a recent 4.99.72/amd64 bootable ISO
image on a HP ProLiant BL460c G1 "blade" in a blade server
chassis.

This failed as follows:

...
bnx0 at pci4 dev 0 function 0: Broadcom NetXtreme II BCM5708 1000Base-SX
bnx0: SerDes controllers are not supported!
uvm_fault(0xffffffff811089a0, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff804cb1d4 cs 8 rflags 10202 cr2  30 cpl 8 rsp ffffffff81157c30
kernel: page fault trap, code=0
Stopped in pid 0.1 (system) at  0xffffffff804cb1d4: movq 0x30(%rax),%r11
db{0}> tra
?() at 0xffffffff804cb1d4 (bus_dmamap_load_mbuf)
?() at 0xffffffff805ac260 (bnx_attach)
?() at 0xffffffff8042554a (config_attach_loc)
?() at 0xffffffff8050f764 (pci_probe_device)
?() at 0xffffffff8050f8fb (pci_enumerate_bus)
?() at 0xffffffff8050f9e0 (pcirescan)
?() at 0xffffffff8050fd87 (pciattach)
?() at 0xffffffff8042554a (config_attach_loc)
?() at 0xffffffff8054853c (ppbattach)
?() at 0xffffffff8042554a (config_attach_loc)
?() at 0xffffffff8050f764 (pci_probe_device)
?() at 0xffffffff8050f8fb (pci_enumerate_bus)
?() at 0xffffffff8050f9e0 (pcirescan)
(Continue reading)

Havard Eidnes | 3 Sep 15:33 2008
Picon
Picon

Re: HP ProLiant BL460c G1 boot failure

Hi,

due to the stack decode being a manual process, I made a mistake:

> db{0}> tra
> ?() at 0xffffffff804cb1d4 (bus_dmamap_load_mbuf)

That's supposed to be (bus_dmamap_destroy).

> ?() at 0xffffffff805ac260 (bnx_attach)
> ?() at 0xffffffff8042554a (config_attach_loc)
> ?() at 0xffffffff8050f764 (pci_probe_device)
> ?() at 0xffffffff8050f8fb (pci_enumerate_bus)
> ?() at 0xffffffff8050f9e0 (pcirescan)
> ?() at 0xffffffff8050fd87 (pciattach)
> ?() at 0xffffffff8042554a (config_attach_loc)
> ?() at 0xffffffff8054853c (ppbattach)
> ?() at 0xffffffff8042554a (config_attach_loc)
> ?() at 0xffffffff8050f764 (pci_probe_device)
> ?() at 0xffffffff8050f8fb (pci_enumerate_bus)
> ?() at 0xffffffff8050f9e0 (pcirescan)
> ?() at 0xffffffff8050fd87 (pciattach)
> ...

Regards,

- Håvard

David Young | 3 Sep 17:55 2008
Picon

Re: HP ProLiant BL460c G1 boot failure

On Wed, Sep 03, 2008 at 03:33:18PM +0200, Havard Eidnes wrote:
> Hi,
> 
> due to the stack decode being a manual process, I made a mistake:
> 
> > db{0}> tra
> > ?() at 0xffffffff804cb1d4 (bus_dmamap_load_mbuf)
> 
> That's supposed to be (bus_dmamap_destroy).

I think that the crash happens because bnx_release_resources() tries to
free rx mbuf DMA maps, whether they were allocated or not:

        for (i = 0; i < TOTAL_RX_BD; i++)
                bus_dmamap_destroy(sc->bnx_dmatag, sc->rx_mbuf_map[i]);

Dave

--

-- 
David Young             OJC Technologies
dyoung <at> ojctech.com      Urbana, IL * (217) 278-3933 ext 24

Christos Zoulas | 3 Sep 19:42 2008

Re: HP ProLiant BL460c G1 boot failure

In article <20080903155517.GB1493 <at> che.ojctech.com>,
David Young  <dyoung <at> pobox.com> wrote:
>On Wed, Sep 03, 2008 at 03:33:18PM +0200, Havard Eidnes wrote:
>> Hi,
>> 
>> due to the stack decode being a manual process, I made a mistake:
>> 
>> > db{0}> tra
>> > ?() at 0xffffffff804cb1d4 (bus_dmamap_load_mbuf)
>> 
>> That's supposed to be (bus_dmamap_destroy).
>
>I think that the crash happens because bnx_release_resources() tries to
>free rx mbuf DMA maps, whether they were allocated or not:
>
>        for (i = 0; i < TOTAL_RX_BD; i++)
>                bus_dmamap_destroy(sc->bnx_dmatag, sc->rx_mbuf_map[i]);
>

I think you are right. The problem is that the driver does not keep track
of what was allocated, so will require a bit of coding to fix.

christos

Manuel Bouyer | 3 Sep 20:06 2008

Re: HP ProLiant BL460c G1 boot failure

On Wed, Sep 03, 2008 at 02:30:05PM +0200, Havard Eidnes wrote:
> Hi,
> 
> I recently tried to boot a recent 4.99.72/amd64 bootable ISO
> image on a HP ProLiant BL460c G1 "blade" in a blade server
> chassis.
> 
> This failed as follows:
> 
> ...
> bnx0 at pci4 dev 0 function 0: Broadcom NetXtreme II BCM5708 1000Base-SX
> bnx0: SerDes controllers are not supported!
> uvm_fault(0xffffffff811089a0, 0x0, 1) -> e
> fatal page fault in supervisor mode
> trap type 6 code 0 rip ffffffff804cb1d4 cs 8 rflags 10202 cr2  30 cpl 8 rsp ffffffff81157c30
> kernel: page fault trap, code=0
> Stopped in pid 0.1 (system) at  0xffffffff804cb1d4: movq 0x30(%rax),%r11
> db{0}> tra
> ?() at 0xffffffff804cb1d4 (bus_dmamap_load_mbuf)
> ?() at 0xffffffff805ac260 (bnx_attach)
> ?() at 0xffffffff8042554a (config_attach_loc)
> ?() at 0xffffffff8050f764 (pci_probe_device)
> ?() at 0xffffffff8050f8fb (pci_enumerate_bus)
> ?() at 0xffffffff8050f9e0 (pcirescan)
> ?() at 0xffffffff8050fd87 (pciattach)
> ?() at 0xffffffff8042554a (config_attach_loc)
> ?() at 0xffffffff8054853c (ppbattach)
> ?() at 0xffffffff8042554a (config_attach_loc)
> ?() at 0xffffffff8050f764 (pci_probe_device)
> ?() at 0xffffffff8050f8fb (pci_enumerate_bus)
(Continue reading)

Michael L. Hitch | 3 Sep 20:19 2008

Re: HP ProLiant BL460c G1 boot failure

On Wed, 3 Sep 2008, Manuel Bouyer wrote:

>> One: I assume "SerDes controllers are not supported!" means that
>> the driver doesn't support fibre-optic versions of this device.
>> Is there a particular reason this isn't supported in our driver?  It
>
> I guess, because this is a port from the OpenBSD driver, and it didn't
> support SerDes at that time.

   The following changes (also from the OpenBSD driver) fixed this for me
on a Dell M600 blade server:

Index: sys/dev/pci/if_bnx.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_bnx.c,v
retrieving revision 1.18
diff -u -r1.18 if_bnx.c
--- sys/dev/pci/if_bnx.c	7 Feb 2008 01:21:55 -0000	1.18
+++ sys/dev/pci/if_bnx.c	3 Sep 2008 15:24:08 -0000
 <at>  <at>  -40,7 +40,7  <at>  <at> 
  /*
   * The following controllers are supported by this driver:
   *   BCM5706C A2, A3
- *   BCM5708C B1
+ *   BCM5708C B1, B2
   *
   * The following controllers are not supported by this driver:
   * (These are not "Production" versions of the controller.)
 <at>  <at>  -412,6 +412,7  <at>  <at> 
  	u_int32_t		command;
(Continue reading)

Havard Eidnes | 3 Sep 20:32 2008
Picon
Picon

Re: HP ProLiant BL460c G1 boot failure

> > > db{0}> tra
> > > ?() at 0xffffffff804cb1d4 (bus_dmamap_load_mbuf)
> >
> > That's supposed to be (bus_dmamap_destroy).
>
> I think that the crash happens because bnx_release_resources() tries to
> free rx mbuf DMA maps, whether they were allocated or not:
>
>         for (i = 0; i < TOTAL_RX_BD; i++)
>                 bus_dmamap_destroy(sc->bnx_dmatag, sc->rx_mbuf_map[i]);

Right.  I've inserted the obvious if() before that
bus_dmamap_destroy locally, but because of PR#39454 I didn't get
to test this before leaving the office for the day.

I'll test it tomorrow and commit it if it avoids the crash.

Regards,

- Håvard

Havard Eidnes | 3 Sep 21:09 2008
Picon

Re: HP ProLiant BL460c G1 boot failure

> > One: I assume "SerDes controllers are not supported!" means that
> > the driver doesn't support fibre-optic versions of this device.
> > Is there a particular reason this isn't supported in our driver?
>
> I guess, because this is a port from the OpenBSD driver, and it
> didn't support SerDes at that time.

That's sort of an obvious implication, yes.

By the looks of it, the code initially came to OpenBSD from
FreeBSD.  Our driver has both $FreeBSD ... $ and $OpenBSD ... $
strings embedded in it, both of them pointing to rather old
versions / dates.  Unfortunately, it's somewhat unclear what
these strings signify: "this driver was adapted from this
version", or "this driver is in feature-sync with this particular
version".

...which brings up another question: how good are we at adopting
features from other OSes after initial import and adaptation
("not good enough", would be my answer to that one), and what
can we do to improve the situation?

Regards,

- Håvard

Michael L. Hitch | 3 Sep 21:30 2008

Re: HP ProLiant BL460c G1 boot failure

On Wed, 3 Sep 2008, Michael L. Hitch wrote:

>  The following changes (also from the OpenBSD driver) fixed this for me
> on a Dell M600 blade server:

  Oops, this also needs another change:

Index: sys/net/if_media.h
===================================================================
RCS file: /cvsroot/src/sys/net/if_media.h,v
retrieving revision 1.50
diff -u -r1.50 if_media.h
--- sys/net/if_media.h	15 Jun 2008 16:33:58 -0000	1.50
+++ sys/net/if_media.h	3 Sep 2008 19:27:03 -0000
 <at>  <at>  -178,6 +178,7  <at>  <at> 
  #define	IFM_10G_LR	18		/* 10GbaseLR - single-mode fiber */
  #define	IFM_10G_SR	19		/* 10GBase-SR 850nm Multi-mode */
  #define	IFM_10G_CX4	20		/* 10GBase CX4 copper */
+#define	IFM_2500_SX	21		/* 2500baseSX - multi-mode fiber */

  #define	IFM_ETH_MASTER	0x00000100	/* master mode (1000baseT) */
  #define	IFM_ETH_RXPAUSE	0x00000200	/* receive PAUSE frames */
 <at>  <at>  -423,6 +424,8  <at>  <at> 
  	{ IFM_ETHER | IFM_10G_CX4,	"10GbaseCX4" },			\
  	{ IFM_ETHER | IFM_10G_CX4,	"10GCX4" },			\
  	{ IFM_ETHER | IFM_10G_CX4,	"10GBASE-CX4" },		\
+	{ IFM_ETHER | IFM_2500_SX,	"2500baseSX" },			\
+	{ IFM_ETHER | IFM_2500_SX,	"2500SX" },			\
  									\
  	{ IFM_TOKEN | IFM_TOK_STP4,	"DB9/4Mbit" },			\
(Continue reading)

Havard Eidnes | 3 Sep 23:32 2008
Picon
Picon

Re: HP ProLiant BL460c G1 boot failure

Hi,

and thanks for the diffs.  I'm wondering if we'll need some
change to brgphy.c similar to

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/mii/brgphy.c.diff?r1=1.64;r2=1.65;f=h

  or

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/mii/brgphy.c.diff?r1=1.68;r2=1.69;f=h

in order to attach the correct phy driver?

Regards,

- Håvard


Gmane