Andrew Morton | 1 Jun 2005 01:12

Re: 2.6.12-rcx networking oops

Phil Oester <kernel <at> linuxace.com> wrote:
>
> At Andrew's suggestion, I tested the latest 2.6.12-rc5-gitx, and am still
>  hitting an oops on a gateway box under load.  From comparing the various
>  oops, it seems like a dev is disappearing while one CPU is in the middle
>  of processing traffic.  At least that's what my naive analysis leads
>  me to believe.

Are you _sure_ the hardware is good?

Are you running anything which would cause netdevs to be destroyed? 
Bringing virtual devices up and down?  TUN/TAP driver?  Bonding driver? 
Anything like that?

Have you tried CONFIG_DEBUG_SLAB and/or CONFIG_DEBUG_PAGEALLOC?

Jon Mason | 1 Jun 2005 01:20
Picon
Favicon

Re: r8169 802.1q/MTU bug

On Monday 30 May 2005 07:27 pm, James Harr wrote:
> Hi,
> The driver you sent gives me an error when I try to compile and load it
> with 2.6.11.11:
>
> # make
> [...]
>   CC [M]  drivers/net/r8169.o
>   drivers/net/r8169.c: In function `rtl8169_down':
>   drivers/net/r8169.c:2589: warning: implicit declaration of function
>   synchronize_sched
> [...]
>
> # insmod drivers/net/r8169.ko
> insmod: error inserting 'drivers/net/r8169.ko': -1 Unknown symbol in module
>
>
> It does a similar thing when I try to install it:
>
> # make modules_install
> [...]
> if [ -r System.map ]; then /sbin/depmod -ae -F System.map  2.6.11.11; fi
> WARNING: /lib/modules/2.6.11.11/kernel/drivers/net/r8169.ko needs unknown
> symbol synchronize_sched
>
> As I noted in a previous email, it doesn't give me this problem when I have
> jumbo frames enabled on my switch.
>
> Since I found out my switch supported jumbo frames, I started to toy around
> with larger MTUs. When a VLAN's MTU was set to 7200, my system locked up. I
(Continue reading)

Edgar E Iglesias | 1 Jun 2005 01:23
Picon
Favicon

Re: ipv4 ipsec

On Tue, May 31, 2005 at 03:56:42PM -0700, David S. Miller wrote:
> From: Edgar E Iglesias <edgar.iglesias <at> axis.com>
> Date: Wed, 1 Jun 2005 00:47:17 +0200
> 
> > Im not sure this is the correct list for ipsec issues, but shouldn't
> > the size check at the bottom of net/ipv4/esp4.c be the other way
> > around (2.6.11)?
> 
> You are right, good catch.  Luckily the size of esp_decap_data
> is exactly 20 bytes, so the incorrect test happens to be harmless.

mostly harmless..
But for gcc ports that create packed structs per default, it is 19
bytes.

> 
> > --- /usr/src/linux-2.6.11-gentoo-r8/net/ipv4/esp4.c     2005-05-11 10:05:03.000000000 +0200
> > +++ esp4.c      2005-06-01 00:38:55.000000000 +0200
> 
> Please porperly -p1 root your patch so I can apply it, and also
> please provide a "Signed-off-by: " line for yourself as well.
> 
> It may seem pointless for a one-line patch, but I want to get you
> and others into the habit of submitting patches properly in the
> future.

oh sorry, I hope I get it right this time :)

Best regards
--

-- 
(Continue reading)

Phil Oester | 1 Jun 2005 01:23

Re: 2.6.12-rcx networking oops

On Tue, May 31, 2005 at 04:12:20PM -0700, Andrew Morton wrote:
> Are you _sure_ the hardware is good?

Well, it lasts on 2.6.10 indefinitely (since 1/1/5 minus the recent
upgrade attempts).  And the hardware itself has been in service for
a few years without failure.  It will last on 2.6.11 or 12-rc over
the weekend fine, but as soon as traffic picks up during the workday
it keels over.

> Are you running anything which would cause netdevs to be destroyed? 
> Bringing virtual devices up and down?  TUN/TAP driver?  Bonding driver? 
> Anything like that?

The box runs keepalived (for VRRP), and quagga (for OSPF).  Neither should
be destroying netdevs during normal operation AFAIK.

> Have you tried CONFIG_DEBUG_SLAB and/or CONFIG_DEBUG_PAGEALLOC?

No - do you think it would reveal anything given the above?

Phil

Jon Mason | 1 Jun 2005 01:28
Picon
Favicon

Re: RFC: NAPI packet weighting patch

On Tuesday 31 May 2005 05:14 pm, David S. Miller wrote:
> From: Jon Mason <jdmason <at> us.ibm.com>
> Date: Tue, 31 May 2005 17:07:54 -0500
>
> > Of course some performace analysis would have to be done to determine the
> > optimal numbers for each speed/duplexity setting per driver.
>
> per cpu speed, per memory bus speed, per I/O bus speed, and add in other
> complications such as NUMA
>
> My point is that whatever experimental number you come up with will be
> good for that driver on your systems, not necessarily for others.
>
> Even within a system, whatever number you select will be the wrong
> thing to use if one starts a continuous I/O stream to the SATA
> controller in the next PCI slot, for example.
>
> We keep getting bitten by this, as the Altix perf data continually shows,
> and we need to absolutely stop thinking this way.
>
> The way to go is to make selections based upon observed events and
> mesaurements.

I'm not arguing against a /proc entry to tune dev->weight for those sysadmins 
advanced enough to do that.  I am arguing that we can make the driver smarter 
(at little/no cost)  for "out of the box" users.

Andrew Morton | 1 Jun 2005 01:28

Re: 2.6.12-rcx networking oops

Phil Oester <kernel <at> linuxace.com> wrote:
>
> On Tue, May 31, 2005 at 04:12:20PM -0700, Andrew Morton wrote:
> > Are you _sure_ the hardware is good?
> 
> Well, it lasts on 2.6.10 indefinitely (since 1/1/5 minus the recent
> upgrade attempts).  And the hardware itself has been in service for
> a few years without failure.  It will last on 2.6.11 or 12-rc over
> the weekend fine, but as soon as traffic picks up during the workday
> it keels over.

hm, OK.  So I assume the machine has recently been running 2.6.10.  So it's
unlikely to be a hardware problem.

> > Are you running anything which would cause netdevs to be destroyed? 
> > Bringing virtual devices up and down?  TUN/TAP driver?  Bonding driver? 
> > Anything like that?
> 
> The box runs keepalived (for VRRP), and quagga (for OSPF).  Neither should
> be destroying netdevs during normal operation AFAIK.

OK.  It would need more than a very-ex-net person to work out how those
things affect the networking stack ;)

> > Have you tried CONFIG_DEBUG_SLAB and/or CONFIG_DEBUG_PAGEALLOC?
> 
> No - do you think it would reveal anything given the above?

It might catch the failure at an earlier stage.

(Continue reading)

Phil Oester | 1 Jun 2005 01:34

Re: 2.6.12-rcx networking oops

On Tue, May 31, 2005 at 04:28:37PM -0700, Andrew Morton wrote:
> hm, OK.  So I assume the machine has recently been running 2.6.10.  So it's
> unlikely to be a hardware problem.

It's running 2.6.10 as we speak -- I typically reboot it at night into 
2.6.12-rc and between 8-10am the next day it panics itself back into
2.6.10.

> > > Have you tried CONFIG_DEBUG_SLAB and/or CONFIG_DEBUG_PAGEALLOC?
> > 
> > No - do you think it would reveal anything given the above?
> 
> It might catch the failure at an earlier stage.

I'll try it out tomorrow.

Phil

Francois Romieu | 1 Jun 2005 01:44

Re: r8169 802.1q/MTU bug

Jon Mason <jdmason <at> us.ibm.com> :
[...]
> I wonder if this is related to the adapter breaking large frames into
> multiple descriptors.  On normal (non-VLAN) frames, this happens at MTU
> 8169. I wonder if enabling VLAN and jumbo frames (a combination I never
> tried) brings down the threshold to MTU 7200.  

Testing suggests that there is at least a size filtering issue (fixed
in netdev-2.6.git but not in 2.6.11.xx nor in 2.6.12-rc). It is unrelated
to hardware vlan support and happens a few bytes above the MTU on the vlan
device (which is set to the same value as the adapter). Of course different
issues could hide in the dark.

--
Ueimor

David S. Miller | 1 Jun 2005 01:47
Favicon

Re: ipv4 ipsec

From: Edgar E Iglesias <edgar.iglesias <at> axis.com>
Date: Wed, 1 Jun 2005 01:23:40 +0200

> oh sorry, I hope I get it right this time :)

Your email client has mangled the tab characters into spaces
in the patch, so the patch still will not apply correctly.

Please fix this.

Edgar E Iglesias | 1 Jun 2005 02:02
Picon
Favicon

Re: ipv4 ipsec

On Tue, May 31, 2005 at 04:47:41PM -0700, David S. Miller wrote:
> From: Edgar E Iglesias <edgar.iglesias <at> axis.com>
> Date: Wed, 1 Jun 2005 01:23:40 +0200
> 
> > oh sorry, I hope I get it right this time :)
> 
> Your email client has mangled the tab characters into spaces
> in the patch, so the patch still will not apply correctly.
> 
> Please fix this.

One more try.. sorry

Best regards
--

-- 
        Programmer
        Edgar E Iglesias <edgar <at> axis.com> 46.46.272.1946

Signed-off-by: Edgar E Iglesias <edgar <at> axis.com>

-----

--- linux-2.6.11-gentoo-r6/net/ipv4/esp4.c	2005-04-14 21:39:32.000000000 +0200
+++ linux-2.6.11-gentoo-r9/net/ipv4/esp4.c	2005-06-01 00:38:55.000000000 +0200
 <at>  <at>  -480,7 +480,7  <at>  <at> 
 {
 	struct xfrm_decap_state decap;

-	if (sizeof(struct esp_decap_data)  <
+	if (sizeof(struct esp_decap_data)  >
(Continue reading)


Gmane