Fermín Galán Márquez | 7 May 20:32 2007
Picon

Performance and limitations of virtual bridges

Hi,

Is there a limit in the number of interfaces a virtual bridge (created with
brctl) can support without having a severe impact in performance?

I guess that there is no absolute answer for that question :), but maybe
there is some kind of procedure/tool to know the "stress" or "load" that a
virtual bridge is supporting in a given moment (in a similar way that a
"top" can show you the CPU load).

My question is due to I'm using a virtual bridge with 14 interfaces (each
interface correspond to a Xen virtual machine in the same physical host)
and, given that I'm experiencing transmission delays in the network
supported by the bridge, I'm suspecting about a loss of performance of it.

Thanks in advance!

Best regards,

--------------------
Fermín Galán Márquez
CTTC - Centre Tecnològic de Telecomunicacions de Catalunya
Parc Mediterrani de la Tecnologia, Av. del Canal Olímpic s/n, 08860
Castelldefels, Spain
Room 1.02
Tel : +34 93 645 29 12
Fax : +34 93 645 29 01
Email address: fermin dot galan at cttc dot es 
Stephen Hemminger | 8 May 20:11 2007

Re: two fields are missing in brctl output when using /sys

On Thu, 05 Apr 2007 16:10:59 -0400
Jeremy Jackson <jerj <at> coplanar.net> wrote:

> I've noticed for a while that 
> 
> # brctl showstp
> 
> output is showing 0 for port_no and  port_id
> 
> It seems that somewhere in 2.6 sysfs land the following items got
> printed in hexadecimal, and brctl code was parsing for decimal only
> 
> doug:/sys/class/net/eth0/brport# cat port_id
> 0x8001
> doug:/sys/class/net/eth0/brport# cat port_no
> 0x1
> 
> The following patch to bridge-utils (git and 1.2 release) lets it do the
> right thing:
> 
> diff -ur bridge-utils-1.2/libbridge/libbridge_devif.c
> bridge-utils/libbridge/lib
> bridge_devif.c
> --- bridge-utils-1.2/libbridge/libbridge_devif.c        2007-04-05
> 16:02:58.2870
> 22220 -0400
> +++ bridge-utils/libbridge/libbridge_devif.c    2007-04-05
> 14:51:19.362040447 -0
> 400
>  <at>  <at>  -56,7 +56,7  <at>  <at> 
(Continue reading)

Kyle Moffett | 10 May 04:54 2007
Picon

[BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

I've managed to fairly reliably trigger a deadlock in some portion of  
the linux networking code on my Debian test box (using the debian  
kernel linux-image-2.6.20-1-686).  I'm pretty sure that it's a race  
condition of some sort as it doesn't trigger if I ifdown the  
interfaces one by one, but if I run "ifdown -a" then it triggers  
halfway through reliably (although not with the same reference count  
numbers, once it was 6, this time it was 2).

The message I get is: "unregister_netdevice: waiting for world0 to  
become free. Usage count = 2"

My /etc/network/interfaces file uses a couple custom-made if-pre-up.d  
and if-post-down.d scripts to set up the bridges and VLANs a little  
more cleanly than the standard debian scripts do, but the general  
configuration is as follows:

net0: Tigon-3 onboard gigabit NIC, hooked to managed switch, untagged  
packets
   wfi0: (net0.2 before renaming) WAP-connected VLAN 2 on the managed  
switch
   world0: (net0.4094 before renaming) Internet connection, runs DHCP

lan: Local Area Network bridge of "net0" and "wfi0" (current box has  
lowest STP priority)  This will eventually also have another untagged- 
only ethernet port attached to it.
   lan:0: Alias for acting as primary nameserver

world: pseudo-bridge of "world0" for highly-available DHCP client.

Just for a bit of background on why this is so complex:  When I get  
(Continue reading)

Ben Greear | 10 May 06:25 2007

Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

Kyle Moffett wrote:
>
> vconfig       D 83CCD8CE     0 16564  16562                     (NOTLB)
>        efdd7e7c 00000086 ee120afb 83ccd8ce 98f00788 b7083ffa 5384b49a 
> c76c0b05
>        9ebaf791 00000004 efdd7e4e 00000007 f1468a90 2ab74174 00000362 
> 00000326
>        f1468b9c c180e420 00000001 00000286 c012933c efdd7e8c df98a000 
> c180e468
> Call Trace:
> [<c012933c>] lock_timer_base+0x15/0x2f
> [<c0129445>] __mod_timer+0x91/0x9b
> [<c02988f5>] schedule_timeout+0x70/0x8d
> [<f8b75209>] vlan_device_event+0x13/0xf8 [8021q]

Looks like a deadlock in the vlan code.  Any chance you can run this 
test with
lockdep enabled?

You could also add a printk in vlan_device_event() to check which event 
it is hanging on, and
the netdevice that is passed in.

Since the vlan code holds RTNL at this point, then most other network 
tasks will eventually
hang as well.

Thanks,
Ben

(Continue reading)

Kyle Moffett | 10 May 06:34 2007
Picon

Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

On May 10, 2007, at 00:25:54, Ben Greear wrote:
> Kyle Moffett wrote:
>> vconfig       D 83CCD8CE     0 16564  16562                      
>> (NOTLB)
>>        efdd7e7c 00000086 ee120afb 83ccd8ce 98f00788 b7083ffa  
>> 5384b49a c76c0b05
>>        9ebaf791 00000004 efdd7e4e 00000007 f1468a90 2ab74174  
>> 00000362 00000326
>>        f1468b9c c180e420 00000001 00000286 c012933c efdd7e8c  
>> df98a000 c180e468
>> Call Trace:
>> [<c012933c>] lock_timer_base+0x15/0x2f
>> [<c0129445>] __mod_timer+0x91/0x9b
>> [<c02988f5>] schedule_timeout+0x70/0x8d
>> [<f8b75209>] vlan_device_event+0x13/0xf8 [8021q]
>
> Looks like a deadlock in the vlan code.  Any chance you can run  
> this test with lockdep enabled?
>
> You could also add a printk in vlan_device_event() to check which  
> event it is hanging on, and the netdevice that is passed in.

Ok, I'll try building a 2.6.21 kernel with lockdep and some debugging  
printk()s in the vlan_device_event() function and get back to you  
tomorrow.  Thanks for the quick response!

> Since the vlan code holds RTNL at this point, then most other  
> network tasks will eventually hang as well.

Well, it's less of an "eventually" and more of an "almost  
(Continue reading)

Kyle Moffett | 11 May 07:49 2007
Picon

Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

On May 10, 2007, at 00:34:11, Kyle Moffett wrote:
> On May 10, 2007, at 00:25:54, Ben Greear wrote:
>> Looks like a deadlock in the vlan code.  Any chance you can run  
>> this test with lockdep enabled?
>>
>> You could also add a printk in vlan_device_event() to check which  
>> event it is hanging on, and the netdevice that is passed in.
>
> Ok, I'll try building a 2.6.21 kernel with lockdep and some  
> debugging printk()s in the vlan_device_event() function and get  
> back to you tomorrow.  Thanks for the quick response!

Progress!!!  I built a 2.6.21.1 kernel with a 1MB dmesg buffer,  
almost all of the locking debugging options on (as well as a few  
others just for kicks), a VLAN debug #define turned on in the net/ 
8021q/vlan.h file, and lots of extra debugging messages added to the  
functions in vlan.c.  My initial interpretation is that due to the  
funny order in which "ifdown -a" takes down interfaces, it tries to  
delete the VLAN interfaces before the bridges running atop them have  
been taken down.  Ordinarily this seems to work, but when the  
underlying physical ethernet is down already, the last VLAN to be  
deleted seems to hang somehow.  The full results are as follows:

The lock dependency validator at startup passes all 218 testcases,  
indicating that all the locking crap is probably working correctly  
(those debug options chew up another meg of RAM).

ifup -a brings up the interfaces in this order (See previous email  
for configuration details):
lo net0 wfi0 world0 lan lan:0 world
(Continue reading)

Kip | 11 May 11:00 2007
Picon

[KJ] Re.: Fri.end.ship

Hello my dear friend
I waas looaking through the web few weeks ago and found 
your profile. Now I decided to email you to get to know
you better. Ib am coming to your country in few weeks
and thought may be we can meet each other. I am pretty
looking bgirl. I am 25. Do not reply tob this address 
directlay. Email me back at gcuf <at> SpringMailSite.info
Kyle Moffett | 12 May 14:13 2007
Picon

Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

On May 11, 2007, at 01:49:27, Kyle Moffett wrote:
> On May 10, 2007, at 00:34:11, Kyle Moffett wrote:
>> On May 10, 2007, at 00:25:54, Ben Greear wrote:
>>> Looks like a deadlock in the vlan code.  Any chance you can run  
>>> this test with lockdep enabled?
>>>
>>> You could also add a printk in vlan_device_event() to check which  
>>> event it is hanging on, and the netdevice that is passed in.
>>
>> Ok, I'll try building a 2.6.21 kernel with lockdep and some  
>> debugging printk()s in the vlan_device_event() function and get  
>> back to you tomorrow.  Thanks for the quick response!

[snip]

> ifup -a brings up the interfaces in this order (See previous email  
> for configuration details):
> lo net0 wfi0 world0 lan lan:0 world
>
> ifdown -a appears to bring them down in the same order (at least,  
> until it gets stuck).

Hmm, turns out that it always hung downing this entry in my  
interfaces file, independent of ordering:

iface world0 inet manual
	mac-address 8b:8d:cb:91:e2:4c
	minimally-up yes
	vlan-dev net0
	vlan-id 4094
(Continue reading)

Stephen Hemminger | 14 May 18:04 2007

Re: Performance and limitations of virtual bridges

On Mon, 7 May 2007 20:32:17 +0200
Fermín Galán Márquez <fermin.galan <at> cttc.es> wrote:

> Hi,
> 
> Is there a limit in the number of interfaces a virtual bridge (created with
> brctl) can support without having a severe impact in performance?

The problem with lots of interfaces is that if destination address is not
known (or multicast/broadcast) the packet has to be copied and sent N times.

> I guess that there is no absolute answer for that question :), but maybe
> there is some kind of procedure/tool to know the "stress" or "load" that a
> virtual bridge is supporting in a given moment (in a similar way that a
> "top" can show you the CPU load).

Worst case is the flooding problem.

> My question is due to I'm using a virtual bridge with 14 interfaces (each
> interface correspond to a Xen virtual machine in the same physical host)
> and, given that I'm experiencing transmission delays in the network
> supported by the bridge, I'm suspecting about a loss of performance of it.

Probably when flooding it has to wake up all the guest machines and
that is sucking your performance on hypervisor switches.

> Thanks in advance!
> 
> Best regards,
> 
(Continue reading)

chunhui_true | 15 May 15:01 2007

test

 sorry
-- 
                                    ,        ,
                                   /(        )`
                                   \ \___   / |
                                   /- _  `-/  '
                                  (/\/ \ \   /\
                                  / /   | `    \
                                  O O   ) /    |
                                  `-^--'`<     '
                                 (_.)  _  )   /
                                  `.___/`    /
                                    `-----' /
                       <----.     __ / __   \
                       <----|====O)))==) \) /====(O
                       <----'    `--' `.__,' \
                                    |        |
                                     \       /       /\
                                ______( (_  / \______/
                              ,'  ,-----'   |
                              `--{__________)

                              Shut up and code
                              chunhui_true <at> 163.com

Gmane