Ben Greear | 1 Apr 01:00 2005

Re: RFC: Redirect-Device

Thomas Graf wrote:
> * Ben Greear <424C7813.4000101 <at> candelatech.com> 2005-03-31 14:22
> 
>>My personal opinion is that netlink sockets are a pain in the ass to deal
>>with, and there is no way I want to try to programatically parse the tc
>>input or output.
> 
> 
> libnl will make your life a lot easier, the tc support is not yet
> finished but I'll be releasing a version this week which at least
> fully support all the basics (link, neigh, address, routes, rules).
> It's only a matter of time until one can use the tc interface with
> only a few library calls.

Sometime I will revisit netlink.  There are some things I want to do
like allowing more than 8 bits of routing tables, and perhaps watching
for network link up/down events and the like.  Please don't bust ass to support
tc on my account as it's quite unlikely I'll use it any time soon.

Thanks,
Ben

--

-- 
Ben Greear <greearb <at> candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

Nivedita Singhvi | 1 Apr 01:03 2005
Picon

Re: [RFC] netif_rx: receive path optimization

Rick Jones wrote:

>> Take the following scenario in non-NAPI. -packet 1 arrives -interupt 
>> happens, NIC bound to CPU0
>> - in the meantime packets 2,3 arrive
>> - 3 packets put on queue for CPU0
>> - interupt processing done
>>
>> - packet 4 arrives, interupt, CPU1 is bound to NIC
>> - in the meantime packets 5,6 arrive
>> - CPU1 backlog queue used.
>> - interupt processing done
>>
>> Assume CPU0 is overloaded with other systenm work and CPU1 rx processing
>> kicks in first ... TCP sees packet 4, 5, 6 before 1, 2, 3 ..
> 
> 
> I "never" see that because I always bind a NIC to a specific CPU :)  
> Just about every networking-intensive benchmark report I've seen has 
> done the same.

Just a reminder that the networking-benchmark world and
the real networking deployment world have a less than desirable
intersection (which I know you know only too well, Rick ;)).
How often do people use affinity? How often do they really tune
the system for their workloads? How often do they turn off things
like SACK etc? Not very often in the real world. Designing OSs to
do better at benchmarks is a different proposition than designing
OSs to do well in the real world.

(Continue reading)

Herbert Xu | 1 Apr 01:19 2005
Picon
Picon

Re: KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (126)

Hi Dave:

On Wed, Mar 30, 2005 at 05:02:36PM -0800, David S. Miller wrote:
> On Wed, 30 Mar 2005 18:26:40 +1000
> Herbert Xu <herbert <at> gondor.apana.org.au> wrote:
>
> > The solution is to hold a ref count on the socket before we drop
> > the cb lock.
> 
> Applied, thanks Herbert.

Unfortunately my patch only closed half the race.  There is still
a chunk of code between netlink_dump_start and netlink_dump that runs
outside the cb lock which isn't protected by an sk reference.

Here is a better patch which protects the entire netlink_dump function
with a sk reference.

The other call to netlink_dump by recvmsg is safe as the open file
descriptor already holds a reference.  As such the final sock_put
in netlink_dump can be turned into a __sock_put since there is at
least one reference held by the caller.

Signed-off-by: Herbert Xu <herbert <at> gondor.apana.org.au>

Cheers,
--

-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert <at> gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
(Continue reading)

jamal | 1 Apr 01:20 2005
Picon
Picon

Re: RFC: Redirect-Device

On Thu, 2005-03-31 at 17:22, Ben Greear wrote:
> jamal wrote:
> > On Thu, 2005-03-31 at 16:26, Ben Greear wrote:

> 
> My personal opinion is that netlink sockets are a pain in the ass to deal
> with, and there is no way I want to try to programatically parse the tc
> input or output.
> 

Take a look at the libraries i mentioned. 

> And probably not so easy to manipulate from a kernel module.
> 
> And BNF cannot be more powerful than a c/c++ program with a byte-buffer
> representing the entire ethernet frame.
> 

For that level you write a program. In any language you want;->
I dont think you can beat the u32 classifier interface on how to
describe a packet. 

> >>I can also create a nice little set of virtual interfaces
> >>and connections  rdd0 <-> rdd1  |bridge|  rdd2 <-> rdd3.  I can then send traffic
> >>from rdd0 to rdd3 across the bridge, etc.  Now, this last bit is fairly
> >>contrived, but it happens to help me with some testing on my laptop which
> >>lacks a lot of external ethernet interfaces :)
> > 
> > So your goal is to define a path that the packet takes inside the kernel
> > across multiple devices? i.e some form of loose source routing?
(Continue reading)

Herbert Xu | 1 Apr 01:23 2005
Picon
Picon

Re: KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (126)

Hi Dave:

Here is the version for 2.4.

Unfortunately my patch only closed half the race.  There is still
a chunk of code between netlink_dump_start and netlink_dump that runs
outside the cb lock which isn't protected by an sk reference.

Here is a better patch which protects the entire netlink_dump function
with a sk reference.

The other call to netlink_dump by recvmsg is safe as the open file
descriptor already holds a reference.  As such the final sock_put
in netlink_dump can be turned into a __sock_put since there is at
least one reference held by the caller.

Signed-off-by: Herbert Xu <herbert <at> gondor.apana.org.au>

Cheers,
--

-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert <at> gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--- linux-2.4/net/netlink/af_netlink.c.orig	2005-04-01 09:20:05.000000000 +1000
+++ linux-2.4/net/netlink/af_netlink.c	2005-04-01 09:21:06.000000000 +1000
 <at>  <at>  -981,11 +981,9  <at>  <at> 
 	len = cb->dump(skb, cb);
(Continue reading)

jamal | 1 Apr 01:26 2005
Picon
Picon

Re: RFC: Redirect-Device

On Thu, 2005-03-31 at 17:54, Ben Greear wrote:

> No.  I can't imagine a way to make it work with my application.
> 

I think you are more comfortable with using netdevices and ioctls and
/proc. 
If the action stuff cant do what you need i will make a donation
to the EFF on your behalf;->

> I obviously can't force you to accept the redirect module, so
> if no one else sees any reason for it, then we can simply
> drop the matter and I'll carry it in my own patch set like
> I do my other stuff.  No hard feelings, and if someone decides
> they could use something like it in the future, then perhaps
> we can take another look at it.
> 

Why dont we try to help you so you migrate from the approach you are
currently taking?

cheers,
jamal

Rick Jones | 1 Apr 01:28 2005
Picon

Re: [RFC] netif_rx: receive path optimization

>> I "never" see that because I always bind a NIC to a specific CPU :)  
>> Just about every networking-intensive benchmark report I've seen has 
>> done the same.
> 
> 
> Just a reminder that the networking-benchmark world and
> the real networking deployment world have a less than desirable
> intersection (which I know you know only too well, Rick ;)).

Touche :)

> How often do people use affinity? How often do they really tune
> the system for their workloads? 

Not as often as they should.

 > How often do they turn off things like SACK etc?

Well, I'm in an email discussion with someone who seems to bump their TCP 
windows quite large, and disable timestamps...

> Not very often in the real world. Designing OSs to
> do better at benchmarks is a different proposition than designing
> OSs to do well in the real world.

BTW what is the real world purpose of having the multiple CPU affinity of NIC 
interrupts?  I have to admit it seems rather alien to me.  (In the context of no 
onboard NIC smarts being involved that is)

>>> Note Linux is quiet resilient to reordering compared to other OSes (as
(Continue reading)

Ben Greear | 1 Apr 01:35 2005

Re: RFC: Redirect-Device

jamal wrote:

> One thing you probably havent understood is that all the action stuff
> that happens on ingress happens before dev.c pkt receive.

Could you point me to where this is, or give me something to
search for?  I'm curious how/where it does the hook, and if I
understand that better, maybe I can start thinking about how
to make use of it for future hacks...

Thanks,
Ben

--

-- 
Ben Greear <greearb <at> candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

jamal | 1 Apr 01:36 2005
Picon
Picon

Re: [RFC] netif_rx: receive path optimization

On Thu, 2005-03-31 at 17:42, Rick Jones wrote:

> I "never" see that because I always bind a NIC to a specific CPU :)  Just about 
> every networking-intensive benchmark report I've seen has done the same.
> 

Do you have to be so clever? ;->

> > Note Linux is quiet resilient to reordering compared to other OSes (as
> > you may know) but avoiding this is a better approach - hence my
> > suggestion to use NAPI when you want to do serious TCP.
> 
> Would the same apply to NIC->CPU interrupt assignments? That is, bind the NIC to 
> a single CPU.
> 

No reordering there.

> > Dont think we can do that unfortunately: We are screwed by the APIC
> > architecture on x86.
> 
> The IPS and TOPS stuff was/is post-NIC-interrupt. Low-level driver processing 
> still happened/s on a specific CPU, it is the higher-level processing which is 
> done on another CPU.  The idea - with TOPS at least, is to try to access the ULP 
> (TCP, UDP etc) structures on the same CPU as last accessed by the app to 
> minimize that cache to cache migration.
> 

But if interupt happens on "wrong" cpu - and you decide higher level
processing is to be done on the "right" cpu (i assume queueing on some
(Continue reading)

jamal | 1 Apr 01:46 2005
Picon
Picon

Re: RFC: Redirect-Device

On Thu, 2005-03-31 at 18:35, Ben Greear wrote:
> jamal wrote:
> 
> > One thing you probably havent understood is that all the action stuff
> > that happens on ingress happens before dev.c pkt receive.
> 
> Could you point me to where this is, or give me something to
> search for?  I'm curious how/where it does the hook, and if I
> understand that better, maybe I can start thinking about how
> to make use of it for future hacks...

dev.c
search for CONFIG_NET_CLS_ACT

To see how simple an action can look like, take a look at
net/sched/gact.c (which does simple drop/accept etc); I have some
patches i need to submit to Dave that would make it even simpler to use
(and write less code). Attached is an example. 

Actually gact may have gotten a little bit complex because it now allows
you to add randomness to accepting, dropping, going to next action etc.

The one thing you have to understand is filters and actions are
separate. So what i am pointing to you is a simple action that is
executed after a packet matches a specified filter. Thomas has been
working on providing what are know as ematches which are very simple
filters that you could program.
You can match a packet and pass it through a series of actions of your
choice.

(Continue reading)


Gmane