Robert Watson | 1 Dec 2006 02:39
Picon
Favicon

Re: a proposed callout API


On Thu, 30 Nov 2006, Ivan Voras wrote:

> No trying to take sides here, but for us willing to learn here, what exactly 
> are the problems in Matt Dillon's suggestions? From a novice's POV, having 
> per-cpu queues looks (emphasis: looks) very scalable and performant.

The implications of adopting the model Matt proposes are quite far-reaching: 
callouts don't exist in isolation, but occur in the context of data structures 
and work occuring in many threads.  If callouts are pinned to a particular 
CPU, and can only be scheduled, rescheduled, and cancelled from that CPU, that 
implies either that all work associated with that callout is also pinned to 
the CPU, or that migration or message-passing be involved if the requirement 
comes up in a thread on another CPU.

Consider the case of TCP timers: a number of TCP timers get regularly 
rescheduled (delack, retransmit, etc).  If they can only be manipulated from 
cpu0 (i.e., protected by a synchronization primitive that can't be acquired 
from another CPU -- i.e., critical sections instead of mutexes), how do you 
handle the case where the a TCP packet for that connection is processed on 
cpu1 and needs to change the scheduling of the timer?  In a strict work/data 
structure pinning model, you would pin the TCP connection to cpu0, and only 
process any data leading to timer changes on that CPU.  Alternatively, you 
might pass a message from cpu1 to cpu0 to change the scheduling.

The idea of processing timers in multiple threads and pinning them to multiple 
CPUs clearly isn't a bad idea: we could likely benefit from parallelism (and 
generally, concurrency) in timer processing.  One of the things we discussed 
at the recent developer summit was subsystem callout threads (introducing the 
opportunity for parallism without committing to a particular CPU scheduling 
(Continue reading)

George V. Neville-Neil | 1 Dec 2006 04:38

Re: a proposed callout API

At Thu, 30 Nov 2006 21:57:01 +0000,
Poul-Henning Kamp wrote:
> I'm not going to dissect Matts emails because that will just lead 
> to a long an pointless flamewar.
> 
> Most of Matts emails focus on the specifics of implementation whereas
> I have repeatedly stressed that my focus is on defining a good API
> for programmers to use which will allow us to isolate the implemetation
> so we can experiment with different strategies.
> 
> I will work with John to write up our spec and we will publish that
> along with the reasoning soon.
> 

Excellent.  That will give us something more substantial to chew on.

Thanks,
George
_______________________________________________
freebsd-arch <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe <at> freebsd.org"

Matthew Dillon | 1 Dec 2006 10:30

Re: a proposed callout API

:The implications of adopting the model Matt proposes are quite far-reaching: 
:callouts don't exist in isolation, but occur in the context of data structures 
:and work occuring in many threads.  If callouts are pinned to a particular 
:...
:Consider the case of TCP timers: a number of TCP timers get regularly 
:rescheduled (delack, retransmit, etc).  If they can only be manipulated from 
:cpu0 (i.e., protected by a synchronization primitive that can't be acquired 
:from another CPU -- i.e., critical sections instead of mutexes), how do you 
:handle the case where the a TCP packet for that connection is processed on 
:cpu1 and needs to change the scheduling of the timer?  In a strict work/data 
:structure pinning model, you would pin the TCP connection to cpu0, and only 
:process any data leading to timer changes on that CPU.  Alternatively, you 
:might pass a message from cpu1 to cpu0 to change the scheduling.

    Yes, this is all very true.  One could think of this in a more abstract
    way if that would make things more clear:  All the work processing 
    related to a particular TCP connection is accumulated into a single
    'hopper'.  The hopper is what is being serialized with a mutex, or by
    cpu-locality, or even simply by thread-locality (dedicating a single
    thread to process a single hopper).  This means that all the work
    that has accumulated in the hopper can be processed while holding a
    single serializer instead of having to acquire and release a serializer
    for each work item within the hopper.

    That's the jist of it.  If you have enough hoppers, statistics takes
    care of the rest.  There is nothing that says the hoppers have to
    be pinned to particular cpu's, it just makes it easier for other
    system APIs if they are.

    For FreeBSD, I think the hopper abstraction might be the way to
(Continue reading)

Matthew Dillon | 1 Dec 2006 11:09

Re: a proposed callout API

:
:     http://www.ece.rice.edu/~willmann/pubs/paranet_tr06-872.pdf
:
:Robert N M Watson

   Oh, that paper.  You know, I talked to Alan about that paper a while
   back, but it isn't really possible to compare DragonFly side by side
   with FreeBSD yet in an SMP environment because we still have a lot 
   of BGL junk in the network path, and because our interrupts are
   still going to cpu #0.  The code itself is mostly MP safe, and Jeff
   has actually turned off the BGL in some of his own testing, but I
   can't do it officially yet.  In anycase, that is why DragonFly wasn't
   used.

   I like the paper but I'm not sure it is possible to compare particulars
   of the network implementation in such different operating systems
   (FreeBSD vs Linux).

   The basic issue is, in a nutshell:

   per-packet			per-connection
   ----------			-------------
   long code paths		short code paths
   long data paths		shorter data paths
   (more cache contention)	(less cache contention, work aggregation
				has a better change of fitting in the L1/L2,
				much less 'shared' data between cpus)

   lots of mutexes		fewer mutexes or no mutexes
   (eats time, potential
(Continue reading)

Poul-Henning Kamp | 1 Dec 2006 11:21
Picon
Favicon

Re: a proposed callout API

In message <200612011009.kB1A9VA8064231 <at> apollo.backplane.com>, Matthew Dillon w
rites:
>:
>:     http://www.ece.rice.edu/~willmann/pubs/paranet_tr06-872.pdf
>:
>:Robert N M Watson
>
>   Oh, that paper.  You know, I talked to Alan about that paper a while
>   back, but it isn't really possible to compare DragonFly side by side
>   with FreeBSD yet in an SMP environment because we still have a lot 
>   of BGL junk in the network path, and because our interrupts are
>   still going to cpu #0.  The code itself is mostly MP safe, and Jeff
>   has actually turned off the BGL in some of his own testing, but I
>   can't do it officially yet.  In anycase, that is why DragonFly wasn't
>   used.

So,  like, why don't you work on that, instead of annoying us with your
long lectures about how "The World Shall Be Ordered According To Me" ?

Poul-Henning

--

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk <at> FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-arch <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe <at> freebsd.org"
(Continue reading)

Ivan Voras | 1 Dec 2006 13:04
Picon

Re: What is the PREEMPTION option good for?

Robert Watson wrote:

> 
> They're independent twiddles, and can be frobbed separately.  If you can
> easily measure performance in the different configurations, seeing a
> table of permutations and results would be very nice to see what happens
> :-).

Ok, this is what I found:

- ipiwakeup doesn't produce differences as calculated by ministat
- turning off preemption produces visible differences, which are
calculated by ministat to be upto 10%.

x nopreempt+ipiwakeup
+ preempt+ipiwakeup
+--------------------------------------------------------------------------+
|+                + +        +                     x  x    xx    xx       x|
|    |___________A__M________|                       |______MA_______|     |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   7         99.92        104.19        101.48     101.78429     1.4606717
+   4          90.5         95.78         94.12         93.53     2.2081365
Difference at 95.0% confidence
	-8.25429 +/- 2.4751
	-8.10959% +/- 2.43172%
	(Student's t, pooled s = 1.74576)

Sorry about the small number of samples - these are collected from the
system in the same state and product version (the machine was otherwise
(Continue reading)

Matthew Dillon | 1 Dec 2006 19:39

Re: a proposed callout API

:So,  like, why don't you work on that, instead of annoying us with your
:long lectures about how "The World Shall Be Ordered According To Me" ?
:
:Poul-Henning

    Because, fortunately, the lives of DragonFly developers aren't governed
    by big-dick contests from people who clearly have no clue what DragonFly
    is all about.  I said from the outset that we wouldn't be taking
    shortcuts to 'compete' with other distributions, and we haven't.  The
    BGL will be turned off in the networking threads when we are good and
    ready to do it.  #1 on my priority list is achieving the major goals
    of the project and keeping the system stable while we do it.  MP is only
    part of those goals.  A big part, but still only a part.   Frankly, I
    think we have done a better job on the stability front then you guys
    have.

    My personal agenda for the January release is to finish the userland
    kernel support - basically features which allow a kernel to be built 
    as a userland application and to control independant VM spaces (its 'user'
    processes) with the help of the real kernel.  Userland kernels are linked
    against libc and the kernel APIs have to be simple enough and bullet
    proof enough for it to work as a non-root userland application.  The
    intent is to be able to do so with as little supporting overhead from
    the real kernel as possible.  This is going to be a very cool feature,
    similar in scope to usermode linux, and it is also a necessary prereq
    to reduce engineering cycle times for development of the big ticket items
    next year.

    My goals for all of next year are to make good progress on the two
    biggest ticket items in the DragonFly goal list -- SYSLINK and CCMS.
(Continue reading)

Julian Elischer | 1 Dec 2006 19:52

Re: a proposed callout API

Poul-Henning Kamp wrote:
> In message <200612011009.kB1A9VA8064231 <at> apollo.backplane.com>, Matthew Dillon w
> rites:
>> :
>> :     http://www.ece.rice.edu/~willmann/pubs/paranet_tr06-872.pdf
>> :
>> :Robert N M Watson
>>
>>   Oh, that paper.  You know, I talked to Alan about that paper a while
>>   back, but it isn't really possible to compare DragonFly side by side
>>   with FreeBSD yet in an SMP environment because we still have a lot 
>>   of BGL junk in the network path, and because our interrupts are
>>   still going to cpu #0.  The code itself is mostly MP safe, and Jeff
>>   has actually turned off the BGL in some of his own testing, but I
>>   can't do it officially yet.  In anycase, that is why DragonFly wasn't
>>   used.
> 
> So,  like, why don't you work on that, instead of annoying us with your
> long lectures about how "The World Shall Be Ordered According To Me" ?
Matt,
Ignore Poul-Henning's email on this.. He certainly speaks for
himself but his use of  "Us" or "We" doesn't incluse everyone....
  some of us ARE interested to hear intelligent comments.

Julian

> 
> Poul-Henning
> 

(Continue reading)

M. Warner Losh | 1 Dec 2006 20:17

Re: a proposed callout API

Gentlemen:

Please, if you don't have something nice to say, please STFU.  At
least in public.  I know that insults are hard to ignore, and that
wrongs must be righted, but not here.  Not now.  Not anymore.  You've
had multiple back and forths to get it out of your system.  If it
isn't out, please channel the energy elsewhere.  If it is out, thank
you for not polluting this list further.

If the post is off topic, ignore it. If you are offended by it, take
it up with the person making the post, not the whole list.  If you are
talking about off-topic stuff, please consider a different forum.

In short, please return to civility and make liberal use of the
'delete' button (or key sequence) rather than the 'Reply All' button
(or key sequence).  If you can't resist, please confine yourself to
the 'Reply to sender' functionality only.

Warner
_______________________________________________
freebsd-arch <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe <at> freebsd.org"

Gary Palmer | 1 Dec 2006 23:33
Picon
Favicon

Re: a proposed callout API

On Fri, Dec 01, 2006 at 10:39:15AM -0800, Matthew Dillon wrote:
> :So,  like, why don't you work on that, instead of annoying us with your
> :long lectures about how "The World Shall Be Ordered According To Me" ?
> :
> :Poul-Henning
> 
>     Because, fortunately, the lives of DragonFly developers aren't governed
>     by big-dick contests from people who clearly have no clue what DragonFly
>     is all about.

Can we please lose the attitudes and get on with the discussion?  This
is nothing to do with the original discussion and is no longer constructive.

I'm not singling Matt out by following up to his e-mail directly.  But
this is now way off topic and needs to be put back ON topic.

Thank you
_______________________________________________
freebsd-arch <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe <at> freebsd.org"


Gmane