Matthew Dobson | 2 Jun 2003 20:26
Picon
Favicon

Re: Simple NUMA library for AMD64

I'd like to have commented on this sooner, but I've been busy with some
other work.  This looks really interesting, and I'd love to help any way
I can.

Now, on to more specific comments.

Andi Kleen wrote:
 > Hallo,
 >
<SNIP>
 >
 > It only deals with nodes, not CPUs. One reason for this is that it is
 > AMD64 centric where CPU equals node, but even on other architectures 
with
 > multiple CPUs per node more finegrained settings than nodes do not 
seem to be
 > commonly used. Inside a node conventional SMP tunings can be used, no 
need
 > for an NUMA library.
 >
 > The only possible exception is the CPU binding (numa_run_on_node*), but
 > node granuality seems to be enough for that too. If it should be a 
problem
 > the application can call sched_setaffinity directly.

I like this.  Working with CPUs can be a pain because they tend to be
used in groups anyway, ie: nodes (as you pointed out), they tend not to
have physical memory directly associated with them (make membinding
tricky), they force bitmasks to be much larger (ie: a 32bit bitmask of
nodes covers a larger array of systems than a 32 bit bitmask of CPUs), etc.
(Continue reading)

William Lee Irwin III | 2 Jun 2003 20:49

Re: Simple NUMA library for AMD64

On Mon, Jun 02, 2003 at 11:26:21AM -0700, Matthew Dobson wrote:
> Ermm... I have to disagree here.  Right now, I don't think leaving out
> distance is too a big deal, just assume everything is either local or 
> remote, but in the future, there *will* be a need for it.  As machines 
> consistently get larger, we're going to have multiple hops to go from 
> one CPU to another (some machines have this now), and we'll need a 
> distance metric.  I guess only time will tell... ;)

Well there are two dimensions to this:
(1) For strict bindings, relative distances are irrelevant because
	things are merely bound to one point or another.
(2) For interleaving, distance and other topological considerations
	have more importance, but there are no proposed algorithms
	to utilize it.

Getting a notion of what to do in-kernel once the kernel has deeper
knowledge of the topology would make a stronger argument for its
relevance and support, but I'm not hearing much about algorithms that
would utilize that information for effective striping policies yet.

-- wli

-------------------------------------------------------
This SF.net email is sponsored by: eBay
Get office equipment for less on eBay!
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
Matthew Dobson | 2 Jun 2003 21:07
Picon
Favicon

Re: Simple NUMA library for AMD64

William Lee Irwin III wrote:
> On Mon, Jun 02, 2003 at 11:26:21AM -0700, Matthew Dobson wrote:
> 
>>Ermm... I have to disagree here.  Right now, I don't think leaving out
>>distance is too a big deal, just assume everything is either local or 
>>remote, but in the future, there *will* be a need for it.  As machines 
>>consistently get larger, we're going to have multiple hops to go from 
>>one CPU to another (some machines have this now), and we'll need a 
>>distance metric.  I guess only time will tell... ;)
> 
> 
> Well there are two dimensions to this:
> (1) For strict bindings, relative distances are irrelevant because
> 	things are merely bound to one point or another.
> (2) For interleaving, distance and other topological considerations
> 	have more importance, but there are no proposed algorithms
> 	to utilize it.
> 
> Getting a notion of what to do in-kernel once the kernel has deeper
> knowledge of the topology would make a stronger argument for its
> relevance and support, but I'm not hearing much about algorithms that
> would utilize that information for effective striping policies yet.

Mm...  Good point.  I hadn't thought of that.  This API is just in 
charge of the actual bindings.  Something else (syscalls, sysfs, another 
library/API) will be in charge of reporting useful topology distances 
and other info.  Withdrawn. :)

Cheers!

(Continue reading)

Andi Kleen | 3 Jun 2003 10:12
Picon

Re: Simple NUMA library for AMD64

Hallo!

Thanks for the detailed review. This was very useful. Comments below.

I will try to post a 2.5 implementation of this using Erich Focht's 
new 2.5 homenode scheduler soon.

On Mon, Jun 02, 2003 at 11:26:21AM -0700, Matthew Dobson wrote:
> > Possible distance between nodes is ignored. On current AMD64 it doesn't
> > exist and it seems like a very big complication for little gain even
> > on other architectures. If it should be needed it can be read from
> > sysfs in 2.5.
> 
> Ermm... I have to disagree here.  Right now, I don't think leaving out
> distance is too a big deal, just assume everything is either local or 
> remote, but in the future, there *will* be a need for it.  As machines 
> consistently get larger, we're going to have multiple hops to go from 
> one CPU to another (some machines have this now), and we'll need a 
> distance metric.  I guess only time will tell... ;)

If applications need it they can read it from sysfs. 

> I don't like this either, but I see that several other have already
> mentioned the badness of this, so I'll leave it alone... :)

It's already fixed now. I introduced a fd_set like type for it.
Actually my kernel patch still has the limitation (it uses prctl and only
the first argument is used), but prctl has 4 arguments left (all together
320 nodes on 64bit) which should be enough for now. The kernel internally 
doesn't support more than 64 anyways.
(Continue reading)

Paul Larson | 6 Jun 2003 23:19

Gcov-kernel patch updates for 2.4.20 and 2.5.70

The Linux Kernel GCOV patch has been updated for 2.4.20 and 2.5.70.  The
patches can be downloaded from:
https://sourceforge.net/project/showfiles.php?group_id=3382

Major changes in this release:
* ppc32 support - Many thanks to Nigel Hinds, Paul Mackerras, and
everyone else who worked on this.
* The gcov-proc module is no longer external, it can now be build with
CONFIG_GCOV_PROC turned on.  So, the tarball is no longer needed, just
the patch now.

For more information about this patch, please see:
http://ltp.sourceforge.net/coverage/gcov-kernel.php

Thanks,
Paul Larson

-------------------------------------------------------
This SF.net email is sponsored by:  Etnus, makers of TotalView, The best
thread debugger on the planet. Designed with thread debugging features
you've never dreamed of, try TotalView 6 free at www.etnus.com.
Hanna Linder | 10 Jun 2003 21:04
Picon
Favicon

Next week's topic is the K42 Operating System Research Project


This will be presented by Orran Krieger June 18.

K42 is a operating-system research project (available under LGPL) focused in large part on scalability in
the implementation of core OS services.  We will give a brief overview of the project, discuss some of the
recent performance results running 64-bit
PPC Linux binaries and discuss the major ways that our implementation of Linux services depart from the
norm (hopefully making sure to bring up all the truly offensive and contentious ways).  We will leave most
of the time for questions and discussion. 

 http://www.research.ibm.com/K42/ 

Date:	Wednesday June 18th

Time: 	1300 PDT
	1600 EDT
	2000 UTC/GMT

Call-in: All the lines will be open so please use mute if you aren't talking.
	North America: 1-877-849-9636
	Int'l: 1-719-457-5110
	Passcode: 372406

I will send out a reminder closer to the time of the call.

Thanks.

Hanna

	
(Continue reading)

Hanna Linder | 11 Jun 2003 18:15
Picon
Favicon

[2.4.21-rc8] Lockmeter port available (fwd)


---------- Forwarded Message ----------
Date: Tuesday, June 10, 2003 05:36:38 PM -0700
From: Hanna Linder <hannal <at> us.ibm.com>
To: linux-kernel <at> vger.kernel.org
Cc: hannal <at> us.ibm.com
Subject: [2.4.21-rc8] Lockmeter port available

As part of my testing of the fastwalk patch I ported the lockmeter
patch that John Hawkes wrote to 2.4.21-rc8. This patch now applied
cleanly, there were minimal merge conflicts I fixed for the i386 arch.
This allows you to see which functions are contending for which locks,
among other great things.

Available on Sourceforge at:

http://osdn.dl.sourceforge.net/sourceforge/lse/lockmeter1.5-2.4.21-rc8.diff

Hanna 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

---------- End Forwarded Message ----------

-------------------------------------------------------
This SF.NET email is sponsored by: eBay
(Continue reading)

Hanna Linder | 11 Jun 2003 18:15
Picon
Favicon

[PATCH 2.4.21-rc8] Fastwalk (fwd)


---------- Forwarded Message ----------
Date: Tuesday, June 10, 2003 05:32:54 PM -0700
From: Hanna Linder <hannal <at> us.ibm.com>
To: marcelo <at> conectiva.com.br
Cc: hannal <at> us.ibm.com, linux-kernel <at> vger.kernel.org, hch <at> infradead.org, Herbert Poetzl
<herbert <at> 13thfloor.at>, andrea <at> suse.de
Subject: [PATCH 2.4.21-rc8] Fastwalk

Marcelo,

We have talked about putting the fastwalk patch into 2.4 before.
Very little has changed and it should apply cleanly to 2.4.22-pre
too ;)

This patch reduces cacheline bouncing due to numerous atomic increments 
and decrements of the d_count reference counter during path walking by 
holding the dcache_lock as long as the dentries are in the cache. 

Linus included the original patch in 2.5.11. Al Viro made a few great 
changes in 2.5.12 and Paul Menage added one fix. The following patch includes 
their changes as well. The last time I had access to an 8-way this patch 
showed a throughput improvement while running dbench. Here is the output:

http://west.dl.sourceforge.net/sourceforge/lse/2419fw.png

Today I tested this patch on a 2-way with dbench and found similar 
throughput increase albeit on a smaller scale. I can send out the details 
of that if anyone is interested. 

(Continue reading)

Martin J. Bligh | 12 Jun 2003 16:53

2.5.70-mjb2

The patchset contains mainly scalability and NUMA stuff, and anything 
else that stops things from irritating me. It's meant to be pretty stable, 
not so much a testing ground for new stuff.

I'd be very interested in feedback from anyone willing to test on any 
platform, however large or small.

ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.69/patch-2.5.69-mjb1.bz2

additional patches that can be applied if desired:

(these two form the qlogic feral driver)
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm1/broken-out/linux-isp.patch
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm1/broken-out/isp-update-1.patch

Since 2.5.70-mjb1 (~ = changed, + = added, - = dropped)

Notes: I've left out some shiny new bits people have sent me because there
       were a few things that were just broken. Fixing those up before
       piling more on. Various floating fixes merged back into their main
       elements to make maintaince easier (possible?)

Now in Linus' tree:

New:
+ numaq_apic_handling				Martin J. Bligh
	Fix numaq code to use phys apic ids
+ remove_x86_summit				Martin J. Bligh
	remove the magic switch - genarch is better
+ target_cpus					Martin J. Bligh
(Continue reading)

Martin J. Bligh | 12 Jun 2003 16:55

Re: 2.5.70-mjb2

Pah. Wrong URL, sorry.

ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.70/patch-2.5.70-mjb2.bz2

-------------------------------------------------------
This SF.NET email is sponsored by: eBay
Great deals on office technology -- on eBay now! Click here:
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5

Gmane