Lee Schermerhorn | 4 Mar 2010 18:07
Picon
Favicon

[PATCH/RFC 2/8] numa: add generic percpu var implementation of numa_node_id()

From: Christoph Lameter <cl <at> linux-foundation.org>

Against:  2.6.33-mmotm-100302-1838

Rework the generic version of the numa_node_id() function to use the
new generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option
to 'y' when NUMA is configured.  This config option could be removed
if/when all archs switch over to the generic percpu implementation
of numa_node_id().  Arch support involves:

  1) converting any existing per cpu variable implementations to use
     this implementation.  x86_64 is an instance of such an arch.
  2) archs that don't use a per cpu variable for numa_node_id() will
     need to initialize the new per cpu variable "numa_node" as cpus
     are brought on-line.  ia64 is an example.
  3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
     when NUMA is configured

Subsequent patches will convert x86_64 and ia64 to use this
implemenation.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn <at> hp.com>

V0:
(Continue reading)

Lee Schermerhorn | 4 Mar 2010 18:08
Picon
Favicon

[PATCH/RFC 6/8] numa: ia64: support numa_mem_id() for memoryless nodes

PATCH/RFC numa: ia64:  support memoryless nodes

Against:  2.6.33-mmotm-100302-1838

Enable 'HAVE_MEMORYLESS_NODES' by default when NUMA configured
on ia64.  Initialize percpu 'numa_mem' variable when starting
secondary cpus.  Generic initialization will handle the boot
cpu.

Nothing uses 'numa_mem_id()' yet.  Subsequent patch with modify
slab to use this.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn <at> hp.com>

New in V2

 arch/ia64/Kconfig          |    4 ++++
 arch/ia64/kernel/smpboot.c |    1 +
 2 files changed, 5 insertions(+)

Index: linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
===================================================================
--- linux-2.6.33-mmotm-100302-1838.orig/arch/ia64/Kconfig
+++ linux-2.6.33-mmotm-100302-1838/arch/ia64/Kconfig
 <at>  <at>  -502,6 +502,10  <at>  <at>  config USE_PERCPU_NUMA_NODE_ID
 	def_bool y
 	depends on NUMA

+config HAVE_MEMORYLESS_NODES
+	def_bool y
(Continue reading)

Lee Schermerhorn | 4 Mar 2010 18:08
Picon
Favicon

[PATCH/RFC 5/8] numa: Introduce numa_mem_id()- effective local memory node id

Against:  2.6.33-mmotm-100302-1838

Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "effective local memory node" for archs that support memoryless
nodes.

Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs. Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

if they don't want to use the generic version, but want to support
memoryless nodes.

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist rebuild.
Archs that use this implementation will need to initialize 'numa_mem' for
secondary cpus as they're brought on-line.

Question:  Is it worth adding a generic initialization of per cpu numa_mem?
E.g.,  built only when CONFIG_HAVE_MEMORYLESS_NODES defined?  Or leave it
to the archs?

Signed-off-by: Lee Schermerhorn <lee.schermerhorn <at> hp.com>
(Continue reading)

Lee Schermerhorn | 4 Mar 2010 18:08
Picon
Favicon

[PATCH/RFC 7/8] numa: slab: use numa_mem_id() for slab local memory node

[PATCH] numa:  Slab handle memoryless nodes

Against:  2.6.33-mmotm-100302-1838

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless
nodes well.  Specifically, the "fast path"--____cache_alloc()--will
never succeed as slab doesn't cache offnode objects on the per cpu
queues, and for memoryless nodes, all memory will be "off node"
relative to numa_node_id().  This adds significant overhead to all
kmem cache allocations, incurring a significant regression relative
to earlier kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to
return the "effective local memory node" for the calling context.
This is the first node in the local node's generic fallback zonelist--
i.e., the same node that "local" mempolicy-based allocations would
use.  This lets slab cache these "local" allocations and avoid
fallback/refill on every allocation.

N.B.:  Slab will need to handle node and memory hotplug events that
could change the value returned by numa_mem_id() for any given
node.  E.g., flush all per cpu slab queues before rebuilding the
zonelists.  Andi Kleen and David Rientjes are currently working on
patch series to improve slab support for memory hotplug.  When that
effort settles down, and if there is general agreement on this
approach, I'll prepare another patch to address possible change in
"local memory node", if still necessary.

(Continue reading)

Lee Schermerhorn | 4 Mar 2010 21:42
Picon
Favicon

Re: [PATCH/RFC 3/8] numa: x86_64: use generic percpu var for numa_node_id() implementation

On Thu, 2010-03-04 at 12:47 -0600, Christoph Lameter wrote: 
> On Thu, 4 Mar 2010, Lee Schermerhorn wrote:
> 
> > Index: linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> > ===================================================================
> > --- linux-2.6.33-mmotm-100302-1838.orig/arch/x86/include/asm/percpu.h
> > +++ linux-2.6.33-mmotm-100302-1838/arch/x86/include/asm/percpu.h
> >  <at>  <at>  -208,10 +208,12  <at>  <at>  do {									\
> >  #define percpu_or(var, val)		percpu_to_op("or", var, val)
> >  #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
> >
> > +#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >  #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
> >
> > +#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
> >  #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)
> 
> 
> The functions added are already defined in linux/percpu.h and their
> definition here is wrong since the u64 case is not handled (percpu.h does
> that correctly).

Well, in linux/percpu-defs.h after the first patch in this series, but
x86 is overriding it with the percpu_to_op() implementation.  You're
saying that the x86 percpu_to_op() macro doesn't handle 8-byte 'pcp'
operands?  It appears to handle sizes 1, 2, 4 and 8.
(Continue reading)

Tejun Heo | 9 Mar 2010 09:46

Re: [PATCH/RFC 1/8] numa: prep: move generic percpu interface definitions to percpu-defs.h

Hello,

On 03/05/2010 02:07 AM, Lee Schermerhorn wrote:
> To use the generic percpu infrastructure for the numa_node_id() interface,
> defined in linux/topology.h, we need to break the circular header dependency
> that results from including <linux/percpu.h> in <linux/topology.h>.  The
> circular dependency:
> 
> 	percpu.h -> slab.h -> gfp.h -> topology.h
> 
> percpu.h includes slab.h to obtain the definition of kzalloc()/kfree() for
> inlining __alloc_percpu() and free_percpu() in !SMP configurations.  One could
> un-inline these functions in the !SMP case, but a large number of files depend
> on percpu.h to include slab.h.  Tejun Heo suggested moving the definitions to
> percpu-defs.h and requested that this be separated from the remainder of the
> generic percpu numa_node_id() preparation patch.

Hmmm... I think uninlining !SMP case would be much cleaner.  Sorry
that you had to do it twice.  I'll break the dependency in the percpu
devel branch and let you know.

For other patches, except for what Christoph has already pointed out,
everything looks good to me.

Thank you.

--

-- 
tejun
Lee Schermerhorn | 9 Mar 2010 15:13
Picon
Favicon

Re: [PATCH/RFC 1/8] numa: prep: move generic percpu interface definitions to percpu-defs.h

On Tue, 2010-03-09 at 17:46 +0900, Tejun Heo wrote:
> Hello,
> 
> On 03/05/2010 02:07 AM, Lee Schermerhorn wrote:
> > To use the generic percpu infrastructure for the numa_node_id() interface,
> > defined in linux/topology.h, we need to break the circular header dependency
> > that results from including <linux/percpu.h> in <linux/topology.h>.  The
> > circular dependency:
> > 
> > 	percpu.h -> slab.h -> gfp.h -> topology.h
> > 
> > percpu.h includes slab.h to obtain the definition of kzalloc()/kfree() for
> > inlining __alloc_percpu() and free_percpu() in !SMP configurations.  One could
> > un-inline these functions in the !SMP case, but a large number of files depend
> > on percpu.h to include slab.h.  Tejun Heo suggested moving the definitions to
> > percpu-defs.h and requested that this be separated from the remainder of the
> > generic percpu numa_node_id() preparation patch.
> 
> Hmmm... I think uninlining !SMP case would be much cleaner.  Sorry
> that you had to do it twice.  I'll break the dependency in the percpu
> devel branch and let you know.

OK, I'll do that for V4.  It'll be one big ugly patch because of all the
dependencies.  But, it's really just a mechanical change.

> 
> For other patches, except for what Christoph has already pointed out,
> everything looks good to me.
> 
> Thank you.
(Continue reading)

Martin Vogt | 10 Mar 2010 18:10
Picon
Favicon

numa_num_configured_cpus off by 2 on 2.6.33 ?


Hello list,

currently my numa library reports two cpus more than I actuall have:

nCPUs=numa_num_configured_cpus();
printf("Currently available CPUs: %d\n",nCPUs);

Currently available CPUs: 34 (but it has only 32)

looking in the source:

static void
set_configured_cpus(void)
{
        int             filecount=0;
        char            *dirnamep = "/sys/devices/system/cpu";
        struct dirent   *dirent;
        DIR             *dir;
        dir = opendir(dirnamep);

        if (dir == NULL) {
                /* fall back to using the online cpu count */
                maxconfiguredcpu = sysconf(_SC_NPROCESSORS_CONF) - 1;
                return;
        }
        while ((dirent = readdir(dir)) != 0) {
                if (!strncmp("cpu", dirent->d_name, 3)) {
                        filecount++;
                } else {
(Continue reading)

Andi Kleen | 11 Mar 2010 23:48

Re: numa_num_configured_cpus off by 2 on 2.6.33 ?

On Wed, Mar 10, 2010 at 06:10:43PM +0100, Martin Vogt wrote:
> 
> Hello list,
> 
> currently my numa library reports two cpus more than I actuall have:

The code is broken anyways, it should be looking for the highest CPU 
number, otherwise it would not  handle CPU hotplug. Something like

	int max = 0;

...
	int n;
	if (sscanf(dirent->d_name, "cpu%d", &n) == 1 && n > max)
		max = n;

...
	use max 

-Andi
Lee Schermerhorn | 19 Mar 2010 19:59
Picon
Favicon

[PATCH 0/6] Mempolicy: additional cleanups

Here is a series of proposed memory policy cleanup patches, mostly
in the 'mpol' mount option parsing function 'mpol_parse_str()'.  I
came across these cleanup opportunities reviewing and testing
Kosaki Motohiro's 5 patch tmpfs series from 16mar.  This series applies
atop Kosaki-san's series.

Patch 5 of the series is more of a bug fix to get_mempolicy() discovered
while testing the other patches.

Gmane