Trevor Kramer | 6 Feb 2012 09:57

kernel crash when using libnuma

I have a program which can use libnuma to allocate memory using
numa_alloc_onnode() or using malloc. When running in malloc mode
everything works fine but when running under libnuma mode I get
consistent kernel panics with the following traces. This only occurs
when multiple threads are running. Has anyone seen this before or have
any recommendations on how to debug further?

crash> bt
PID: 62333  TASK: ffff883ff5698b40  CPU: 17  COMMAND: "test"
 #0 [ffff883ff58378f0] machine_kexec at ffffffff810310cb
 #1 [ffff883ff5837950] crash_kexec at ffffffff810b6392
 #2 [ffff883ff5837a20] oops_end at ffffffff814de670
 #3 [ffff883ff5837a50] die at ffffffff8100f2eb
 #4 [ffff883ff5837a80] do_trap at ffffffff814ddf64
 #5 [ffff883ff5837ae0] do_invalid_op at ffffffff8100ceb5
 #6 [ffff883ff5837b80] invalid_op at ffffffff8100bf5b
    [exception RIP: split_huge_page+2021]
    RIP: ffffffff8116c605  RSP: ffff883ff5837c38  RFLAGS: 00010297
    RAX: 0000000000000001  RBX: ffff880ff704bc38  RCX: 000000000000fe9e
    RDX: 0000000000000000  RSI: 0000000000000046  RDI: 0000000000000246
    RBP: ffff883ff5837d08   R8: 0000000000000000   R9: 0000000000000004
    R10: 0000000000000001  R11: ffff880ff6fb7906  R12: ffff880ff84b7aa8
    R13: fffffffffffffff2  R14: ffffea006c34c000  R15: ffffea006c34c000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff883ff5837c30] split_huge_page at ffffffff8116c5aa
 #8 [ffff883ff5837d10] __split_huge_page_pmd at ffffffff8116c6d1
 #9 [ffff883ff5837d40] unmap_vmas at ffffffff8113559e
#10 [ffff883ff5837e80] unmap_region at ffffffff8113cce1
#11 [ffff883ff5837ef0] do_munmap at ffffffff8113d3a6
#12 [ffff883ff5837f50] sys_munmap at ffffffff8113d4e6
(Continue reading)

Andi Kleen | 6 Feb 2012 23:23

Re: kernel crash when using libnuma

On Mon, Feb 06, 2012 at 03:57:52AM -0500, Trevor Kramer wrote:
> I have a program which can use libnuma to allocate memory using
> numa_alloc_onnode() or using malloc. When running in malloc mode
> everything works fine but when running under libnuma mode I get
> consistent kernel panics with the following traces. This only occurs
> when multiple threads are running. Has anyone seen this before or have
> any recommendations on how to debug further?

Looks like a THP problem.

For RHEL issues you normally need to talk to RedHat, these lists
are more for mainline.

-Andi

> 
> crash> bt
> PID: 62333  TASK: ffff883ff5698b40  CPU: 17  COMMAND: "test"
>  #0 [ffff883ff58378f0] machine_kexec at ffffffff810310cb
>  #1 [ffff883ff5837950] crash_kexec at ffffffff810b6392
>  #2 [ffff883ff5837a20] oops_end at ffffffff814de670
>  #3 [ffff883ff5837a50] die at ffffffff8100f2eb
>  #4 [ffff883ff5837a80] do_trap at ffffffff814ddf64
>  #5 [ffff883ff5837ae0] do_invalid_op at ffffffff8100ceb5
>  #6 [ffff883ff5837b80] invalid_op at ffffffff8100bf5b
>     [exception RIP: split_huge_page+2021]
>     RIP: ffffffff8116c605  RSP: ffff883ff5837c38  RFLAGS: 00010297
>     RAX: 0000000000000001  RBX: ffff880ff704bc38  RCX: 000000000000fe9e
>     RDX: 0000000000000000  RSI: 0000000000000046  RDI: 0000000000000246
>     RBP: ffff883ff5837d08   R8: 0000000000000000   R9: 0000000000000004
(Continue reading)

Andrea Arcangeli | 6 Feb 2012 23:43
Picon
Favicon

Re: kernel crash when using libnuma

On Mon, Feb 06, 2012 at 11:23:18PM +0100, Andi Kleen wrote:
> On Mon, Feb 06, 2012 at 03:57:52AM -0500, Trevor Kramer wrote:
> > I have a program which can use libnuma to allocate memory using
> > numa_alloc_onnode() or using malloc. When running in malloc mode
> > everything works fine but when running under libnuma mode I get
> > consistent kernel panics with the following traces. This only occurs
> > when multiple threads are running. Has anyone seen this before or have
> > any recommendations on how to debug further?
> 
> 
> Looks like a THP problem.
> 
> For RHEL issues you normally need to talk to RedHat, these lists
> are more for mainline.

Well at this point we don't know yet if this affects mainline too or
not.

To be sure, you can file a bugzilla.redhat.com and we'll fix it ASAP
and submit the fix upstream if it happens there too.

Best of all is if you can send the source of the program that trigger
this attached to the bugzilla (or by email). Or create a small source
testcase that can reproduce it.

A BUG_ON is triggering, probably the rmap mapcount vs page->mapcount
one. I'm unsure why this is related to libnuma only, and a wild guess
could be that the vma policy does something wrong over the vmas to the
point rmap won't find the pmds (like wrong vma splitting or
something). But thanks to the BUG_ON there is close to zero risk of
(Continue reading)

Petr Holasek | 16 Feb 2012 14:54
Picon
Favicon

[PATCH] numademo: msize check for ptrchase test

From: Petr Holasek <pholasek <at> redhat.com>

This patch fixes ptrchase test segfault, when numademo is 
called with argument lower than sizeof(struct union node) 
(8 bytes on x86_64).

--
diff -up numactl-2.0.8-rc3/numademo.c.orig numactl-2.0.8-rc3/numademo.c
--- numactl-2.0.8-rc3/numademo.c.orig	2011-12-19 15:51:35.000000000 +0100
+++ numactl-2.0.8-rc3/numademo.c	2012-02-16 14:44:34.510249987 +0100
 <at>  <at>  -529,7 +529,13  <at>  <at>  int main(int ac, char **av)
 #ifdef HAVE_STREAM_LIB
 		test(STREAM);
 #endif
-		test(PTRCHASE);
+		if (msize >= sizeof(union node)) {
+			test(PTRCHASE);
+		} else {
+			fprintf(stderr, "You must set msize at least %lu bytes for ptrchase test.\n",
+				sizeof(union node));
+			exit(1);
+		}
 	} else {
 		int k;
 		for (k = 2; k < ac; k++) {
Cliff Wickman | 16 Feb 2012 16:21
Picon
Favicon

Re: [PATCH] numademo: msize check for ptrchase test


Thanks Petr.  I've included your patch in numactl-2.0.8-rc4.tar.gz
( ftp://oss.sgi.com/www/projects/libnuma/download/ )

To all interested in libnuma and numactl:

These are the changes since the 2.0.7, released in April, 2011:

2.0.8-rc1
- 110818 Checking of sucessful allocations in numademo (Petr Holasek)
2.0.8-rc2
- 110823 Fix of numactl (--touch) warnings and man page (Cliff W.) 2.0.8-rc3
- 111214 Add "same" nodemask alias to numactl (Andi Kleen)
- 111214 Add constructors for numa_init/exit (Andi Kleen)
- 111214 Add use of glibc syscall stub where possible (Andi Kleen)
- 111214 Fix regress1 to show all the problems before exiting (Andi Kleen)
- 111214 Add IO affinity support (Andi Kleen)
- 111214 Clean regression test temp files (Andi Kleen)
- 111214 Add an option to memhog to disable transparent huge pages (Andi Kleen)
- 111214 Fix the test suite on systems that force THP, disable them (Andi Kleen)
2.0.8-rc4
- 120106 Install man pages migspeed, migratepages and numastat (Petr Holasek)
- 120106 Warnings in numa_node_to_cpus_v1 to be more verbose (Petr Holasek)
- 120216 Fix for numademo: msize check for ptrchase test (Petr Holasek)

Version 2.0.8 should probably be released soon.  If you have any plans in hand
or in mind for fixes or enhancements, please send them or let me know that
they are coming.

Any testing of the current (2.0.8-rc4) would also be appreciated.
(Continue reading)

Brice Goglin | 20 Feb 2012 15:18
Picon
Picon
Favicon

bitmask/nodemask strangeness with sparse node ids

Hello,

I am debugging some problems on a strange power7 virtual node that
contains 3 sparse-numbered NUMA nodes:
* node#0 has NO memory but many CPUs
* node#2 and #3 have memory but no CPUs

I am trying to understand what numa_all_nodes_ptr and friends are
supposed to contain. I see:
    numa_all_nodes_ptr = bits 2 & 3
This seems to match the manpage when it says "representing all nodes on
which the calling task may allocate memory". But it doesn't really match
when it says "a bit mask that has all available nodes set" above (should
be bits 0&2&3).

If I look at numa_all_nodes instead, I get bits 0 & 1. libnuma.c seems
to set as many first bits as there are nodes I can allocate from. But I
don't see anywhere in the code something that can deal with such
nodemask on sparse-numbered platforms. For instance,
copy_bitmask_to_nodemask(numa_all_nodes_ptr, foo) puts bits 2 & 3 in foo
while foo should be equal to numa_all_nodes.

So:
* is the nodemask API known to be broken when node ids are sparse?
* is the nodemask API officially deprecated? If so, could this be
clarified in the header and manpage? The definition of nodemask_t is
very close to the header of numa.h without anything saying "don't use it".
I actually think that everything nodemask_t related should be out of
numa.h and out of the manpage. If people should not use them, they
should be hidden. It could be moved in something like numacompat1.h for
(Continue reading)

Ajay Anandan | 23 Feb 2012 21:50
Picon

Libnuma: Memory allocation problem

Hi,
I am trying to allocate memory in my NUMA-compatible AMD opteron
machine using the numa_alloc_onnode.  I am getting a Memory not
available error (errno 12).  But the nodes have far more memory that
what is being allocated.

I am trying to allocate a simple structure

typedef struct
{
    int N;
    int nz;
    float *value;      /* scalar values */
    int *index;     /* offsets */
} sparseVector;

using a for loop like this:
    for(long int i=0; i<100000000; i++)
    {
        ret = alloc_onnode(sizeof(sparseVector) ,0, "niceee");
        if(ret == NULL)
        {
                    fprintf (stderr, "Error during memory allocation.
 ErrNo:%d - %s\n Memory allocated %ld ",errno, strerror(errno),
i*1000000);
                    exit(0);
        }
    }

I am always getting an error once I allocate 1572264 bytes of memory.
(Continue reading)


Gmane