Cliff Wickman | 1 Aug 2012 01:13
Picon
Favicon

Re: [PATCH v2] list corruption by gather_surp


On Mon, Jul 30, 2012 at 02:22:24PM +0200, Michal Hocko wrote:
> On Fri 27-07-12 17:32:15, Cliff Wickman wrote:
> > From: Cliff Wickman <cpw <at> sgi.com>
> > 
> > v2: diff'd against linux-next
> > 
> > I am seeing list corruption occurring from within gather_surplus_pages()
> > (mm/hugetlb.c).  The problem occurs in a RHEL6 kernel under a heavy load,
> > and seems to be because this function drops the hugetlb_lock.
> > The list_add() in gather_surplus_pages() seems to need to be protected by
> > the lock.
> > (I don't have a similar test for a linux-next kernel)
> 
> Because you cannot reproduce or you just didn't test it with linux-next?
> 
> > I have CONFIG_DEBUG_LIST=y, and am running an MPI application with 64 threads
> > and a library that creates a large heap of hugetlbfs pages for it.
> > 
> > The below patch fixes the problem.
> > The gist of this patch is that gather_surplus_pages() does not have to drop
> 
> But you cannot hold spinlock while allocating memory because the
> allocation is not atomic and you could deadlock easily.
> 
> > the lock if alloc_buddy_huge_page() is told whether the lock is already held.
> 
> The changelog doesn't actually explain how does the list gets corrupted.
> alloc_buddy_huge_page doesn't provide the freshly allocated page to use
> so nobody could get and free it. enqueue_huge_page happens under hugetlb_lock.
(Continue reading)

James Morris | 1 Aug 2012 03:34
Favicon

Re: [PATCH v3 07/10] mm: use mm->exe_file instead of first VM_EXECUTABLE vma->vm_file

On Tue, 31 Jul 2012, Konstantin Khlebnikov wrote:

> Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
> task's executable file, after this patch they will use mm->exe_file directly.
> mm->exe_file protected with mm->mmap_sem, so locking stays the same.
> 

Acked-by: James Morris <james.l.morris <at> oracle.com>

--

-- 
James Morris
<jmorris <at> namei.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Larry Woodman | 1 Aug 2012 04:45
Picon
Favicon

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

On 07/31/2012 04:06 PM, Michal Hocko wrote:
> On Tue 31-07-12 13:49:21, Larry Woodman wrote:
>> On 07/31/2012 08:46 AM, Mel Gorman wrote:
>>> Fundamentally I think the problem is that we are not correctly detecting
>>> that page table sharing took place during huge_pte_alloc(). This patch is
>>> longer and makes an API change but if I'm right, it addresses the underlying
>>> problem. The first VM_MAYSHARE patch is still necessary but would you mind
>>> testing this on top please?
>> Hi Mel, yes this does work just fine.  It ran for hours without a panic so
>> I'll Ack this one if you send it to the list.
> Hi Larry, thanks for testing! I have a different patch which tries to
> address this very same issue. I am not saying it is better or that it
> should be merged instead of Mel's one but I would be really happy if you
> could give it a try. We can discuss (dis)advantages of both approaches
> later.
>
> Thanks!

Hi Michal, the system hung when I tested this patch on top of the
latest 3.5 kernel.  I wont have AltSysrq access to the system until
tomorrow AM.  I'll retry this kernel and get AltSysrq output and let
you know whats happening in the morning.

Larry

> ---
>  From 8cbf3bd27125fc0a2a46cd5b1085d9e63f9c01fd Mon Sep 17 00:00:00 2001
> From: Michal Hocko<mhocko <at> suse.cz>
> Date: Tue, 31 Jul 2012 15:00:26 +0200
> Subject: [PATCH] mm: hugetlbfs: Correctly populate shared pmd
(Continue reading)

David Rientjes | 1 Aug 2012 05:55
Picon
Favicon

Re: [PATCH 2/2] memcg, oom: Clarify some oom dump messages

On Wed, 25 Jul 2012, Sha Zhengju wrote:

> From: Sha Zhengju <handai.szj <at> taobao.com>
> 
> Revise some oom dump messages to avoid misleading admin.
> 

The only place the oom killer emits information on what it does via the 
kernel log so changing this has the potential for messing up a number of 
scripts that people are using for parsing it (this would break some of our 
log scraping code, for instance).

This adds nothing except a bogus message that is emitted when 
select_bad_process() races with oom_kill_process() and no kill occurs 
because all threads of the selected process have detached their mm.

Nack.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Michael Kerrisk | 1 Aug 2012 07:15
Picon

Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall

On Mon, Jul 30, 2012 at 9:29 PM, Christoph Lameter <cl <at> linux.com> wrote:
> On Sat, 28 Jul 2012, JoonSoo Kim wrote:
>
>> 2012/7/28 Christoph Lameter <cl <at> linux.com>:
>> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>> >
>> >> move_pages() syscall may return success in case that
>> >> do_move_page_to_node_array return positive value which means migration failed.
>> >
>> > Nope. It only means that the migration for some pages has failed. This may
>> > still be considered successful for the app if it moves 10000 pages and one
>> > failed.
>> >
>> > This patch would break the move_pages() syscall because an error code
>> > return from do_move_pages_to_node_array() will cause the status byte for
>> > each page move to not be updated anymore. Application will not be able to
>> > tell anymore which pages were successfully moved and which are not.
>>
>> In case of returning non-zero, valid status is not required according
>> to man page.
>
> Cannot find a statement like that in the man page. The return code
> description is incorrect. It should that that is returns the number of
> pages not moved otherwise an error code (Michael please fix the manpage).

Hi Christoph,

Is the patch below acceptable? (I've attached the complete page as well.)

See you in San Diego (?),
(Continue reading)

Wen Congyang | 1 Aug 2012 08:06
Favicon

Re: [RFC PATCH v5 12/19] memory-hotplug: introduce new function arch_remove_memory()

At 08/01/2012 10:44 AM, jencce zhou Wrote:
> 2012/7/27 Wen Congyang <wency <at> cn.fujitsu.com>:
>> We don't call __add_pages() directly in the function add_memory()
>> because some other architecture related things need to be done
>> before or after calling __add_pages(). So we should introduce
>> a new function arch_remove_memory() to revert the things
>> done in arch_add_memory().
>>
>> Note: the function for s390 is not implemented(I don't know how to
>> implement it for s390).
>>
>> CC: David Rientjes <rientjes <at> google.com>
>> CC: Jiang Liu <liuj97 <at> gmail.com>
>> CC: Len Brown <len.brown <at> intel.com>
>> CC: Benjamin Herrenschmidt <benh <at> kernel.crashing.org>
>> CC: Paul Mackerras <paulus <at> samba.org>
>> CC: Christoph Lameter <cl <at> linux.com>
>> Cc: Minchan Kim <minchan.kim <at> gmail.com>
>> CC: Andrew Morton <akpm <at> linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro <at> jp.fujitsu.com>
>> CC: Yasuaki Ishimatsu <isimatu.yasuaki <at> jp.fujitsu.com>
>> Signed-off-by: Wen Congyang <wency <at> cn.fujitsu.com>
>> ---
>>  arch/ia64/mm/init.c                  |   16 ++++
>>  arch/powerpc/mm/mem.c                |   14 +++
>>  arch/s390/mm/init.c                  |    8 ++
>>  arch/sh/mm/init.c                    |   15 +++
>>  arch/tile/mm/init.c                  |    8 ++
>>  arch/x86/include/asm/pgtable_types.h |    1 +
>>  arch/x86/mm/init_32.c                |   10 ++
(Continue reading)

Wen Congyang | 1 Aug 2012 08:09
Favicon

Re: [RFC PATCH v5 16/19] memory-hotplug: free memmap of sparse-vmemmap

At 07/31/2012 08:22 PM, Gerald Schaefer Wrote:
> On Fri, 27 Jul 2012 18:34:38 +0800
> Wen Congyang <wency <at> cn.fujitsu.com> wrote:
> 
>> From: Yasuaki Ishimatsu <isimatu.yasuaki <at> jp.fujitsu.com>
>>
>> All pages of virtual mapping in removed memory cannot be freed, since
>> some pages used as PGD/PUD includes not only removed memory but also
>> other memory. So the patch checks whether page can be freed or not.
>>
>> How to check whether page can be freed or not?
>>  1. When removing memory, the page structs of the revmoved memory are
>> filled with 0FD.
>>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be
>> cleared. In this case, the page used as PT/PMD can be freed.
>>
>> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is
>> integrated into one. So __remove_section() of
>> CONFIG_SPARSEMEM_VMEMMAP is deleted.
> 
> There should also be generic or dummy versions of the functions
> vmemmap_free_bootmem(), vmemmap_kfree() and
> register_page_bootmem_memmap(). It doesn't compile on other
> archtitectures than x86 as it is now:
> 
> mm/built-in.o: In function `sparse_remove_one_section':
> (.text+0x49fa6): undefined reference to `vmemmap_free_bootmem'
> mm/built-in.o: In function `sparse_remove_one_section':
> (.text+0x49fcc): undefined reference to `vmemmap_kfree'
> mm/built-in.o: In function `register_page_bootmem_info_node':
(Continue reading)

Michal Hocko | 1 Aug 2012 08:51
Picon

Re: [PATCH v2] list corruption by gather_surp

On Tue 31-07-12 18:13:06, Cliff Wickman wrote:
> 
> On Mon, Jul 30, 2012 at 02:22:24PM +0200, Michal Hocko wrote:
> > On Fri 27-07-12 17:32:15, Cliff Wickman wrote:
> > > From: Cliff Wickman <cpw <at> sgi.com>
> > > 
> > > v2: diff'd against linux-next
> > > 
> > > I am seeing list corruption occurring from within gather_surplus_pages()
> > > (mm/hugetlb.c).  The problem occurs in a RHEL6 kernel under a heavy load,
> > > and seems to be because this function drops the hugetlb_lock.
> > > The list_add() in gather_surplus_pages() seems to need to be protected by
> > > the lock.
> > > (I don't have a similar test for a linux-next kernel)
> > 
> > Because you cannot reproduce or you just didn't test it with linux-next?
> > 
> > > I have CONFIG_DEBUG_LIST=y, and am running an MPI application with 64 threads
> > > and a library that creates a large heap of hugetlbfs pages for it.
> > > 
> > > The below patch fixes the problem.
> > > The gist of this patch is that gather_surplus_pages() does not have to drop
> > 
> > But you cannot hold spinlock while allocating memory because the
> > allocation is not atomic and you could deadlock easily.
> > 
> > > the lock if alloc_buddy_huge_page() is told whether the lock is already held.
> > 
> > The changelog doesn't actually explain how does the list gets corrupted.
> > alloc_buddy_huge_page doesn't provide the freshly allocated page to use
(Continue reading)

Michal Hocko | 1 Aug 2012 10:20
Picon

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

On Tue 31-07-12 22:45:43, Larry Woodman wrote:
> On 07/31/2012 04:06 PM, Michal Hocko wrote:
> >On Tue 31-07-12 13:49:21, Larry Woodman wrote:
> >>On 07/31/2012 08:46 AM, Mel Gorman wrote:
> >>>Fundamentally I think the problem is that we are not correctly detecting
> >>>that page table sharing took place during huge_pte_alloc(). This patch is
> >>>longer and makes an API change but if I'm right, it addresses the underlying
> >>>problem. The first VM_MAYSHARE patch is still necessary but would you mind
> >>>testing this on top please?
> >>Hi Mel, yes this does work just fine.  It ran for hours without a panic so
> >>I'll Ack this one if you send it to the list.
> >Hi Larry, thanks for testing! I have a different patch which tries to
> >address this very same issue. I am not saying it is better or that it
> >should be merged instead of Mel's one but I would be really happy if you
> >could give it a try. We can discuss (dis)advantages of both approaches
> >later.
> >
> >Thanks!
> 
> Hi Michal, the system hung when I tested this patch on top of the
> latest 3.5 kernel.  I wont have AltSysrq access to the system until
> tomorrow AM.  

Please hold on. The patch is crap. I forgot about 
if (!vma_shareable(vma, addr))
	return;

case so somebody got an uninitialized pmd. The patch bellow handles
that.

(Continue reading)

Glauber Costa | 1 Aug 2012 10:42
Favicon

Re: Common [2/9] slub: Use kmem_cache for the kmem_cache structure

On 07/31/2012 09:36 PM, Christoph Lameter wrote:
> Do not use kmalloc() but kmem_cache_alloc() for the allocation
> of the kmem_cache structures in slub.
> 
> This is the way its supposed to be. Recent merges lost
> the freeing of the kmem_cache structure and so this is also
> fixing memory leak on kmem_cache_destroy() by adding
> the missing free action to sysfs_slab_remove().

This patch seems incomplete to say the least.

1) You are still not touching the !SYSFS version of the function,
that still reads:

static inline void sysfs_slab_remove(struct kmem_cache *s)
{
        kfree(s->name);
        kfree(s);
}

and it is then inconsistent with its SYSFS version.

2) kmem_cache_release still reads:

static void kmem_cache_release(struct kobject *kobj)
{
        struct kmem_cache *s = to_slab(kobj);

        kfree(s->name);
        kfree(s);
(Continue reading)


Gmane