Gary Hade | 1 Oct 01:29 2008
Picon

Re: [PATCH] mm: show node to memory section relationship with symlinks in sysfs

On Tue, Sep 30, 2008 at 05:06:08PM +0900, Yasunori Goto wrote:
> 
> > +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> > +int register_mem_sect_under_node(struct memory_block *mem_blk)
>         :
> 
> I think this patch is convenience even when memory hotplug is disabled.
> CONFIG_SPARSEMEM seems better than CONFIG_MEMORY_HOTPLUG_SPARSE.

Yes, this would be nice but unfortunately the presence of the
memory section directories that are referenced by the symlinks
also depend on CONFIG_MEMORY_HOTPLUG_SPARSE being enabled.  Removal
of the memory hotplug dependency for the code in drivers/base/memory.c
will require more than a simple CONFIG_MEMORY_HOTPLUG_SPARSE to
CONFIG_SPARSEMEM dependency change.  I am still looking at this.

Thanks for the review and testing.

Gary

--

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade <at> us.ibm.com
http://www.ibm.com/linux/ltc
Matt Mackall | 1 Oct 02:02 2008

Re: [PATCH] slub: reduce total stack usage of slab_err & object_err


On Tue, 2008-09-30 at 17:20 +0100, Richard Kennedy wrote:
> Yes, using vprintk is better but you still have this path :
> ( with your patch applied)
> 
> 	object_err -> slab_bug(208) -> printk(216)
> instead of 
> 	object_err -> slab_bug_message(8) -> printk(216)
> 
> unfortunately the overhead for having var_args is pretty big, at least
> on x86_64. I haven't measured it on 32 bit yet.

Looks like this got fixed for x86_64 in gcc 4.4. For most other arches,
the overhead should be reasonable.

--

-- 
Mathematics is the supreme nostalgia of our time.

Yasunori Goto | 1 Oct 04:48 2008

Re: [PATCH] mm: show node to memory section relationship with symlinks in sysfs

> On Tue, Sep 30, 2008 at 08:50:37AM -0700, Dave Hansen wrote:
> > On Tue, 2008-09-30 at 17:06 +0900, Yasunori Goto wrote:
> > > > +#define section_nr_to_nid(section_nr) pfn_to_nid(section_nr_to_pfn(section_nr))
> > > >  #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> > > 
> > > If the first page of the section is not valid, then this section_nr_to_nid()
> > > doesn't return correct value.
> > > 
> > > I tested this patch. In my box, the start_pfn of node 1 is 1200400, but 
> > > section_nr_to_pfn(mem_blk->phys_index) returns 1200000. As a result,
> > > the section is linked to node 0.
> > 
> > Crap, I was worried about that.
> > 
> > Gary, this means that we have a N:1 relationship between NUMA nodes and
> > sections.  This normally isn't a problem because sections don't really
> > care about nodes and they layer underneath them.
> 
> So, using Yasunori-san's example the memory section starting at
> pfn 1200000 actually resides on both node 0 and node 1.

It may be possible that one section is divided to different node in theory.
(I don't know really there is...)

But, the cause of my trouble differs from it.
There is a memory hole which is occupied by firmware.
So, the memory map of my box is here.

----
early_node_map[3] active PFN ranges
(Continue reading)

Ulrich Drepper | 1 Oct 05:13 2008
Picon

Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes

On Tue, Sep 30, 2008 at 3:55 AM, Eric Dumazet <dada1 <at> cosmosbay.com> wrote:
> I am not sure how it could be converted to private futexes, since
> old binaries (static glibc) will use FUTEX_WAKE like calls.

We considered this back when but any effort seems too much.  We'd
either need a clone flag (a scarce resource) or replace the set_tid
_address syscall.  Given that the futex is woken once per thread
lifetime it shouldn't be an issue.  If the semaphore shows up even
after this patch feel free to introduce a new set_tid_address syscall.
Balbir Singh | 1 Oct 05:49 2008
Picon

Re: [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages

KAMEZAWA Hiroyuki wrote:
> This is conding-style fixed version. Thank you, Nishimura-san.
> -Kmae
> ==
> There are not-on-LRU pages which can be mapped and they are not worth to
> be accounted. (becasue we can't shrink them and need dirty codes to handle
> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> and don't want to account out-of-vm's control pages.
> 
> When special_mapping_fault() is called, page->mapping is tend to be NULL 
> and it's charged as Anonymous page.
> insert_page() also handles some special pages from drivers.
> 
> This patch is for avoiding to account special pages.
> 
> Changlog: v5 -> v6
>   - modified Documentation.
>   - fixed to charge only when a page is newly allocated.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com>
> 

[snip]
>  <at>  <at>  -2463,6 +2457,7  <at>  <at>  static int __do_fault(struct mm_struct *
>  	struct page *page;
>  	pte_t entry;
>  	int anon = 0;
> +	int charged = 0;
>  	struct page *dirty_page = NULL;
>  	struct vm_fault vmf;
(Continue reading)

Balbir Singh | 1 Oct 05:50 2008
Picon

Re: [PATCH 2/4] memcg: set page->mapping NULL before uncharge

KAMEZAWA Hiroyuki wrote:
> This patch tries to make page->mapping to be NULL before
> mem_cgroup_uncharge_cache_page() is called.
> 
> "page->mapping == NULL" is a good check for "whether the page is still
> radix-tree or not".
> This patch also adds BUG_ON() to mem_cgroup_uncharge_cache_page();
> 
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com>

Looks good to me

Acked-by: Balbir Singh <balbir <at> linux.vnet.ibm.com>

--

-- 
	Balbir
Balbir Singh | 1 Oct 06:03 2008
Picon

Re: [PATCH 9/12] memcg allocate all page_cgroup at boot

KAMEZAWA Hiroyuki wrote:
> Allocate all page_cgroup at boot and remove page_cgroup poitner
> from struct page. This patch adds an interface as
> 
>  struct page_cgroup *lookup_page_cgroup(struct page*)
> 
> All FLATMEM/DISCONTIGMEM/SPARSEMEM  and MEMORY_HOTPLUG is supported.
> 
> Remove page_cgroup pointer reduces the amount of memory by
>  - 4 bytes per PAGE_SIZE.
>  - 8 bytes per PAGE_SIZE
> if memory controller is disabled. (even if configured.)
> meta data usage of this is no problem in FLATMEM/DISCONTIGMEM.
> On SPARSEMEM, this makes mem_section[] size twice.
> 
> On usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory.
> On my x86-64 server with 48GB of memory, this saves 96MB of memory.
> (and uses xx kbytes for mem_section.)
> I think this reduction makes sense.
> 
> By pre-allocation, kmalloc/kfree in charge/uncharge are removed. 
> This means
>   - we're not necessary to be afraid of kmalloc faiulre.
>     (this can happen because of gfp_mask type.)
>   - we can avoid calling kmalloc/kfree.
>   - we can avoid allocating tons of small objects which can be fragmented.
>   - we can know what amount of memory will be used for this extra-lru handling.
> 
> I added printk message as
> 
(Continue reading)

KAMEZAWA Hiroyuki | 1 Oct 06:50 2008

Re: [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages

On Wed, 01 Oct 2008 09:19:10 +0530
Balbir Singh <balbir <at> linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > This is conding-style fixed version. Thank you, Nishimura-san.
> > -Kmae
> > ==
> > There are not-on-LRU pages which can be mapped and they are not worth to
> > be accounted. (becasue we can't shrink them and need dirty codes to handle
> > specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> > and don't want to account out-of-vm's control pages.
> > 
> > When special_mapping_fault() is called, page->mapping is tend to be NULL 
> > and it's charged as Anonymous page.
> > insert_page() also handles some special pages from drivers.
> > 
> > This patch is for avoiding to account special pages.
> > 
> > Changlog: v5 -> v6
> >   - modified Documentation.
> >   - fixed to charge only when a page is newly allocated.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com>
> > 
> 
> [snip]
> >  <at>  <at>  -2463,6 +2457,7  <at>  <at>  static int __do_fault(struct mm_struct *
> >  	struct page *page;
> >  	pte_t entry;
> >  	int anon = 0;
(Continue reading)

KAMEZAWA Hiroyuki | 1 Oct 07:07 2008

Re: [PATCH 9/12] memcg allocate all page_cgroup at boot

On Wed, 01 Oct 2008 09:33:53 +0530
Balbir Singh <balbir <at> linux.vnet.ibm.com> wrote:

> > +int __meminit init_section_page_cgroup(unsigned long pfn)
> > +{
> > +	struct mem_section *section;
> > +	struct page_cgroup *base, *pc;
> > +	unsigned long table_size;
> > +	int nid, index;
> > +
> > +	section = __pfn_to_section(pfn);
> > +
> > +	if (section->page_cgroup)
> > +		return 0;
> > +
> > +	nid = page_to_nid(pfn_to_page(pfn));
> > +
> > +	table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
> > +	base = kmalloc_node(table_size, GFP_KERNEL, nid);
> > +	if (!base)
> > +		base = vmalloc_node(table_size, nid);
> 
> Should we check for slab_is_available() before calling kmalloc_node? Otherwise,
> we might need to fallback to alloc_bootmem_node.
> 
That check is not necessary.
Because we use kmalloc()/vmalloc()  in mem_cgroup_create(), after this.

(We assume usual page allocateor is available here. For FLATMEM, size of 
 array can be too big for kmalloc(), etc... I use alloc_bootmem().
(Continue reading)

KAMEZAWA Hiroyuki | 1 Oct 07:32 2008

Re: [PATCH 9/12] memcg allocate all page_cgroup at boot

On Wed, 1 Oct 2008 14:07:48 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com> wrote:
> > Can we make this patch indepedent of the flags changes and push it in ASAP.
> > 
> Need much work....Hmm..rewrite all again ? 
> 

BTW, do you have good idea to modify flag bit without affecting LOCK bit on
page_cgroup->flags ?

At least, we'll have to set ACTIVE/INACTIVE/UNEVICTABLE flags dynamically.
take lock_page_cgroup() always ?
__mem_cgroup_move_lists() will have some amount of changes. And we should
check dead lock again.

Thanks,
-Kame


Gmane