Dave Hansen | 1 Oct 2011 02:08
Picon

[RFCv3][PATCH 2/4] add string_get_size_pow2()


This is a specialized version of string_get_size().

It only works on powers-of-two, and only outputs in
KiB/MiB/etc...  Doing it this way means that we do
not have to do any division like string_get_size()
does.

Signed-off-by: Dave Hansen <dave <at> linux.vnet.ibm.com>
---

 linux-2.6.git-dave/include/linux/string_helpers.h |    1 
 linux-2.6.git-dave/lib/string_helpers.c           |   23 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff -puN include/linux/string_helpers.h~string_get_size-pow2-1 include/linux/string_helpers.h
--- linux-2.6.git/include/linux/string_helpers.h~string_get_size-pow2-1	2011-09-30
17:03:00.511708995 -0700
+++ linux-2.6.git-dave/include/linux/string_helpers.h	2011-09-30 17:03:00.535708956 -0700
 <at>  <at>  -10,6 +10,7  <at>  <at>  enum string_size_units {
 	STRING_UNITS_2,		/* use binary powers of 2^10 */
 };

+u64 string_get_size_pow2(u64 size, char *unit_ret);
 int string_get_size(u64 size, enum string_size_units units,
 		    char *buf, int len);

diff -puN lib/string_helpers.c~string_get_size-pow2-1 lib/string_helpers.c
--- linux-2.6.git/lib/string_helpers.c~string_get_size-pow2-1	2011-09-30 17:03:00.515708988 -0700
+++ linux-2.6.git-dave/lib/string_helpers.c	2011-09-30 17:03:00.535708956 -0700
(Continue reading)

Dave Hansen | 1 Oct 2011 02:09
Picon

[RFCv3][PATCH 4/4] show page size in /proc/$pid/numa_maps


The output of /proc/$pid/numa_maps is in terms of number of pages
like anon=22 or dirty=54.  Here's some output:

7f4680000000 default file=/hugetlb/bigfile anon=50 dirty=50 N0=50
7f7659600000 default file=/anon_hugepage\040(deleted) anon=50 dirty=50 N0=50
7fff8d425000 default stack anon=50 dirty=50 N0=50

Looks like we have a stack and a couple of anonymous hugetlbfs
areas page which both use the same amount of memory.  They don't.

The 'bigfile' uses 1GB pages and takes up ~50GB of space.  The
anon_hugepage uses 2MB pages and takes up ~100MB of space while
the stack uses normal 4k pages.  You can go over to smaps to
figure out what the page size _really_ is with KernelPageSize
or MMUPageSize.  But, I think this is a pretty nasty and
counterintuitive interface as it stands.

The following patch adds a pagesize= field.  Note that this only
shows the kernel's notion of page size.  For transparent
hugepages, it still shows the base page size.  Here's some real
output.  Note the anon_hugepage in there.

# cat /proc/`pidof memknobs`/numa_maps
00400000 default file=/root/memknobs pagesize=4KiB dirty=3 active=2 N0=3
00602000 default file=/root/memknobs pagesize=4KiB anon=1 dirty=1 N0=1
00603000 default file=/root/memknobs pagesize=4KiB anon=1 dirty=1 N0=1
00604000 default heap pagesize=4KiB anon=6 dirty=6 N0=6
7f6766216000 default file=/lib/libc-2.9.so pagesize=4KiB mapped=98 mapmax=25 active=97 N0=98
7f676637e000 default file=/lib/libc-2.9.so
(Continue reading)

Dave Hansen | 1 Oct 2011 02:08
Picon

[RFCv3][PATCH 1/4] replace string_get_size() arrays


Instead of explicitly storing the entire string for each
possible units, just store the thing that varies: the
first character.

We have to special-case the 'B' unit (index==0).

This shaves about 100 bytes off of my .o file.

Signed-off-by: Dave Hansen <dave <at> linux.vnet.ibm.com>
---

 linux-2.6.git-dave/lib/string_helpers.c |   30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff -puN lib/string_helpers.c~string_get_size-pow2 lib/string_helpers.c
--- linux-2.6.git/lib/string_helpers.c~string_get_size-pow2	2011-09-30 16:50:31.628981352 -0700
+++ linux-2.6.git-dave/lib/string_helpers.c	2011-09-30 17:04:02.211607364 -0700
 <at>  <at>  -8,6 +8,23  <at>  <at> 
 #include <linux/module.h>
 #include <linux/string_helpers.h>

+static const char byte_units[] = "_KMGTPEZY";
+
+static char *__units_str(enum string_size_units unit, char *buf, int index)
+{
+	int place = 0;
+
+	/* index=0 is plain 'B' with no other unit */
+	if (index) {
(Continue reading)

Minchan Kim | 1 Oct 2011 08:59
Picon

Re: [patch 1/2]vmscan: correct all_unreclaimable for zone without lru pages

On Fri, Sep 30, 2011 at 10:12:23AM +0800, Shaohua Li wrote:
> On Thu, 2011-09-29 at 17:18 +0800, Minchan Kim wrote:
> > On Thu, Sep 29, 2011 at 09:14:51AM +0800, Shaohua Li wrote:
> > > On Thu, 2011-09-29 at 01:57 +0800, Minchan Kim wrote:
> > > > On Wed, Sep 28, 2011 at 03:08:31PM +0800, Shaohua Li wrote:
> > > > > On Wed, 2011-09-28 at 14:57 +0800, Minchan Kim wrote:
> > > > > > On Tue, Sep 27, 2011 at 03:23:04PM +0800, Shaohua Li wrote:
> > > > > > > I saw DMA zone always has ->all_unreclaimable set. The reason is the high zones
> > > > > > > are big, so zone_watermark_ok/_safe() will always return false with a high
> > > > > > > classzone_idx for DMA zone, because DMA zone's lowmem_reserve is big for a high
> > > > > > > classzone_idx. When kswapd runs into DMA zone, it doesn't scan/reclaim any
> > > > > > > pages(no pages in lru), but mark the zone as all_unreclaimable. This can
> > > > > > > happen in other low zones too.
> > > > > > 
> > > > > > Good catch!
> > > > > > 
> > > > > > > This is confusing and can potentially cause oom. Say a low zone has
> > > > > > > all_unreclaimable when high zone hasn't enough memory. Then allocating
> > > > > > > some pages in low zone(for example reading blkdev with highmem support),
> > > > > > > then run into direct reclaim. Since the zone has all_unreclaimable set,
> > > > > > > direct reclaim might reclaim nothing and an oom reported. If
> > > > > > > all_unreclaimable is unset, the zone can actually reclaim some pages.
> > > > > > > If all_unreclaimable is unset, in the inner loop of balance_pgdat we always have
> > > > > > > all_zones_ok 0 when checking a low zone's watermark. If high zone watermark isn't
> > > > > > > good, there is no problem. Otherwise, we might loop one more time in the outer
> > > > > > > loop, but since high zone watermark is ok, the end_zone will be lower, then low
> > > > > > > zone's watermark check will be ok and the outer loop will break. So looks this
> > > > > > > doesn't bring any problem.
> > > > > > 
> > > > > > I think it would be better to correct zone_reclaimable.
(Continue reading)

Joe Perches | 1 Oct 2011 21:29

Re: [RFCv3][PATCH 1/4] replace string_get_size() arrays

On Fri, 2011-09-30 at 17:08 -0700, Dave Hansen wrote:
> Instead of explicitly storing the entire string for each
> possible units, just store the thing that varies: the
> first character.

trivia

> diff -puN lib/string_helpers.c~string_get_size-pow2 lib/string_helpers.c
> --- linux-2.6.git/lib/string_helpers.c~string_get_size-pow2	2011-09-30 16:50:31.628981352 -0700
> +++ linux-2.6.git-dave/lib/string_helpers.c	2011-09-30 17:04:02.211607364 -0700
>  <at>  <at>  -8,6 +8,23  <at>  <at> 
>  #include <linux/module.h>
>  #include <linux/string_helpers.h>
>  
> +static const char byte_units[] = "_KMGTPEZY";

u64 could be up to ~1.8**19 decimal
zetta and yotta are not possible or necessary.
u128 maybe someday, but then other changes
would be necessary.

> +static char *__units_str(enum string_size_units unit, char *buf, int index)
> +{
> +	int place = 0;
> +
> +	/* index=0 is plain 'B' with no other unit */
> +	if (index) {
> +		buf[place++] = byte_units[index];

unbounded index (doesn't matter currently, it will for u128)
(Continue reading)

Joe Perches | 1 Oct 2011 21:33

Re: [RFCv3][PATCH 1/4] replace string_get_size() arrays

On Fri, 2011-09-30 at 17:08 -0700, Dave Hansen wrote:
> Instead of explicitly storing the entire string for each
> possible units, just store the thing that varies: the
> first character.

trivia

> diff -puN lib/string_helpers.c~string_get_size-pow2 lib/string_helpers.c
> --- linux-2.6.git/lib/string_helpers.c~string_get_size-pow2	2011-09-30 16:50:31.628981352 -0700
> +++ linux-2.6.git-dave/lib/string_helpers.c	2011-09-30 17:04:02.211607364 -0700
>  <at>  <at>  -8,6 +8,23  <at>  <at> 
>  #include <linux/module.h>
>  #include <linux/string_helpers.h>
>  
> +static const char byte_units[] = "_KMGTPEZY";

u64 could be up to ~1.8**19 decimal
zetta and yotta are not possible or necessary.
u128 maybe someday, but then other changes
would be necessary too.

> +static char *__units_str(enum string_size_units unit, char *buf, int index)
> +{
> +	int place = 0;
> +
> +	/* index=0 is plain 'B' with no other unit */
> +	if (index) {
> +		buf[place++] = byte_units[index];

index is unbounded (doesn't matter currently, it will for u128)
(Continue reading)

Mikael Abrahamsson | 2 Oct 2011 09:25
Picon
Favicon

page allocation failures, now in 2.6.38


Hello.

I just upgraded my ubuntu 10.10 amd64 (2.6.35) to 11.04 (2.6.38) and 14 
hours after the upgrade, it oopsed related to nf_conntrak (I've sent an 
email to netdev about it). I don't know if it's related, but my page 
allocation failures have returned (I've had them quite a lot in the past) 
so I thought I'd report them again now that I'm running a newer kernel.

2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

Oct  1 22:45:19 ub kernel: [38857.002540] __alloc_pages_slowpath: 12 callbacks suppressed
Oct  1 22:45:19 ub kernel: [38857.002545] swapper: page allocation failure. order:0, mode:0x4020
Oct  1 22:45:19 ub kernel: [38857.002551] Pid: 0, comm: swapper Tainted: G          I 2.6.38-11-generic #50-Ubuntu
Oct  1 22:45:19 ub kernel: [38857.002555] Call Trace:
Oct  1 22:45:19 ub kernel: [38857.002558]  <IRQ>  [<ffffffff811147e4>] ? __alloc_pages_nodemask+0x604/0x840
Oct  1 22:45:19 ub kernel: [38857.002577]  [<ffffffff81149f85>] ? alloc_pages_current+0xa5/0x110
Oct  1 22:45:19 ub kernel: [38857.002583]  [<ffffffff81153762>] ? new_slab+0x282/0x290
Oct  1 22:45:19 ub kernel: [38857.002589]  [<ffffffff81155222>] ? __slab_alloc+0x262/0x390
Oct  1 22:45:19 ub kernel: [38857.002597]  [<ffffffff814c6134>] ? __netdev_alloc_skb+0x24/0x50
Oct  1 22:45:19 ub kernel: [38857.002603]  [<ffffffff8115852b>] ? __kmalloc_node_track_caller+0x9b/0x1a0
Oct  1 22:45:19 ub kernel: [38857.002609]  [<ffffffff814c6134>] ? __netdev_alloc_skb+0x24/0x50
Oct  1 22:45:19 ub kernel: [38857.002615]  [<ffffffff814c5b93>] ? __alloc_skb+0x83/0x170
Oct  1 22:45:19 ub kernel: [38857.002620]  [<ffffffff814c6134>] ? __netdev_alloc_skb+0x24/0x50

Please see attached file for all the page allocation failures I got 
yesterday.

On 2.6.35, the failures seem to have been helped by 
"vm.min_free_kbytes=4096" but not anymore it seems. I just now changed it 
(Continue reading)

Gilad Ben-Yossef | 2 Oct 2011 10:44
Gravatar

Re: [PATCH 0/5] Reduce cross CPU IPI interference

On Wed, Sep 28, 2011 at 3:00 PM, Chris Metcalf <cmetcalf <at> tilera.com> wrote:
> On 9/25/2011 4:54 AM, Gilad Ben-Yossef wrote:
>
> I strongly concur with your motivation in looking for and removing sources
> of unnecessary cross-cpu interrupts.

Thanks for the support :-)

> We have some code in our tree (not yet
> returned to the community) that tries to deal with some sources of interrupt
> jitter on tiles that are running isolcpu and want to be 100% in user space.

Yes, I think this work will benefit this kind of use case (CPU/user
space bound on a dedicated CPU)
the most, although other use cases can benefit as well (e.g. power
management with idle cores).

Btw, do you have any plan to share the patches you mentioned? it could
save me a lot of time. Not wanting to
re-invent the wheel and all that...

>> This first version creates an on_each_cpu_mask infrastructure API (derived
>> from
>> existing arch specific versions in Tile and Arm) and uses it to turn two
>> global
>> IPI invocation to per CPU group invocations.
>
> The global version looks fine; I would probably make on_each_cpu() an inline
> in the !SMP case now that you are (correctly, I suspect) disabling
> interrupts when calling the function.
(Continue reading)

Shi, Alex | 2 Oct 2011 14:47
Picon
Favicon

RE: [PATCH] slub Discard slab page only when node partials > minimum setting

> > I am tested aim9/netperf, both of them was said related to memory
> > allocation, but didn't find performance change with/without PCP. Seems
> > only hackbench sensitive on this. As to aim9, whichever with ourself
> > configuration, or with Mel Gorman's aim9 configuration from his
> > mmtest, both of them has no clear performance change for PCP slub.
> 
> AIM9 tests are usually single threaded so I would not expect any differences.
> Try AIM7? And concurrent netperfs?

I used aim7+aim9 patch, and setup 2000 process run concurrently. But aim9 
can't have big press on slab in fact. 
As to concurrent netperf, I'd like try it after vacation, if you can wait. :) 
> 
> The PCP patch helps only if there is node lock contention. Meaning
> simultaneous allocations/frees from multiple processor from the same cache.
> 
> > Checking the kernel function call graphic via perf record/perf report,
> > slab function only be used much in hackbench benchmark.
> 
> Then the question arises if its worthwhile merging if it only affects this
> benchmark.
> 

From my viewpoint, the patch is still helpful on server machines, while no clear 
regression finding on desktop machine. So it useful. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
(Continue reading)

Shi, Alex | 2 Oct 2011 14:55
Picon
Favicon

RE: [PATCH] slub: remove a minus instruction in get_partial_node


> -----Original Message-----
> From: Christoph Lameter [mailto:cl <at> gentwo.org]
> Sent: Thursday, September 29, 2011 10:19 PM
> To: Shi, Alex
> Cc: Pekka Enberg; linux-mm <at> kvack.org; Chen, Tim C; Huang, Ying
> Subject: Re: [PATCH] slub: remove a minus instruction in get_partial_node
> 
> On Thu, 29 Sep 2011, Alex,Shi wrote:
> 
> > Don't do a minus action in get_partial_node function here, since
> > it is always zero.
> 
> A slab on the partial lists always has objects available. Why would it be
> zero?

Um, my mistaken. The reason should be: if code is here, the slab will be per cpu slab.
It is no chance to be in per cpu partial and no relationship with per cpu partial. So 
no reason to use this value as a criteria for filling per cpu partial. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>


Gmane