Andrea Arcangeli | 25 May 19:02
Picon
Favicon

[PATCH 00/35] AutoNUMA alpha14

Hello everyone,

It's time for a new autonuma-alpha14 milestone.

Removed the [RFC] from Subject because 1) this is a release I'm quite
happy with (from the implementation side it allows the same kernel
image to boot optimally on NUMA and not-NUMA hardware and it avoids
altering the scheduler runtime most of the time) and 2) because of the
great benchmark results we got so far, showing this design so far has
been proved to perform best.

I believe (realistically speaking) nobody is going to change
applications to specify which thread is using which memory (for
threaded apps) with the only exception of QEMU and a few others.

For not threaded apps that fits in a NUMA node, there's no way a blind
home node can perform nearly as good as AutoNUMA: AutoNUMA monitor the
whole status of the memory of the running processes and it optimizes
the memory placement and CPU placement dynamically
accordingly. There's a small memory and CPU cost in collecting so much
information to be able to make smart decisions, but the benefits
largely outweight those costs.

If a big idle task was idle for a long while, but it suddenly start
computing, AutoNUMA may totally change the memory and CPU placement of
the other running tasks according to what's best, because it has
enough information to take optimal NUMA placement decisions.

git clone --reference linux -b autonuma-alpha14
git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git autonuma-alpha14
(Continue reading)

Jan Kara | 24 May 18:59
Picon

[PATCH 0/2 v4] Flexible proportions


  Hello,

  here is the next iteration of my flexible proportions code. I've addressed
all Peter's comments.
Changes since v3:
  * changed fprop_fraction_foo() to avoid using percpu_counter_sum()
  * changed __fprop_inc_percpu_max() to avoid 64-bit division (now maximum
    allowed fraction is expressed max_frac/FPROP_FRAC_BASE)
  * avoid drifting of period timer
  * handle better cases where period timer fires long after intended time by
    aging by really passed number of periods
Changes since v2:
  * use timer instead of workqueue for triggering period switch
  * arm timer only if aging didn't zero out all fractions, re-arm timer when
    new event arrives again
  * set period length to 3s

  Some introduction for first time readers:

  The idea of this patch set is to provide code for computing event proportions
where aging period is not dependent on the number of events happening (so
that aging works well both with fast storage and slow USB sticks in the same
system).

  The basic idea is that we compute proportions as:
p_j = (\Sum_{i>=0} x_{i,j}/2^{i+1}) / (\Sum_{i>=0} x_i/2^{i+1})

  Where x_{i,j} is j's number of events in i-th last time period and x_i is
total number of events in i-th last time period.
(Continue reading)

majianpeng | 24 May 15:38
Picon

the max size of block device on 32bit os,when using do_generic_file_read() proceed.

  Hi all:
		I readed a raid5,which size 30T.OS is RHEL6 32bit.
	    I reaed the raid5(as a whole,not parted) and found read address which not i wanted.
		So I tested the newest kernel code,the problem is still.
		I review the code, in function do_generic_file_read()

		index = *ppos >> PAGE_CACHE_SHIFT;
		index is u32.and *ppos is long long.
		So when *ppos is larger than 0xFFFF FFFF *  PAGE_CACHE_SHIFT(16T Byte),then the index is error.

		I wonder this .In 32bit os ,block devices size do not large then 16T,in other words, if block devices
larger than 16T,must parted.

																						Thanks all.
 				
--------------
majianpeng
2012-05-24

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Satoru Moriya | 23 May 22:41
Favicon

[PATCH RESEND] avoid swapping out with swappiness==0

Hi Andrew,

This patch has been reviewed for couple of months.

This patch *only* improves the behavior when the kernel has
enough filebacked pages. It means that it does not change
the behavior when kernel has small number of filebacked pages.

Kosaki-san pointed out that the threshold which we use
to decide whether filebacked page is enough or not is not
appropriate(*).

(*) http://www.spinics.net/lists/linux-mm/msg32380.html

As I described in (**), I believe that threshold discussion
should be done in other thread because it affects not only
swappiness=0 case and the kernel behave the same way with
or without this patch below the threshold.

(**) http://www.spinics.net/lists/linux-mm/msg34317.html

The patch may not be perfect but, at least, we can improve
the kernel behavior in the enough filebacked memory case
with this patch. I believe it's better than nothing.

Do you have any comments about it?

NOTE: I updated the patch with Acked-by tags

---
(Continue reading)

Christoph Lameter | 23 May 22:34
Picon

Common 00/22] Sl[auo]b: Common functionality V3

V2->V3:
- Incorporate more feedback from Joonsoo Kim and Glauber Costa
- And a couple more patches to deal with slab duping and move
  more code to slab_common.c

V1->V2:
- Incorporate glommers feedback.
- Add 2 more patches dealing with common code in kmem_cache_destroy

This is a series of patches that extracts common functionality from
slab allocators into a common code base. The intend is to standardize
as much as possible of the allocator behavior while keeping the
distinctive features of each allocator which are mostly due to their
storage format and serialization approaches.

This patchset makes a beginning by extracting common functionality in
kmem_cache_create() and kmem_cache_destroy(). However, there are
numerous other areas where such work could be beneficial:

1. Extract the sysfs support from SLUB and make it common. That way
   all allocators have a common sysfs API and are handleable in the same
   way regardless of the allocator chose.

2. Extract the error reporting and checking from SLUB and make
   it available for all allocators. This means that all allocators
   will gain the resiliency and error handling capabilties.

3. Extract the memory hotplug and cpu hotplug handling. It seems that
   SLAB may be more sophisticated here. Having common code here will
   make it easier to maintain the special code.
(Continue reading)

Nathan Zimmer | 23 May 15:28
Picon
Favicon

[PATCH] tmpfs not interleaving properly


When tmpfs has the memory policy interleaved it always starts allocating at each file at node 0.
When there are many small files the lower nodes fill up disproportionately.
My proposed solution is to start a file at a randomly chosen node.

Cc: Christoph Lameter <cl <at> linux.com>
Cc: Nick Piggin <npiggin <at> gmail.com>
Cc: Hugh Dickins <hughd <at> google.com>
Cc: Lee Schermerhorn <lee.schermerhorn <at> hp.com>
Cc: stable <at> vger.kernel.org
Signed-off-by: Nathan T Zimmer <nzimmer <at> sgi.com>

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 79ab255..38eda26 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -17,6 +17,7 @@ struct shmem_inode_info {
 		char		*symlink;	/* unswappable short symlink */
 	};
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
+	int			node_offset;	/* bias for interleaved nodes */
 	struct list_head	swaplist;	/* chain of maybes on swap */
 	struct list_head	xattr_list;	/* list of shmem_xattr */
 	struct inode		vfs_inode;
diff --git a/mm/shmem.c b/mm/shmem.c
index f99ff3e..58ef512 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -819,7 +819,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,

 	/* Create a pseudo vma that just contains the policy */
(Continue reading)

[PATCH 2/2] vmevent: pass right attribute to vmevent_sample_attr()

From: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Subject: [PATCH] vmevent: pass right attribute to vmevent_sample_attr()

Pass "config attribute" (&watch->config->attrs[i]) not "sample
attribute" (&watch->sample_attrs[i]) to vmevent_sample_attr() to
allow use of the original attribute value in vmevent_attr_sample_fn().

Cc: Anton Vorontsov <anton.vorontsov <at> linaro.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park <at> samsung.com>
---
Without this patch vmevent_attr_lowmem_pages() always returns 0.

 mm/vmevent.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: b/mm/vmevent.c
===================================================================
--- a/mm/vmevent.c	2012-05-22 17:55:02.000000000 +0200
+++ b/mm/vmevent.c	2012-05-22 18:10:40.075231798 +0200
@@ -31,6 +31,7 @@
 	 */
 	unsigned long			nr_attrs;
 	struct vmevent_attr		*sample_attrs;
+	struct vmevent_attr		*config_attrs[VMEVENT_CONFIG_MAX_ATTRS];

 	/* sampling */
 	struct timer_list		timer;
@@ -104,6 +105,7 @@
 	 */
(Continue reading)

[PATCH] vmevent: add arm support

From: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Subject: [PATCH] vmevent: add arm support

Tested on ARM EXYNOS4210 (Universal C210 board).

Cc: Anton Vorontsov <anton.vorontsov <at> linaro.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park <at> samsung.com>
---
 arch/arm/include/asm/unistd.h        |    1 +
 arch/arm/kernel/calls.S              |    1 +
 tools/testing/vmevent/vmevent-test.c |    3 +++
 3 files changed, 5 insertions(+)

Index: b/arch/arm/include/asm/unistd.h
===================================================================
--- a/arch/arm/include/asm/unistd.h	2012-05-22 15:17:15.590826904 +0200
+++ b/arch/arm/include/asm/unistd.h	2012-05-22 15:17:43.990826872 +0200
@@ -404,6 +404,7 @@
 #define __NR_setns			(__NR_SYSCALL_BASE+375)
 #define __NR_process_vm_readv		(__NR_SYSCALL_BASE+376)
 #define __NR_process_vm_writev		(__NR_SYSCALL_BASE+377)
+#define __NR_vmevent_fd		(__NR_SYSCALL_BASE+378)

 /*
  * The following SWIs are ARM private.
Index: b/arch/arm/kernel/calls.S
===================================================================
--- a/arch/arm/kernel/calls.S	2012-05-22 15:16:31.646826898 +0200
+++ b/arch/arm/kernel/calls.S	2012-05-22 15:17:02.850825441 +0200
(Continue reading)

[PATCH 1/2] vmevent: don't leak unitialized data to userspace

From: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Subject: [PATCH] vmevent: don't leak unitialized data to userspace

Remember to initialize all attrs[nr] fields in vmevent_setup_watch().

Cc: Anton Vorontsov <anton.vorontsov <at> linaro.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park <at> samsung.com>
---
 mm/vmevent.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: b/mm/vmevent.c
===================================================================
--- a/mm/vmevent.c	2012-05-22 17:51:13.195231958 +0200
+++ b/mm/vmevent.c	2012-05-22 17:51:40.991231956 +0200
@@ -350,7 +350,10 @@

 		attrs = new;

-		attrs[nr++].type = attr->type;
+		attrs[nr].type = attr->type;
+		attrs[nr].value = 0;
+		attrs[nr].state = 0;
+		nr++;
 	}

 	watch->sample_attrs	= attrs;

--
(Continue reading)

[PATCH] cma: cached pageblock type fixup

From: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Subject: [PATCH] cma: cached pageblock type fixup

CMA pages added to per-cpu pages lists in free_hot_cold_page()
have private field set to MIGRATE_CMA pageblock type .  If this
happes just before start_isolate_page_range() in alloc_contig_range()
changes pageblock type of the page to MIGRATE_ISOLATE it may result
in the cached pageblock type being stale in free_pcppages_bulk()
(which may be triggered by drain_all_pages() in alloc_contig_range()),
page being added to MIGRATE_CMA free list instead of MIGRATE_ISOLATE
one in __free_one_page() and (if the page is reused just before
test_pages_isolated() check) causing alloc_contig_range() failure.

Fix such situation by checking whether pageblock type of the page
changed to MIGRATE_ISOLATE for MIGRATE_CMA type pages in
free_pcppages_bulk() and if so fixup the pageblock type to
MIGRATE_ISOLATE (so the page will be added to MIGRATE_ISOLATE free
list in __free_one_page() and won't be used).

Similar situation can happen if rmqueue_bulk() sets cached pageblock
of the page to MIGRATE_CMA and start_isolate_page_range() is called
before buffered_rmqueue() completes (so the page may used by
get_page_from_freelist() and cause test_pages_isolated() check
failure in alloc_contig_range()).  Fix it in buffered_rmqueue() by
changing the pageblock type of the affected page if needed, freeing
page back to buddy allocator and retrying the allocation.

Please note that even with this patch applied some page allocation
vs alloc_contig_range() races are still possible and may result in
rare test_pages_isolated() failures.
(Continue reading)

[PATCH] cma: retry on test_pages_isolated() failure

From: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Subject: [PATCH] cma: retry on test_pages_isolated() failure

Retry (once) migration on test_pages_isolated() failure.

Cc: Michal Nazarewicz <mina86 <at> mina86.com>
Cc: Marek Szyprowski <m.szyprowski <at> samsung.com>
Cc: Mel Gorman <mgorman <at> suse.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie <at> samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park <at> samsung.com>
---
 mm/page_alloc.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2012-05-15 12:40:54.199127705 +0200
+++ b/mm/page_alloc.c	2012-05-15 12:41:25.335127686 +0200
@@ -5796,7 +5796,7 @@
 {
 	struct zone *zone = page_zone(pfn_to_page(start));
 	unsigned long outer_start, outer_end;
-	int ret = 0, order;
+	int ret = 0, order, retry = 0;

 	/*
 	 * What we do here is we mark all pageblocks in range as
@@ -5826,7 +5826,7 @@
 				       pfn_max_align_up(end), migratetype);
 	if (ret)
(Continue reading)


Gmane