Mel Gorman | 23 May 2013 11:26
Picon

[PATCH 0/2] Reduce system disruption due to kswapd followup

Further testing of the "Reduce system disruption due to kswapd" discovered
a few problems.  First, as pages were not being swapped, the file LRU was
being scanned faster and clean file pages were being reclaimed resulting
in some cases in larger amounts of read IO to re-read data from disk.
Second, more pages were being written from kswapd context which can
adversly affect IO performance. Lastly, it was observed that PageDirty
pages are not necessarily dirty on all filesystems (buffers can be clean
while PageDirty is set and ->writepage generates no IO) and not all
filesystems set PageWriteback when the page is being written (e.g. ext3).
This disconnect confuses the reclaim stalling logic. This follow-up series
is aimed at these problems.

The tests were based on three kernels

vanilla:	kernel 3.9 as that is what the current mmotm uses as a baseline
mmotm-20130522	is mmotm as of 22nd May with "Reduce system disruption due to
		kswapd" applied on top as per what should be in Andrew's tree
		right now
lessdisrupt-v5r4 is this follow-up series on top of the mmotm kernel

The first test used memcached+memcachetest while some background IO
was in progress as implemented by the parallel IO tests implement in
MM Tests. memcachetest benchmarks how many operations/second memcached
can service. It starts with no background IO on a freshly created ext4
filesystem and then re-runs the test with larger amounts of IO in the
background to roughly simulate a large copy in progress. The expectation
is that the IO should have little or no impact on memcachetest which is
running entirely in memory.

                                             3.9.0                       3.9.0                       3.9.0
(Continue reading)

Wanpeng Li | 23 May 2013 10:42
Picon

[PATCH v2 1/4] mm/memory-hotplug: fix lowmem count overflow when offline pages

Changelog:
 v1 -> v2:
	* show number of HighTotal before hotremove 
	* remove CONFIG_HIGHMEM
	* cc stable kernels
	* add Michal reviewed-by

Logic memory-remove code fails to correctly account the Total High Memory 
when a memory block which contains High Memory is offlined as shown in the
example below. The following patch fixes it.

Stable for 2.6.24+.

Before logic memory remove:

MemTotal:        7603740 kB
MemFree:         6329612 kB
Buffers:           94352 kB
Cached:           872008 kB
SwapCached:            0 kB
Active:           626932 kB
Inactive:         519216 kB
Active(anon):     180776 kB
Inactive(anon):   222944 kB
Active(file):     446156 kB
Inactive(file):   296272 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       7294672 kB
HighFree:        5704696 kB
(Continue reading)

HATAYAMA Daisuke | 23 May 2013 07:24
Favicon

[PATCH v8 0/9] kdump, vmcore: support mmap() on /proc/vmcore

Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation due to repeated page
table changes, TLB flush and build-up of VM related objects.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance under sufficiently large mapping size.

In particular, the current main user of this mmap() is makedumpfile,
which not only reads memory from /proc/vmcore but also does other
processing like filtering, compression and I/O work.

Benchmark
=========

You can see two benchmarks on terabyte memory system. Both show about
40 seconds on 2TB system. This is almost equal to performance by
experimental kernel-side memory filtering.

- makedumpfile mmap() benchmark, by Jingbai Ma
  https://lkml.org/lkml/2013/3/27/19

- makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
  https://lkml.org/lkml/2013/3/26/914

ChangeLog
=========

v7 => v8)
(Continue reading)

akpm | 23 May 2013 01:42

mmotm 2013-05-22-16-40 uploaded

The mm-of-the-moment snapshot 2013-05-22-16-40 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (3.x
or 3.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE-yyyy-mm-dd-hh-mm-ss.  Both contain the string yyyy-mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
(Continue reading)

Ralf Baechle | 22 May 2013 12:18
Favicon

[PATCH] mm: Fix warning

virt_to_page() is typically implemented as a macro containing a cast so
it'll accept both pointers and unsigned long without causing a warning.
MIPS virt_to_page() uses virt_to_phys which is a function so passing an
unsigned long will cause a warning:

  CC      mm/page_alloc.o
/home/ralf/src/linux/linux-mips/mm/page_alloc.c: In function ‘free_reserved_area’:
/home/ralf/src/linux/linux-mips/mm/page_alloc.c:5161:3: warning: passing argument 1 of
‘virt_to_phys’ makes pointer from integer without a cast [enabled by default]
In file included from /home/ralf/src/linux/linux-mips/arch/mips/include/asm/page.h:153:0,
                 from /home/ralf/src/linux/linux-mips/include/linux/mmzone.h:20,
                 from /home/ralf/src/linux/linux-mips/include/linux/gfp.h:4,
                 from /home/ralf/src/linux/linux-mips/include/linux/mm.h:8,
                 from /home/ralf/src/linux/linux-mips/mm/page_alloc.c:18:
/home/ralf/src/linux/linux-mips/arch/mips/include/asm/io.h:119:100: note: expected ‘const
volatile void *’ but argument is of type ‘long unsigned int’

All others users of virt_to_page() in mm/ are passing a void *.

Signed-off-by: Ralf Baechle <ralf <at> linux-mips.org>
Reported-by: Eunbong Song <eunb.song <at> samsung.com>
Cc: Linus Torvalds <torvalds <at> linux-foundation.org>
Cc: linux-kernel <at> vger.kernel.org
Cc: linux-mm <at> kvack.org
Cc: linux-mips <at> linux-mips.org
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
(Continue reading)

Wanpeng Li | 22 May 2013 11:29
Picon

[PATCH 1/4] mm/memory-hotplug: fix lowmem count overflow when offline pages

Logic memory-remove code fails to correctly account the Total High Memory 
when a memory block which contains High Memory is offlined as shown in the
example below. The following patch fixes it.

cat /proc/meminfo 
MemTotal:        7079452 kB
MemFree:         5805976 kB
Buffers:           94372 kB
Cached:           872000 kB
SwapCached:            0 kB
Active:           626936 kB
Inactive:         519236 kB
Active(anon):     180780 kB
Inactive(anon):   222944 kB
Active(file):     446156 kB
Inactive(file):   296292 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       7294672 kB
HighFree:        5181024 kB
LowTotal:       4294752076 kB
LowFree:          624952 kB

Signed-off-by: Wanpeng Li <liwanp <at> linux.vnet.ibm.com>
---
 mm/page_alloc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 98cbdf6..80474b2 100644
(Continue reading)

HATAYAMA Daisuke | 22 May 2013 04:55
Favicon

[PATCH v7 0/8] kdump, vmcore: support mmap() on /proc/vmcore

Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation due to repeated page
table changes, TLB flush and build-up of VM related objects.

In particular, the current main user of this mmap() is makedumpfile,
which not only reads memory from /proc/vmcore but also does other
processing like filtering, compression and IO work.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance.

Benchmark
=========

You can see two benchmarks on terabyte memory system. Both show about
40 seconds on 2TB system. This is almost equal to performance by
experimental kernel-side memory filtering.

- makedumpfile mmap() benchmark, by Jingbai Ma
  https://lkml.org/lkml/2013/3/27/19

- makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
  https://lkml.org/lkml/2013/3/26/914

ChangeLog
=========

v6 => v7)
(Continue reading)

Toshi Kani | 21 May 2013 23:33
Picon
Favicon

[PATCH] mm: Change normal message to use pr_debug

During early boot-up, iomem_resource is set up from the boot
descriptor table, such as EFI Memory Table and e820.  Later,
acpi_memory_device_add() calls add_memory() for each ACPI
memory device object as it enumerates ACPI namespace.  This
add_memory() call is expected to fail in register_memory_resource()
at boot since iomem_resource has been set up from EFI/e820.
As a result, add_memory() returns -EEXIST, which
acpi_memory_device_add() handles as the normal case.

This scheme works fine, but the following error message is
logged for every ACPI memory device object during boot-up.

  "System RAM resource %pR cannot be added\n"

This patch changes register_memory_resource() to use pr_debug()
for the message as it shows up under the normal case.

Signed-off-by: Toshi Kani <toshi.kani <at> hp.com>
---
 mm/memory_hotplug.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5ea1287..90ebc91 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
 <at>  <at>  -75,7 +75,7  <at>  <at>  static struct resource *register_memory_resource(u64 start, u64 size)
 	res->end = start + size - 1;
 	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
 	if (request_resource(&iomem_resource, res) < 0) {
(Continue reading)

Oskar Andero | 21 May 2013 09:13
Favicon
Gravatar

[PATCH v2] mm: vmscan: add VM_BUG_ON on illegal return values from scan_objects

Add a VM_BUG_ON to catch any illegal value from the shrinkers. It's a
potential bug if scan_objects returns a negative other than -1 and
would lead to undefined behaviour.

Cc: Glauber Costa <glommer <at> openvz.org>
Cc: Dave Chinner <dchinner <at> redhat.com>
Cc: Andrew Morton <akpm <at> linux-foundation.org>
Cc: Hugh Dickins <hughd <at> google.com>
Cc: Greg Kroah-Hartman <gregkh <at> linuxfoundation.org>
Signed-off-by: Oskar Andero <oskar.andero <at> sonymobile.com>
---
 mm/vmscan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6bac41e..63fec86 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
 <at>  <at>  -293,6 +293,7  <at>  <at>  shrink_slab_one(struct shrinker *shrinker, struct shrink_control *shrinkctl,
 		ret = shrinker->scan_objects(shrinker, shrinkctl);
 		if (ret == -1)
 			break;
+		VM_BUG_ON(ret < -1);
 		freed += ret;

 		count_vm_events(SLABS_SCANNED, nr_to_scan);
--

-- 
1.8.1.5

--
(Continue reading)

Bob Liu | 21 May 2013 08:26
Picon

[RFC PATCH] zswap: add zswap shrinker

In my understanding, currenlty zswap have a few problems.
1. The zswap pool size is 20% of total memory that's too random and once it
gets full the performance may even worse because everytime pageout() an anon
page two disk-io write ops may happend instead of one.

2. The reclaim hook will only be triggered in frontswap_store().
It may be result that the zswap pool size can't be adjusted in time which may
caused 20% memory lose for other users.

This patch introduce a zswap shrinker, it make the balance that the zswap
pool size will be the same as anon pages in use.
It's more flexiable and the size of zswap pool can be dynamically changed
during different memory situation.

This patch was based on Seth's zswap v12. It's very draft and only compile
tested now.

Signed-off-by: Bob Liu <bob.liu <at> oracle.com>
---
 include/linux/zbud.h |    2 +-
 mm/zbud.c            |   17 ++++++++--
 mm/zswap.c           |   84 +++++++++++++++++++++++++++++++++++---------------
 3 files changed, 74 insertions(+), 29 deletions(-)

diff --git a/include/linux/zbud.h b/include/linux/zbud.h
index 2571a5c..afd2eb2 100644
--- a/include/linux/zbud.h
+++ b/include/linux/zbud.h
 <at>  <at>  -14,7 +14,7  <at>  <at>  void zbud_destroy_pool(struct zbud_pool *pool);
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
(Continue reading)

Rafael Aquini | 21 May 2013 02:04
Picon
Favicon

[RFC PATCH 00/02] swap: allowing a more flexible DISCARD policy

Howdy folks,

While working on a backport for the following changes:

3399446 swap: discard while swapping only if SWAP_FLAG_DISCARD
052b198 swap: don't do discard if no discard option added

We found ourselves around an interesting discussion on how limiting 
the behavior with regard to user-visible swap areas configuration has become
after applying the aforementioned changesets.

Before commit 3399446, if the swap backing device supported DISCARD,
then a batched discard was issued at swapon(8) time, and fine-grained DISCARDs
were issued in between freeing swap page-clusters and re-writing to them.
As noticed at 3399446's commit message, the fine-grained discards often
didn't help on improving performance as expected, and were potentially causing
more trouble than desired. So, commit 3399446 introduced a new swapon flag,
to make the fine-grained discards while swapping conditional.
However a batched discard would have been issued everytime swapon(8) was
turning a new swap area available.

This batched operation that remained at sys_swapon was considered troublesome
for some setups, and specially because a sysadmin was not flagging swapon(8)
to do discards -- http://www.spinics.net/lists/linux-mm/msg31741.html 
then, commit 052b198 got merged to address the scenario described above.
After this last commit, now we can either only do both batched and fine-grained
discards for swap, or none of them.

As depicted above, this seems to be not very flexible as it could be,
and the whole discussion we had (internally) left us wondering if does upstream
(Continue reading)


Gmane