Jan Kara | 1 Oct 01:27 2010
Picon

Re: block cache replacement strategy?

  Hi,

On Tue 07-09-10 15:34:29, Johannes Stezenbach wrote:
> during some simple disk read throughput testing I observed
> caching behaviour that doesn't seem right.  The machine
> has 2G of RAM and AMD Athlon 4850e, x86_64 kernel but 32bit
> userspace, Linux 2.6.35.4.  It seems that contents of the
> block cache are not evicted to make room for other blocks.
> (Or something like that, I have no real clue about this.)
> 
> Since this is a rather artificial test I'm not too worried,
> but it looks strange to me so I thought I better report it.
> 
> 
> zzz:~# echo 3 >/proc/sys/vm/drop_caches 
> zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 13.9454 s, 75.2 MB/s
> zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 0.92799 s, 1.1 GB/s
> 
> OK, seems like the blocks are cached. But:
> 
> zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 13.8375 s, 75.8 MB/s
(Continue reading)

Daisuke Nishimura | 1 Oct 03:07 2010
Picon
Picon

Re: [BUGFIX][PATCH v2] memcg: fix thresholds with use_hierarchy == 1

On Thu, 30 Sep 2010 13:16:32 +0300
"Kirill A. Shutsemov" <kirill <at> shutemov.name> wrote:

> From: Kirill A. Shutemov <kirill <at> shutemov.name>
> 
> We need to check parent's thresholds if parent has use_hierarchy == 1 to
> be sure that parent's threshold events will be triggered even if parent
> itself is not active (no MEM_CGROUP_EVENTS).
> 
> Signed-off-by: Kirill A. Shutemov <kirill <at> shutemov.name>

	Reviewed-by: Daisuke Nishimura <nishimura <at> mxp.nes.nec.co.jp>

Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

KAMEZAWA Hiroyuki | 1 Oct 05:38 2010

Re: [BUGFIX][PATCH v2] memcg: fix thresholds with use_hierarchy == 1

On Thu, 30 Sep 2010 13:16:32 +0300
"Kirill A. Shutsemov" <kirill <at> shutemov.name> wrote:

> From: Kirill A. Shutemov <kirill <at> shutemov.name>
> 
> We need to check parent's thresholds if parent has use_hierarchy == 1 to
> be sure that parent's threshold events will be triggered even if parent
> itself is not active (no MEM_CGROUP_EVENTS).
> 
> Signed-off-by: Kirill A. Shutemov <kirill <at> shutemov.name>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Michel Lespinasse | 1 Oct 07:04 2010
Picon

[PATCH 0/2] Reduce mmap_sem hold times during file backed page faults

Linus, I would appreciate your comments on this since you shot down the
previous proposal. I hope you'll find this approach is sane, but I would
be interested to hear if you have specific objections.

mmap_sem is very coarse grained (per process) and has long read-hold times
(disk latencies); this breaks down rapidly for workloads that use both
read and write mmap_sem acquires. This short patch series tries to reduce
mmap_sem hold times when faulting in file backed VMAs.

First patch creates a single place to lock the page in filemap_fault().
There should be no behavior differences.

Second patch modifies that lock_page() so that, if trylock_page() fails,
we consider releasing the mmap_sem while waiting for page to be unlocked.
This is controlled by a new FAULT_FLAG_RELEASE flag. If the mmap_sem gets
released, we return the VM_FAULT_RELEASED status; the caller is then expected
to re-acquire mmap_sem and retry the page fault. Chances are that the same
page will be accessed and will now be unlocked, so the mmap_sem hold time
will be short.

Michel Lespinasse (2):
  Unique path for locking page in filemap_fault()
  Release mmap_sem when page fault blocks on disk transfer.

 arch/x86/mm/fault.c |   35 ++++++++++++++++++++++++++---------
 include/linux/mm.h  |    2 ++
 mm/filemap.c        |   38 +++++++++++++++++++++++++++++---------
 mm/memory.c         |    3 ++-
 4 files changed, 59 insertions(+), 19 deletions(-)

(Continue reading)

Michel Lespinasse | 1 Oct 07:04 2010
Picon

[PATCH 2/2] Release mmap_sem when page fault blocks on disk transfer.

This change reduces mmap_sem hold times that are caused by waiting for
disk transfers when accessing file mapped VMAs. It introduces the
VM_FAULT_RELEASED flag, which indicates that the call site holds mmap_lock
and wishes for it to be released if blocking on a pending disk transfer.
In that case, filemap_fault() returns the VM_FAULT_RELEASED status bit
and do_page_fault() will then re-acquire mmap_sem and retry the page fault.
It is expected that the retry will hit the same page which will now be cached,
and thus it will complete with a low mmap_sem hold time.

Signed-off-by: Michel Lespinasse <walken <at> google.com>
---
 arch/x86/mm/fault.c |   35 ++++++++++++++++++++++++++---------
 include/linux/mm.h  |    2 ++
 mm/filemap.c        |   20 +++++++++++++++++++-
 mm/memory.c         |    3 ++-
 4 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 4c4508e..58109ba 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
 <at>  <at>  -954,6 +954,7  <at>  <at>  do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	struct mm_struct *mm;
 	int write;
 	int fault;
+	unsigned int release_flag = FAULT_FLAG_RELEASE;

 	tsk = current;
 	mm = tsk->mm;
 <at>  <at>  -1064,6 +1065,7  <at>  <at>  do_page_fault(struct pt_regs *regs, unsigned long error_code)
(Continue reading)

Michel Lespinasse | 1 Oct 07:04 2010
Picon

[PATCH 1/2] Unique path for locking page in filemap_fault()

Note that if the file got truncated before we locked the page, we retry
at find_get_page(). This is similar to what find_lock_page() would do.
Linus's commit ef00e08e introduced a goto no_cached_page in that simuation,
which seems correct as well but possibly less efficient ?

----------------------- >8 ------------------------

This change introduces a single location where filemap_fault() locks
the desired page. There used to be two such places, depending if the
initial find_get_page() was successful or not.

Signed-off-by: Michel Lespinasse <walken <at> google.com>
---
 mm/filemap.c |   20 +++++++++++---------
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 3d4df44..8ed709a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
 <at>  <at>  -1539,25 +1539,27  <at>  <at>  int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		 * waiting for the lock.
 		 */
 		do_async_mmap_readahead(vma, ra, file, page, offset);
-		lock_page(page);
-
-		/* Did it get truncated? */
-		if (unlikely(page->mapping != mapping)) {
-			unlock_page(page);
-			put_page(page);
(Continue reading)

Balbir Singh | 1 Oct 11:22 2010
Picon

Re: [BUGFIX][PATCH v2] memcg: fix thresholds with use_hierarchy == 1

* Kirill A. Shutsemov <kirill <at> shutemov.name> [2010-09-30 13:16:32]:

> From: Kirill A. Shutemov <kirill <at> shutemov.name>
> 
> We need to check parent's thresholds if parent has use_hierarchy == 1 to
> be sure that parent's threshold events will be triggered even if parent
> itself is not active (no MEM_CGROUP_EVENTS).
> 
> Signed-off-by: Kirill A. Shutemov <kirill <at> shutemov.name>
> ---
>  mm/memcontrol.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3eed583..df40eaf 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
>  <at>  <at>  -3587,9 +3587,13  <at>  <at>  unlock:
> 
>  static void mem_cgroup_threshold(struct mem_cgroup *memcg)
>  {
> -	__mem_cgroup_threshold(memcg, false);
> -	if (do_swap_account)
> -		__mem_cgroup_threshold(memcg, true);
> +	while (memcg) {
> +		__mem_cgroup_threshold(memcg, false);
> +		if (do_swap_account)
> +			__mem_cgroup_threshold(memcg, true);
> +
> +		memcg =  parent_mem_cgroup(memcg);
(Continue reading)

Steven J. Magnani | 1 Oct 12:35 2010

[PATCH][RESEND] nommu: add anonymous page memcg accounting

Add the necessary calls to track VM anonymous page usage.

Signed-off-by: Steven J. Magnani <steve <at> digidescorp.com>
---
diff -uprN a/mm/nommu.c b/mm/nommu.c
--- a/mm/nommu.c	2010-09-02 19:47:43.000000000 -0500
+++ b/mm/nommu.c	2010-09-02 20:07:02.000000000 -0500
 <at>  <at>  -524,8 +524,10  <at>  <at>  static void delete_nommu_region(struct v
 /*
  * free a contiguous series of pages
  */
-static void free_page_series(unsigned long from, unsigned long to)
+static void free_page_series(unsigned long from, unsigned long to,
+			     const struct file *file)
 {
+	mem_cgroup_uncharge_start();
 	for (; from < to; from += PAGE_SIZE) {
 		struct page *page = virt_to_page(from);

 <at>  <at>  -534,8 +536,12  <at>  <at>  static void free_page_series(unsigned lo
 		if (page_count(page) != 1)
 			kdebug("free page %p: refcount not one: %d",
 			       page, page_count(page));
+		if (!file)
+			mem_cgroup_uncharge_page(page);
+
 		put_page(page);
 	}
+	mem_cgroup_uncharge_end();
 }
(Continue reading)

Rik van Riel | 1 Oct 14:07 2010
Picon

Re: [PATCH 0/2] Reduce mmap_sem hold times during file backed page faults

On 10/01/2010 01:04 AM, Michel Lespinasse wrote:
> Linus, I would appreciate your comments on this since you shot down the
> previous proposal. I hope you'll find this approach is sane, but I would
> be interested to hear if you have specific objections.
>
> mmap_sem is very coarse grained (per process) and has long read-hold times
> (disk latencies); this breaks down rapidly for workloads that use both
> read and write mmap_sem acquires. This short patch series tries to reduce
> mmap_sem hold times when faulting in file backed VMAs.

The changes make sense to me, but it would be good to know
what kind of benefits you have seen with these patches.

Especially performance numbers :)

--

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Johannes Stezenbach | 1 Oct 15:05 2010
Picon

Re: block cache replacement strategy?

Hi,

On Fri, Oct 01, 2010 at 01:27:59AM +0200, Jan Kara wrote:
> On Tue 07-09-10 15:34:29, Johannes Stezenbach wrote:
> > 
> > zzz:~# echo 3 >/proc/sys/vm/drop_caches 
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.9454 s, 75.2 MB/s
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 0.92799 s, 1.1 GB/s
> > 
> > OK, seems like the blocks are cached. But:
> > 
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.8375 s, 75.8 MB/s
> > zzz:~# dd if=/dev/sda2 of=/dev/null bs=1M count=1000 skip=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 13.8429 s, 75.7 MB/s
>   I took a look at this because it looked strange at the first sight to me.
> After some code reading the result is that everything is working as
> designed.
>   The first dd fills up memory with 1GB of data. Pages with data just freshly
> read from disk are in "Inactive" state. When these pages are read again by
(Continue reading)


Gmane