Paul Menage | 1 Dec 08:39 2007
Picon

Re: What can we do to get ready for memory controller merge in 2.6.25

On Nov 29, 2007 6:11 PM, Nick Piggin <nickpiggin <at> yahoo.com.au> wrote:
> And also some
> results or even anecdotes of where this is going to be used would be
> interesting...

We want to be able to run multiple isolated jobs on the same machine.
So being able to limit how much memory each job can consume, in terms
of anonymous memory and page cache, are useful. I've not had much time
to look at the patches in great detail, but they seem to provide a
sensible way to assign and enforce static limits on a bunch of jobs.

Some of our requirements are a bit beyond this, though:

In our experience, users are not good at figuring out how much memory
they really need. In general they tend to massively over-estimate
their requirements. So we want some way to determine how much of its
allocated memory a job is actively using, and how much could be thrown
away or swapped out without bothering the job too much.

Of course, the definition of "actve use" is tricky - one possibility
that we're looking at is "has been accessed within the last N
seconds", where N can be configured appropriately for different jobs
depending on the job's latency requirements. Active use should also be
reported for pages that can't be easily freed quickly, e.g. mlocked or
dirty pages, or anon pages on a swapless system. Inactive pages should
be easily freeable, and be the first ones to go in the event of memory
pressure. (From a scheduling point of view we can treat them as free
memory, and schedule more jobs on the machine)

The existing active/inactive distinction doesn't really capture this,
(Continue reading)

Balbir Singh | 1 Dec 10:50 2007
Picon

Re: What can we do to get ready for memory controller merge in 2.6.25

Paul Menage wrote:
> On Nov 29, 2007 6:11 PM, Nick Piggin <nickpiggin <at> yahoo.com.au> wrote:
>> And also some
>> results or even anecdotes of where this is going to be used would be
>> interesting...
> 
> We want to be able to run multiple isolated jobs on the same machine.
> So being able to limit how much memory each job can consume, in terms
> of anonymous memory and page cache, are useful. I've not had much time
> to look at the patches in great detail, but they seem to provide a
> sensible way to assign and enforce static limits on a bunch of jobs.
> 
> Some of our requirements are a bit beyond this, though:
> 
> In our experience, users are not good at figuring out how much memory
> they really need. In general they tend to massively over-estimate
> their requirements. So we want some way to determine how much of its
> allocated memory a job is actively using, and how much could be thrown
> away or swapped out without bothering the job too much.
> 

One would prefer the kernel provides the mechanism and user space
provides the policy. The algorithms to assign limits can exist in user
space and be supported by a good set of statistics.

> Of course, the definition of "actve use" is tricky - one possibility
> that we're looking at is "has been accessed within the last N
> seconds", where N can be configured appropriately for different jobs
> depending on the job's latency requirements. Active use should also be
> reported for pages that can't be easily freed quickly, e.g. mlocked or
(Continue reading)

Rik van Riel | 1 Dec 19:36 2007
Picon

Re: What can we do to get ready for memory controller merge in 2.6.25

On Sat, 01 Dec 2007 15:20:29 +0530
Balbir Singh <balbir <at> linux.vnet.ibm.com> wrote:

> > In our experience, users are not good at figuring out how much memory
> > they really need. In general they tend to massively over-estimate
> > their requirements. So we want some way to determine how much of its
> > allocated memory a job is actively using, and how much could be thrown
> > away or swapped out without bothering the job too much.
> 
> One would prefer the kernel provides the mechanism and user space
> provides the policy. The algorithms to assign limits can exist in user
> space and be supported by a good set of statistics.

With the /proc/refaults info, we can measure how much extra
memory each process group needs, if any.

As for how much memory a process group needs, at pageout time
we can check the fraction of pages that are accessed.  If 60%
of the pages were recently accessed at pageout time and this
process group is spending little or no time waiting for refaults,
40% of the pages are *not* recently accessed and we can probably
reduce the amount of memory assigned to this group.

Page cache that has only been accessed once can also be
counted as "not recently accessed", since streaming file
IO should not increase the working set of the process group.

--

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
(Continue reading)

Paul Menage | 1 Dec 20:02 2007
Picon

Re: What can we do to get ready for memory controller merge in 2.6.25

On Dec 1, 2007 10:36 AM, Rik van Riel <riel <at> redhat.com> wrote:
>
> With the /proc/refaults info, we can measure how much extra
> memory each process group needs, if any.

What's the status of that? It looks as though it would be better than
the "accessed in the last N seconds" metric that we've been playing
with, although it's possibly more intrusive?

Would it be practical to keep a non-resident set for each cgroup?

>
> As for how much memory a process group needs, at pageout time
> we can check the fraction of pages that are accessed.  If 60%
> of the pages were recently accessed at pageout time and this
> process group is spending little or no time waiting for refaults,
> 40% of the pages are *not* recently accessed and we can probably
> reduce the amount of memory assigned to this group.

It would probably be better to reduce its background-reclaim high
watermark than to reduce its limit. If you do the latter, you risk
triggering an OOM in the cgroup if it turns out that it did need all
that memory after all.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
(Continue reading)

Rik van Riel | 1 Dec 20:26 2007
Picon

Re: What can we do to get ready for memory controller merge in 2.6.25

On Sat, 1 Dec 2007 11:02:32 -0800
"Paul Menage" <menage <at> google.com> wrote:

> On Dec 1, 2007 10:36 AM, Rik van Riel <riel <at> redhat.com> wrote:
> >
> > With the /proc/refaults info, we can measure how much extra
> > memory each process group needs, if any.
> 
> What's the status of that? It looks as though it would be better than
> the "accessed in the last N seconds" metric that we've been playing
> with, although it's possibly more intrusive?
> 
> Would it be practical to keep a non-resident set for each cgroup?

I have an implementation with a global array, but will have to
change it over to a per-radix tree implementation (not that
hard, with the slab reclaiming code) and per-cgroup reclaiming
information.

That way we can figure out per mapping, per cgroup or system
wide reclaim info (though not all at the same time).

> > As for how much memory a process group needs, at pageout time
> > we can check the fraction of pages that are accessed.  If 60%
> > of the pages were recently accessed at pageout time and this
> > process group is spending little or no time waiting for refaults,
> > 40% of the pages are *not* recently accessed and we can probably
> > reduce the amount of memory assigned to this group.
> 
> It would probably be better to reduce its background-reclaim high
(Continue reading)

Daniel Phillips | 1 Dec 23:39 2007
Picon

Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6

Hmm, I wonder if this had something to do with it:

> [   25.856573] VFS: Disk quotas dquot_6.5.1

Was the system still pingable?

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Tetsuo Handa | 2 Dec 11:39 2007
Picon

[BUG 2.6.24-rc3-git6] SLUB's ksize() fails for size > 2048.

Hello.

I can't pass memory allocated by kmalloc() to ksize()
if it is allocated by SLUB allocator and
size is larger than (I guess) PAGE_SIZE / 2.

Regards.

---------- Kernel config (grep CONFIG_SLUB .config) ----------
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
# CONFIG_SLUB_DEBUG_ON is not set

---------- Testing program (ksize_test.c) ----------
#include <linux/module.h>
#include <linux/slab.h>

static int __init init_ksize_test(void)
{
        void *p = kmalloc(2049, GFP_KERNEL);
        printk("ksize(%p) = %d\n", p, ksize(p));
        kfree(p);
        return -ENOMEM;
}

module_init(init_ksize_test)

MODULE_LICENSE("GPL");

------------[ cut here ]------------
(Continue reading)

Denis Cheng | 2 Dec 12:59 2007
Picon

[PATCH] mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init

this call should use the array index j, not i:

	--- mm/backing-dev.c.orig	2007-12-02 19:42:57.000000000 +0800
	+++ mm/backing-dev.c	2007-12-02 19:43:14.000000000 +0800
	 <at>  <at>  -22,7 +22,7  <at>  <at>  int bdi_init(struct backing_dev_info *bd
		if (err) {
	 err:
			for (j = 0; j < i; j++)
	-			percpu_counter_destroy(&bdi->bdi_stat[i]);
	+			percpu_counter_destroy(&bdi->bdi_stat[j]);
		}

		return err;

but with this approach, just one int i is enough, int j is not needed.

Signed-off-by: Denis Cheng <crquan <at> gmail.com>
---
 mm/backing-dev.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index b0ceb29..e8644b1 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
 <at>  <at>  -7,7 +7,7  <at>  <at> 

 int bdi_init(struct backing_dev_info *bdi)
 {
-	int i, j;
(Continue reading)

Mark Lord | 2 Dec 16:56 2007
Picon

Re: [BUG 2.6.24-rc3-git6] SLUB's ksize() fails for size > 2048.

Tetsuo Handa wrote:
> Hello.
> 
> I can't pass memory allocated by kmalloc() to ksize()
> if it is allocated by SLUB allocator and
> size is larger than (I guess) PAGE_SIZE / 2.
> 
> Regards.
> 
> ---------- Kernel config (grep CONFIG_SLUB .config) ----------
> CONFIG_SLUB_DEBUG=y
> CONFIG_SLUB=y
> # CONFIG_SLUB_DEBUG_ON is not set
> 
> ---------- Testing program (ksize_test.c) ----------
> #include <linux/module.h>
> #include <linux/slab.h>
> 
> static int __init init_ksize_test(void)
> {
>         void *p = kmalloc(2049, GFP_KERNEL);
>         printk("ksize(%p) = %d\n", p, ksize(p));
>         kfree(p);
>         return -ENOMEM;
> }
> 
> module_init(init_ksize_test)
> 
> MODULE_LICENSE("GPL");
> 
(Continue reading)

Mark Lord | 2 Dec 17:03 2007
Picon

Re: [BUG 2.6.24-rc3-git6] SLUB's ksize() fails for size > 2048.

Mark Lord wrote:
> Tetsuo Handa wrote:
>> Hello.
>>
>> I can't pass memory allocated by kmalloc() to ksize()
>> if it is allocated by SLUB allocator and
>> size is larger than (I guess) PAGE_SIZE / 2.
>>
>> Regards.
>>
>> ---------- Kernel config (grep CONFIG_SLUB .config) ----------
>> CONFIG_SLUB_DEBUG=y
>> CONFIG_SLUB=y
>> # CONFIG_SLUB_DEBUG_ON is not set
>>
>> ---------- Testing program (ksize_test.c) ----------
>> #include <linux/module.h>
>> #include <linux/slab.h>
>>
>> static int __init init_ksize_test(void)
>> {
>>         void *p = kmalloc(2049, GFP_KERNEL);
>>         printk("ksize(%p) = %d\n", p, ksize(p));
>>         kfree(p);
>>         return -ENOMEM;
>> }
>>
>> module_init(init_ksize_test)
>>
>> MODULE_LICENSE("GPL");
(Continue reading)


Gmane