kamezawa.hiroyu | 1 Jun 2008 02:35
Favicon

Re: Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy

----- Original Message -----
>kamezawa.hiroyu <at> jp.fujitsu.com wrote:
>> ----- Original Message -----
>> 
>>>> One more problem is that it's hard to implement various kinds of hierarch
y
>>>> policy. I believe there are other hierarhcy policies rather than OpenVZ
>>>> want to use. Kicking out functions to middleware AMAP is what I'm thinkin
g
>>>> now.
>>> One way to manage hierarchies other than via limits is to use shares (plea
se 
>> see
>>> the shares used by the cpu controller). Basically, what you've done with l
imi
>> ts
>>> is done with shares
>>>
>> Yes, I like _share_ rather than limits.
>> 
>>> If a parent has 100 shares, then it can decide how many to pass on to it's
  c
>> hildren
>>> based on the shares of the child and your logic would work well. I propose
>>> assigning top level (high resolution) shares to the root of the cgroup and
 in
>>  a
>>> hierarchy passing them down to children and sharing it with them. Based on
 th
>> e
(Continue reading)

YAMAMOTO Takashi | 2 Jun 2008 04:15
Picon
Favicon

Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy

>  <at>  <at>  -135,13 +138,118  <at>  <at>  ssize_t res_counter_write(struct res_cou
>  		if (*end != '\0')
>  			goto out_free;
>  	}
> -	spin_lock_irqsave(&counter->lock, flags);
> -	val = res_counter_member(counter, member);
> -	*val = tmp;
> -	spin_unlock_irqrestore(&counter->lock, flags);
> -	ret = nbytes;
> +	if (member != RES_LIMIT || !callback) {

is there any reason to check member != RES_LIMIT here,
rather than in callers?

> +/*
> + * Move resource to its parent.
> + *   child->limit -= val.
> + *   parent->usage -= val.
> + *   parent->limit -= val.

s/limit/for_children/

> + */
> +
> +int res_counter_repay_resource(struct res_counter *child,
> +				struct res_counter *parent,
> +				unsigned long long val,
> +				res_shrink_callback_t callback, int retry)

can you reduce gratuitous differences between
(Continue reading)

Andreas Dilger | 2 Jun 2008 05:16
Picon

Re: [patch 22/23] fs: check for statfs overflow

On May 30, 2008  03:14 +0200, Nick Piggin wrote:
> On Thu, May 29, 2008 at 05:56:07PM -0600, Andreas Dilger wrote:
> > On May 28, 2008  11:02 +0200, Nick Piggin wrote:
> > >  <at>  <at>  -197,8 +197,8  <at>  <at>  static int put_compat_statfs(struct comp
> > >  	if (sizeof ubuf->f_blocks == 4) {
> > > +		if ((kbuf->f_blocks | kbuf->f_bfree | kbuf->f_bavail |
> > > +		     kbuf->f_bsize | kbuf->f_frsize) & 0xffffffff00000000ULL)
> > >  			return -EOVERFLOW;
> > 
> > Hmm, doesn't this check break every filesystem > 16TB on 4kB PAGE_SIZE
> > nodes?  It would be better, IMHO, to scale down f_blocks, f_bfree, and
> > f_bavail and correspondingly scale up f_bsize to fit into the 32-bit
> > statfs structure.
> 
> Oh? Hmm, from my reading, such filesystems will already overflow f_blocks
> check which is already there. Jon's patch only adds checks for f_bsize
> and f_frsize.

Sorry, you are right - I meant that the whole f_blocks check is broken
for filesystems > 16TB.  Scaling f_bsize is easy, and prevents gratuitous
breakage of old applications for a few kB of accuracy.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
(Continue reading)

Balbir Singh | 2 Jun 2008 08:16
Picon

Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy

Hi, Kamezawa-san,

kamezawa.hiroyu <at> jp.fujitsu.com wrote:
> 
> It's not problem. We're not developing world-wide eco system.
> It's good that there are several development groups. It's a way to evolution.
> Something popular will be defacto standard. 
> What we have to do is providing proper interfaces for allowing fair race.
> 

I did not claim that we were developing an eco system either :)
My point is that we should not confuse *Linux* users. Lets do the common/useful
stuff in the kernel and make it easy for users to use the cgroup subsystem.

>>> Here is an example. (just an example...)
>>> Please point out if I'm misunderstanding "share".
>>>
>>> root_level/                   = limit 1G.
>>>           /child_A = share=30
>>>           /child_B = share=15
>>>           /child_C = share=5
>>> (and assume there is no process under root_level for make explanation easy.
> .)
>>> 0. At first, before starting to use memory, set all kernel_memory_limit.
>>> root_level.limit = 1G
>>>   child_A.limit=64M,usage=0
>>>   child_B.limit=64M,usage=0
>>>   child_C.limit=64M,usage=0
>>>   free_resource=808M 
>>>
(Continue reading)

kamezawa.hiroyu | 2 Jun 2008 11:48
Favicon

Re: Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy

>Why don't we add soft limits, so that we don't have to go to the kernel and
>change limits frequently. One missing piece in the memory controller is that 
we
>don't shrink the memory controller when limits change or when tasks move. I
>think soft limits is a better solution.
>
My code adds shirinking_at_limit_change. I'm now try to write migrate_resouces
_at_task_move. (But seems not so easy to be implemented in 
clean/fast way.)

I have no objection to soft-limit if it's easy to be implemented. (I wrote
my explanation was just an example and we could add more knobs.) 
_But_ I think that something to control multiple cgroups with regard to hierar
chy under some policy never be a simple one. Adding some knobs for each cgroup
s to do soft-limit will be simple one if no hirerachy.

Memory controller's  difference from scheduler's hirerachy is that we have to 
do multilevel page reclaim with feedback under some policy (not only one..). 
Even without hierarhcy, we _did_ make the kernel's LRU logic more complicated.
But we can get a help from the middleware here, I think.

My goal is never to make cgroup slow or complicated. If it's slow, 
I'd like to say "ok, please use VMware.It's simpler and enough fast for you." 
"How fast it works rather than Hardware-Virtualization" is the most
important for me. It should be much more faster.

>Thanks for patiently explaining all of this.
>
Thanks, I'm sorry for my poor explanation skill.

(Continue reading)

kamezawa.hiroyu | 2 Jun 2008 11:52
Favicon

Re: Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy

----- Original Message -----

>>  <at>  <at>  -135,13 +138,118  <at>  <at>  ssize_t res_counter_write(struct res_cou
>>  		if (*end != '\0')
>>  			goto out_free;
>>  	}
>> -	spin_lock_irqsave(&counter->lock, flags);
>> -	val = res_counter_member(counter, member);
>> -	*val = tmp;
>> -	spin_unlock_irqrestore(&counter->lock, flags);
>> -	ret = nbytes;
>> +	if (member != RES_LIMIT || !callback) {
>
>is there any reason to check member != RES_LIMIT here,
>rather than in callers?

Hmm...ok. This is messy. I'll rearrange this.

>
>> +/*
>> + * Move resource to its parent.
>> + *   child->limit -= val.
>> + *   parent->usage -= val.
>> + *   parent->limit -= val.
>
>s/limit/for_children/
>
>> + */
>> +
>> +int res_counter_repay_resource(struct res_counter *child,
(Continue reading)

Nick Piggin | 2 Jun 2008 12:15
Picon

Re: [patch 3/5] x86: lockless get_user_pages_fast

BTW. I do plan to ask Linus to merge this as soon as 2.6.27 opens.
Hope nobody objects (or if they do please speak up before then)

On Thu, May 29, 2008 at 12:20:59PM -0500, Dave Kleikamp wrote:
> On Thu, 2008-05-29 at 22:20 +1000, npiggin <at> suse.de wrote:
>  
> > +int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages)
Ingo Molnar | 2 Jun 2008 12:15
Picon
Picon
Favicon

Re: [PATCH] 2.6.26-rc: x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY


* Miquel van Smoorenburg <mikevs <at> xs4all.net> wrote:

> Okay, so how about this then ?
> 
> --- linux-2.6.26-rc4.orig/arch/x86/kernel/pci-dma.c	2008-05-26 20:08:11.000000000 +0200
> +++ linux-2.6.26-rc4/arch/x86/kernel/pci-dma.c	2008-05-28 10:27:41.000000000 +0200
>  <at>  <at>  -397,9 +397,6  <at>  <at> 
>  	if (dev->dma_mask == NULL)
>  		return NULL;
>  
> -	/* Don't invoke OOM killer */
> -	gfp |= __GFP_NORETRY;
> -
>  #ifdef CONFIG_X86_64
>  	/* Why <=? Even when the mask is smaller than 4GB it is often
>  	   larger than 16MB and in this case we have a chance of
>  <at>  <at>  -410,7 +407,9  <at>  <at> 
>  #endif
>  
>   again:
> -	page = dma_alloc_pages(dev, gfp, get_order(size));
> +	/* Don't invoke OOM killer or retry in lower 16MB DMA zone */
> +	page = dma_alloc_pages(dev,
> +		(gfp & GFP_DMA) ? gfp | __GFP_NORETRY : gfp, get_order(size));
>  	if (page == NULL)
>  		return NULL;

applied to tip/pci-for-jesse for more testing. Thanks,

(Continue reading)

Stephen Rothwell | 2 Jun 2008 13:28
Picon
Picon

Re: [patch 3/5] x86: lockless get_user_pages_fast

Hi Nick,

On Mon, 2 Jun 2008 12:15:30 +0200 Nick Piggin <npiggin <at> suse.de> wrote:
>
> BTW. I do plan to ask Linus to merge this as soon as 2.6.27 opens.
> Hope nobody objects (or if they do please speak up before then)

Any chance of getting this into linux-next then to see if it
conflicts with/kills anything else?

If this is posted/reviewed/tested enough to be "finished" then put it in
a tree (or quilt series) and submit it.

Thanks.
--

-- 
Cheers,
Stephen Rothwell                    sfr <at> canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
Yasunori Goto | 2 Jun 2008 14:34
Favicon

Re: [PATCH -mm 08/14] bootmem: clean up alloc_bootmem_core

Hello.

> +		/*
> +		 * Reserve the area now:
> +		 */
> +		for (i = PFN_DOWN(new_start) + merge; i < PFN_UP(new_end); i++)
> +			if (test_and_set_bit(i, bdata->node_bootmem_map))
> +				BUG();
> +
> +		region = phys_to_virt(bdata->node_boot_start + new_start);
> +		memset(region, 0, size);
> +		return region;

bdata->last_success doesn't seem to be updated in alloc_bootmem_core(),
it is updated in only __free().
Is it intended? If not, it should be updated, I suppose....

Bye.

--

-- 
Yasunori Goto 


Gmane