Balbir Singh | 1 Jun 2009 01:51
Picon

Re: [RFC] Low overhead patches for the memory cgroup controller (v2)

* KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com> [2009-05-18 19:11:07]:

> On Fri, 15 May 2009 23:46:39 +0530
> Balbir Singh <balbir <at> linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > Balbir Singh wrote:
> > > > Feature: Remove the overhead associated with the root cgroup
> > > >
> > > > From: Balbir Singh <balbir <at> linux.vnet.ibm.com>
> > > >
> > > > This patch changes the memory cgroup and removes the overhead associated
> > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > > can
> > > > no longer set a memory hard limit in the root cgroup.
> > > >
> > > > A new flag is used to track page_cgroup associated with the root cgroup
> > > > pages. A new flag to track whether the page has been accounted or not
> > > > has been added as well.
> > > >
> > > > Review comments higly appreciated
> > > >
> > > > Tests
> > > >
> > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > > 2. For the root cgroup tested performance impact with reaim
> > > >
> > > >
> > > > 		+patch		mmtom-08-may-2009
(Continue reading)

Hisashi Hifumi | 1 Jun 2009 03:39
Picon

Re: [PATCH] readahead:add blk_run_backing_dev


At 11:23 09/05/28, KOSAKI Motohiro wrote:
>> Hi Andrew.
>> Please merge following patch.
>> Thanks.
>> 
>> ---
>> 
>> I added blk_run_backing_dev on page_cache_async_readahead
>> so readahead I/O is unpluged to improve throughput on 
>> especially RAID environment. 
>> 
>> Following is the test result with dd.
>> 
>> #dd if=testdir/testfile of=/dev/null bs=16384
>> 
>> -2.6.30-rc6
>> 1048576+0 records in
>> 1048576+0 records out
>> 17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
>> 
>> -2.6.30-rc6-patched
>> 1048576+0 records in
>> 1048576+0 records out
>> 17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
>> 
>> My testing environment is as follows:
>> Hardware: HP DL580 
>> CPU:Xeon 3.2GHz *4 HT enabled
>> Memory:8GB
(Continue reading)

KOSAKI Motohiro | 1 Jun 2009 04:23
Favicon

Re: [PATCH] readahead:add blk_run_backing_dev

> 
> At 11:23 09/05/28, KOSAKI Motohiro wrote:
> >> Hi Andrew.
> >> Please merge following patch.
> >> Thanks.
> >> 
> >> ---
> >> 
> >> I added blk_run_backing_dev on page_cache_async_readahead
> >> so readahead I/O is unpluged to improve throughput on 
> >> especially RAID environment. 
> >> 
> >> Following is the test result with dd.
> >> 
> >> #dd if=testdir/testfile of=/dev/null bs=16384
> >> 
> >> -2.6.30-rc6
> >> 1048576+0 records in
> >> 1048576+0 records out
> >> 17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
> >> 
> >> -2.6.30-rc6-patched
> >> 1048576+0 records in
> >> 1048576+0 records out
> >> 17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
> >> 
> >> My testing environment is as follows:
> >> Hardware: HP DL580 
> >> CPU:Xeon 3.2GHz *4 HT enabled
> >> Memory:8GB
(Continue reading)

Wu Fengguang | 1 Jun 2009 04:37
Picon
Favicon

Re: [PATCH] readahead:add blk_run_backing_dev

On Wed, May 27, 2009 at 11:06:37AM +0800, Hisashi Hifumi wrote:
> 
> At 11:57 09/05/27, Wu Fengguang wrote:
> >On Wed, May 27, 2009 at 10:47:47AM +0800, Hisashi Hifumi wrote:
> >> 
> >> At 11:36 09/05/27, Wu Fengguang wrote:
> >> >On Wed, May 27, 2009 at 10:21:53AM +0800, Hisashi Hifumi wrote:
> >> >>
> >> >> At 11:09 09/05/27, Wu Fengguang wrote:
> >> >> >On Wed, May 27, 2009 at 08:25:04AM +0800, Hisashi Hifumi wrote:
> >> >> >>
> >> >> >> At 08:42 09/05/27, Andrew Morton wrote:
> >> >> >> >On Fri, 22 May 2009 10:33:23 +0800
> >> >> >> >Wu Fengguang <fengguang.wu <at> intel.com> wrote:
> >> >> >> >
> >> >> >> >> > I tested above patch, and I got same performance number.
> >> >> >> >> > I wonder why if (PageUptodate(page)) check is there...
> >> >> >> >>
> >> >> >> >> Thanks!  This is an interesting micro timing behavior that
> >> >> >> >> demands some research work.  The above check is to confirm if it's
> >> >> >> >> the PageUptodate() case that makes the difference. So why that case
> >> >> >> >> happens so frequently so as to impact the performance? Will it also
> >> >> >> >> happen in NFS?
> >> >> >> >>
> >> >> >> >> The problem is readahead IO pipeline is not running smoothly, which is
> >> >> >> >> undesirable and not well understood for now.
> >> >> >> >
> >> >> >> >The patch causes a remarkably large performance increase.  A 9%
> >> >> >> >reduction in time for a linear read? I'd be surprised if the workload
> >> >> >>
(Continue reading)

Hisashi Hifumi | 1 Jun 2009 04:51
Picon

Re: [PATCH] readahead:add blk_run_backing_dev


At 11:37 09/06/01, Wu Fengguang wrote:
>On Wed, May 27, 2009 at 11:06:37AM +0800, Hisashi Hifumi wrote:
>> 
>> At 11:57 09/05/27, Wu Fengguang wrote:
>> >On Wed, May 27, 2009 at 10:47:47AM +0800, Hisashi Hifumi wrote:
>> >> 
>> >> At 11:36 09/05/27, Wu Fengguang wrote:
>> >> >On Wed, May 27, 2009 at 10:21:53AM +0800, Hisashi Hifumi wrote:
>> >> >>
>> >> >> At 11:09 09/05/27, Wu Fengguang wrote:
>> >> >> >On Wed, May 27, 2009 at 08:25:04AM +0800, Hisashi Hifumi wrote:
>> >> >> >>
>> >> >> >> At 08:42 09/05/27, Andrew Morton wrote:
>> >> >> >> >On Fri, 22 May 2009 10:33:23 +0800
>> >> >> >> >Wu Fengguang <fengguang.wu <at> intel.com> wrote:
>> >> >> >> >
>> >> >> >> >> > I tested above patch, and I got same performance number.
>> >> >> >> >> > I wonder why if (PageUptodate(page)) check is there...
>> >> >> >> >>
>> >> >> >> >> Thanks!  This is an interesting micro timing behavior that
>> >> >> >> >> demands some research work.  The above check is to confirm if it's
>> >> >> >> >> the PageUptodate() case that makes the difference. So why that case
>> >> >> >> >> happens so frequently so as to impact the performance? Will it also
>> >> >> >> >> happen in NFS?
>> >> >> >> >>
>> >> >> >> >> The problem is readahead IO pipeline is not running smoothly, 
>which is
>> >> >> >> >> undesirable and not well understood for now.
>> >> >> >> >
(Continue reading)

Wu Fengguang | 1 Jun 2009 05:02
Picon
Favicon

Re: [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 01, 2009 at 10:51:56AM +0800, Hisashi Hifumi wrote:
> 
> At 11:37 09/06/01, Wu Fengguang wrote:
> >On Wed, May 27, 2009 at 11:06:37AM +0800, Hisashi Hifumi wrote:
> >> 
> >> At 11:57 09/05/27, Wu Fengguang wrote:
> >> >On Wed, May 27, 2009 at 10:47:47AM +0800, Hisashi Hifumi wrote:
> >> >> 
> >> >> At 11:36 09/05/27, Wu Fengguang wrote:
> >> >> >On Wed, May 27, 2009 at 10:21:53AM +0800, Hisashi Hifumi wrote:
> >> >> >>
> >> >> >> At 11:09 09/05/27, Wu Fengguang wrote:
> >> >> >> >On Wed, May 27, 2009 at 08:25:04AM +0800, Hisashi Hifumi wrote:
> >> >> >> >>
> >> >> >> >> At 08:42 09/05/27, Andrew Morton wrote:
> >> >> >> >> >On Fri, 22 May 2009 10:33:23 +0800
> >> >> >> >> >Wu Fengguang <fengguang.wu <at> intel.com> wrote:
> >> >> >> >> >
> >> >> >> >> >> > I tested above patch, and I got same performance number.
> >> >> >> >> >> > I wonder why if (PageUptodate(page)) check is there...
> >> >> >> >> >>
> >> >> >> >> >> Thanks!  This is an interesting micro timing behavior that
> >> >> >> >> >> demands some research work.  The above check is to confirm if it's
> >> >> >> >> >> the PageUptodate() case that makes the difference. So why that case
> >> >> >> >> >> happens so frequently so as to impact the performance? Will it also
> >> >> >> >> >> happen in NFS?
> >> >> >> >> >>
> >> >> >> >> >> The problem is readahead IO pipeline is not running smoothly, 
> >which is
> >> >> >> >> >> undesirable and not well understood for now.
(Continue reading)

KOSAKI Motohiro | 1 Jun 2009 05:06
Favicon

Re: [PATCH] readahead:add blk_run_backing_dev

> > >> >I mean, you should get >300MB/s throughput with 7 disks, and you
> > >> >should seek ways to achieve that before testing out this patch :-)
> > >> 
> > >> Throughput number of storage array is very from one product to another.
> > >> On my hardware environment I think this number is valid and
> > >> my patch is effective.
> > >
> > >What's your readahead size? Is it large enough to cover the stripe width?
> > 
> > Do you mean strage's readahead size?
> 
> What's strage? I mean if your RAID's block device file is /dev/sda, then

I guess it's typo :-)
but I recommend he use sane test environment...

> 
>         blockdev --getra /dev/sda
> 
> will tell its readahead size in unit of 512 bytes.
> 
> Thanks,
> Fengguang
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Hisashi Hifumi | 1 Jun 2009 05:07
Picon

Re: [PATCH] readahead:add blk_run_backing_dev


At 12:02 09/06/01, Wu Fengguang wrote:
>On Mon, Jun 01, 2009 at 10:51:56AM +0800, Hisashi Hifumi wrote:
>> 
>> At 11:37 09/06/01, Wu Fengguang wrote:
>> >On Wed, May 27, 2009 at 11:06:37AM +0800, Hisashi Hifumi wrote:
>> >> 
>> >> At 11:57 09/05/27, Wu Fengguang wrote:
>> >> >On Wed, May 27, 2009 at 10:47:47AM +0800, Hisashi Hifumi wrote:
>> >> >> 
>> >> >> At 11:36 09/05/27, Wu Fengguang wrote:
>> >> >> >On Wed, May 27, 2009 at 10:21:53AM +0800, Hisashi Hifumi wrote:
>> >> >> >>
>> >> >> >> At 11:09 09/05/27, Wu Fengguang wrote:
>> >> >> >> >On Wed, May 27, 2009 at 08:25:04AM +0800, Hisashi Hifumi wrote:
>> >> >> >> >>
>> >> >> >> >> At 08:42 09/05/27, Andrew Morton wrote:
>> >> >> >> >> >On Fri, 22 May 2009 10:33:23 +0800
>> >> >> >> >> >Wu Fengguang <fengguang.wu <at> intel.com> wrote:
>> >> >> >> >> >
>> >> >> >> >> >> > I tested above patch, and I got same performance number.
>> >> >> >> >> >> > I wonder why if (PageUptodate(page)) check is there...
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks!  This is an interesting micro timing behavior that
>> >> >> >> >> >> demands some research work.  The above check is to confirm 
>if it's
>> >> >> >> >> >> the PageUptodate() case that makes the difference. So why 
>that case
>> >> >> >> >> >> happens so frequently so as to impact the performance? 
>Will it also
(Continue reading)

Wu Fengguang | 1 Jun 2009 06:30
Picon
Favicon

Re: [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 01, 2009 at 11:07:42AM +0800, Hisashi Hifumi wrote:
> 
> At 12:02 09/06/01, Wu Fengguang wrote:
> >On Mon, Jun 01, 2009 at 10:51:56AM +0800, Hisashi Hifumi wrote:
> >> 
> >> At 11:37 09/06/01, Wu Fengguang wrote:
> >> >On Wed, May 27, 2009 at 11:06:37AM +0800, Hisashi Hifumi wrote:
> >> >> 
> >> >> At 11:57 09/05/27, Wu Fengguang wrote:
> >> >> >On Wed, May 27, 2009 at 10:47:47AM +0800, Hisashi Hifumi wrote:
> >> >> >> 
> >> >> >> At 11:36 09/05/27, Wu Fengguang wrote:
> >> >> >> >On Wed, May 27, 2009 at 10:21:53AM +0800, Hisashi Hifumi wrote:
> >> >> >> >>
> >> >> >> >> At 11:09 09/05/27, Wu Fengguang wrote:
> >> >> >> >> >On Wed, May 27, 2009 at 08:25:04AM +0800, Hisashi Hifumi wrote:
> >> >> >> >> >>
> >> >> >> >> >> At 08:42 09/05/27, Andrew Morton wrote:
> >> >> >> >> >> >On Fri, 22 May 2009 10:33:23 +0800
> >> >> >> >> >> >Wu Fengguang <fengguang.wu <at> intel.com> wrote:
> >> >> >> >> >> >
> >> >> >> >> >> >> > I tested above patch, and I got same performance number.
> >> >> >> >> >> >> > I wonder why if (PageUptodate(page)) check is there...
> >> >> >> >> >> >>
> >> >> >> >> >> >> Thanks!  This is an interesting micro timing behavior that
> >> >> >> >> >> >> demands some research work.  The above check is to confirm 
> >if it's
> >> >> >> >> >> >> the PageUptodate() case that makes the difference. So why 
> >that case
> >> >> >> >> >> >> happens so frequently so as to impact the performance? 
(Continue reading)

Daisuke Nishimura | 1 Jun 2009 06:25
Picon
Picon

Re: [RFC] Low overhead patches for the memory cgroup controller (v2)

I'm sorry for my very late reply.

I've been working on the stale swap cache problem for a long time as you know :)

On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir <at> linux.vnet.ibm.com> wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu <at> jp.fujitsu.com> [2009-05-16 02:45:03]:
> 
> > I think set/clear flag here adds race condtion....because pc->flags is
> > modfied by
> >   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> > you have to modify above lines to be
> > 
> >   SetPageCgroupCache(pc) or some..
> >   ...
> >   SetPageCgroupUsed(pc)
> > 
> > Then, you can use set_bit() without lock_page_cgroup().
> > (Currently, pc->flags is modified only under lock_page_cgroup(), so,
> >  non atomic code is used.)
> >
> 
> Here is the next version of the patch
> 
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> From: Balbir Singh <balbir <at> linux.vnet.ibm.com>
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
(Continue reading)


Gmane