Dan Williams | 1 Jul 2008 03:57
Picon
Favicon

[problem] raid performance loss with 2.6.26-rc8 on 32-bit x86 (bisected)

Hello,

Prompted by a report from a user I have bisected a performance loss
apparently introduced by commit 54a6eb5c (mm: use two zonelist that are
filtered by GFP mask).  The test is simple sequential writes to a 4 disk
raid5 array.  Performance should be about 20% greater than 2.6.25 due to
commit 8b3e6cdc (md: introduce get_priority_stripe() to improve raid456
write performance).  The sample data below shows sporadic performance
starting at 54a6eb5c.  The '+' indicates where I hand applied 8b3e6cdc.

revision   2.6.25.8-fc8 2.6.25.9+ dac1d27b+ 18ea7e71+ 54a6eb5c+ 2.6.26-rc1 2.6.26-rc8
           138          168       169       167       177       149        144
           140          168       172       170       109       138        142
           142          165       169       164       119       138        129
           144          168       169       171       120       139        135
           142          165       174       166       165       122        154
MB/s (avg) 141          167       171       168       138       137        141
% change   0%           18%       21%       19%       -2%       -3%        0%
result     base         good      good      good      [bad]     bad        bad

Notable observations:
1/ This problem does not reproduce when ARCH=x86_64, i.e. 2.6.26-rc8 and
54a6eb5c show consistent performance at 170MB/s.
2/ Single drive performance appears to be unaffected
3/ A quick test shows that raid0 performance is also sporadic:
   2147483648 bytes (2.1 GB) copied, 7.72408 s, 278 MB/s
   2147483648 bytes (2.1 GB) copied, 7.78478 s, 276 MB/s
   2147483648 bytes (2.1 GB) copied, 11.0323 s, 195 MB/s
   2147483648 bytes (2.1 GB) copied, 8.41244 s, 255 MB/s
   2147483648 bytes (2.1 GB) copied, 30.7649 s, 69.8 MB/s
(Continue reading)

KOSAKI Motohiro | 1 Jul 2008 09:01
Favicon

[PATCH -mm] split_lru: fix pagevec_move_tail() doesn't treat unevictable page

even under writebacking, page can move to unevictable list.
so shouldn't pagevec_move_tail() check unevictable?

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro <at> jp.fujitsu.com>

---
 mm/swap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/mm/swap.c
===================================================================
--- a/mm/swap.c
+++ b/mm/swap.c
 <at>  <at>  -116,7 +116,7  <at>  <at>  static void pagevec_move_tail(struct pag
 			zone = pagezone;
 			spin_lock(&zone->lru_lock);
 		}
-		if (PageLRU(page) && !PageActive(page)) {
+		if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
 			int lru = page_is_file_cache(page);
 			list_move_tail(&page->lru, &zone->lru[lru].list);
 			pgmoved++;

MinChan Kim | 1 Jul 2008 09:34
Picon

Re: [PATCH -mm] split_lru: fix pagevec_move_tail() doesn't treat unevictable page

On Tue, Jul 1, 2008 at 4:01 PM, KOSAKI Motohiro
<kosaki.motohiro <at> jp.fujitsu.com> wrote:
> even under writebacking, page can move to unevictable list.
> so shouldn't pagevec_move_tail() check unevictable?
>
Hi, Kosaki-san.

I can't understand this race situation.
How the page can move to unevictable list while it is under writeback?

Could you explain for me ? :)

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro <at> jp.fujitsu.com>
>
> ---
>  mm/swap.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: b/mm/swap.c
> ===================================================================
> --- a/mm/swap.c
> +++ b/mm/swap.c
>  <at>  <at>  -116,7 +116,7  <at>  <at>  static void pagevec_move_tail(struct pag
>                        zone = pagezone;
>                        spin_lock(&zone->lru_lock);
>                }
> -               if (PageLRU(page) && !PageActive(page)) {
> +               if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
>                        int lru = page_is_file_cache(page);
>                        list_move_tail(&page->lru, &zone->lru[lru].list);
(Continue reading)

KOSAKI Motohiro | 1 Jul 2008 09:56
Favicon

Re: [PATCH -mm] split_lru: fix pagevec_move_tail() doesn't treat unevictable page

Hi Kim-san,

Thank you for good question.

> > even under writebacking, page can move to unevictable list.
> > so shouldn't pagevec_move_tail() check unevictable?
> >
> Hi, Kosaki-san.
> 
> I can't understand this race situation.
> How the page can move to unevictable list while it is under writeback?
> 
> Could you explain for me ? :)

Actually, I added below assertion and tested on stress workload.
then system crashed after 4H runnings.

----------------------------------------------
static void pagevec_move_tail(struct pagevec *pvec)
{
(snip)
                if (PageLRU(page) && !PageActive(page)) {
                        int lru = page_is_file_cache(page);
                        list_move_tail(&page->lru, &zone->lru[lru].list);
                        BUG_ON(page_lru(page) != lru);  // !!here
                        pgmoved++;
                }
        }
----------------------------------------------------

(Continue reading)

Mel Gorman | 1 Jul 2008 10:09
Picon

Re: [problem] raid performance loss with 2.6.26-rc8 on 32-bit x86 (bisected)

(Christoph's address corrected and Andys added to cc)

On (30/06/08 18:57), Dan Williams didst pronounce:
> Hello,
> 
> Prompted by a report from a user I have bisected a performance loss
> apparently introduced by commit 54a6eb5c (mm: use two zonelist that are
> filtered by GFP mask).  The test is simple sequential writes to a 4 disk
> raid5 array.  Performance should be about 20% greater than 2.6.25 due to
> commit 8b3e6cdc (md: introduce get_priority_stripe() to improve raid456
> write performance).  The sample data below shows sporadic performance
> starting at 54a6eb5c.  The '+' indicates where I hand applied 8b3e6cdc.
> 
> revision   2.6.25.8-fc8 2.6.25.9+ dac1d27b+ 18ea7e71+ 54a6eb5c+ 2.6.26-rc1 2.6.26-rc8
>            138          168       169       167       177       149        144
>            140          168       172       170       109       138        142
>            142          165       169       164       119       138        129
>            144          168       169       171       120       139        135
>            142          165       174       166       165       122        154
> MB/s (avg) 141          167       171       168       138       137        141
> % change   0%           18%       21%       19%       -2%       -3%        0%
> result     base         good      good      good      [bad]     bad        bad
> 

That is not good at all as this patch is not a straight-forward revert but
the second time it's come under suspicion.

> Notable observations:
> 1/ This problem does not reproduce when ARCH=x86_64, i.e. 2.6.26-rc8 and
> 54a6eb5c show consistent performance at 170MB/s.
(Continue reading)

MinChan Kim | 1 Jul 2008 10:16
Picon

Re: [PATCH -mm] split_lru: fix pagevec_move_tail() doesn't treat unevictable page

On Tue, Jul 1, 2008 at 4:56 PM, KOSAKI Motohiro
<kosaki.motohiro <at> jp.fujitsu.com> wrote:
> Hi Kim-san,
>
> Thank you for good question.

Thanks for good explaining.
I guess your scenario have a possibility.

If I don't have a test HPC, I will dig in source. :)

>> > even under writebacking, page can move to unevictable list.
>> > so shouldn't pagevec_move_tail() check unevictable?
>> >
>> Hi, Kosaki-san.
>>
>> I can't understand this race situation.
>> How the page can move to unevictable list while it is under writeback?
>>
>> Could you explain for me ? :)
>
> Actually, I added below assertion and tested on stress workload.
> then system crashed after 4H runnings.
>
> ----------------------------------------------
> static void pagevec_move_tail(struct pagevec *pvec)
> {
> (snip)
>                if (PageLRU(page) && !PageActive(page)) {
>                        int lru = page_is_file_cache(page);
(Continue reading)

KOSAKI Motohiro | 1 Jul 2008 10:26
Favicon

[resend][PATCH -mm] split_lru: fix pagevec_move_tail() doesn't treat unevictable page


Agghh!
I am really stupid. I forgot CCed Andrew ;-)

So, I resend this.

--------------------------------------
even under writebacking, page can move to unevictable list.
so shouldn't pagevec_move_tail() check unevictable?

if pagevec_move_tail() doesn't PageUnevictable(), 
below race can occur.

    CPU1                                       CPU2
==================================================================
1. rotate_reclaimable_page()
2. PageUnevictable(page) return 0
3. local_irq_save()
4. pagevec_move_tail()
                                       SetPageUnevictable()   //mlock?
                                       move to unevictable list
5. spin_lock(&zone->lru_lock);
6. list_move_tail(); (move to inactive list)

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro <at> jp.fujitsu.com>

---
 mm/swap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(Continue reading)

Andrew Morton | 1 Jul 2008 10:53

Re: [patch 1/6] mm: Allow architectures to define additional protection bits

On Wed, 18 Jun 2008 17:32:55 -0500 shaggy <at> linux.vnet.ibm.com wrote:

> This patch allows architectures to define functions to deal with
> additional protections bits for mmap() and mprotect().
> 
> arch_calc_vm_prot_bits() maps additonal protection bits to vm_flags
> arch_vm_get_page_prot() maps additional vm_flags to the vma's vm_page_prot
> arch_validate_prot() checks for valid values of the protection bits

It'd be simpler if Paul were to merge this.  It doesn't conflict with
any pending work.

Acked-by: Andrew Morton <akpm <at> linux-foundation.org>

> Note: vm_get_page_prot() is now pretty ugly.

It is.  But afacit it generates the same code for non-powerpc.

> Suggestions?

nfi.  Let us rub the Hugh-summoning lamp.

> Signed-off-by: Dave Kleikamp <shaggy <at> linux.vnet.ibm.com>
> ---
> 
>  include/linux/mman.h |   28 +++++++++++++++++++++++++++-
>  mm/mmap.c            |    5 +++--
>  mm/mprotect.c        |    2 +-
>  3 files changed, 31 insertions(+), 4 deletions(-)
> 
(Continue reading)

Benny Halevy | 1 Jul 2008 13:53
Favicon

Re: [PATCH] mm: fix uninitialized variables for find_vma_prepare callers

On Jun. 30, 2008, 22:00 +0300, Hugh Dickins <hugh <at> veritas.com> wrote:
> On Mon, 30 Jun 2008, Benny Halevy wrote:
>> gcc 4.3.0 correctly emits the following warnings.
>> When a vma covering addr is found, find_vma_prepare indeed returns without
>> setting pprev, rb_link, and rb_parent.
> 
> That's amusing, thank you.
> 
> You may wonder how the vma rb_tree has been working all these years
> despite that.  The answer is that we only use find_vma_prepare when
> about to insert a new vma: if there's anything already there, it's
> either an error condition, or we go off and unmap the overlap without
> taking any interest in those uninitialized values for linking.
> 
> It would be nicer to initialize them, and your patch is certainly
> nice and simple.  Would it have the effect, that it returns with
> vma == *pprev when addr falls within an existing vma?
> That would be a sensible outcome, I think.

No, *pprev will be set to the last parent encountered with this
patch, but doing what you suggested is possible, though I doubt
it matters since apparently nobody is using pprev in this path
otherwise stuff would've just broken.

Benny

> 
> Hugh
> 
>> [warnings snipped]
(Continue reading)

Rik van Riel | 1 Jul 2008 15:38
Picon
Favicon

Re: [resend][PATCH -mm] split_lru: fix pagevec_move_tail() doesn't treat unevictable page

On Tue, 01 Jul 2008 17:26:51 +0900
KOSAKI Motohiro <kosaki.motohiro <at> jp.fujitsu.com> wrote:

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro <at> jp.fujitsu.com>

Acked-by: Rik van Riel <riel <at> redhat.com>

Good catch!

>  <at>  <at>  -116,7 +116,7  <at>  <at>  static void pagevec_move_tail(struct pag
>  			zone = pagezone;
>  			spin_lock(&zone->lru_lock);
>  		}
> -		if (PageLRU(page) && !PageActive(page)) {
> +		if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
>  			int lru = page_is_file_cache(page);
>  			list_move_tail(&page->lru, &zone->lru[lru].list);
>  			pgmoved++;

--

-- 
All rights reversed.

Gmane