KOSAKI Motohiro | 1 Mar 2008 08:02
Favicon

Re: [patch 03/21] use an array for the LRU pagevecs

Hi Andy

sorry, almost mistake maked by me. 

> >  #define for_each_lru(l) for (l = 0; l < NR_LRU_LISTS; l++)
> >  
> > +static inline int is_active_lru(enum lru_list l)
> > +{
> > +	if (l == LRU_ACTIVE)
> > +		return 1;
> > +	return 0;
> 
> Can this not be:
> 
> 	return (l == LRU_ACTIVE);

yes, your code is more better.

Thanks.

> >  <at>  <at>  -98,6 +97,19  <at>  <at>  void put_pages_list(struct list_head *pa
> >  EXPORT_SYMBOL(put_pages_list);
> >  
> >  /*
> > + * Returns the LRU list a page should be on.
> > + */
> > +enum lru_list page_lru(struct page *page)
> > +{
> > +	enum lru_list lru = LRU_BASE;
> > +
(Continue reading)

Jared Hulbert | 1 Mar 2008 09:14
Picon

Re: [patch 4/6] xip: support non-struct page backed memory

>  (The kaddr->pfn conversion may not be quite right for all architectures or XIP
>  memory mappings, and the cacheflushing may need to be added for some archs).
>
>  This scheme has been tested and works for Jared's work-in-progress filesystem,

Opps.  I screwed up testing this.  It doesn't work with MTD devices and ARM....

The problem is that virt_to_phys() gives bogus answer for a
mtd->point()'ed address.  It's a ioremap()'ed address which doesn't
work with the ARM virt_to_phys().  I can get a physical address from
mtd->point() with a patch I dropped a little while back.

So I was thinking how about instead of:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
void * get_xip_address(struct address_space *mapping, pgoff_t pgoff,
int create);

xip_mem = mapping->a_ops->get_xip_address(mapping, vmf->pgoff, 0);
pfn = virt_to_phys((void *)xip_mem) >> PAGE_SHIFT;
err = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address, pfn);
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Could we do?

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
int get_xip_address(struct address_space *mapping, pgoff_t pgoff, int
create, unsigned long *address);

if(mapping->a_ops->get_xip_address(mapping, vmf->pgoff, 0, &xip_mem)){
(Continue reading)

KOSAKI Motohiro | 1 Mar 2008 13:13
Favicon

Re: [patch 06/21] split LRU lists into anon & file sets

Hi

> <at>  <at>  -1128,64 +1026,65  <at>  <at>  static void shrink_active_list(unsigned 
(snip)
> +	/*
> +	 * For sorting active vs inactive pages, we'll use the 'anon'
> +	 * elements of the local list[] array and sort out the file vs
> +	 * anon pages below.
> +	 */

IMHO this comment implies code is not so good...

I think shrink_active_list should not change to indexed array. 
because this function almost use no indexed array operation.

the following is only explain my intention patch.

---
 mm/vmscan.c |   29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2008-03-01 21:11:03.000000000 +0900
+++ b/mm/vmscan.c	2008-03-01 21:13:13.000000000 +0900
 <at>  <at>  -1023,14 +1023,12  <at>  <at>  static void shrink_active_list(unsigned 
 	int pgdeactivate = 0;
 	unsigned long pgscanned;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
-	struct list_head list[NR_LRU_LISTS];
(Continue reading)

KOSAKI Motohiro | 1 Mar 2008 13:46
Favicon

Re: [patch 06/21] split LRU lists into anon & file sets

Hi

>  <at>  <at>  -153,43 +153,47  <at>  <at>  static int meminfo_read_proc(char *page,
>  	 * Tagged format, for easy grepping and expansion.
>  	 */
>  	len = sprintf(page,
> -		"MemTotal:     %8lu kB\n"
> -		"MemFree:      %8lu kB\n"
> -		"Buffers:      %8lu kB\n"
> -		"Cached:       %8lu kB\n"
> -		"SwapCached:   %8lu kB\n"
> -		"Active:       %8lu kB\n"
> -		"Inactive:     %8lu kB\n"
> +		"MemTotal:       %8lu kB\n"
> +		"MemFree:        %8lu kB\n"
> +		"Buffers:        %8lu kB\n"
> +		"Cached:         %8lu kB\n"
> +		"SwapCached:     %8lu kB\n"
> +		"Active(anon):   %8lu kB\n"
> +		"Inactive(anon): %8lu kB\n"
> +		"Active(file):   %8lu kB\n"
> +		"Inactive(file): %8lu kB\n"

Unfortunately this change corrupt "vmstat -a".
could we add field instead replace it?

-kosaki

---
 fs/proc/proc_misc.c |   21 +++++++++++++++++----
(Continue reading)

KOSAKI Motohiro | 1 Mar 2008 14:35
Favicon

Re: [patch 09/21] (NEW) improve reclaim balancing

hi

> +	/*
> +	 * Even if we did not try to evict anon pages at all, we want to
> +	 * rebalance the anon lru active/inactive ratio.
> +	 */
> +	if (inactive_anon_low(zone))
> +		shrink_list(NR_ACTIVE_ANON, SWAP_CLUSTER_MAX, zone, sc,
> +								priority);
> +

you want check global zone status, right?
if so, this statement only do that at global scan.

- kosaki

---
 mm/vmscan.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c       2008-03-01 22:18:42.000000000 +0900
+++ b/mm/vmscan.c       2008-03-01 22:42:42.000000000 +0900
 <at>  <at>  -1319,9 +1319,9  <at>  <at>  static unsigned long shrink_zone(int pri
         * Even if we did not try to evict anon pages at all, we want to
         * rebalance the anon lru active/inactive ratio.
         */
-       if (inactive_anon_low(zone))
+       if (scan_global_lru(sc) && inactive_anon_low(zone))
(Continue reading)

KOSAKI Motohiro | 2 Mar 2008 11:35
Favicon

Re: [patch 11/21] (NEW) more aggressively use lumpy reclaim

Hi

I think this patch is very good improvement.
but it is not related to split lru.

Why don't you separate this patch?
IMHO treat as independent patch is better.

Thanks.

> During an AIM7 run on a 16GB system, fork started failing around
> 32000 threads, despite the system having plenty of free swap and
> 15GB of pageable memory.
> 
> If normal pageout does not result in contiguous free pages for
> kernel stacks, fall back to lumpy reclaim instead of failing fork
> or doing excessive pageout IO.
> 
> I do not know whether this change is needed due to the extreme
> stress test or because the inactive list is a smaller fraction
> of system memory on huge systems.

Rik van Riel | 2 Mar 2008 15:23
Picon
Favicon

Re: [patch 11/21] (NEW) more aggressively use lumpy reclaim

On Sun, 02 Mar 2008 19:35:44 +0900
KOSAKI Motohiro <kosaki.motohiro <at> jp.fujitsu.com> wrote:

> I think this patch is very good improvement.
> but it is not related to split lru.
> 
> Why don't you separate this patch?
> IMHO treat as independent patch is better.

Agreed, I should probably pull this to the start of the patch series
and submit it to Andrew Morton soon.

The arrayification of the LRU lists and pagevecs should probably go
into -mm soon, as well.  That code is ready and it can be merged
independently of the split VM code.

--

-- 
All rights reversed.
Andrea Arcangeli | 2 Mar 2008 16:54

[PATCH] mmu notifiers #v8

Difference between #v7 and #v8:

1) s/age_page/clear_flush_young/ (Nick's suggestion)
2) macro fix (Andrew)
3) move release before final unmap_vmas (for GRU, Jack/Christoph)
4) microoptimize mmu_notifier_unregister (Christoph)
5) use mmap_sem for registration serialization (Christoph)

The (void)xxx in macros doesn't work with "args". Christoph's solution
look best in avoiding warnings, even if it forces to make the mmu
notifier operation structure visible even if MMU_NOTIFIER=n (that's
the only downside).

I didn't drop invalidate_page, because invalidate_range_begin/end
would be slower for usages like KVM/GRU (we don't need a begin/end
there because where invalidate_page is called, the VM holds a
reference on the page). do_wp_page should also use invalidate_page
since it can free the page after dropping the PT lock without losing
any performance (that's not true for the places where invalidate_range
is called).

It'd be nice if everyone involved can agree to converge on this API
for .25. KVM/GRU (and perhaps Quadrics) and similar usages will be
fully covered in .25. This is a kernel internal API so there's no
problem if all the methods will become sleep capable only starting
only in .26. The brainer part of the VM work to do to make it sleep
capable is pretty much orthogonal with this patch.

Signed-off-by: Andrea Arcangeli <andrea <at> qumranet.com>
Signed-off-by: Christoph Lameter <clameter <at> sgi.com>
(Continue reading)

Andrea Arcangeli | 2 Mar 2008 17:03

[ofa-general] Re: [PATCH] mmu notifiers #v8 + xpmem

Here an example of the futher orthogonal work to do on top of #v8
during .26-rc to make the whole mmu notifier API sleep capable.

1) Every single ptep_clear_flush_young_notify and
ptep_clear_flush_notify must be converted like the below. The below is
the conversion of a single one. do_wp_page has been converted by
Christoph already but with invalidate_range (should be changed to
invalidate_page by releasing the refcount on the page after calling
invalidate_page). Hope it's clear why I'd rather not depend on these
changes to be merged in .25 in order to have the mmu notifier included
in .25.

2) Then after all this conversion work is finished, it's trivial to
delete ptep_clear_flush_young_notify and ptep_clear_flush_notify from
mmu_notifier.h (they will be unused macros once the conversion is
complete).

3) After that the VM has to be changed to convert anon_vma lock and
i_mmap_lock spinlocks to mutex/rwsemaphore.

4) Then finally the mmu_notifier_unregister must be dropped to make the
mmu notifier sleep capable with RCU in the mmu_notifier() fast path.

It's unclear at this point if 3/4 should be switchable and happening
under a CONFIG_XPMEM or similar or if everyone will benefit from those
spinlock becoming mutex (the only one that is certain to appreciate
such a change is preempt-rt, the rest of the userbase I don't know for
sure and I'd be more confortable with a TPC number comparison before
doing such a chance by default, but I leave the commentary on such a
change to linux-mm in a separate thread).
(Continue reading)

Peter Zijlstra | 2 Mar 2008 17:23
Picon

[ofa-general] Re: [PATCH] mmu notifiers #v8 + xpmem


On Sun, 2008-03-02 at 17:03 +0100, Andrea Arcangeli wrote:

> 4) Then finally the mmu_notifier_unregister must be dropped to make the
> mmu notifier sleep capable with RCU in the mmu_notifier() fast path.

Or require PREEMPTIBLE_RCU, that can handle sleeps..


Gmane