Dave Hansen | 1 Sep 01:12 2004
Picon

Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap(2) [1/3]

On Tue, 2004-08-31 at 15:55, Hiroyuki KAMEZAWA wrote:
> Dave Hansen wrote:
> 
> > On Tue, 2004-08-31 at 03:41, Hiroyuki KAMEZAWA wrote:
> > 
> >>+static void __init calculate_aligned_end(struct zone *zone,
> >>+					 unsigned long start_pfn,
> >>+					 int nr_pages)
> > 
> > ...
> > 
> >>+		end_address = (zone->zone_start_pfn + end_idx) << PAGE_SHIFT;
> >>+#ifndef CONFIG_DISCONTIGMEM
> >>+		reserve_bootmem(end_address,PAGE_SIZE);
> >>+#else
> >>+		reserve_bootmem_node(zone->zone_pgdat,end_address,PAGE_SIZE);
> >>+#endif
> >>+	}
> >>+	return;
> >>+}
> > 
> > 
> > What if someone has already reserved that address?  You might not be
> > able to grow the zone, right?
> > 
> 1) If someone has already reserved that address,  it (the page) will not join to
>    buddy allocator and it's no problem.
> 
> 2) No, I can grow the zone.
>    A reserved page is the last page of "not aligned contiguous mem_map", not zone.
(Continue reading)

Andrew Morton | 1 Sep 01:24 2004

Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap(2) [0/3]

Hiroyuki KAMEZAWA <kamezawa.hiroyu <at> jp.fujitsu.com> wrote:
>
> Because I had to record some information about shape of mem_map, I used PG_xxx bit.
> 1 bit is maybe minimum consumption.

The point is that we're running out of bits in page.flags.

You should be able to reuse an existing bit for this application.  PG_lock would suit.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart <at> kvack.org"> aart <at> kvack.org </a>

Hiroyuki KAMEZAWA | 1 Sep 01:36 2004

Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap(2) [1/3]

Dave Hansen wrote:

> On Tue, 2004-08-31 at 15:55, Hiroyuki KAMEZAWA wrote:
> 
>>Dave Hansen wrote:
>>
>>
>>>On Tue, 2004-08-31 at 03:41, Hiroyuki KAMEZAWA wrote:
>>>
>>>
>>>>+static void __init calculate_aligned_end(struct zone *zone,
>>>>+					 unsigned long start_pfn,
>>>>+					 int nr_pages)
>>>
>>>...
>>>
>>>
>>>>+		end_address = (zone->zone_start_pfn + end_idx) << PAGE_SHIFT;
>>>>+#ifndef CONFIG_DISCONTIGMEM
>>>>+		reserve_bootmem(end_address,PAGE_SIZE);
>>>>+#else
>>>>+		reserve_bootmem_node(zone->zone_pgdat,end_address,PAGE_SIZE);
>>>>+#endif
>>>>+	}
>>>>+	return;
>>>>+}
>>>
>>>
>>>What if someone has already reserved that address?  You might not be
>>>able to grow the zone, right?
(Continue reading)

Hiroyuki KAMEZAWA | 1 Sep 01:53 2004

Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap(2) [0/3]

Andrew Morton wrote:

> Hiroyuki KAMEZAWA <kamezawa.hiroyu <at> jp.fujitsu.com> wrote:
> 
>>Because I had to record some information about shape of mem_map, I used PG_xxx bit.
>>1 bit is maybe minimum consumption.
> 
> 
> The point is that we're running out of bits in page.flags.
> 
yes.

> You should be able to reuse an existing bit for this application.  PG_lock would suit.

Hmm... PG_buddyend pages in the top of mem_map can be allocated and used as normal pages
,which can be used for Disk I/O.

If I make them as victims to buddy allocator and don't allow to use them,
I can reuse an existing bit.

I'll consider more.

--Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart <at> kvack.org"> aart <at> kvack.org </a>

(Continue reading)

Christoph Lameter | 1 Sep 22:11 2004
Picon

page fault scalability patch v6: fallback to page_table_lock, s390 support

On Wed, 1 Sep 2004, Martin Schwidefsky wrote:

> P.S. You should send patches against the memory management to the linux-mm
> list.

Thanks for all the feedback. I hope I got the s390 support right in the
following patch. PowerPC falls back to the use of the page_table_lock now.
The following patch is against 2.6.9-rc1 (last one was against mm1).

Signed-off-by: Christoph Lameter <clameter <at> sgi.com>

This is the sixth release of the page fault scalability patches.
The scalability patches avoid locking during the creation of page table entries for anonymous
memory in a threaded application running on a SMP system. The performance
increases significantly for more than 2 threads running concurrently.

Changes from V5:
- Provide fallback routines in asm-generic/pgtable.h for platforms that do not
  support atomic operations on pte/pmd/pgds.
  The fallback routines will use the page_table_lock for very short times on
  those platforms so that as much as possible of the performance gain through
  concurrent page faults is preversed.
- Provide s390 support for atomic pte operations.

Typical performance increases in the page fault rate are:
2 CPUs -> 10%
4 CPUs -> 30%
8 CPUs -> 50%

With a high number of CPUs (16..512) we are seeing the page fault rate roughly doubling.
(Continue reading)

Hiroyuki KAMEZAWA | 2 Sep 09:55 2004

[RFC] buddy allocator without bitmap(3) [0/3]

Hi, this is new patch for removing bitmaps from the buddy allocator.

In previous version, I used new additional PG_xxx flag. But in this, I don't
use any additional new flag.

For dealing with a special case of unaligned discontiguous mem_map,
I removed some troublesome pages from the system instead of using PG_xxx flag.
Note:: If memmap is aligned, no pages are removed.

"What pages are removed ?" is explained in patch[1/3].
Please draw a picture of the buddy system when you read calculate_aligned_end(),
which finds pages to be removed.

(Special Case Example)
Results of calculate_aligned_end() on Tiger4
(Itanium2, 8GB Memory, discontiguous, virtual mem_map)
is here. There are 5 mem_maps for 2 zones and 19 pages are removed.

mem_map(1) from  36e    length 1fb6d  --- ZONE_DMA
mem_map(2) from  1fedc  length   124  --- ZONE_DMA
mem_map(3) from  40000  length 40000  --- ZONE_NORMAL (this mem_map is aligned)
mem_map(4) from  a0000  length 20000  --- ZONE_NORMAL
mem_map(5) from  bfedc  length   124  --- ZONE_NORMAL

ZONE_NORMAL has a memory hole of 2 Gbytes.

==================
Sep  2 15:23:35 casares kernel: calculate_aligned_end() 36e 1fb6d
Sep  2 15:23:35 casares kernel: victim top page 36e
Sep  2 15:23:35 casares kernel: victim top page 370
(Continue reading)

Hiroyuki KAMEZAWA | 2 Sep 10:00 2004

[RFC] buddy allocator without bitmap(3) [1/3]

This part is for initialization.

-- Kame

======

This patch is 2nd.
This implements some initialization for the buddy allocator.

New function: calculate_aligned_end()

calculate_aligned_end() removes some pages from system for removing invalid
mem_map access from __free_pages_bulk() main loop.(This is in 4th patch)

See below
================== main loop in __free_pages_bulk(page,order) ===========
while (order < MAX_ORDER) {
	struct page *buddy;
	......
	buddy_idx = page_idx ^ (1 << order);
	buddy = zone->zone_mem_map + buddy_idx;
	if (page_count(buddy) !=0  --------------------(*)
	.......
}
===============================================================
At (*), we have to guarantee that "buddy" is a valid page struct
in a valid zone.

Let MASK = (1 << (MAX_ORDER - 1)) - 1.
A page of index 'X' can be coalesced with pages from (X &~MASK) to (X | mask).
(Continue reading)

Hiroyuki KAMEZAWA | 2 Sep 10:04 2004

[RFC] buddy allocator without bitmap [2/3]

This part is unchanged from previous version.

I'm sorry to forget to say that these patches are against 2.6.9-rc1-mm2.

--Kame
====

This is 3rd.
This patch removes bitmap operation from alloc_pages().

Instead of using MARK_USED() bitmap operation,
this patch records page's order in page struct itself, page->private field.

---

 test-kernel-kamezawa/mm/page_alloc.c |   17 +++++++----------
 1 files changed, 7 insertions(+), 10 deletions(-)

diff -puN mm/page_alloc.c~eliminate-bitmap-alloc mm/page_alloc.c
--- test-kernel/mm/page_alloc.c~eliminate-bitmap-alloc	2004-09-02 15:46:01.135746384 +0900
+++ test-kernel-kamezawa/mm/page_alloc.c	2004-09-02 15:46:01.140745624 +0900
 <at>  <at>  -288,9 +288,6  <at>  <at>  void __free_pages_ok(struct page *page,
 	free_pages_bulk(page_zone(page), 1, &list, order);
 }

-#define MARK_USED(index, order, area) \
-	__change_bit((index) >> (1+(order)), (area)->map)
-
 /*
  * The order of subdivision here is critical for the IO subsystem.
(Continue reading)

Hiroyuki KAMEZAWA | 2 Sep 10:11 2004

[RFC] buddy allocator without bitmap(3) [3/3]

This is the last part.
There is no big change from the previous version.

--Kame
===

This is 4th.
This patch removes bitmap operation from free_pages().

In main lopp of __free_pages_bulk(), we access a "buddy" page.
It is guaranteed that there is no invalid "buddy" in the buddy system, because
all dangerous pages which has possibility to be coalesced with
out-of-range pages are removed in advance by calculate_aligned_end().

---

 test-kernel-kamezawa/mm/page_alloc.c |   81 ++++++++++++++++++++++-------------
 1 files changed, 51 insertions(+), 30 deletions(-)

diff -puN mm/page_alloc.c~eliminate-bitmap-free mm/page_alloc.c
--- test-kernel/mm/page_alloc.c~eliminate-bitmap-free	2004-09-02 17:03:32.648373272 +0900
+++ test-kernel-kamezawa/mm/page_alloc.c	2004-09-02 17:03:32.653372512 +0900
 <at>  <at>  -157,6 +157,27  <at>  <at>  static void destroy_compound_page(struct
 #endif		/* CONFIG_HUGETLB_PAGE */

 /*
+ * This function checks whether a page is free && is the buddy
+ * we can do coalesce if
+ * (a) the buddy is free and
+ * (b) the buddy is on the buddy system
(Continue reading)

Rik van Riel | 2 Sep 11:05 2004
Picon

Re: Kernel 2.6.8.1: swap storm of death - nr_requests > 1024 on swap partition

On Tue, 31 Aug 2004, Marcelo Tosatti wrote:
> On Tue, Aug 31, 2004 at 08:24:31PM +0200, Karl Vogel wrote:
> > On Tuesday 31 August 2004 18:52, Marcelo Tosatti wrote:
> > > I've seen extreme decreases in performance (interactivity) with hungry
> > > memory apps with Rik's swap token code.
> > 
> > Decrease?!
> 
> Yep, its odd. Rik knows the exact reason.

Yes, it appears that the swap token patch works great on
systems where the workload consists of similar applications.
If you have a desktop, the swap token makes switching between
apps faster.  If you have a server, the swap token helps
increase throughput.

However, if you have one app that needs more memory than the
system has and the rest of the apps are all "friendly", then
the swap token can help the system hog steal resources from
the other processes.

This needs to be fixed somehow, but I'm at a conference now
so I don't think I'll get around to it this week ;)

--

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

--
(Continue reading)


Gmane