Nishanth Aravamudan | 4 Sep 00:22 2007
Picon

Re: [RFC][PATCH] stack_grow_into_huge: fix-up testcase for 64-bit binaries

On 31.08.2007 [12:53:37 +1000], David Gibson wrote:
> On Thu, Aug 30, 2007 at 09:59:00AM -0700, Nishanth Aravamudan wrote:
> > On 30.08.2007 [13:06:51 +1000], David Gibson wrote:
> > > On Wed, Aug 29, 2007 at 04:20:10PM -0700, Nishanth Aravamudan wrote:
> > > > stack_grow_into_huge: fix-up testcase for 64-bit binaries
> > > > 
> > > > Currently, we mmap in a huge page arbitrarily and then start growing the
> > > > stack down. For 64-bit binaries, though, the stack may be rather high up
> > > > in the address space, and there is no guarantee our huge page mapping is
> > > > anywhere near it. So, instead, try to mmap() in a hugepage very close to
> > > > the stack. If we fail, just keep moving down a hugepage at a time, until
> > > > we succeed. For x86/x86_64, this should happen relatively quickly. For
> > > > ppc64, this should put us in the segment just below the one containing
> > > > the stack.
> > > > 
> > > > Then, in the child process, only grow the stack down to the point where
> > > > it is below the mmap() address. If we're still running at that point,
> > > > we've clearly not been signalled.
> > > > 
> > > > Tested on x86_64, but wanted to get some review from others.
> > > 
> > > Heh.  I was thinking about doing something vaguely like this for
> > > ppc64, since the 64-bit version of the test takes so long to run.
> > 
> > Yep.
> > 
> > > I think what you have is good - certainly makes the test more robust.
> > > However, it doesn't actually speed the test up all that much for
> > > ppc64.  There, the hugepage has to go not in the previous segment from
> > > the stack but in the previous "address space slice" since we only keep
(Continue reading)

Nishanth Aravamudan | 4 Sep 00:26 2007
Picon

[RFC][PATCH v3] stack_grow_into_huge: fix-up testcase for 64-bit binaries

On 31.08.2007 [12:55:04 +1000], David Gibson wrote:
> On Thu, Aug 30, 2007 at 02:03:53PM -0700, Nishanth Aravamudan wrote:
> > On 30.08.2007 [13:06:51 +1000], David Gibson wrote:
> > > On Wed, Aug 29, 2007 at 04:20:10PM -0700, Nishanth Aravamudan wrote:
> > > > stack_grow_into_huge: fix-up testcase for 64-bit binaries

<snip>

> > diff --git a/tests/stack_grow_into_huge.c b/tests/stack_grow_into_huge.c
> > index 8507df9..9a63d6b 100644
> > --- a/tests/stack_grow_into_huge.c
> > +++ b/tests/stack_grow_into_huge.c
> >  <at>  <at>  -48,13 +48,19  <at>  <at> 
> >   * 0d59a01bc461bbab4017ff449b8401151ef44cf6.
> >   */
> >  
> > -void do_child()
> > +#ifdef __LP64__
> > +#define STACK_ALLOCATION_SIZE	(256*1024*1024)
> > +#else
> > +#define STACK_ALLOCATION_SIZE	(16*1024*1024)
> > +#endif
> 
> Uh... you don't appear to be actually using this constant below....

D'oh!

> More subtley, I had been thinking of actually computing a chunksize
> based on the distance between the initial and target stack addresses.

(Continue reading)

Nishanth Aravamudan | 5 Sep 02:01 2007
Picon

libhugetlbfs 1.2 released

Hello all,

libhugetlbfs 1.2 is out the door on sf.net. Adam is out of town, so the
ozlabs git tree won't be updated until Monday. This release brings:

New features

* Partial segment remapping. This allows non-relinked binaries to try
  to take advantage of libhugetlbfs' segment remapping code. Large
  segments are required, especially on Power. This feature is useful for
  estimating huge page performance, however full relinking will still
  perform better.
* Add extra debugging for binaries that may run out of address
  space.
* Log library version when HUGETLB_VERBOSE is enabled.
* Beginning support for ia64 and sparc64.
* New test to check handling of misaligned mmap() parameters.

Bug fixes

* Fix EH_FRAME segment. Fixes some C++ applications.
* Rework PLT detection to work better on Power.
* Add per-target-arch syscall stubs to the library. These
  provide reliable error messages from elflink.c if they occur while the
  program segments are unmapped.
* Add proper SONAME to shared libs.
* Makefile respects CFLAGS/LDFLAGS/CPPFLAGS environment
* variables.
* Make mlock() failure non-fatal.

(Continue reading)

David Gibson | 4 Sep 02:08 2007
Picon

Re: [RFC][PATCH v3] stack_grow_into_huge: fix-up testcase for 64-bit binaries

On Mon, Sep 03, 2007 at 03:26:04PM -0700, Nishanth Aravamudan wrote:
> On 31.08.2007 [12:55:04 +1000], David Gibson wrote:
> > On Thu, Aug 30, 2007 at 02:03:53PM -0700, Nishanth Aravamudan wrote:
> > > On 30.08.2007 [13:06:51 +1000], David Gibson wrote:
> > > > On Wed, Aug 29, 2007 at 04:20:10PM -0700, Nishanth Aravamudan wrote:
> > > > > stack_grow_into_huge: fix-up testcase for 64-bit binaries
> 
> <snip>
> 
> > > diff --git a/tests/stack_grow_into_huge.c b/tests/stack_grow_into_huge.c
> > > index 8507df9..9a63d6b 100644
> > > --- a/tests/stack_grow_into_huge.c
> > > +++ b/tests/stack_grow_into_huge.c
> > >  <at>  <at>  -48,13 +48,19  <at>  <at> 
> > >   * 0d59a01bc461bbab4017ff449b8401151ef44cf6.
> > >   */
> > >  
> > > -void do_child()
> > > +#ifdef __LP64__
> > > +#define STACK_ALLOCATION_SIZE	(256*1024*1024)
> > > +#else
> > > +#define STACK_ALLOCATION_SIZE	(16*1024*1024)
> > > +#endif
> > 
> > Uh... you don't appear to be actually using this constant below....
> 
> D'oh!
> 
> > More subtley, I had been thinking of actually computing a chunksize
> > based on the distance between the initial and target stack addresses.
(Continue reading)

Nishanth Aravamudan | 5 Sep 03:22 2007
Picon

Re: [RFC][PATCH v3] stack_grow_into_huge: fix-up testcase for 64-bit binaries

On 04.09.2007 [10:08:44 +1000], David Gibson wrote:
> On Mon, Sep 03, 2007 at 03:26:04PM -0700, Nishanth Aravamudan wrote:
> > On 31.08.2007 [12:55:04 +1000], David Gibson wrote:
> > > On Thu, Aug 30, 2007 at 02:03:53PM -0700, Nishanth Aravamudan wrote:
> > > > On 30.08.2007 [13:06:51 +1000], David Gibson wrote:
> > > > > On Wed, Aug 29, 2007 at 04:20:10PM -0700, Nishanth Aravamudan wrote:
> > > > > > stack_grow_into_huge: fix-up testcase for 64-bit binaries
> > 
> > <snip>
> > 
> > > > diff --git a/tests/stack_grow_into_huge.c b/tests/stack_grow_into_huge.c
> > > > index 8507df9..9a63d6b 100644
> > > > --- a/tests/stack_grow_into_huge.c
> > > > +++ b/tests/stack_grow_into_huge.c
> > > >  <at>  <at>  -48,13 +48,19  <at>  <at> 
> > > >   * 0d59a01bc461bbab4017ff449b8401151ef44cf6.
> > > >   */
> > > >  
> > > > -void do_child()
> > > > +#ifdef __LP64__
> > > > +#define STACK_ALLOCATION_SIZE	(256*1024*1024)
> > > > +#else
> > > > +#define STACK_ALLOCATION_SIZE	(16*1024*1024)
> > > > +#endif
> > > 
> > > Uh... you don't appear to be actually using this constant below....
> > 
> > D'oh!
> > 
> > > More subtley, I had been thinking of actually computing a chunksize
(Continue reading)

Adam Litke | 13 Sep 19:59 2007
Picon

[PATCH 1/5] hugetlb: Account for hugepages as locked_vm


Hugepages allocated to a process are pinned into memory and are not
reclaimable.  Currently they do not contribute towards the process' locked
memory.  This patch includes those pages in the process' 'locked_vm' pages.

NOTE: The locked_vm counter is only updated at fault and unmap time.  Huge
pages are different from regular mlocked memory which is faulted in all at
once.  Therefore, it does not make sense to charge at mmap time for huge
page mappings.  This difference results in a deviation from normal mlock
accounting which cannot be trivially reconciled given the inherent
differences with huge pages.

Signed-off-by: Adam Litke <agl <at> us.ibm.com>
Signed-off-by: Mel Gorman <mel <at> csn.ul.ie>
Acked-by: Andy Whitcroft <apw <at> shadowen.org>
---

 mm/hugetlb.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index de4cf45..1dfeafa 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
 <at>  <at>  -428,6 +428,7  <at>  <at>  void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 			continue;

 		page = pte_page(pte);
+		mm->locked_vm -= HPAGE_SIZE >> PAGE_SHIFT;
 		if (pte_dirty(pte))
(Continue reading)

Adam Litke | 13 Sep 19:58 2007
Picon

[PATCH 0/5] [hugetlb] Dynamic huge page pool resizing


In most real-world scenarios, configuring the size of the hugetlb pool
correctly is a difficult task.  If too few pages are allocated to the pool,
applications using MAP_SHARED may fail to mmap() a hugepage region and
applications using MAP_PRIVATE may receive SIGBUS.  Isolating too much memory
in the hugetlb pool means it is not available for other uses, especially those
programs not using huge pages.

The obvious answer is to let the hugetlb pool grow and shrink in response to
the runtime demand for huge pages.  The work Mel Gorman has been doing to
establish a memory zone for movable memory allocations makes dynamically
resizing the hugetlb pool reliable within the limits of that zone.  This patch
series implements dynamic pool resizing for private and shared mappings while
being careful to maintain existing semantics.  Please reply with your comments
and feedback; even just to say whether it would be a useful feature to you.
Thanks.

How it works
============

Upon depletion of the hugetlb pool, rather than reporting an error immediately,
first try and allocate the needed huge pages directly from the buddy allocator.
Care must be taken to avoid unbounded growth of the hugetlb pool, so the
hugetlb filesystem quota is used to limit overall pool size.  Additionally we
classify hugepages as locked memory (since that is what it actually is), and
only grow the pool where the process does not exceed its locked_vm limit.

The real work begins when we decide there is a shortage of huge pages.  What
happens next depends on whether the pages are for a private or shared mapping.
Private mappings are straightforward.  At fault time, if alloc_huge_page()
(Continue reading)

Adam Litke | 13 Sep 19:59 2007
Picon

[PATCH 2/5] hugetlb: Move update_and_free_page


This patch simply moves update_and_free_page() so that it can be reused
later in this patch series.  The implementation is not changed.

Signed-off-by: Adam Litke <agl <at> us.ibm.com>
Acked-by: Andy Whitcroft <apw <at> shadowen.org>
---

 mm/hugetlb.c |   30 +++++++++++++++---------------
 1 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1dfeafa..50195a2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
 <at>  <at>  -90,6 +90,21  <at>  <at>  static struct page *dequeue_huge_page(struct vm_area_struct *vma,
 	return page;
 }

+static void update_and_free_page(struct page *page)
+{
+	int i;
+	nr_huge_pages--;
+	nr_huge_pages_node[page_to_nid(page)]--;
+	for (i = 0; i < (HPAGE_SIZE / PAGE_SIZE); i++) {
+		page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
+				1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
+				1 << PG_private | 1<< PG_writeback);
+	}
+	set_compound_page_dtor(page, NULL);
(Continue reading)

Adam Litke | 13 Sep 19:59 2007
Picon

[PATCH 3/5] hugetlb: Try to grow hugetlb pool for MAP_PRIVATE mappings


Because we overcommit hugepages for MAP_PRIVATE mappings, it is possible
that the hugetlb pool will be exhausted or completely reserved when a
hugepage is needed to satisfy a page fault.  Before killing the process in
this situation, try to allocate a hugepage directly from the buddy
allocator.  Only do this if the process would remain within its locked_vm
memory limits.

The explicitly configured pool size becomes a low watermark.  When
dynamically grown, the allocated huge pages are accounted as a surplus over
the watermark.  As huge pages are freed on a node, surplus pages are
released to the buddy allocator so that the pool will shrink back to the
watermark.

Signed-off-by: Adam Litke <agl <at> us.ibm.com>
Signed-off-by: Mel Gorman <mel <at> csn.ul.ie>
Acked-by: Andy Whitcroft <apw <at> shadowen.org>
---

 mm/hugetlb.c |   71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 50195a2..ec5207e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
 <at>  <at>  -27,6 +27,7  <at>  <at>  unsigned long max_huge_pages;
 static struct list_head hugepage_freelists[MAX_NUMNODES];
 static unsigned int nr_huge_pages_node[MAX_NUMNODES];
 static unsigned int free_huge_pages_node[MAX_NUMNODES];
(Continue reading)

Adam Litke | 13 Sep 19:59 2007
Picon

[PATCH 4/5] hugetlb: Try to grow hugetlb pool for MAP_SHARED mappings


Shared mappings require special handling because the huge pages needed to
fully populate the VMA must be reserved at mmap time.  If not enough pages
are available when making the reservation, allocate all of the shortfall at
once from the buddy allocator and add the pages directly to the hugetlb
pool.  If they cannot be allocated, then fail the mapping.  The page
surplus is accounted for in the same way as for private mappings; faulted
surplus pages will be freed at unmap time.  Reserved, surplus pages that
have not been used must be freed separately when their reservation has been
released.

Signed-off-by: Adam Litke <agl <at> us.ibm.com>
Acked-by: Andy Whitcroft <apw <at> shadowen.org>
---

 mm/hugetlb.c |  161 ++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 138 insertions(+), 23 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ec5207e..0cedcd0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
 <at>  <at>  -28,6 +28,7  <at>  <at>  static struct list_head hugepage_freelists[MAX_NUMNODES];
 static unsigned int nr_huge_pages_node[MAX_NUMNODES];
 static unsigned int free_huge_pages_node[MAX_NUMNODES];
 static unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+static unsigned long unused_surplus_pages;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;

(Continue reading)


Gmane