Nobuhiko Yoshida | 1 Apr 2004 11:10

Re: Hugetlbpages in very large memory machines.......

Nobuhiko Yoshida <n-yoshida <at> pst.fujitsu.com> wroteF
> Hello,
> 
> > > +/*      update_mmu_cache(vma, address, *pte); */
> > 
> > I have not studied low level IA64 VM in detail, but don't you need
> > some kind of TLB flush here?
> 
> Oh! Yes.
> Perhaps, TLB flush is needed here.

- Below is the patch that revised what I contributed before.
- I added the flush of TLB and icache. 

How To Use:
   1. Download linux-2.6.0 source tree
   2. Apply the below patch for linux-2.6.0

Thank you,
Nobuhiko Yoshida

diff -dupr linux-2.6.0/arch/i386/mm/hugetlbpage.c linux-2.6.0.HugeTLB/arch/i386/mm/hugetlbpage.c
--- linux-2.6.0/arch/i386/mm/hugetlbpage.c  2003-12-18 11:59:38.000000000 +0900
+++ linux-2.6.0.HugeTLB/arch/i386/mm/hugetlbpage.c  2004-04-01 11:48:56.000000000 +0900
 <at>  <at>  -142,8 +142,10  <at>  <at>  int copy_hugetlb_page_range(struct mm_st
            goto nomem;
        src_pte = huge_pte_offset(src, addr);
        entry = *src_pte;
-       ptepage = pte_page(entry);
-       get_page(ptepage);
(Continue reading)

Andy Whitcroft | 1 Apr 2004 23:15

RE: [PATCH] [0/6] HUGETLB memory commitment

--On 31 March 2004 00:51 -0800 "Chen, Kenneth W" <kenneth.w.chen <at> intel.com> wrote:

> Under common case, worked perfectly!  But there are always corner cases.
>
> I can think of two ugliness:
> 1. very sparse hugetlb file.  I can mmap one hugetlb page, at offset
>    512 GB.  This would account 512GB + 1 hugetlb page as committed_AS.
>    But I only asked for one page mapping.  One can say it's a feature,
>    but I think it's a bug.
>
> 2. There is no error checking (to undo the committed_AS accounting) after
>    hugetlb_prefault(). hugetlb_prefault doesn't always succeed in allocat-
>    ing all the pages user asked for due to disk quota limit.  It can have
>    partial allocation which would put the committed_AS in a wedged state.

O.k. Here is the latest version of the hugetlb commitment tracking patch
(hugetlb_tracking_R4).  This now understands the difference between shm
allocated and mmap allocated and handles them differently.  This should
fix 1.  We now handle the commitments correctly under quota failures.

Please review.

-apw

---
 arch/i386/mm/hugetlbpage.c |   30 +++++++++++++------
 file                       |    1
 fs/hugetlbfs/inode.c       |   69 +++++++++++++++++++++++++++++++++++++++++++--
 fs/proc/proc_misc.c        |    1
 include/linux/hugetlb.h    |    5 +++
(Continue reading)

Andy Whitcroft | 2 Apr 2004 00:50

RE: [PATCH] [0/6] HUGETLB memory commitment

--On 01 April 2004 22:15 +0100 Andy Whitcroft <apw <at> shadowen.org> wrote:

> O.k. Here is the latest version of the hugetlb commitment tracking patch
> (hugetlb_tracking_R4).  This now understands the difference between shm
> allocated and mmap allocated and handles them differently.  This should
> fix 1.  We now handle the commitments correctly under quota failures.

Ok.  Here is R5, including all of the architectures hooked to the new
interface.  Plus the spurious debug is gone.

-apw

---
 arch/i386/mm/hugetlbpage.c    |   28 +++++++++++------
 arch/ia64/mm/hugetlbpage.c    |   28 +++++++++++------
 arch/ppc64/mm/hugetlbpage.c   |   28 +++++++++++------
 arch/sh/mm/hugetlbpage.c      |   28 +++++++++++------
 arch/sparc64/mm/hugetlbpage.c |   28 +++++++++++------
 fs/hugetlbfs/inode.c          |   66 ++++++++++++++++++++++++++++++++++++++++--
 fs/proc/proc_misc.c           |    1
 include/linux/hugetlb.h       |    5 +++
 8 files changed, 160 insertions(+), 52 deletions(-)

diff -X /home/apw/lib/vdiff.excl -rupN reference/arch/i386/mm/hugetlbpage.c current/arch/i386/mm/hugetlbpage.c
--- reference/arch/i386/mm/hugetlbpage.c	2004-04-02 00:38:24.000000000 +0100
+++ current/arch/i386/mm/hugetlbpage.c	2004-04-01 22:58:48.000000000 +0100
 <at>  <at>  -334,6 +334,7  <at>  <at>  int hugetlb_prefault(struct address_spac
 	struct mm_struct *mm = current->mm;
 	unsigned long addr;
 	int ret = 0;
(Continue reading)

Chen, Kenneth W | 2 Apr 2004 01:09
Picon
Favicon

RE: [PATCH] [0/6] HUGETLB memory commitment

>>>>> Andy Whitcroft wrote on Thu, April 01, 2004 1:16 PM
> --On 31 March 2004 00:51 -0800 "Chen, Kenneth W" <kenneth.w.chen <at> intel.com> wrote:
>
> > Under common case, worked perfectly!  But there are always corner cases.
> >
> > I can think of two ugliness:
> > 1. very sparse hugetlb file.  I can mmap one hugetlb page, at offset
> >    512 GB.  This would account 512GB + 1 hugetlb page as committed_AS.
> >    But I only asked for one page mapping.  One can say it's a feature,
> >    but I think it's a bug.
> >
> > 2. There is no error checking (to undo the committed_AS accounting) after
> >    hugetlb_prefault(). hugetlb_prefault doesn't always succeed in allocat-
> >    ing all the pages user asked for due to disk quota limit.  It can have
> >    partial allocation which would put the committed_AS in a wedged state.
>
> O.k. Here is the latest version of the hugetlb commitment tracking patch
> (hugetlb_tracking_R4).  This now understands the difference between shm
> allocated and mmap allocated and handles them differently.  This should
> fix 1.
>
> diff -X /home/apw/lib/vdiff.excl -rupN reference/arch/i386/mm/hugetlbpage.c current/arch/i386/mm/hugetlbpage.c
> --- reference/arch/i386/mm/hugetlbpage.c	2004-04-01 13:37:14.000000000 +0100
> +++ current/arch/i386/mm/hugetlbpage.c	2004-04-01 21:54:54.000000000 +0100
>  <at>  <at>  -355,30 +357,38  <at>  <at>  int hugetlb_prefault(struct address_spac
>  			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
>  		page = find_get_page(mapping, idx);
>  		if (!page) {
> -			/* charge the fs quota first */
> +			/* charge against commitment */
(Continue reading)

Martin J. Bligh | 2 Apr 2004 19:54

2.6.5-rc3-mjb2

The patchset is meant to be pretty stable, not so much a testing ground.
Main differences from mainline are:

1. Better performance & resource consumption, particularly on larger machines.
2. Diagnosis tools (kgdb, early_printk, etc).
3. Kexec support.
4. ivtv drivers

I'd be very interested in feedback from anyone willing to test on any 
platform, however large or small.

ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.6.5-rc3/patch-2.6.5-rc3-mjb2.bz2

Since 2.6.5-rc3-mjb1 (~ = changed, + = added, - = dropped)

Notes: 
Resurrects most of the stuff that dropped out in -mjb1

-----------------------------------------------------------------------

Now in Linus' tree:

Dropped:

New:

+ 4g4g							Ingo Molnar
+ 4g_zap_low_mappings					Martin Lorenz
+ 4g4g_locked_copy					Dave McCracken
+ vgtod1						John Stultz
(Continue reading)

Ray Bryant | 3 Apr 2004 05:57
Picon
Favicon

Re: [PATCH] HUGETLB memory commitment

Chen, Kenneth W wrote:

> 
> Can we just RIP this whole hugetlb page overcommit?
> 
> - Ken
> 
> 
>

Ken et al,

Perhaps the following patch might be more to your liking.  I'm sorry I haven't
been contributing to this discussion -- I've been off doing this code first
for Altix under 2.4.21 (one's got to eat, after all).  Now I've ported the
changes forward to Linux 2.6.5-rc3 and tested them.  The patch below is
relative to that version of Linux.

A few points to be made about this patch:

(1)  This patch includes "allocate on fault" and "hugetlb memory commit"
changes.  One can argue that this is mixing two changes into a single patch,
but the two changes seem intertwined to me -- one doesn't make sense without
the other.

(2)  I've only done the ia64 version.  I've not yet tackled Andrew's
suggestion that we move the common parts of the arch dependent hugetlbpage.c
up into ./mm.  So, since hugetlbfs_file_mmap() in this patch no longer calls
hugetlb_prefault(), this patch will break hugetlbpage support on architectures
other than ia64 until those architectures are fixed or we move the common code
(Continue reading)

Chen, Kenneth W | 4 Apr 2004 05:31
Picon
Favicon

RE: [PATCH] HUGETLB memory commitment

>>>>> Ray Bryant wrote on Fri, April 02, 2004 7:57 PM
> Chen, Kenneth W wrote:
> >
> > Can we just RIP this whole hugetlb page overcommit?
> >
>
> Ken et al,
>
> Perhaps the following patch might be more to your liking.  I'm
> sorry I haven't been contributing to this discussion -- I've been
> off doing this code first for Altix under 2.4.21 (one's got to eat,
> after all).  Now I've ported the changes forward to Linux 2.6.5-rc3
> and tested them.  The patch below is relative to that version of Linux.

Somehow the patch came through with extra white space at beginning of
each line, but s/^  / / fix that up.

> The hugetlb memory commit code does this with a single global counter:
> htlbzone_reserved, and a per inode reserved page count.  The latter is
> used to decrement the global reserved page count when the inode is
> deleted or the file is truncated.

A simple counter won't work for different file offset mapping.  It has to
be some sort of per-inode, per-block reservation tracking.  I think we are
steering in the right direction though.

> diff -Nru a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> --- a/fs/hugetlbfs/inode.c	Fri Apr  2 19:31:56 2004
> +++ b/fs/hugetlbfs/inode.c	Fri Apr  2 19:31:56 2004
>  <at>  <at>  -59,19 +58,34  <at>  <at> 
(Continue reading)

Ray Bryant | 5 Apr 2004 00:15
Picon
Favicon

Re: [PATCH] HUGETLB memory commitment

Ken,

If you have user space code that tests this that you can send me I'll use them 
to fix up the reservation and quota code to handle this case as well.

Thanks,

Chen, Kenneth W wrote:
>>>
> 
> 
> This assumes all mmap start from the same file offset. IMO, it's not
> generic enough. This code will only reserve 1 page for the following
> case, but actually there are 4 mapping totaling 4 pages:
> 
> mmap 1 page at file offset 0
> mmap 1 page at file offset HPAGE_SIZE,
> mmap 1 page at file offset HPAGE_SIZE*2,
> mmap 1 page at file offset HPAGE_SIZE*3,
> 
> Oh, this code broke file system quota accounting as well.
> 
> - Ken
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
(Continue reading)

Protasevich, Natalie | 5 Apr 2004 18:12
Picon
Favicon

[PATCH] 2.6.5- es7000 subarch update

Hello,

ES7000 was failing to boot since first couple revisions of 2.6. The patch fixes the boot problem. 
In the patch, some maintenance and cleanup was done for es7000 subarch, such as APIC destinations were
corrected, missing initialization for the variable was added, extraneous file was removed, etc.
The patch was created against 2.6.5, compiled  cleanly, and tested on the ES7000 system.

Thanks,
--Natalie
------------------------------------

diff -Naur linux6.5/arch/i386/mach-es7000/Makefile linux-2.6.5/arch/i386/mach-es7000/Makefile
--- linux6.5/arch/i386/mach-es7000/Makefile	2004-04-04 18:22:39.000000000 -0400
+++ linux-2.6.5/arch/i386/mach-es7000/Makefile	2004-04-05 00:07:13.000000000 -0400
 <at>  <at>  -2,4 +2,4  <at>  <at> 
 # Makefile for the linux kernel.
 #

-obj-y		:= setup.o topology.o es7000.o
+obj-y		:= setup.o es7000.o
diff -Naur linux6.5/arch/i386/mach-es7000/es7000.c linux-2.6.5/arch/i386/mach-es7000/es7000.c
--- linux6.5/arch/i386/mach-es7000/es7000.c	2004-04-04 18:22:39.000000000 -0400
+++ linux-2.6.5/arch/i386/mach-es7000/es7000.c	2004-04-05 00:07:13.000000000 -0400
 <at>  <at>  -82,6 +82,7  <at>  <at> 
 			host_addr = val;
 			host = (struct mip_reg *)val;
 			host_reg = __va(host);
+			mip_port = MIP_PORT(mi->mip_info);
 			val = MIP_RD_LO(mi->mip_reg);
 			mip_addr = val;
(Continue reading)

Ray Bryant | 5 Apr 2004 17:26
Picon
Favicon

Re: [Lse-tech] RE: [PATCH] HUGETLB memory commitment

Ken,

Chen, Kenneth W wrote:

> 
> A simple counter won't work for different file offset mapping.  It has to
> be some sort of per-inode, per-block reservation tracking.  I think we are
> steering in the right direction though.
> 
> 
> 

OK, pardon my question about test code, that is trivial enough I guess.

Anyway, the only way I can see to make this work with non-zero offset is to 
hang a list of segment descriptors (offset and size) for each reserved segment 
off of the inode.  Then when a new mapping comes in, we search the segment 
list to see if the new offset and size overlaps with any of the existing 
reserved segments.  If it doesn't, then we make a new reservation (and request 
file system quota) for the current size, and add the current request to the 
reserved segment list.  If it does, and it fits entirely in a previously 
reserved segement, then no change to reservation/quota needs to be made.  If 
it only partially fits, then we need to make a new reservation/quota request 
for the number of new huge pages required and update the overlapping segment's 
length to reflect the new reservation.

Then in truncate_hugepages() we can search the segment list again, discarding 
full or partial segments that occur either entirely or partially beyond 
"lstart", as appropropriate and doing hugetlb_unreserve() and 
hugetlbfs_put_quota() for the appropriate number of pages.
(Continue reading)


Gmane