William Lee Irwin III | 5 Sep 2002 08:22

Re: statm_pgd_range() sucks!

On Thu, Aug 29, 2002 at 07:51:44PM -0700, Andrew Morton wrote:
>> BTW, Rohit's hugetlb patch touches proc_pid_statm(), so a diff on -mm3
>> would be appreciated.

On Wed, Sep 04, 2002 at 08:20:35PM -0700, William Lee Irwin III wrote:
> I lost track of what the TODO's were but this is of relatively minor
> import, and I lagged long enough this is against 2.5.33-mm2:

doh! I dropped a line merging by hand and broke VSZ

on top of the prior one:

diff -u linux-wli/fs/proc/array.c linux-wli/fs/proc/array.c
--- linux-wli/fs/proc/array.c		2002-09-02 23:37:17.000000000 -0700
+++ linux-wli/fs/proc/array.c		2002-09-02 23:37:17.000000000 -0700
 <at>  <at>  -409,6 +409,7  <at>  <at> 
 	resident = mm->rss;
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		int pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+		size += pages;
 		if (is_vm_hugetlb_page(vma)) {
 			if (!(vma->vm_flags & VM_DONTCOPY))
 				shared += pages;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

Andrew Morton | 5 Sep 2002 08:49
Picon
Picon

Re: statm_pgd_range() sucks!

William Lee Irwin III wrote:
> 
> William Lee Irwin III wrote:
> >> I lost track of what the TODO's were but this is of relatively minor
> >> import, and I lagged long enough this is against 2.5.33-mm2:
> 
> On Wed, Sep 04, 2002 at 09:48:07PM -0700, Andrew Morton wrote:
> > Well the TODO was to worry about the (very) incorrect reporting of
> > mapping occupancy.  mmap(1G file), touch one byte of it (or none)
> > and the thing will report 1G?
> 
> I don't know of anything actually meant to report mapping occupancy
> (except full RSS) before or after this patch. Or have I blundered?

statm_pgd_range(pgd, vma->vm_start, vma->vm_end, &pages, &shared, &dirty, &total);
                                                  ^^^^^

`pages' there is the number of actually resident pages, yes?
And it gets fed into trs, drs and lrs.

But converting it to this:

+               int pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
...
+               if (vma->vm_flags & VM_SHARED)
+                       shared += pages;

Will mean that `shared' can be vastly overestimated.   I think??

> On Wed, Sep 04, 2002 at 09:48:07PM -0700, Andrew Morton wrote:
(Continue reading)

Andrew Morton | 5 Sep 2002 09:20
Picon
Picon

MAP_SHARED handling

One thing bugs me a little bit.

A program has a huge MAP_SHARED segment and dirties it.  The VM
walks the LRU, propagating the pte dirtiness into the pageframe
and *immediately* writes the page out:

	switch (try_to_unmap(page))
	case SWAP_SUCCESS:
		break;
	}

	if (PageDirty(page))
		vm_writeback(page->mapping);

This has a few small irritations.

- We'll be calling ->vm_writeback() once per page, and it'll only
  discover a single dirty page on swapper_space.dirty_pages.

  This is a little CPU-inefficient.  Be nicer to build up a few
  dirty pages on swapper_space before launching vm_writeback
  against it.

- My dirty page accounting tells lies.  In /proc/meminfo, `Dirty'
  is just a few tens of kilobytes, and `Writeback' is a meg or two.

  But in reality, there are a huge number of dirty pages - we just
  don't know about them yet.

  And there's some benefit in making `Dirty' more accurate, because
(Continue reading)

William Lee Irwin III | 5 Sep 2002 09:07

Re: statm_pgd_range() sucks!

William Lee Irwin III wrote:
>> I don't know of anything actually meant to report mapping occupancy
>> (except full RSS) before or after this patch. Or have I blundered?

On Wed, Sep 04, 2002 at 11:49:21PM -0700, Andrew Morton wrote:
> statm_pgd_range(pgd, vma->vm_start, vma->vm_end, &pages, &shared, &dirty, &total);
>                                                   ^^^^^
> `pages' there is the number of actually resident pages, yes?
> And it gets fed into trs, drs and lrs.
> But converting it to this:
+               int pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
...
+               if (vma->vm_flags & VM_SHARED)
+                       shared += pages;
> Will mean that `shared' can be vastly overestimated.   I think??

Shared as wildly guessing from the implementation can only be accurately
estimated in one of two ways"

(1) maintaining RSS counters in the mm's updated on PG_direct split/coalesce
(2) walking the pagetables and adding up things with PG_direct clear

... both are too computationally expensive, so I deliberately changed
the semantics to "amount of mem mapped as MAP_SHARED".

Prior to this it was pure garbage because it checked page->count > 1.

William Lee Irwin III wrote:
>> Hmm, that could get hairy depending on how we want them grouped. It
>> might be better just to maintain RSS counters for the kinds of mappings
(Continue reading)

Andrew Morton | 5 Sep 2002 10:22
Picon
Picon

2.5.33-mm3

+filemap-integration.patch

  Cleanup and code consolidation for readv and writev: generic_file_read()
  and generic_file_write() take an iovec, and tons of code goes away.

  A work in progress.

+direct-io-alignment.patch

  Allow finer-than-fs-blocksize alignment for O_DIRECT.

  A work in progress, which has a bit of a correctness problem, actually.

+mmap-fixes.patch

  Some cleanups and fixes from Christoph.

Also quite a lot of fiddling with the new non-blocking page reclaim
code.  This works well.

linus.patch
  cset-1.575-to-1.600.txt.gz

scsi_hack.patch
  Fix block-highmem for scsi

ext3-htree.patch
  Indexed directories for ext3

zone-pages-reporting.patch
(Continue reading)

Rik van Riel | 5 Sep 2002 14:35
Picon
Favicon

Re: MAP_SHARED handling

On Thu, 5 Sep 2002, Andrew Morton wrote:

> One thing bugs me a little bit.

> - We'll be calling ->vm_writeback() once per page, and it'll only
>   discover a single dirty page on swapper_space.dirty_pages.

> So....  Could we do something like: if the try_to_unmap() call turned
> the page from !PageDirty to PageDirty, give it another go around the
> list?

FreeBSD is doing this and seems to be getting good results
with it, so I guess it'll improve our VM too ;)

Rik
--

-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

Andrew Morton | 5 Sep 2002 07:43
Picon
Picon

Re: nonblocking-vm.patch

For the record...

One thing we could do is to make the heavy write()r perform
blocking writeback in the page allocator:

generic_file_write()
{
	current->bdi = mapping->backing_dev_info;
	...
	current->bdi = NULL;
}

shrink_list()
{
	...
	if (PageDirty(page) && mapping->backing_dev_info == current->bdi)
		writeback(page->mapping);
	...
}

So when that writer allocates a page, he gets to clean up
his own mess, rather than scanning past those pages.

We have to write back just that queue; otherwise we get back to
the situation where one queue enters congested and that blocks the
whole world.

It's just an idea to bear in mind - balance_dirty_pages() is
supposed to be the place where this happens, but the above would
perhaps mop up some mmapped dirty memory, stray dirty pages which
(Continue reading)

Daniel Phillips | 5 Sep 2002 18:52
Picon
Favicon

Re: MAP_SHARED handling

On Thursday 05 September 2002 09:20, Andrew Morton wrote:
> One thing bugs me a little bit.
> 
> A program has a huge MAP_SHARED segment and dirties it.  The VM
> walks the LRU, propagating the pte dirtiness into the pageframe
> and *immediately* writes the page out:
> 
> 	switch (try_to_unmap(page))
> 	case SWAP_SUCCESS:
> 		break;
> 	}
> 
> 	if (PageDirty(page))
> 		vm_writeback(page->mapping);
> 
> This has a few small irritations.
> 
> - We'll be calling ->vm_writeback() once per page, and it'll only
>   discover a single dirty page on swapper_space.dirty_pages.
> 
>   This is a little CPU-inefficient.  Be nicer to build up a few
>   dirty pages on swapper_space before launching vm_writeback
>   against it.
> 
> - My dirty page accounting tells lies.  In /proc/meminfo, `Dirty'
>   is just a few tens of kilobytes, and `Writeback' is a meg or two.
> 
>   But in reality, there are a huge number of dirty pages - we just
>   don't know about them yet.
> 
(Continue reading)

Steven Cole | 5 Sep 2002 19:23

2.5.33-mm3 dbench hang and 2.5.33 page allocation failures

I booted 2.5.33-mm3 and ran dbench with increasing
numbers of clients: 1,2,3,4,6,8,10,12,16,etc. while
running vmstat -n 1 600 from another terminal.

After about 3 minutes, the output from vmstat stopped,
and the dbench 16 output stopped.  The machine would
respond to pings, but not to anything else. I had to 
hard-reset the box. Nothing interesting was saved in 
/var/log/messages. I have the output from vmstat if needed.

The test box is dual p3, 1GB, scsi, ext3 fs.
Kernels are SMP,_HIGHMEM4G, no PREEMPT, no HIGHPTE. 

Earlier this morning, I ran 2.5.33 and the dbench test and got many
page allocation failure messages before I terminated the test.

Steven

Sep  5 07:20:01 spc5 kernel: dbench: page allocation failure. order:0, mode:0x50
Sep  5 07:28:32 spc5 kernel: dbench: page allocation failure. order:0, mode:0x50
Sep  5 07:37:46 spc5 last message repeated 2 times
Sep  5 07:37:47 spc5 last message repeated 9 times
Sep  5 07:37:47 spc5 kernel: klogd: page allocation failure. order:0, mode:0x50
Sep  5 07:37:56 spc5 last message repeated 23 times
Sep  5 07:37:56 spc5 kernel: dbench: page allocation failure. order:0, mode:0x50
Sep  5 07:38:00 spc5 last message repeated 17 times
Sep  5 07:38:00 spc5 kernel: dbench: page allocation failure. order:0, mode:0x20
Sep  5 07:38:00 spc5 kernel: dbench: page allocation failure. order:0, mode:0x50
Sep  5 07:38:06 spc5 last message repeated 22 times
Sep  5 07:41:01 spc5 kernel: kjournald: page allocation failure. order:0, mode:0x0
(Continue reading)

Andrew Morton | 5 Sep 2002 19:47
Picon
Picon

Re: MAP_SHARED handling

Daniel Phillips wrote:
> 
> ...
> 
> Why not just ensure the page is scheduled for writing, sometime,
> we don't care exactly when as long as it's relatively soon.  Just bump
> the page's mapping to the hot end of your writeout list and let things
> take their course.

Good point.  Marking the pages dirty and not starting IO on them
exposes them to pdflush.  Chances are, by the time those pages
come around again, they'll all be under writeback or clean.

And, umm, yes.  If a pass across all the zones in the classzone
doesn't free enough stuff, we run wakeup_bdflush() and then
take an up-to-quarter-second nap.  So pdflush will immediately
start working on all those pages which we just marked dirty.
It looks about right.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/


Gmane