Paul E. McKenney | 1 Jun 01:48 2003
Picon

Re: Always passing mm and vma down (was: [RFC][PATCH] Convert do_no_page() to a hook to avoid DFS race)

On Sat, May 31, 2003 at 10:46:18AM +0200, Ingo Oeser wrote:
> On Fri, May 30, 2003 at 04:41:50PM -0700, Paul E. McKenney wrote:
> > -struct page *
> > -ia32_install_shared_page (struct vm_area_struct *vma, unsigned long address, int no_share)
> > +int
> > +ia32_install_shared_page (struct mm_struct *mm, struct vm_area_struct *vma, unsigned long
address, int write_access, pmd_t *pmd)
> 
> Why do we always pass mm and vma down, even if vma->vm_mm
> contains the mm, where the vma belongs to? Is the connection
> between a vma and its mm also protected by the mmap_sem?
> 
> Is this really necessary or an oversight and we waste a lot of
> stack in a lot of places?
> 
> If we just need it for accounting: We need current->mm, if we
> need it to locate the next vma relatively to this vma, vma->vm_mm
> is the one.

Interesting point.  The original do_no_page() API does this
as well:

	static int
	do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
		   unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd)

As does do_anonymous_page().  I assumed that there were corner
cases where this one-to-one correspondence did not exist, but
must confess that I did not go looking for them.

(Continue reading)

Paul E. McKenney | 1 Jun 01:51 2003
Picon

Re: [RFC][PATCH] Convert do_no_page() to a hook to avoid DFS race

On Fri, May 30, 2003 at 06:00:27PM -0700, Andrew Morton wrote:
> "Paul E. McKenney" <paulmck <at> us.ibm.com> wrote:
> > There
> > is still an inlined do_no_page() wrapper due to the fact that
> > do_anonymous_page() requires that the mm->page_table_lock be
> > held on entry, while the ->nopage callouts require that this
> > lock be dropped.
> 
> I sugest you change the ->nopage definition so that page_table_lock is held
> on entry to ->nopage, and ->nopage must drop it at some point.  This gives
> the nopage implementations some more flexibility and may perhaps eliminate
> that special case?

Will do!

> > This patch is untested.
> 
> I don't think there's a lot of point in making changes until the code which
> requires those changes is accepted into the tree.  Otherwise it may be
> pointless churn, and there's nothing in-tree to exercise the new features.

A GPLed use of these DFS features is expected Real Soon Now...

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart <at> kvack.org"> aart <at> kvack.org </a>

(Continue reading)

Helge Hafting | 1 Jun 14:06 2003
Picon
Picon

Re: 2.5.66-mm3 and raid still oopses, but later than mm1/mm2

2.5.70-mm3 with raid1 has improved to some extent,
the RAID crash now happens somewhat later.

I got a lot of kernel errors during RAID initialization,
with normal boot messages inbetween.  Nothing made it to
the logs.  I eventually got this:

Kernel BUG at mm/slab.c:1682
invalid operand 0000 [#1]
PREEMPTSMP
CPU:0
EIP at free_block+0x276/0x350
process fsck

Call trace:
drain_array
reap_timer_fnc
reap_timer_fnc
run_timer_softirq
do_softirq
smp_apic_timer
apic_timer_interrupt

<0> KErnel panic exception in interrupt
in interrupt - not syncing
reboot in 300 seconds

This is 2.5.70-mm3, with a patch that makes
matroxfb work so I could see the entire oops.

(Continue reading)

Andrea Arcangeli | 1 Jun 14:22 2003
Picon

Re: Always passing mm and vma down (was: [RFC][PATCH] Convert do_no_page() to a hook to avoid DFS race)

Hi Paul,

On Sat, May 31, 2003 at 04:48:16PM -0700, Paul E. McKenney wrote:
> On Sat, May 31, 2003 at 10:46:18AM +0200, Ingo Oeser wrote:
> > On Fri, May 30, 2003 at 04:41:50PM -0700, Paul E. McKenney wrote:
> > > -struct page *
> > > -ia32_install_shared_page (struct vm_area_struct *vma, unsigned long address, int no_share)
> > > +int
> > > +ia32_install_shared_page (struct mm_struct *mm, struct vm_area_struct *vma, unsigned long
address, int write_access, pmd_t *pmd)
> > 
> > Why do we always pass mm and vma down, even if vma->vm_mm
> > contains the mm, where the vma belongs to? Is the connection
> > between a vma and its mm also protected by the mmap_sem?
> > 
> > Is this really necessary or an oversight and we waste a lot of
> > stack in a lot of places?
> > 
> > If we just need it for accounting: We need current->mm, if we
> > need it to locate the next vma relatively to this vma, vma->vm_mm
> > is the one.
> 
> Interesting point.  The original do_no_page() API does this
> as well:
> 
> 	static int
> 	do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
> 		   unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd)
> 
> As does do_anonymous_page().  I assumed that there were corner
(Continue reading)

Ingo Oeser | 1 Jun 16:33 2003
Picon

Re: build problems on architectures where FIXADDR_* stuff is not constant

Hi Andrew,

I finally got around to handle this issue.

On Tue, May 13, 2003 at 06:12:27PM -0700, Andrew Morton wrote:
> Ingo Oeser <ingo.oeser <at> informatik.tu-chemnitz.de> wrote:
> >
> > What, if you fold a part of my page_walker code (the special case
> >  for one page actually) into the baseline?
> > 
> >  This will resolve the issue and should remove the vma argument
> >  from get_user_pages().
> > 
> >  It will also speed up that path a little.
> > 
> >  What do you think?
> 
> I'd be interested in seeing a diff.  Especially if it
> digs that crap out of get_user_pages().

Yes, but the change would be quite big:
   - follow_hugetlb_pages() is not needed anymore, since
     follow_page() can handle them and every user of
     get_user_pages() needing to get a vma back is just needing
     exactly one page.

   Required users of a possible get_single_user_page():  
      fs/binfmt_elf.c:elf_core_dump()
      kernel/ptrace.c:access_process_vm()

(Continue reading)

Ingo Oeser | 1 Jun 14:34 2003
Picon

Re: 2.5.70-bk4+: oops by mc -v /proc/bus/pci/00/00.0

Hi Andrew,

On Sat, May 31, 2003 at 07:54:14PM -0700, Andrew Morton wrote:
> It's pretty lame.  Really we need a proper vma constructor
> somewhere.

you mean sth. like this? (Just initialized the members, that I had useful
defaults for.)

--- linux-2.5.70/kernel/fork.c	Sun Jun  1 13:08:54 2003
+++ linux-2.5.70/kernel/fork.c	Sun Jun  1 14:26:19 2003
 <at>  <at>  -1147,6 +1147,35  <at>  <at> 
 /* SLAB cache for mm_struct structures (tsk->mm) */
 kmem_cache_t *mm_cachep;

+/* SLAB constructor for vm_area_struct objects */
+static void init_vm_area_struct(void *at, kmem_cache_t * dummy, 
+		unsigned long flags) 
+{
+	        struct vm_area_struct *t = at;
+
+		if (SLAB_CTOR_CONSTRUCTOR != 
+			(flags & (SLAB_CTOR_CONSTRUCTOR | SLAB_CTOR_VERIFY) ))
+			return;
+
+		/* these are NOT initialized, because they must be intialized
+		 * by the caller of kmem_cache_alloc():
+
+			t->vm_mm
+			t->vm_start
(Continue reading)

Paul E. McKenney | 1 Jun 21:33 2003
Picon

Re: [RFC][PATCH] Convert do_no_page() to a hook to avoid DFS race

On Fri, May 30, 2003 at 06:00:27PM -0700, Andrew Morton wrote:
> "Paul E. McKenney" <paulmck <at> us.ibm.com> wrote:
> > An alternative to this patch includes the nopagedone() patch posted
> > moments ago.  hch has also suggested that do_anonymous_page() be
> > converted to a ->nopage callout, but this would require that all
> > of the other ->nopage callouts drop mm->page_table_lock as their
> > first action.  If people believe that this is the right thing to
> > do, I will happily produce such a patch.
> 
> That sounds better to me.

Here is a patch, compiled on i386, untested.  I had to put the
pte_unmap() as well as the spin_unlock() into each ->nopage
function.  I would guess that this might be more attractive
if combined with a fix for the pagefault-truncate() race.  ;-)

					Thanx, Paul

diff -urN -X dontdiff linux-2.5.70-mm3/arch/i386/mm/hugetlbpage.c linux-2.5.70-mm3.install_new_page_hch/arch/i386/mm/hugetlbpage.c
--- linux-2.5.70-mm3/arch/i386/mm/hugetlbpage.c	2003-05-26 18:00:58.000000000 -0700
+++ linux-2.5.70-mm3.install_new_page_hch/arch/i386/mm/hugetlbpage.c	2003-06-01
10:43:06.000000000 -0700
 <at>  <at>  -487,11 +487,13  <at>  <at> 
  * hugegpage VMA.  do_page_fault() is supposed to trap this, so BUG is we get
  * this far.
  */
-static struct page *hugetlb_nopage(struct vm_area_struct *vma,
-				unsigned long address, int unused)
+static int hugetlb_nopage(struct mm_struct *mm, struct vm_area_struct *vma,
+			  unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd)
(Continue reading)

Andrew Morton | 1 Jun 21:58 2003

Re: 2.5.70-bk4+: oops by mc -v /proc/bus/pci/00/00.0

Ingo Oeser <ingo.oeser <at> informatik.tu-chemnitz.de> wrote:
>
> Hi Andrew,
> 
> On Sat, May 31, 2003 at 07:54:14PM -0700, Andrew Morton wrote:
> > It's pretty lame.  Really we need a proper vma constructor
> > somewhere.
> 
> you mean sth. like this? (Just initialized the members, that I had useful
> defaults for.)
> 
> ...
>  	vm_area_cachep = kmem_cache_create("vm_area_struct",
>  			sizeof(struct vm_area_struct), 0,
> -			0, NULL, NULL);
> +			0, init_vm_area_struct, NULL);
>  	if(!vm_area_cachep)
>  		panic("vma_init: Cannot alloc vm_area_struct SLAB cache");
>  

Well not really.  Yes, a slab-based ctor would be nice, but it requires that
all objects be kfreed in a "constructed" state.  So a full audit/fixup of
all users is needed.

For now I was thinking more along the lines of

struct vma_struct alloc_vma(gfp_flags)
{
	vma = kmem_cache_alloc();
	memset(vma);
(Continue reading)

Paul E. McKenney | 1 Jun 22:00 2003
Picon

Re: Always passing mm and vma down (was: [RFC][PATCH] Convert do_no_page() to a hook to avoid DFS race)

On Sun, Jun 01, 2003 at 02:22:00PM +0200, Andrea Arcangeli wrote:
> Hi Paul,

Hello, Andrea!

> On Sat, May 31, 2003 at 04:48:16PM -0700, Paul E. McKenney wrote:
[ . . . ]
> > Interesting point.  The original do_no_page() API does this
> > as well:
> > 
> > 	static int
> > 	do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > 		   unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd)
> > 
> > As does do_anonymous_page().  I assumed that there were corner
> > cases where this one-to-one correspondence did not exist, but
> > must confess that I did not go looking for them.
> > 
> > Or is this a performance issue, avoiding a dereference and
> > possible cache miss?
> 
> it's a performance microoptimization issue (the stack memory overhead is
> not significant), each new parameter to those calls will slowdown the
> kernel, especially with the x86 calling convetions it will hurt more
> than with other archs.  Jeff did something similar for UML (dunno if he
> also can use vma->vm_mm, anyways I #ifdeffed it out enterely for non UML
> compiles for exact this reason that I can't slowdown the whole
> production host kernel just for the uml compile) then there's also the
> additional extern call. So basically the patch would hurt until there
> are active users.
(Continue reading)

William Lee Irwin III | 1 Jun 23:53 2003

Re: 2.5.70-bk4+: oops by mc -v /proc/bus/pci/00/00.0

On Sun, Jun 01, 2003 at 12:58:09PM -0700, Andrew Morton wrote:
> Well not really.  Yes, a slab-based ctor would be nice, but it requires that
> all objects be kfreed in a "constructed" state.  So a full audit/fixup of
> all users is needed.
> For now I was thinking more along the lines of
> struct vma_struct alloc_vma(gfp_flags)
> {
> 	vma = kmem_cache_alloc();
> 	memset(vma);
> 	return vma;
> }
> And then deleting tons of open-coded init stuff elsewhere...

I'll add vma ctor bits to my TODO list, behind numerous other things..

-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart <at> kvack.org"> aart <at> kvack.org </a>


Gmane