Thibaut VARENE | 1 May 20:29 2010

Re: Patch for segfaults in minifail tests

On Fri, Apr 30, 2010 at 8:21 PM, James Bottomley
<James.Bottomley <at> hansenpartnership.com> wrote:
> T-bone is keeping a web page with all the failing tests on now:
>
> http://wiki.parisc-linux.org/TestCases

About this: I've had to blaze through hundreds of emails to get this
data together. TBH it's been quite hard to figure which testcase
related to which bug and which patch. I'm not absolutely sure that my
segmentation is right (all the futex/fork/vfork/etc threads seem to
deal with the same kind of problems). I'm also not entirely certain
that I haven't listed bugs that are now fixed. I may also have
overlooked mails containing non-obvious testcases.

What I'm trying to say is that this page is useless without your
feedback. I'd especially need people to, short of updating it
themselves (^-^), tell me what can be added/removed/improved. Given a
Message-ID, I can do the dirty work ;-)

Comments are welcome, I hope this will be anyhow helpful.

T-Bone

--

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

Thibaut VARENE | 1 May 20:34 2010

Re: threads and fork on machine with VIPT-WB cache

On Mon, Apr 19, 2010 at 6:26 PM, John David Anglin
<dave <at> hiauly1.hia.nrc.ca> wrote:
> Hi Helge,
>
> On Tue, 13 Apr 2010, Helge Deller wrote:
>
>> Still crashes.
>
> Can you you try the patch below?  The change to cacheflush.h is the same
> as before.

For the records, while setting up the wiki's TestCases page, it
noticed that the initial large patch that you sent (see
https://patchwork.kernel.org/patch/91525/ ) contained bits that
weren't part of the split chunks you sent afterwards.

This patch (pte.d.2) seems to update some of those chunks and also
contains bits that weren't either part of them.

That being said so that we do not loose track of potentially useful
code. Though maybe kyle has all of this sorted out already and I'm
just unable to figure it out myself ;-)

HTH

T-Bone

--

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
(Continue reading)

John David Anglin | 1 May 22:17 2010
Picon

Re: threads and fork on machine with VIPT-WB cache

> On Mon, Apr 19, 2010 at 6:26 PM, John David Anglin
> <dave <at> hiauly1.hia.nrc.ca> wrote:
> > Hi Helge,
> >
> > On Tue, 13 Apr 2010, Helge Deller wrote:
> >
> >> Still crashes.
> >
> > Can you you try the patch below? =C2=A0The change to cacheflush.h is the =
> same
> > as before.
> 
> For the records, while setting up the wiki's TestCases page, it
> noticed that the initial large patch that you sent (see
> https://patchwork.kernel.org/patch/91525/ ) contained bits that
> weren't part of the split chunks you sent afterwards.
> 
> This patch (pte.d.2) seems to update some of those chunks and also
> contains bits that weren't either part of them.

The split chunks were mainly cleanups.  As far as I know, they are
obvious and provide no significant change in functionality.  I didn't
intentionally change any of the split hunks in patch4 (pte.d.2) although
this patch does touch some of the same files.  Possibly, the LWS fixes
should be split into two (obvious and UP locking).

Both the original patch and pte.d.2 were experimental.  Since I sent it,
I continued to experiment and reached a change that appears to fix the
minifail bug in a somewhat different manner than proposed by James.  However,
I'm still seeing some issues that appear to be PTE related (segmentation
(Continue reading)

Helge Deller | 2 May 00:25 2010
Picon
Picon

Re: Patch for segfaults in minifail tests

On 04/30/2010 08:21 PM, James Bottomley wrote:
> T-bone is keeping a web page with all the failing tests on now:
> 
> http://wiki.parisc-linux.org/TestCases
> 
> The patch below is what I've found fixes the minifail6 case, but it
> doesn't seem to fully fix the minifail one (although the frequency goes
> down).
> 
> It's the essential patch we need to fix up our kmapping.  Right at the
> moment kmap of a page with a dirty cache line in userspace sees stale
> data and kunmap of a page the kernel has modified will see likewise.

Hi James,

I tried your patch on top of a 2.6.33.2 kernel (SMP, 32bit, PA8500 (PCX-W) CPU).
I still do see all the page faults as before. They even seem to trigger 
faster than with a few of Dave's patches.

I usually run this command
	i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail_dave; done;
in a few screen sessions in parallel.

Thibeaut: I've attached the testcase (I called it minifail_dave, since it
was changed by Dave). Maybe you can attach it to your website if it's not
yet there...?

Helge
-----------------

(Continue reading)

John David Anglin | 2 May 01:13 2010
Picon

Re: Patch for segfaults in minifail tests

Hi Helge,

> I tried your patch on top of a 2.6.33.2 kernel (SMP, 32bit, PA8500 (PCX-W) CPU).
> I still do see all the page faults as before. They even seem to trigger 
> faster than with a few of Dave's patches.
> 
> I usually run this command
> 	i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail_dave; done;
> in a few screen sessions in parallel.

The reason running multiple screens in parallel exposes further problems
is the implementation of ptep_set_wrprotect is broken.  It simply sets the
write protect bit in the pte and doesn't purge the existing translation.
So, the parent continues to merrily write to the write protected page until
the TLB entry is purged and reloaded.  More processes make it more likely the
entry will be replaced and trigger a COW break.

This is why my versions of the minifail test which monitor the stack region
used by the thread don't cause a COW break immediately after the fork.  When
compiled at -O0, the loop index is constantly being stored to the stack.

Dave
--

-- 
J. David Anglin                                  dave.anglin <at> nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

James Bottomley | 2 May 01:39 2010

Re: Patch for segfaults in minifail tests

On Sat, 2010-05-01 at 19:13 -0400, John David Anglin wrote:
> Hi Helge,
> 
> > I tried your patch on top of a 2.6.33.2 kernel (SMP, 32bit, PA8500 (PCX-W) CPU).
> > I still do see all the page faults as before. They even seem to trigger 
> > faster than with a few of Dave's patches.
> > 
> > I usually run this command
> > 	i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail_dave; done;
> > in a few screen sessions in parallel.
> 
> The reason running multiple screens in parallel exposes further problems
> is the implementation of ptep_set_wrprotect is broken.  It simply sets the
> write protect bit in the pte and doesn't purge the existing translation.
> So, the parent continues to merrily write to the write protected page until
> the TLB entry is purged and reloaded.  More processes make it more likely the
> entry will be replaced and trigger a COW break.
> 
> This is why my versions of the minifail test which monitor the stack region
> used by the thread don't cause a COW break immediately after the fork.  When
> compiled at -O0, the loop index is constantly being stored to the stack.

Actually, no, this explanation isn't correct.

The way linux works.  You can see roughly how this works in
copy_page_range() where we prepare the COW.  If it's going to be a COW
range, we call mmu notifiers before and after the pte settings.  The
after mmu notifier is supposed to flush the TLB.   Linux always does
memory operations in the form

(Continue reading)

Thibaut VARÈNE | 2 May 01:40 2010

Re: Patch for segfaults in minifail tests

Le 2 mai 10 à 00:25, Helge Deller a écrit :

> Thibeaut: I've attached the testcase (I called it minifail_dave,  
> since it
> was changed by Dave). Maybe you can attach it to your website if  
> it's not
> yet there...?

Err, please, you're not making this easy for me.

I've put on the website all the testcases I found in the emails, so if  
dave already posted it, chances are it's already there.

Could you please check the page and see if the testcase you're using  
is there? I tried to keep it simple by:
a) keeping the original name (whenever available)
b) linking to the corresponding message

If I'm the only one actually using that webpage, it's totally  
pointless...

HTH

--

-- 
Thibaut VARÈNE
http://www.parisc-linux.org/~varenet/

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
(Continue reading)

Helge Deller | 2 May 10:19 2010
Picon
Picon

Re: Patch for segfaults in minifail tests

On 05/02/2010 01:40 AM, Thibaut VARÈNE wrote:
> Le 2 mai 10 à 00:25, Helge Deller a écrit :
> 
>> Thibeaut: I've attached the testcase (I called it minifail_dave, since it
>> was changed by Dave). Maybe you can attach it to your website if it's not
>> yet there...?
> 
> 
> Err, please, you're not making this easy for me.
> 
> I've put on the website all the testcases I found in the emails, so if
> dave already posted it, chances are it's already there.
> 
> Could you please check the page and see if the testcase you're using is
> there? I tried to keep it simple by:
> a) keeping the original name (whenever available)
> b) linking to the corresponding message

Sorry Thibaut,

Thanks so much for putting all this stuff into the Wiki!
It's very useful to everybody, and of course I did already yesterday looked 
if this minifail test program was in the list. I didn't found it.

So, it would be great if you could add it.

In principle you could replace the minifail3 testcase by this one.
And, I think this should be the major testcase which should work in the
end, since it's the one which doesn't has any tweaks. It should just
run without segfaults out of the box if we fixed the problem.
(Continue reading)

Thibaut VARÈNE | 2 May 12:53 2010

Re: threads and fork on machine with VIPT-WB cache

Le 1 mai 10 à 22:17, John David Anglin a écrit :
>
> Regarding the wiki, it's a useful summary.  However, #561203 (minifail
> bug) is not a "Futex wait failure".  We may have futex bugs, but I'm  
> not
> aware of a testcase.  The minifail bug is a "Threads and fork" problem
> arising from cache corruption.  Mainly, copy_user_page is broken when
> copying memory shared by more than one process.  There are also issues
> in PTE/TLB management on SMP systems.  Probably, the vfork/execve bug
> is caused by the same problem.

Many thanks for the feedback. The reason why I initially put the  
minifail bug under "Futex wait failure" was because I found it  
discussed under such a thread ;-)

I've merged this section under "Threads & fork", and have quoted your  
summary at the top of the section.

HTH

T-Bone

--

-- 
Thibaut VARÈNE
http://www.parisc-linux.org/~varenet/

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

Thibaut VARÈNE | 2 May 12:59 2010

Re: Patch for segfaults in minifail tests

Le 2 mai 10 à 10:19, Helge Deller a écrit :

> Sorry Thibaut,
>
> Thanks so much for putting all this stuff into the Wiki!
> It's very useful to everybody, and of course I did already yesterday  
> looked
> if this minifail test program was in the list. I didn't found it.

Ah ok, sorry I didn't understand that.

> So, it would be great if you could add it.

Done.

> In principle you could replace the minifail3 testcase by this one.

I'll keep it for archiving purpose, it's just a line on the page after  
all ;)

> And, I think this should be the major testcase which should work in  
> the
> end, since it's the one which doesn't has any tweaks. It should just
> run without segfaults out of the box if we fixed the problem.

I've marked it as "Final" until proved wrong.

HTH

--

-- 
(Continue reading)


Gmane