John David Anglin | 1 Jan 2012 01:02

Happy New Year PARISC

Hi all,

After a lot of trial and error, I believe that I have resolved the  
random segmentation faults
on SMP PA8800 and PA8900 systems.  This includes the libgomp hpmc's  
which turned
out to be caused by a non-equivalent alias mapping in libattr (I  
hadn't rebuilt the attr
package because there is a build issue with the current version in  
unstable).  This was
hard to find!

I have started working on setting up magnum for buildd.

Happy New Year,
Dave
--
John David Anglin	dave.anglin <at> bell.net

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Grant Grundler | 2 Jan 2012 07:23
Picon

Re: Happy New Year PARISC

On Sat, Dec 31, 2011 at 4:02 PM, John David Anglin <dave.anglin <at> bell.net> wrote:
> Hi all,
>
> After a lot of trial and error, I believe that I have resolved the random
> segmentation faults
> on SMP PA8800 and PA8900 systems.  This includes the libgomp hpmc's which
> turned
> out to be caused by a non-equivalent alias mapping in libattr (I hadn't
> rebuilt the attr
> package because there is a build issue with the current version in
> unstable).  This was
> hard to find!

Wow! Happy New Year!

Well Done!
That must have been incredibly hard to find. It's a bummer the only
way we can find those sorts of things is by unraveling crash dumps. :(

> I have started working on setting up magnum for buildd.

If I can download a bunch of .debs, I'd be happy to install and run
some tests on my j6k.

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

John David Anglin | 2 Jan 2012 16:12

Re: Happy New Year PARISC

On 2-Jan-12, at 1:23 AM, Grant Grundler wrote:

> Well Done!
> That must have been incredibly hard to find. It's a bummer the only
> way we can find those sorts of things is by unraveling crash dumps. :(
>

Sadly, the crash dumps were never very helpful.  The hpmc's arising
from cache corruption were usually significantly deferred.  Often, the
crashes would occur with the processors in the idle loop!  Probably,
HP had a JTAG box or something similar to analyze cache problems.

We have known for some time that non-equivalent aliases are not
supported on PA8800 and PA8900 processors.  I patched binutils
several months ago to fix this problem.  However, the entire runtime
needs to be recompiled to eliminate the problem.

The kernel routine that causes the hpmc's is flush_cache_range.
I wrote a modified version that can check the page mapping, but it
is very slow.  This is a very performance critical function.

Another source of instability was TLB purges.  We put lock/unlock
sequences around all the purges in the C code, but somehow we forgot
to do the same in the assembly code in pacache.S.

I believe that I fixed the COW/minifail bug yesterday.  I now have
copy_user_page doing copies via the temp-alias region.  We also
forgot to purge the TLB entries when we write protected the page
table for COW.  As a result, multithreaded applications could continue
to dirty a page after it was nominally write protected.
(Continue reading)

John David Anglin | 3 Jan 2012 00:12

Re: Happy New Year PARISC

On 2-Jan-12, at 10:12 AM, John David Anglin wrote:

> I'm going to work on the kernel patch some more today.  Hopefully,
> it will then be ready for testing on other machines.

None of this worked.  Attached patch as it stands.  Comments and  
testing appreciated.

Regards,
Dave
--
John David Anglin	dave.anglin <at> bell.net

diff --git a/arch/parisc/hpux/wrappers.S b/arch/parisc/hpux/wrappers.S
index 58c53c8..bdcea33 100644
--- a/arch/parisc/hpux/wrappers.S
+++ b/arch/parisc/hpux/wrappers.S
 <at>  <at>  -88,7 +88,7  <at>  <at>  ENTRY(hpux_fork_wrapper)

 	STREG	%r2,-20(%r30)
 	ldo	64(%r30),%r30
-	STREG	%r2,PT_GR19(%r1)	;! save for child
+	STREG	%r2,PT_SYSCALL_RP(%r1)	;! save for child
 	STREG	%r30,PT_GR21(%r1)	;! save for child

 	LDREG	PT_GR30(%r1),%r25
 <at>  <at>  -132,7 +132,7  <at>  <at>  ENTRY(hpux_child_return)
 	bl,n	schedule_tail, %r2
(Continue reading)

Carlos O'Donell | 3 Jan 2012 12:50
Gravatar

Re: Happy New Year PARISC

On Mon, Jan 2, 2012 at 6:12 PM, John David Anglin <dave.anglin <at> bell.net> wrote:
> None of this worked.  Attached patch as it stands.  Comments and testing
> appreciated.

Could you clarify what you mean by "none of this worked?"

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

John David Anglin | 3 Jan 2012 16:13

Re: Happy New Year PARISC

On 1/3/2012 6:50 AM, Carlos O'Donell wrote:
> On Mon, Jan 2, 2012 at 6:12 PM, John David Anglin<dave.anglin <at> bell.net>  wrote:
>> None of this worked.  Attached patch as it stands.  Comments and testing
>> appreciated.
> Could you clarify what you mean by "none of this worked?"
>

I tried eliminating the flushes that occur in kunmap_atomic on PA8800 
and PA8900
after the calls to clear_user_page and copy_user_page by defining 
clear_user_highpage
and copy_user_highpage.  I had thought the flushes weren't necessary.  
There's something
about this that I don't understand.  Why do we need to flush 
non-equivalent page mappings
that aren't used?

Also tried:
#define flush_cache_dup_mm(mm)        do { } while (0)

In both cases, init died causing a panic at boot.  Maybe there's 
something missing at
startup.

Most arch's have the above define for flush_cache_dup_mm.  Our define 
really hurts
fork performance.  The GCC testsuite takes almost twice as long to run 
on linux as hpux.
On the other hand, build times are fairly comparable.

(Continue reading)

James Bottomley | 3 Jan 2012 16:32

Re: Happy New Year PARISC

On Tue, 2012-01-03 at 10:13 -0500, John David Anglin wrote:
> On 1/3/2012 6:50 AM, Carlos O'Donell wrote:
> > On Mon, Jan 2, 2012 at 6:12 PM, John David Anglin<dave.anglin <at> bell.net>  wrote:
> >> None of this worked.  Attached patch as it stands.  Comments and testing
> >> appreciated.
> > Could you clarify what you mean by "none of this worked?"
> >
> 
> I tried eliminating the flushes that occur in kunmap_atomic on PA8800 
> and PA8900
> after the calls to clear_user_page and copy_user_page by defining 
> clear_user_highpage
> and copy_user_highpage.  I had thought the flushes weren't necessary.  
> There's something
> about this that I don't understand.  Why do we need to flush 
> non-equivalent page mappings
> that aren't used?

But they are used:  Your work makes sure that all user space mappings
are equivalent.  However, because of the way Linux sets up kernel
mappings (from the pfn array and offsets) the user virtual address and
kernel virtual address almost never are.  kmap is exclusively used so
the kernel can access a user page, and at that point, we need to flush
because we've set up an inequivalent alias (even if it's only done for
read)

kmap/kmap_atomic is used in more than just copy/flush ... or did you
mean that you removed the kmap calls in copy/flush and the whole thing
doesn't work (rather than as you imply you removed the flush in kunmap?)

(Continue reading)

James Bottomley | 3 Jan 2012 16:32

Re: Happy New Year PARISC

On Tue, 2012-01-03 at 10:13 -0500, John David Anglin wrote:
> On 1/3/2012 6:50 AM, Carlos O'Donell wrote:
> > On Mon, Jan 2, 2012 at 6:12 PM, John David Anglin<dave.anglin <at> bell.net>  wrote:
> >> None of this worked.  Attached patch as it stands.  Comments and testing
> >> appreciated.
> > Could you clarify what you mean by "none of this worked?"
> >
> 
> I tried eliminating the flushes that occur in kunmap_atomic on PA8800 
> and PA8900
> after the calls to clear_user_page and copy_user_page by defining 
> clear_user_highpage
> and copy_user_highpage.  I had thought the flushes weren't necessary.  
> There's something
> about this that I don't understand.  Why do we need to flush 
> non-equivalent page mappings
> that aren't used?

But they are used:  Your work makes sure that all user space mappings
are equivalent.  However, because of the way Linux sets up kernel
mappings (from the pfn array and offsets) the user virtual address and
kernel virtual address almost never are.  kmap is exclusively used so
the kernel can access a user page, and at that point, we need to flush
because we've set up an inequivalent alias (even if it's only done for
read)

kmap/kmap_atomic is used in more than just copy/flush ... or did you
mean that you removed the kmap calls in copy/flush and the whole thing
doesn't work (rather than as you imply you removed the flush in kunmap?)

(Continue reading)

John David Anglin | 3 Jan 2012 17:26

Re: Happy New Year PARISC

On 1/3/2012 10:32 AM, James Bottomley wrote:
> On Tue, 2012-01-03 at 10:13 -0500, John David Anglin wrote:
>> On 1/3/2012 6:50 AM, Carlos O'Donell wrote:
>>> On Mon, Jan 2, 2012 at 6:12 PM, John David Anglin<dave.anglin <at> bell.net>   wrote:
>>>> None of this worked.  Attached patch as it stands.  Comments and testing
>>>> appreciated.
>>> Could you clarify what you mean by "none of this worked?"
>>>
>> I tried eliminating the flushes that occur in kunmap_atomic on PA8800
>> and PA8900
>> after the calls to clear_user_page and copy_user_page by defining
>> clear_user_highpage
>> and copy_user_highpage.  I had thought the flushes weren't necessary.
>> There's something
>> about this that I don't understand.  Why do we need to flush
>> non-equivalent page mappings
>> that aren't used?
> But they are used:  Your work makes sure that all user space mappings
> are equivalent.  However, because of the way Linux sets up kernel
> mappings (from the pfn array and offsets) the user virtual address and
> kernel virtual address almost never are.  kmap is exclusively used so
> the kernel can access a user page, and at that point, we need to flush
> because we've set up an inequivalent alias (even if it's only done for
> read)
>
> kmap/kmap_atomic is used in more than just copy/flush ... or did you
> mean that you removed the kmap calls in copy/flush and the whole thing
> doesn't work (rather than as you imply you removed the flush in kunmap?)
>
I didn't modify kmap/kunmap_atomic.  I wrote versions of 
(Continue reading)

John David Anglin | 3 Jan 2012 17:42

Re: Happy New Year PARISC

On 1/3/2012 11:26 AM, John David Anglin wrote:
> I replaced
> the kunmap_atomic calls with pagefault_enable to avoid the flush in 
> the returns from
> clear/copy_user_page (actually, I only used one call to 
> pagefault_enable, so maybe
> that was the issue).
It looks like I messed up the enable.  Will retry.

Dave

--

-- 
John David Anglin    dave.anglin <at> bell.net

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane