Manuel Bouyer | 5 Sep 2010 23:35

SMP/preempt bug in matt-nb5-mips64

Hello,
I found a problem in matt-nb5-mips64's mips/mips/spl.S, regarding
curcpu() and preemption. In both _splraise and _splsw_splhigh,
curcpu() is loaded from L_CPU(MIPS_CURLWP) in a register early,
especially before disabling interrupt. If the current IPL is 0,
the current thread can be preempted and rescheduled on another CPU,
and the new SPL is written back to the wrong cpu_info.
From there, bad things happens (what I've seen is an infinite loop
from the interrupt handler on the victim CPU, because _splsw_splhigh
thinks we're already at splhigh and do nothing, when interrupts are
really enabled).
The attached patch seems to fix it for me: it's enough to reload
curcpu() before writing back the new IPL, as for the above senario to
happen the old IPL of both CPUs has to be 0.

I suspect there's a similar issue with the use of L_CPU(MIPS_CURLWP)
in stub_lock.S, but I've not looked in details, as right now I'm
running with LOCKDEBUG and this code isn't used.

--

-- 
Manuel Bouyer <bouyer <at> antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--
Index: spl.S
===================================================================
--- spl.S	(revision 100)
+++ spl.S	(revision 101)
 <at>  <at>  -76,6 +76,10  <at>  <at> 
(Continue reading)

Sandra | 11 Sep 2010 04:40
Picon

Nuovo sbiancante rapido

NUOVA FORMULA !

Denti bianchissimi con il nuovo sbiancante rapido.L'unico con il 36% di perossido di
carabmide!Risultato professionale e duraturo per un sorriso smagliante!

Per descrizione dettagliata cliccare su http://sbiancantedenti.clickurl.net

Sandra Perini - DLM srl

neil chiao | 14 Sep 2010 05:30
Picon

init won't start on MIPS 74Kc NetBSD 4.0

Hi, all,

I'm porting NetBSD 4.0 to a MIPS 74Kc board.
Kernel boots well, but it stops at last step: init

With BDI and gdb, I found init process is in 'ps'.
But I add abort() at first lines in init.c main(), it won't run.

Anybody has any ideas?

Thank you,
Neil

Manuel Bouyer | 22 Sep 2010 14:39

Re: SMP/preempt bug in matt-nb5-mips64

On Sun, Sep 05, 2010 at 11:35:41PM +0200, Manuel Bouyer wrote:
> Hello,
> I found a problem in matt-nb5-mips64's mips/mips/spl.S, regarding
> curcpu() and preemption. In both _splraise and _splsw_splhigh,
> curcpu() is loaded from L_CPU(MIPS_CURLWP) in a register early,
> especially before disabling interrupt. If the current IPL is 0,
> the current thread can be preempted and rescheduled on another CPU,
> and the new SPL is written back to the wrong cpu_info.
> >From there, bad things happens (what I've seen is an infinite loop
> from the interrupt handler on the victim CPU, because _splsw_splhigh
> thinks we're already at splhigh and do nothing, when interrupts are
> really enabled).
> The attached patch seems to fix it for me: it's enough to reload
> curcpu() before writing back the new IPL, as for the above senario to
> happen the old IPL of both CPUs has to be 0.

Well, unfortunably this is not enough.
If we get preempted just after loading L_CPU(MIPS_CURLWP) and before
checking the spl, we may be moved to another CPU and run at a later
time. When we get back running we check the spl of the previous CPU
which may not be 0 any more.
I couldn't find a better way than disabling the interrupts before
checking the spl (well, we could disable preemption but I suspect
it cost more than just disabling interrupts). The attached patch now seems to
DTRT for me.

--

-- 
Manuel Bouyer <bouyer <at> antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--
(Continue reading)

Manuel Bouyer | 22 Sep 2010 14:48

matt-nb5-mips64 cpu_idle issue

Hi,
while working on a SMP platform using code based on the matt-nb5-mips64
branch, I ran in the following issue:
a lwp has been put to sleep on cpux, switching cpux to the idle lwp.
Another CPU wakes up the lwp while cpux is in mi_switch().
Now cpux is stuck in the idle loop, never noticing that another lwp is
runnable.

The reason is that cpu_idle() loops testing (ci->ci_want_resched),
but this is not enough to detect if another thread is ready to run
(several places do cpu_need_resched(ci, 0)). cpu_idle() is itself called
in a loop, which does more thing and especially call sched_curcpu_runnable_p()
to see if the idle lwp needs to switch to another lwp.
So cpu_idle() should do whatever is needed to put the CPU into sleep
waiting for an interrupt (if possible), but should not do so in a loop.
The attached patch still check for ci->ci_want_resched as x86 does, but
I'm not sure this is needed at all.

--

-- 
Manuel Bouyer <bouyer <at> antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--
Index: cpu_subr.c
===================================================================
--- cpu_subr.c	(revision 66)
+++ cpu_subr.c	(working copy)
 <at>  <at>  -580,7 +580,7  <at>  <at> 
 	void (*const mach_idle)(void) = mips_locoresw.lsw_cpu_idle;
(Continue reading)

Gregory T. Andersen | 22 Sep 2010 18:29
Favicon

MIPS 74K and execution hazards

Has anyone done any work for MIPS 74K CPUs (mips32r2)?  We have run into 
an issue where it seems a lot of the low level code (mipsX_subr.S, 
locore.S) does not have support for MIPS execution hazards.  The big 
offenders seem to be the TLB/CP0 instructions (tlbp, tlbw, mtc0, mfc0, 
etc..).

Rather than use nop slots after instructions MIPS32 requires use of 
ssnop (mips32r1) or ehb (mips32r2) instructions to guarantee clearing of 
execution hazards in MIPS superscaler designs.

We have a heavy handed solution in place to get the kernel booting but 
were curious if anyone had implemented a more elegant/tested solution?

Greg Andersen
Cradlepoint Technology

Matt Thomas | 22 Sep 2010 20:23

Re: MIPS 74K and execution hazards


On Sep 22, 2010, at 9:29 AM, Gregory T. Andersen wrote:

> Has anyone done any work for MIPS 74K CPUs (mips32r2)?  We have run into an issue where it seems a lot of the low
level code (mipsX_subr.S, locore.S) does not have support for MIPS execution hazards.  The big offenders
seem to be the TLB/CP0 instructions (tlbp, tlbw, mtc0, mfc0, etc..).

I'd love to have to a mipsXXr2 to play with to actually make the r2 features work.

> Rather than use nop slots after instructions MIPS32 requires use of ssnop (mips32r1) or ehb (mips32r2)
instructions to guarantee clearing of execution hazards in MIPS superscaler designs.

I find the COP0_SYNC stuff to be horrible.  I've been planning on making adding mips32r2_subr.S and
mips64r2_subr.S so I can use those features.  But locore is more difficult since it's not CPU specific.  

> We have a heavy handed solution in place to get the kernel booting but were curious if anyone had
implemented a more elegant/tested solution?

If have a few ideas in mind but haven't done anything with it due to lack of anything to test on.

Gregory T. Andersen | 22 Sep 2010 22:47
Favicon

Re: MIPS 74K and execution hazards

 > I find the COP0_SYNC stuff to be horrible.
Horrible yes and in some cases on a 74K it is insufficient to define 
COP0_SYNC as 'ehb' because there are hazards which don't have a 
interceding COP0_SYNC instructions.

On 09/22/2010 12:23 PM, Matt Thomas wrote:
>
> On Sep 22, 2010, at 9:29 AM, Gregory T. Andersen wrote:
>
>> Has anyone done any work for MIPS 74K CPUs (mips32r2)?  We have run into an issue where it seems a lot of the
low level code (mipsX_subr.S, locore.S) does not have support for MIPS execution hazards.  The big
offenders seem to be the TLB/CP0 instructions (tlbp, tlbw, mtc0, mfc0, etc..).
>
> I'd love to have to a mipsXXr2 to play with to actually make the r2 features work.
>
>> Rather than use nop slots after instructions MIPS32 requires use of ssnop (mips32r1) or ehb (mips32r2)
instructions to guarantee clearing of execution hazards in MIPS superscaler designs.
>
> I find the COP0_SYNC stuff to be horrible.  I've been planning on making adding mips32r2_subr.S and
mips64r2_subr.S so I can use those features.  But locore is more difficult since it's not CPU specific.
>
>> We have a heavy handed solution in place to get the kernel booting but were curious if anyone had
implemented a more elegant/tested solution?
>
> If have a few ideas in mind but haven't done anything with it due to lack of anything to test on.
>
>
>

(Continue reading)


Gmane