Kolja Waschk | 1 Mar 2011 11:50
Picon
Favicon

Re: More blackfin kernel oops under heavy load

Hi,

>> http://www.ixo.de/tmp/till20110228.tgz

After verifying that no one yet checked, I updated the code in the archive to even better reflect the "real"
situation. The main thread emits a number on stdout every second endlessly, waiting for SIGINT. To
trigger the fault, it is necessary to contact the net server (with wget as I described) in a fast loop and
after some time (about 2 seconds in my setup) the kernel fault occurs.

Parameters to tune the load on the system are the frame size (MODT_BUFLEN in modt.h) and a nop loop in till.c
rtdm_loop() (line 84). The BF537 here is running with 133 MHz bus clock, 533 MHz system clock.

Kolja
Gilles Chanteperdrix | 1 Mar 2011 13:13
Favicon

Re: More blackfin kernel oops under heavy load

Kolja Waschk wrote:
>> I do not really understand what we are talking about. Are we talking
>> about Linux select/accept or Xenomai select/accept? Why not using the
> 
> Linux select/accept. Using the blocking accept() would have changed the
> behaviour somewhat compared to the original design. Anyway, I think that doesn't
> actually matter much.
> 
> I have meanwhile derived a much smaller RTDM driver kernel module, test
> application with blocking accept() ;) plus Makefile that do not depend on any
> particular external hardware anymore: A SPORT interface (SPORT1 receiver) is
> configured with internal clock and frame sync generation and so on itself
> generates a lot of interrupts, and this alltogether quite quickly reproduces
> the problem on my system. The files together are less than 1000 lines, 20kb.
> 
> I'd really appreciate if you or someone could take a look at it and maybe try
> the code on his own bf537 system whether the same faults occur, and why. May I
> post the files here (as a zipped attachment? inline)? I've already uploaded
> a copy at
> 
>> http://www.ixo.de/tmp/till20110228.tgz

Thanks a lot. I do not know much about blackfin, Philippe is the
specialist. But having this code will certainly help find the issue.

Regards.

--

-- 
                                                                Gilles.
(Continue reading)

Jeff Weber | 1 Mar 2011 15:56
Picon

SMP load balancing

I am looking for guidance on how much I can rely upon the Linux kernel and Xenomai schedulers to perform automatic load balancing, vs. how much I must load balance manually.  My config:

x86 SMP system with 2 CPUs
Linux 2.6.35.10 + xenomai-2.5.5.2
ISA hardware device with 1 IRQ
Xenomai kernel module driver using old-style native API (may be ported to RTDM in the future)
user space application with multiple Linux and Xenomai threads
no effort [yet] to explicitly set CPU affinities for either Xenomai, nor Linux
cat /proc/xenomai/affinity output = 000000ff

question:  From /proc/xenomai/stat , I see all Xenomai kernel driver and userspace threads running on only CPU0, though my IRQ handler is being balanced on both CPUs.  Will Xenomai ever "balance" or "migrate" threads to other CPUs if the affinity mask allows this?

question: Does the Linux scheduler assume it owns 100% of the cycles on CPU0 (where all the Xenomai threads happen to be running), and thus make incorrect scheduling decisions for Linux threads on CPU0?

thanks,
Jeff

_______________________________________________
Xenomai-help mailing list
Xenomai-help <at> gna.org
https://mail.gna.org/listinfo/xenomai-help
Gilles Chanteperdrix | 1 Mar 2011 16:04
Favicon

Re: SMP load balancing

Jeff Weber wrote:
> I am looking for guidance on how much I can rely upon the Linux kernel and
> Xenomai schedulers to perform automatic load balancing, vs. how much I must
> load balance manually.  My config:
> 
> x86 SMP system with 2 CPUs
> Linux 2.6.35.10 + xenomai-2.5.5.2
> ISA hardware device with 1 IRQ
> Xenomai kernel module driver using old-style native API (may be ported to
> RTDM in the future)
> user space application with multiple Linux and Xenomai threads
> no effort [yet] to explicitly set CPU affinities for either Xenomai, nor
> Linux
> cat /proc/xenomai/affinity output = 000000ff
> 
> question:  From /proc/xenomai/stat , I see all Xenomai kernel driver and
> userspace threads running on only CPU0, though my IRQ handler is being
> balanced on both CPUs.  Will Xenomai ever "balance" or "migrate" threads to
> other CPUs if the affinity mask allows this?

Xenomai does not make any load balancing, because a migration introduces
a huge latency, so, what you have to do is to set an affinity with just
one bit set. And changing affinity during a thread's life is not
recommended.

> 
> question: Does the Linux scheduler assume it owns 100% of the cycles on CPU0
> (where all the Xenomai threads happen to be running), and thus make
> incorrect scheduling decisions for Linux threads on CPU0?

I am not sure, what does top say ?

--

-- 
					    Gilles.
Gilles Chanteperdrix | 1 Mar 2011 16:04
Favicon

Re: SMP load balancing

Jeff Weber wrote:
> I am looking for guidance on how much I can rely upon the Linux kernel and
> Xenomai schedulers to perform automatic load balancing, vs. how much I must
> load balance manually.  My config:
> 
> x86 SMP system with 2 CPUs
> Linux 2.6.35.10 + xenomai-2.5.5.2
> ISA hardware device with 1 IRQ
> Xenomai kernel module driver using old-style native API (may be ported to
> RTDM in the future)
> user space application with multiple Linux and Xenomai threads
> no effort [yet] to explicitly set CPU affinities for either Xenomai, nor
> Linux
> cat /proc/xenomai/affinity output = 000000ff
> 
> question:  From /proc/xenomai/stat , I see all Xenomai kernel driver and
> userspace threads running on only CPU0, though my IRQ handler is being
> balanced on both CPUs.  Will Xenomai ever "balance" or "migrate" threads to
> other CPUs if the affinity mask allows this?

Xenomai does not make any load balancing, because a migration introduces
a huge latency, so, what you have to do is to set an affinity with just
one bit set. And changing affinity during a thread's life is not
recommended.

> 
> question: Does the Linux scheduler assume it owns 100% of the cycles on CPU0
> (where all the Xenomai threads happen to be running), and thus make
> incorrect scheduling decisions for Linux threads on CPU0?

I am not sure, what does top say ?

--

-- 
					    Gilles.
Philippe Gerum | 1 Mar 2011 18:47
Favicon

Re: More blackfin kernel oops under heavy load

On Tue, 2011-03-01 at 13:13 +0100, Gilles Chanteperdrix wrote:
> Kolja Waschk wrote:
> >> I do not really understand what we are talking about. Are we talking
> >> about Linux select/accept or Xenomai select/accept? Why not using the
> > 
> > Linux select/accept. Using the blocking accept() would have changed the
> > behaviour somewhat compared to the original design. Anyway, I think that doesn't
> > actually matter much.
> > 
> > I have meanwhile derived a much smaller RTDM driver kernel module, test
> > application with blocking accept() ;) plus Makefile that do not depend on any
> > particular external hardware anymore: A SPORT interface (SPORT1 receiver) is
> > configured with internal clock and frame sync generation and so on itself
> > generates a lot of interrupts, and this alltogether quite quickly reproduces
> > the problem on my system. The files together are less than 1000 lines, 20kb.
> > 
> > I'd really appreciate if you or someone could take a look at it and maybe try
> > the code on his own bf537 system whether the same faults occur, and why. May I
> > post the files here (as a zipped attachment? inline)? I've already uploaded
> > a copy at
> > 
> >> http://www.ixo.de/tmp/till20110228.tgz
> 
> Thanks a lot. I do not know much about blackfin, Philippe is the
> specialist. But having this code will certainly help find the issue.
> 

Clearly, yes. Thanks. I'll do my best to find cycles to have a look at
this asap.

> Regards.
> 

--

-- 
Philippe.
Richard Cochran | 2 Mar 2011 16:16
Picon
Gravatar

Stuck MSI in normal Linux driver

I am running Xenomai 2.5 git master on a P2020 with kernel 2.6.36.

We have a custom PCIe card that raise a MSI, with a normal (not rtdm)
driver. It appears that the interrupt comes just once and then is
stuck. It work under plain Linux, but I cannot rule out a HW timing
bug either.

The wiki pages 

   http://www.xenomai.org/index.php/FAQs
   http://www.xenomai.org/index.php/Configuring_x86_kernels

say not to enable MSI on x86, with a link to a very old (2008)
discussion. I found a newer mail

   https://mail.gna.org/public/xenomai-help/2010-01/msg00095.html

claiming that MSI is okay.

Is there a known problem with MSI and Xenomai?

If so, where/how can I start to get working on fixing it?

(I don't need the MSI in a rtdm or adeos context.)

Thanks,

Richard
Gilles Chanteperdrix | 2 Mar 2011 16:26
Favicon

Re: Stuck MSI in normal Linux driver

Richard Cochran wrote:
> I am running Xenomai 2.5 git master on a P2020 with kernel 2.6.36.
> 
> We have a custom PCIe card that raise a MSI, with a normal (not rtdm)
> driver. It appears that the interrupt comes just once and then is
> stuck. It work under plain Linux, but I cannot rule out a HW timing
> bug either.
> 
> The wiki pages 
> 
>    http://www.xenomai.org/index.php/FAQs
>    http://www.xenomai.org/index.php/Configuring_x86_kernels
> 
> say not to enable MSI on x86, with a link to a very old (2008)
> discussion. I found a newer mail
> 
>    https://mail.gna.org/public/xenomai-help/2010-01/msg00095.html
> 
> claiming that MSI is okay.

No. This is my fault. I simply had forgotten that MSI were not OK on x86.

> 
> Is there a known problem with MSI and Xenomai?

There is a known problem with MSI on x86 (explained by the FAQ link).

> 
> If so, where/how can I start to get working on fixing it?
> 
> (I don't need the MSI in a rtdm or adeos context.)

Given that the problem is that the interrupt controller ack/mask
functions (if I recall correctly) need, on x86, to use some Linux
primitives which can not be called from Xenomai domain, I am not sure it
is easy to fix. And the fact that you do not need the MSI in Xenomai
domain does not change anything, since ack/mask routines are called
ahead of the pipeline anyway.

--

-- 
					    Gilles.
Richard Cochran | 2 Mar 2011 16:34
Picon
Gravatar

Re: Stuck MSI in normal Linux driver

On Wed, Mar 02, 2011 at 04:26:46PM +0100, Gilles Chanteperdrix wrote:
> Richard Cochran wrote:
> > If so, where/how can I start to get working on fixing it?
> > 
> > (I don't need the MSI in a rtdm or adeos context.)
> 
> Given that the problem is that the interrupt controller ack/mask
> functions (if I recall correctly) need, on x86, to use some Linux
> primitives which can not be called from Xenomai domain, I am not sure it
> is easy to fix. And the fact that you do not need the MSI in Xenomai
> domain does not change anything, since ack/mask routines are called
> ahead of the pipeline anyway.

Okay, but how about on PowerPC, can I get to work there?

Thanks,

Richard
Gilles Chanteperdrix | 2 Mar 2011 16:38
Favicon

Re: Stuck MSI in normal Linux driver

Richard Cochran wrote:
> On Wed, Mar 02, 2011 at 04:26:46PM +0100, Gilles Chanteperdrix wrote:
>> Richard Cochran wrote:
>>> If so, where/how can I start to get working on fixing it?
>>>
>>> (I don't need the MSI in a rtdm or adeos context.)
>> Given that the problem is that the interrupt controller ack/mask
>> functions (if I recall correctly) need, on x86, to use some Linux
>> primitives which can not be called from Xenomai domain, I am not sure it
>> is easy to fix. And the fact that you do not need the MSI in Xenomai
>> domain does not change anything, since ack/mask routines are called
>> ahead of the pipeline anyway.
> 
> Okay, but how about on PowerPC, can I get to work there?

I do not think powerpc has the same issue. Check that you do not have an
edge/level issue (MSI are edge triggered I think).

--

-- 
					    Gilles.

Gmane