Michael Sevakis | 1 Dec 08:45 2011
Picon
Picon

Re: Performance regression

> We should try to figure out if it also affects real hardware targets
> or just maemo / other RaaA targets.
>
> Cheers,
> Thomas

The software mixer and the PCM buffer are distinct entities. The PCM buffer 
is called by the mixer and the callback is very lightweight, much more so 
than before revisions with timestamping. The PCM buffer frames are 2048 
samples (to provide accurate elapsed information without much jitter and 
accurate enough for A-B repeat). The software mixer frames are fixed length. 
256 samples by default, always, to have very low latency. Certain non-DMA 
using targets have it shorter to deal with the short HW fifos and the fact 
that mixing is done in the ISR.

On real targets, the impact of the software mixer was pretty negligble and 
in the PortalPlayer 502x case, it actually reduced CPU load a bit since 
cache flushes weren't needed to maintain coherency for DMA (buffers are 
IRAM). On RaaA, I have no idea what the impact was. Some things have a cost 
afterall.

Shorter/longer frames are possible for the mixer based upon targets but it 
should keep things reasonably responsive. Longer mixer frames means each 
double buffer frame is longer, thus needing more memory, which might be 
scarce IRAM.

Regards,
Mike

(Continue reading)

Thomas Jarosch | 1 Dec 17:25 2011

Re: Performance regression

On Thursday, 1. December 2011 08:45:36 Michael Sevakis wrote:
> On RaaA, I have no idea what the impact was. Some
> things have a cost afterall.

Thanks for the detailed explanation, Mike.

I'll try if I can reproduce it with the SDL RaaA target,
debugging/bisecting on the workstation is much easier.

Also one can use a CPU profiler like google-perftools,
who knows if it's related to software mixing at all.

Cheers,
Thomas

Michael Sevakis | 1 Dec 19:02 2011
Picon
Picon

Re: Performance regression

> On Thursday, 1. December 2011 08:45:36 Michael Sevakis wrote:
>
> I'll try if I can reproduce it with the SDL RaaA target,
> debugging/bisecting on the workstation is much easier.
>

One more thing. You mentioned maemo and really, I did not work on that so 
much. I just took my best guess at what to do with it in r30097 and noone 
was conveniently around to check it. I also noticed the lack of 
pcm_play_lock/unlock implementations.

Michael Sevakis | 1 Dec 22:07 2011
Picon
Picon

Re: Performance regression

I forgot something else too. The including of optimized mixer assembly for 
App targets is disabled. Something needs to be done there to get it to 
compile. 

Thomas Jarosch | 1 Dec 22:18 2011

Re: Performance regression

Am Donnerstag, 1. Dezember 2011, 19:02:28 schrieb Michael Sevakis:
> One more thing. You mentioned maemo and really, I did not work on that so
> much. I just took my best guess at what to do with it in r30097 and noone
> was conveniently around to check it. I also noticed the lack of
> pcm_play_lock/unlock implementations.

Actually I did a test build back then (still floating around on my phone)
and everything was working fine. I didn't notice the performance drop.
Maemo doesn't need pcm_play_lock() as the locking is done
on the gstreamer object level.

Anyhow, it turned out to be the software mixing and you already
gave the solution: After an increase of the software mixer buffer size
from 256 to 2048, CPU usage is back to normal.

Is that a size we still can live with?

Also have a look at FS #12421, that squeezes out
a bit more performance on RaaA.

Thanks for you effort not to break maemo when you commited r30097.

Cheers,
Thomas

Thomas Jarosch | 1 Dec 22:19 2011

Re: Performance regression

Am Donnerstag, 1. Dezember 2011, 22:07:33 schrieb Michael Sevakis:
> I forgot something else too. The including of optimized mixer assembly for
> App targets is disabled. Something needs to be done there to get it to
> compile.

Almost fixed, see FS #12421.

Thomas

Michael Sevakis | 1 Dec 22:31 2011
Picon
Picon

Re: Performance regression

> Actually I did a test build back then (still floating around on my phone)
> and everything was working fine. I didn't notice the performance drop.
> Maemo doesn't need pcm_play_lock() as the locking is done
> on the gstreamer object level.
>

It for locking out the callback for the rest of the system, like the mixer, 
pcmbuffer, etc. when it would not be desireable for callback to happen. With 
the higher callback frequency, it's much more likely to happen when other 
code in the system expects an atomic operation. I don't see how the current 
code performs the intended function other than for its own internal 
integrity. The flag setting that I remember seeing is racy too.

> Anyhow, it turned out to be the software mixing and you already
> gave the solution: After an increase of the software mixer buffer size
> from 256 to 2048, CPU usage is back to normal.
>
> Is that a size we still can live with?
>

No comment on that in particular other than test it out and make sure 
channels are sufficiently responsive for things like keyclick and voice. One 
other thing that I've got brewing is using another channel for previews 
during seeking. I wanted it snappy on target but practicality has to have 
its say.

> Also have a look at FS #12421, that squeezes out
> a bit more performance on RaaA.
>

(Continue reading)

Michael Sevakis | 1 Dec 22:47 2011
Picon
Picon

Re: Performance regression

> Is that a size we still can live with?
>

I forgot to ask: were you proposing to increase it for just maemo or for all 
targets? I wouldn't recommend doing the latter. In fact it probably wouldn't 
compile on many targets anyway and also scrollwheels wouldn't be able to 
click frequently enough. 

Michael Sevakis | 1 Dec 23:01 2011
Picon
Picon

Re: Performance regression

> Am Donnerstag, 1. Dezember 2011, 22:07:33 schrieb Michael Sevakis:
>> I forgot something else too. The including of optimized mixer assembly 
>> for
>> App targets is disabled. Something needs to be done there to get it to
>> compile.
>
> Almost fixed, see FS #12421.
>

There's something else I left open for whenever Mr. Someone feels like doing 
it. The possibility exists to just use a different mixer that implements the 
interfaces if native multi-channel audio is available through some 
interface. 

Thomas Martitz | 2 Dec 07:27 2011

Re: Performance regression

Am 01.12.2011 22:47, schrieb Michael Sevakis:
>> Is that a size we still can live with?
>>
>
> I forgot to ask: were you proposing to increase it for just maemo or 
> for all targets? I wouldn't recommend doing the latter. In fact it 
> probably wouldn't compile on many targets anyway and also scrollwheels 
> wouldn't be able to click frequently enough.

Speaking of keyclicks. They're not working that fine on android. Delayed 
by about 0.5s and repeated when using the touchscreen.

Best regards.


Gmane