Lee Revell | 1 Aug 04:07 2004

Re: spurious XRUNs?

On Sat, 2004-07-31 at 04:08, Steve Harris wrote:
> On Fri, Jul 30, 2004 at 08:04:35 -0400, Lee Revell wrote:
> > I have noticed when running with xrun_debug=2 that jackd usually
> > generates one or two spurious XRUNs on startup and shutdown.  I am
> > assuming they are spurious because jackd does not print out its usual
> > XRUN error message, and that these must be some artifact of the way
> > jackd initializes.
> 
> I think its not that they are spurious XRUNS, just that jack chooses not
> to show them in its output - there is some XRUN suppression code to
> prevent showing the inevitable XRUN reuslting from the opening of the PCM
> device.

OK, so there is an inevitable XRUN when you open a PCM device.  Is this
desirable?  It seems like it might be better to have a device state
'initializing', where you ignore any XRUNs until initialization is
finished.  This seems like a fundamentally different thing from a "real"
XRUN.

For now, I modified jackd to print a message when it gets to the startup
XRUN, to remind you to filter that one out of your logs.

Does the inevitable XRUN on startup only apply to capture devices, or
both ways?

Lee

-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
(Continue reading)

Matt Mackall | 1 Aug 04:55 2004

Re: Statistical methods for latency profiling

On Sat, Jul 31, 2004 at 01:22:37AM -0400, Lee Revell wrote:
> Hey,
> 
> Recently Ingo Molnar asked in one of the voluntary-preempt threads for
> the minimum and average scheduling delay reported by jackd.  JACK does
> not currently maintain these statistics.
> 
> I realized that the distribution of maximum latencies reported on each
> process cycle is fairly normally distributed.

This is not at all what I would expect. Instead, I'd expect to see
something like a gamma distribution, where we have everything
clustered down close to zero, but with a very long tail in the
positive direction falling off exponentially and obviously a hard
limit on the other side..

--

-- 
Mathematics is the supreme nostalgia of our time.
Lee Revell | 1 Aug 05:24 2004

Re: [Jackit-devel] Re: Statistical methods for latency profiling

On Sat, 2004-07-31 at 22:55, Matt Mackall wrote:
> On Sat, Jul 31, 2004 at 01:22:37AM -0400, Lee Revell wrote:
> > Hey,
> > 
> > Recently Ingo Molnar asked in one of the voluntary-preempt threads for
> > the minimum and average scheduling delay reported by jackd.  JACK does
> > not currently maintain these statistics.
> > 
> > I realized that the distribution of maximum latencies reported on each
> > process cycle is fairly normally distributed.
> 
> This is not at all what I would expect. Instead, I'd expect to see
> something like a gamma distribution, where we have everything
> clustered down close to zero, but with a very long tail in the
> positive direction falling off exponentially and obviously a hard
> limit on the other side..

Right, it is a lot closer to a gamma distribution.  It's been years
since I have used any of this, and I took stat for psych majors, vs stat
for engineers.  I was a lot more interested in playing Doom at the
time...

This looks interesting:

http://www.itl.nist.gov/div898/handbook/eda/section3/ppccplot.htm
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4291.htm

I will have some numbers soon.

Lee
(Continue reading)

Lee Revell | 1 Aug 07:59 2004

Re: [Jackit-devel] Re: Statistical methods for latency profiling

On Sat, 2004-07-31 at 23:24, Lee Revell wrote:

> I will have some numbers soon.

Here is the histogram from running 500000 cycles (~5 minutes), with
max_sectors_kb -> 64, normal desktop usage.  Kernel is 2.6.8-rc2-M5, set
to 1:3, with all irqs threaded except the sound card.

Delay   Count
-----   -----
6       2257
7       3210
8       797
9       717
10      283
11      219
12      235
13      250
14      263
15      320
16      334
17      287
18      268
19      260
20      223
21      155
22      109
23      90
24      41
25      19
(Continue reading)

Lee Revell | 1 Aug 08:06 2004

[PATCH] Display latency histogram on exit, add -n/--num-cycles argument

This patch displays a histogram of latency measurements from each
cycle.  The histogram is updated in jack_cpu_load, which runs every
JACK_ENGINE_ROLLING_COUNT frames.  The overhead is essentially zero, as
existing infrastructure is reused.

The patch also adds an -n/--num-cycles argument, which causes jack to
run that many cycles and then exit.  I find 500000 cycles useful, at 32
frames (with no clients) this is about 5 minutes.

It was pointed out to me that these numbers only represent the
scheduling delay when there are no clients running; otherwise you are
measuring the time the slowest client takes.  So this is only useful for
benchmarking scheduler latency with no clients.

Also the patch replaces a few occurrences of if/else where a variable
always gets a value with the ternary operator.  I am not even sure this
is faster with a modern compiler.  Feel free to omit these lines. 

Lee 

-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
Steve Harris | 1 Aug 09:08 2004
Picon

Re: Statistical methods for latency profiling

On Sat, Jul 31, 2004 at 01:48:25 -0400, Lee Revell wrote:
> > > I was actually looking for feedback on what types of statistics would be
> > > most useful.  The problem is that you can't just dump every number to an
> > > fd like this, because we are dealing with such high frequencies and low
> > > latencies that the output becomes voluminous, and the overhead of just
> > > printing to a log becomes greater than that of computing some running
> > > statistics in jackd.
> > 
> > The output is not as much as you think. 64 sample periods at 48k is < 1000
> > updates per sec, say, 20 bytes per period gives 20k/sec, ie. nothing.
> > I've profiled lots of software this way.
> 
> True, but if you are trying to estimate the worst latency you would see
> over 24 hours, in order to get a statistically significant result, your
> sample size needs to be a significant fraction of that 24 hours -
> probably half a billion interrupts.  'Five nines' is not helpful here
> because it only takes ~15 seconds to get 10000 samples.  I don't have
> the numbers handy right now, I will have to pull out the books today.

The output can be piped into a processor, rather than storing it.

- Steve 

-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
Lee Revell | 1 Aug 11:08 2004

Re: [PATCH] Display latency histogram on exit, add -n/--num-cycles argument

On Sun, 2004-08-01 at 02:06, Lee Revell wrote:
> This patch displays a histogram of latency measurements from each
> cycle.  The histogram is updated in jack_cpu_load, which runs every
> JACK_ENGINE_ROLLING_COUNT frames.  The overhead is essentially zero, as
> existing infrastructure is reused.
> 
> The patch also adds an -n/--num-cycles argument, which causes jack to
> run that many cycles and then exit.  I find 500000 cycles useful, at 32
> frames (with no clients) this is about 5 minutes.
> 
> It was pointed out to me that these numbers only represent the
> scheduling delay when there are no clients running; otherwise you are
> measuring the time the slowest client takes.  So this is only useful for
> benchmarking scheduler latency with no clients.
> 
> Also the patch replaces a few occurrences of if/else where a variable
> always gets a value with the ternary operator.  I am not even sure this
> is faster with a modern compiler.  Feel free to omit these lines. 
> 

The patch has an off-by-992 error.  This fixes it:

-#define JACK_ENGINE_ROLLING_COUNT 32
+#define JACK_ENGINE_ROLLING_COUNT 1024

Here are some results.  At least three distinct peaks are visible which
repeat in both trials of N=500000.  These are clearer with N=1000000.

Trial 1, N=500000

(Continue reading)

Thomas Charbonnel | 1 Aug 13:09 2004

Re: [PATCH] Display latency histogram on exit, add -n/--num-cycles argument

Lee Revell wrote :
> This patch displays a histogram of latency measurements from each
> cycle. 

Could you please (re)send it ?

Thanks,
Thomas

-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
Lee Revell | 1 Aug 13:19 2004

Re: [PATCH] Display latency histogram on exit, add -n/--num-cycles argument

On Sun, 2004-08-01 at 07:09, Thomas Charbonnel wrote:
> Lee Revell wrote :
> > This patch displays a histogram of latency measurements from each
> > cycle. 
> 
> Could you please (re)send it ?
> 
> Thanks,
> Thomas
> 
> 
> 

Sure, it's combined with my max_usecs patch here.  I also fixed the
error referenced in my followup.

Lee

Index: jack/drivers/alsa/alsa_driver.c
===================================================================
RCS file: /cvsroot/jackit/jack/drivers/alsa/alsa_driver.c,v
retrieving revision 1.52
diff -u -r1.52 alsa_driver.c
--- jack/drivers/alsa/alsa_driver.c	26 Jul 2004 05:36:01 -0000	1.52
+++ jack/drivers/alsa/alsa_driver.c	1 Aug 2004 05:46:05 -0000
 <at>  <at>  -1093,6 +1093,8  <at>  <at> 
 		fprintf(stderr, "\n\n**** alsa_pcm: xrun of at least %.3f "
 			"msecs\n\n",
 			diff.tv_sec * 1000 + diff.tv_usec / 1000.0);
+	} else {
(Continue reading)

Ingo Molnar | 1 Aug 13:21 2004
Picon

Re: Re: Statistical methods for latency profiling


On Sun, 1 Aug 2004, Lee Revell wrote:

> So stressing the filesystem moves the center to the right a bit, from
> 6-7 to 9-10, and *drastically* lengthens the 'tail'.

basically each codepath has a typical latency distribution, and when a
workload uses multiple codepaths then the latencies get intermixed almost
linearly.

> These numbers suggest to me that a lot of the latencies from 47 usecs
> and up are caused by one code path, because they are so uniformly
> distributed over the upper part of the histogram.  The prime suspect of
> course being the ide io completions.  I tested this theory by lowering
> max_sectors_kb from 64 to 32:

> These numbers all point to the ide sg completion code as the only thing
> on the system generating latencies over ~42 usecs.

yep, that's a fair assumption. Once the IO-APIC irq-redirection problems
are solved i'll try to further thread the IDE completion IRQ to remove
that ~100 usecs latency.

	Ingo

-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
(Continue reading)


Gmane