Neal H. Walfield | 3 Jul 13:19

Viengoos and PREMO pagers

I was chatting with a friend last night about Viengoos and he asked
what the advantages of Viengoos are over Mach's PREMO pagers.

The paper on PREMO pagers is:

  Extending The Mach External Pager Interface To Accommodate
  User-Level Page Replacement Policies by Dylan McNamee and Katherine
  Armstrong, 1990.

The main idea is to extend the external pager interface such that when
there is pressure, instead of Mach selecting a page and telling its
associated memory object manager to write that page to backing store
and then free it, Mach sends a notification to the memory object's
manager telling it to choose some pages associated with that memory
object to free.

One shortcoming is that a program may have many memory objects.  For
instance, in the Hurd, a file server associates each file with a
memory object.  Using PREMO pagers, Mach would ask the file server to
choose a page to evict from some specific memory object.  This
severely limits the file server's ability to choose the best page: it
has potentially thousands of memory objects but must choose a page
from the indicated memory object.

Second, although Mach no longer makes the decision regarding which
page to evict, in many cases, it is still not based on application
specific knowledge.  On Viengoos, this is possible, even when using
files.  This is because storage management is separated from memory
management.  When a thread accesses some storage, the *memory* may be
assigned to the activity under which it is running (we say that the
(Continue reading)

Ben Leslie | 8 Jul 02:00
Favicon

Re: RPC overhead

On Mon Jul 07, 2008 at 17:00:58 +0200, Neal H. Walfield wrote:
>I ran an application benchmark on Viengoos.  Specifically, the
>application is derived from the GCbench program.  You can find it
>here:

>Each invocation includes approximately 12 words of payload and each
>reply contains 2 words.  This suggests an RPC overhead of 1350 cycles
>or 1.2 us.
>
>The 4.2 us represents approximately 5000 cycles.  This leaves 3650
>unaccounted cycles.  This seems to be a bit more than one can simply
>accounted to secondary cache effects, however, perhaps ping pong
>really measures the very hot case and I'm running with very cold
>caches.  I hope someone else can suggest how to figure out to what end
>these cycles are being put, has a theory, or can confirm that these
>cycle counts are not, in fact, too high.

That seem about right to me in terms of cache effects. ping-pong runs
very hot. Next step would be to turn on performance monitoring counters
and get a count of cache misses etc.

Cheers,

Benno

Re: RPC overhead

Neil:

If you are running on an IA32 implementation, you should be able to
confirm this using the hardware event tracing registers. Use them to
separately count S cache misses and U+S cache misses. First measure the
ping-pong test, then yours. Look at the S cache misses in particular.
While you are about it, check TLB behavior as well.

It was always Jochen's practice to count performance from trap
instruction to kernel exit. While the L4 trap interface is very well
designed from the IDL compiler point of view, the correct measurement
point is at the boundary of the IDL procedure (because the trap-layer
interface design can mandate copies that might be removed with a
different design).

Further, I have always felt that the microbenchmark approach on this was
a good way to measure the ukernel implementation, but a bad way to
measure systemic behavior. The I/D cache effects and TLB effects are a
sizable component of the cost of separation in real systems.

Not, mind you, that EROS or Coyotos does any better on these issues.
What I'm trying to say is that using microbenchmarks exclusively
constitutes "misleading by omission".

In this respect, it is my opinion that measuring l4linux does not help,
because it is my impression that the IPCs required in that
implementation do not do much string motion (therefore have minimal
cache footprint). That measurement is a fine measurement of hosted linux
performance, but if your goal was merely to run linux quickly, sticking
a microkernel under it is a step in the wrong direction. There are other
(Continue reading)


Gmane