Fortes Marcelo | 1 Aug 21:49
Picon
Favicon

64-bit virtual adresses and registers

>> I agree with Tanenbaum that if an active program's working set
>> does not fit in core and must be swapped, there is a problem.
shap> if you believe that we will remain crippled by 32-bit address limits
shap> forever, then Andy is right. But if you believe that 64-bit virtual
shap> addresses are an inevitable transition (as I do), then I think Andy's
shap> assertion is questionable.
64-bit virtual addresses and 64-bit virtual registers can be a possible solution to context swtching
on a multi-server operating system  running on top of a microkernel ?
There is any literacture about it ?
 
Thanks in advance
Marcelo Fortes.

Alertas do Yahoo! Mail em seu celular. Saiba mais.

_______________________________________________
L4-hurd mailing list
L4-hurd <at> gnu.org
http://lists.gnu.org/mailman/listinfo/l4-hurd

Re: 64-bit virtual adresses and registers

On Wed, 2007-08-01 at 16:49 -0300, Fortes Marcelo wrote:
> >> I agree with Tanenbaum that if an active program's working set
> >> does not fit in core and must be swapped, there is a problem.
> shap> if you believe that we will remain crippled by 32-bit address
> limits
> shap> forever, then Andy is right. But if you believe that 64-bit
> virtual
> shap> addresses are an inevitable transition (as I do), then I think
> Andy's
> shap> assertion is questionable.
> 64-bit virtual addresses and 64-bit virtual registers can be a
> possible solution to context swtching
> on a multi-server operating system  running on top of a microkernel ?
> There is any literacture about it ?

More bits doesn't eliminate the need to context switching unless you
adopt a "single address space" model. There is lots of work on that. Try
a google search for "SASOS" and/or "single address space operating
system".

Adopting a single address space view is a religious issue.

shap
olafBuddenhagen | 2 Aug 21:05
Picon

Re: Resource Allocation Strategy

Hi,

On Fri, Jul 13, 2007 at 02:31:37PM +0200, Neal H. Walfield wrote:

> When an activity acts as a resource scheduler for its children, it has
> maximum control over how resources are allocated between them.  In
> principle, this allows the realization of most any conceivable
> allocation policy or mechanism a developer may desire.  This control
> has a number of costs.  There are computation costs in terms of
> interposition and bookkeeping.  There is also the cost of added
> complexity.  And, there are questions.  If components are often
> combined, how much freedom do developers really have?  Equally
> important, to what degree do developers require such control?

I totally agree that a decentralized approach causes a lot of overhead;
and in a traditional system, where everything interacts quite
intimately, it's probably not worth it because of other constraints.

Yet, I'm not sure the option of decentralized management should be
completely discarded. Think of virtual machines: In this case, it's
totally out of question to have a central resource manager that controls
all allocations within the guest systems. The guests want to keep total
control over how things are managed inside their VMs.

Traditionally, the resources for VMs are allocated statically, which is
a total waste of course. So it is desirable to have a mechanism where
the guests have full control over their VMs, yet can interact with the
host to dynamically manage resources for the whole VM. (A while back I
learned that Xen is working on something like that; no idea what became
of it...)

The current popularity of VMs shows that in many cases, the benefits of
the strong isolation outweigh the disadvantages of missing
centralization and integration. Of course, the ability to relatively
easily create custom sub-environments in the Hurd is not quite
comparable to full-blown VMs; yet it might be an interesting direction
to look in...

-antrik-
olafBuddenhagen | 2 Aug 20:22
Picon

Re: Resource Revocation

Hi,

On Fri, Jul 13, 2007 at 05:16:08PM +0200, Neal H. Walfield wrote:

> Just as we have activities assign preferences to composite activities,
> we can provide a mechanism to allow activities to assign preferences
> to pages.  We can also assign policies to pages. For instance, instead
> of sending a page to backing store, the application may mark a page as
> discardable.  If such a page is chosen for eviction, it should simply
> be freed.  (In this case, a fault needs to be triggered on the next
> access.)  This strategy removes the major source of destructive
> interference: the scheduler no longer waits on the task; the task acts
> beforehand on its own initiative or it reacts to a signal (not a
> request).

I'm not sure I fully understand your proposal; but it may be quite
ironic in some sense...

When I first saw some mention of market-based resource management (in
the context of Hurd/L4), there were very few details, and I thought up
some stuff on my own. My idea was that every task assigns priorities to
all it's physical pages. (In the market model, the maximum rent it's
willing to pay.) Whenever there is memery pressure in the system, the
rent goes up. A central pager then checks all tasks for pages with a
priority below the current rent, and evicts these.

Leaving the market stuff aside, this just means that tasks assign
priorities to their own pages, and a central pager evicts pages based on
these priorities when there is memory pressure.

I was quite sceptical about this design myself, fearing that it might
require too much management for the tasks to assign (and constantly
reassign as demand changes) page priorities, and for the pager to find
the current lowest-priority pages.

Later I learned that your original design was totally different: Tasks
simply get a certain number of physical pages (determined by a
market-based approach or some other rules), and have to do self-paging
within these. The number of available pages changes from time to time,
and the task has to adapt to that. (Where reducing the amount of pages
requires immediate action.)

While self-paging is appealing in a hurdish sense (it allows endless
flexibility changing the implementation), I never really warmed up to
it, because of the problems you described above with explicit
revocation, and suboptimal resource usage due to lack of a global
picture.

Last year you proposed a design where a global pager would evict pages
when the number of pages available to a task is reduced due to memory
pressure; but otherwise tasks have full control over paging by deciding
what happens on a page fault. This looked like a quite promising
compromise: Avoiding the problems of explicit revocation, but still
leaving very much flexibility.

Now you are proposing a design which -- unless I misunderstood your
description -- seems pretty much the same as my original idea. Could you
confirm whether this is indeed the case? And if so, how do you want to
implement it, to avoid too much management overhead?

> This solution is clearly more limiting: the appropriate strategies
> must be identified beforehand.  However, not that many common
> strategies come to mind.  In fact, I suspect that either sending to
> backing store or simply freeing cover the most common scenarios.

Well, I can think only of one other approach: Some applications might
work on large data sets, which can't be discarded because they are not
easy to regenerate, but can be converted to some more compact
representation temporarily at a low cost. This compacting is obviosly
application-specific, so it can't be handled by a generic mechanism.

However, this variant is pretty academic. It seems rather unlikely that
there are actually cases where an application would profit from such
compacting on memory pressure, but on the other hand it would be too
costly to just always do it. And even if such cases indeed exist, I very
much doubt anyone would bother to relly implement this...

What I'm saying is that I agree that swapping or discarding are probably
the only useful actions. I'm not sure how much a global paging mechanism
is limiting possible policies, though...

-antrik-
Fortes Marcelo | 6 Aug 21:52
Picon
Favicon

Re: 64-bit virtual adresses and registers

A Single address space for a multi server Operating on
top of a microkernel don looks like a good idea, how
can i  protect each process server from each other in
a single adress space ?

In fact my question was more related to if any kind of
cpu registers virtualizations can reduce impact over
performance due context change between servers.

In time,(pardon me by my ignorance)Is Eros/CoyotOS IPC
message passing sychronous like L4 or Assyinchronous
like Mach ?

There is any performance comparison between CoyotOS
and Modern L4 Implementations,(FIASCO, L4KA)?

Best wishes
Marcelo Fortes

> > 64-bit virtual addresses and 64-bit virtual
> registers can be a
> > possible solution to context swtching
> > on a multi-server operating system  running on top
> of a microkernel ?
> > There is any literacture about it ?
> 
> More bits doesn't eliminate the need to context
> switching unless you
> adopt a "single address space" model. There is lots
> of work on that. Try
> a google search for "SASOS" and/or "single address
> space operating
> system".
> 
> Adopting a single address space view is a religious
> issue.
> 
> shap
> 
> 
> 

      Alertas do Yahoo! Mail em seu celular. Saiba mais em http://br.mobile.yahoo.com/mailalertas/
Neal H. Walfield | 6 Aug 21:59

SASOS

At Mon, 6 Aug 2007 16:52:26 -0300 (ART),
Fortes Marcelo wrote:
> 
> A Single address space for a multi server Operating on
> top of a microkernel don looks like a good idea, how
> can i  protect each process server from each other in
> a single adress space ?

You do not necessarily sacrifice page-grain protection.  See Mungi as
an example of an SASOS.

  http://www.ertos.nicta.com.au/research/mungi/

In particular, for instance, this paper:

  Mungi: a distributed single address-space operating system by Gernot
  Heiser, Kevin Elphinstone, Stephen Russell and Jerry Vochteloo

  ftp://ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/9314.pdf
Neal H. Walfield | 6 Aug 22:01

Coyotos IPC

At Mon, 6 Aug 2007 16:52:26 -0300 (ART),
Fortes Marcelo wrote:
> In time,(pardon me by my ignorance)Is Eros/CoyotOS IPC
> message passing sychronous like L4 or Assyinchronous
> like Mach ?

It is primarily synchronous like L4.  There has been discussion
regarding the addition of an asynchronous notification feature.

> There is any performance comparison between CoyotOS
> and Modern L4 Implementations,(FIASCO, L4KA)?

If Coyotos' implementation is far enough that that is possible, then
it is only very recently so.  So no, I don't think there are any
benchmarks yet.

Neal

Re: 64-bit virtual adresses and registers

On Mon, 2007-08-06 at 16:52 -0300, Fortes Marcelo wrote:
> A Single address space for a multi server Operating on
> top of a microkernel don looks like a good idea, how
> can i  protect each process server from each other in
> a single adress space ?

The idea behind a single address space operating system (SASOS) is that
all processes agree about the address of every objects. They are not
required to agree about permissions. In real implementations, a context
switch generally means changing the mapping table pointer. This doesn't
change the effective va->pa bindings, but it does change the
permissions.

In real implementations, it is also possible that bindings are built
lazily. An object may be logically mapped at a location without any
hardware mapping existing in the page table of a particular process.

> In fact my question was more related to if any kind of
> cpu registers virtualizations can reduce impact over
> performance due context change between servers.

It doesn't really change anything that I can see in this regard.

> In time,(pardon me by my ignorance)Is Eros/CoyotOS IPC
> message passing sychronous like L4 or Assyinchronous
> like Mach ?

Synchronous. We did discuss asynchronous IPC for a while, but the idea
was much too complicated. It was eventually dropped.

> There is any performance comparison between CoyotOS
> and Modern L4 Implementations,(FIASCO, L4KA)?

Not yet. Coyotos only ran it's first rough IPC three weeks ago.

I do not expect that Coyotos IPC will be as fast as L4 IPC. EROS IPC was
comparable to L4. Experience at the application level led us to make the
Coyotos mechanism more flexible. This has a cost at the kernel level,
but it significantly reduces certain application overheads.

As you look at performance, remember that a fast IPC is not the goal.
The goal is a fast overall system. If you can make a faster IPC without
losing flexibility, that is good. When a proposed design change to the
IPC makes the IPC faster by penalizing the application, you are not
winning anymore.

Also, be skeptical of hot cache performance (which is what all of us
measure). Most processes run long enough to clear the cache, so IPCs are
bimodal; some are hot cache and some are cold cache. The balance depends
on the structure of the OS. As the frequency of cold-cache IPC rises,
the importance of a 10 cycle performance gain in the IPC path shrinks
rapidly -- a single cache miss on modern hardware can be 40 cycles or
more.

Performance is a systemic issue, and IPC is only one part of that issue.

shap
Espen Skoglund | 7 Aug 16:17
Picon
Favicon

Re: 64-bit virtual adresses and registers

[Jonathan S Shapiro]
> On Mon, 2007-08-06 at 16:52 -0300, Fortes Marcelo wrote:
>> In time,(pardon me by my ignorance)Is Eros/CoyotOS IPC message
>> passing sychronous like L4 or Assyinchronous like Mach ?

> Synchronous. We did discuss asynchronous IPC for a while, but the
> idea was much too complicated. It was eventually dropped.

I assume that you might still support some sort of asynchronous
notification mechanism?  This is very useful when one needs an
efficient and reliable signal delivery mechanism.  Some time ago I
implemented something similar to the async notification mechanism
described in the NICTA Nx APIs.  The following numbers where measured
on a quad-core Intel Xeon E5310 @ 1.60GHz:

                                           SMP-kern  SMP-kern  SMP-kern
                       Inter-AS  Intra-AS  Inter-AS  Inter-AS    XCPU

Send async notify:       99.13     97.15    113.67    113.66    123.13
Poll async notify:        1.55      1.55      1.55      1.55      1.55
Pingpong async notify:  543.07    233.57    618.21    315.42   2837.07
Pingpong (single MR):   523.26    207.33    453.07    208.28   8184.09

As one can see polling is extremely cheap since it can be done
completely in user-mode.  Sending a notification in enters the kernel,
updates a bitmask in the destination, and checks if the destination
needs to be woken up (in this case no wakeup).  Async pingpong is two
threads using the async notification mechanism to wake each other up,
and single MR pingpong is the standard pingpong measurement for a
single message register.

Some observations:

  - Sending a notification (i.e., with kernel entry and cheking for
    whom to schedule) takes less than 100 cycles (some more on SMP
    because we have to use bus-locking instructions and do some more
    tests).
  - Sending a notification has pretty much the same cost whether it be
    to a local or remote CPU.  The reason for the difference is that
    the remote thread is constantly polling for notifications, so the
    cache line will have to transition from modified to shared all the
    time.
  - Async pingpong (single CPU) is typically a little more costly than
    sending a 0-byte synchronous IPC.  This is to be expected because
    of some more extra work to be done.
  - Async pingpong between CPUs is about 1/3 the cost of synchronous
    IPC.  This is due to not having to wait for other CPU to synch up
    so that one can do a rendezvous between the threads.

I've also done some measurements on an AMD box.  These are quite a bit
faster (especially for inter-AS due to the TLB flush filter), but I
don't have the numbers at hand right now.

	eSk
Neal H. Walfield | 9 Aug 17:04

Support for Multiple Page Sizes

I'm curious how a memory manager should expose and handle multiple
page sizes.  In particular, I am curious what motivates large pages
and whether the resulting complexity of supporting them is thereby
justified.

Requirements
------------

The system must be reasonable reasonable efficiency, deal with
disruptive interference.  The first requirement precludes static
partitioning.  The second does not allow self-paging for reasons
explained in my last note.  It also requires that all resource be
accounted.  This includes, e.g., PTEs.  Thus, after the various
resource managers start, no further allocations are required on their
part: a principal always pays for all shared resources used on its
behalf.

Based on these requirements, I have tentatively concluded that memory
must be managed centrally and that this resource manager must do its
own paging.

Scenario
--------

Let's imagine that a task allocates a large page, the resource manager
admits this allocation and at some point decides to evict the page.
When the application next requires the page, there are a number of
possibilities:

 - There is sufficient, properly-aligned contiguous memory and
   resource manager is willing to allocate it to the principal

   In this case, the resource manager can just page the data back in
   and continue.

 - There is sufficient memory, however, it is not properly-aligned or
   not contiguous; the resource manager is willing to allocate all of
   it to the principal

   In this case, there are two strategies: defragmentation and
   splitting.

   Defragmentation:

   When the resource manager decides to defragment memory, who should
   pay for it?  The thread that caused the fault could in theory pay,
   however, the fragmentation may be the result of another task's
   usage pattern.  Is it right to penalize the faulting task?

   This approach also introduces two types of quality of service
   cross-talk.  Tasks that use large pages can induce the relocation
   of other tasks' memory.  During the copy, this suspends access to
   the page.  It also induces a page-fault on next access, which must
   be handled by the RM.  Second, tasks that cause physical memory
   fragmentation impact tasks that use large pages as the latter may
   have to defragment memory.

   Splitting:

   It is also possible to simply split the large page into smaller
   pages.  Again, the question is who should pay.  In this case, the
   required resource is primarily memory, e.g., PTEs.  In the case of
   breaking a 4MB page into 4KB pages, an extra 4kb is required.  This
   could be solved by simply including this in the fixed price of
   allocating a single large page.  This would still be cheaper than
   allocating an array of base-page sized pages.

 - And, the resource manager is willing to allocate the task less
   physical memory than is required to hold the entire page

   In this case, the only option to still permit the task to progress
   is splitting.  The same problems apply as above.

Guarded Page Tables
-------------------

To support multiple page sizes ala L4 while correctly accounting all
memory, seL4 has choosen to use a variation of guarded page tables
[1].  Looking up pages in a GPT is relatively straightforward.  I've
thought a bit about modifying a GPT and it appears that it can be
fairly complex and require some shuffling on insert if you don't know
beforehand how the address space will look.

Why Multiple Page Sizes
-----------------------

In Jochen's paper in which he describes guarded page tables he
discusses fitting objects into the minimum number of pages possible.
But is the common usage pattern to really access all bits of an
object?  Today we don't swap tasks; we page them.  The justification
is that only bits and pieces are used and the cost of the additional
faults, memory and TLBs is generally less than the cost of paging the
object as a whole.

An exception to this observation is a rendered image.  In this case,
the image is often either completely or wholly displayed on the screen
(think web browser); the entirety of the object is used at once.

Tentative Conclusion
--------------------

This analysis has led me to the tentative conclusion that for paged
data, the complexity of supporting multiple page-sizes may not be
justified.

Comments?

Thanks,
Neal

[1] Page Table Structures For Fine-Grain Virtual Memory by Jochen
    Liedtke (1994).

    http://citeseer.ist.psu.edu/liedtke94page.html

--

Gmane