olafBuddenhagen | 1 May 16:11
Picon

Drivers

Hi,

As you all know, our "official" proposal for hardware drivers is based
on a generic (portable) device driver framework for L4 based systems
(fabrica), and a low-level Hurd server (deva) to create the
Hurd-specific environment and interface for the framework.

However, I believe we gain nothing by working with a (somewhat) portable
framework. (If we really want portable drivers, we should probably go
the whole way and aim at UDI...) Portability only makes things more
complicated, rising the entry barrier and developement costs, without
any relevant benefit for us. IMHO, we should go for a native hurdish
framework instead, with full integration and all the benefits from using
Hurd-specific facilities. I'm sure this gives us many advantages: For
those creating the framework itself; for the driver developers; as well
as for the users/admins/system vendors using the framework in the end.

Thus I'm putting forward a different proposal, which is partially based
on the original one (many thanks to the authors for the valuable work,
without which I'd be totally lost), but introduces a completely
different method for integrating the drivers in the system.

I've already got some initial feedback (on IRC); so I hope the greatest
shortcomings are fixed now, and it's ready for exposal to stronger
scrutinity here. However, as I never worked on drivers or on Hurd/L4
before -- I am working on this proposal now because I believe there are
great advantages from doing it the way I'm suggesting, especially from a
user's point of view -- there are still many issues open, and I'm sure
there are also still errors and omissions in other parts as well. So if
you have any comments, suggentions, corrections, additions, whatever: Go
(Continue reading)

Picon
Favicon

Re: Drivers

> For one, with drivers at POSIX level, there is no need for a sophisticated,
> full-featured special driver framework: Library functions, loading and
> unloading drivers, communication amongst drivers and to the outside world,
> configuration management/parameter passing, resource management -- it's all
> already there, using the ordinary POSIX/Hurd interfaces. All we need are a few
> extensions to the standard POSIX/Hurd mechanisms, to allow for driver-specific
> stuff like IRQ handling and I/O port access.
> 

That depends on what you mean by 'POSIX level'. Drivers should be 'user
processes', but they don't need the full POSIX functionality. Actually
most of the posix functions are useless for driver programming. At most
you need basic C functions like memcpy etc. The standard posix
read/write interface is not suited for communication between drivers as
it's stream based. This means the driver has to figure out where the
message boundaries are which is a waste of CPU time and error prone. 

> Having no extra framework for drivers also means driver developement becomes
> much easier: No need to cope with some limited environment. No need to learn
> special APIs for a driver-specific library; driver registering/startup and
> shutdown; memory management; threading and locking; configuration management;
> communication to other drivers and the outside world; permission handling.
> Drivers are written just like any other translator, using all your normal
> progamming experience. All one has to know are a few fairly simple extensions
> to the ordinary POSIX/Hurd interfaces. The only specific APIs a driver handles,
> are access to lower level drivers via their filesystem interfaces, and
> exporting an own filesystem to the world.
> 

I never found the 'limited environment' a problem for doing drivers.
(Continue reading)

Daniel Godás | 6 May 19:17
Picon

problems with _L4_add_substring_address_to

string_item = _L4_string_item (13, string1);
comp_string_item = malloc (sizeof (_L4_string_item) + sizeof (_L4_word_t));
memcpy (comp_string_item, &string_item, sizeof (_L4_string_item));
_L4_add_substring_address_to (comp_string_item, string2);
tmp = _L4_substrings (comp_string_item);
check_nr ("[intern]", "_L4_add_substring_address_to", tmp, 2);

That check is failing (tmp=1). Can anybody tell me what im doing wrong?

cheers,
Daniel

_______________________________________________
L4-hurd mailing list
L4-hurd <at> gnu.org
http://lists.gnu.org/mailman/listinfo/l4-hurd
Matthieu Lemerre | 17 May 02:18
Picon
Favicon

Mechanism to request physical memory with certain properties


Hi,

Neal asked me to find a mechanism to request physical memory with
constraints on location, so here's my proposal.

I'm asking for your advises before writing code and precising some
things.  Maybe some of my explanations are also unclear.  Maybe
there's faster ways to achieve this. Please tell me!

Mechanism to request physical memory with certain properties
============================================================

Description:
------------

We need a mechanism to request physical memory with constraints on location.
Constraints are requirements on the bits on the address:

01X0X10X11X00X means that the bytes you require on the allocation must have 1
and 0 where specified, and can be either 1 or 0 when there is a 'X'. To do this,
we will pass a pair of two words like this:

*bits in the first word will indicate bits which have to be on in
 the address of the physical memory eventually choosen

*bits on in the second word will indicate bits which have to be off
 in the address of the physical memory eventually choosen

So with the example above, 01X0X10X11X00X is represented by
(01000100110000, 10010010000110).

The allocation unit is the frame, wich is 2^p bytes (p=12 on IA32)

Possible solution:
------------------

Here's my proposal to solve this problem.

Memory is represented by a table of bits, in which a bit is set to 1
if the corresponding frame is free, and 0 if not (or not existing).

The table is the following:

    	|2^(N-z)-1 | 2^(N-z)-2 | ... | 1 | 0 |
--------|------------------------------------|
0   	|    0           1             0   0 |
1	|    1           1             0   1 | 
2	|    0           1 (A)         0   1 |
...    	|                                    |
(2^z)-1	|    1           0             1   1 |

Frames are numbered from 0 to (2^N)-1. The number of a frame in the table is:
col_number * 2^z + row_number. 

For instance, frame (A) is free (its bit is set to 1), and its frame number is
(2^(N-z)-2) * 2^z + 2.

So, any frame number can be decomposed like this: 01001 101001010010010001001001
						  ^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^
				 	    (N-z) bits		z bits	      

Similarly, we decompose the constraint word like this:

01XX1 10X10XX11X00X1 XXXXXXXXXXXX
^^^^^ ^^^^^^^^^^^^^^ ^^^^^^^^^^^^
(N-z)bits  z bits    p undefined bits

(since the allocation unit is the frame, you can't impose constraints on the 12
last bits)

The general idea is:
1/To generate a mask of possible cols (made with the N-z bits of the constraint word)
2/To iterate over the possible rows (with respect of the z bits of the constraint words)
3/In each iteration, we apply a logical AND on the mask and the row.  This give
which frames were allocated. Then we clear the corresponding bytes.

It's simpler for the algorithm if 2^(N-z) is the size of a word ( N-z=5 on IA32).

Iteration over the possible rows:
---------------------------------

Here's the algorithm to iterate over the possible rows: (Pseudo-C code)

void
iterate_rows (word_t constraint1, word_t constraint0, function_t function)
{
  iterate_rows_rec (constraint1 >> p, constraint0 >> p,
  		    2^z, 0, function);
}

void
iterate_rows_rec (word_t constraint1, word_t constraint0,
		  word_t mask, word_t beginning_of_word, function)
{
  if (mask == 0)
    function(beginning_of_word);
  else
    {
      if !(constraint1 & mask)
        iterate_rows_rec (constraint1, constraint0, mask>>1,
			  beginning_of_word, function);
	
      if !(constraint0 & mask)
        iterate_rows_rec (constraint1, constraint0, mask>>1,
			  mask | beginning_of_word, function);
    }
}

Note: the problem with this algorithm is that allocation is not distributed,
so future allocations may take more time.  But there exist workarounds, like
the following modification to the algorithm:

void
iterate_rows_rec (word_t constraint1, word_t constraint0,
		  word_t mask, word_t beginning_of_word,
		  word_t pseudo_random_word, function)
{
  if (mask == 0)
    function(beginning_of_word);
  else
    {
      if (pseudo_random_word & mask)
        {
      	   if !(constraint1 & mask)
             iterate_rows_rec (constraint1, constraint0, mask>>1,
      	     		       beginning_of_word, function);

           if !(constraint0 & mask)
      	     iterate_rows_rec (constraint1, constraint0, mask>>1,
      			       mask | beginning_of_word, function);
        }
      else
        {
           if !(constraint0 & mask)
      	     iterate_rows_rec (constraint1, constraint0, mask>>1,
      			       mask | beginning_of_word, function);
			       
           if !(constraint1 & mask)
             iterate_rows_rec (constraint1, constraint0, mask>>1,
      	     		       beginning_of_word, function);
        }
    }
}

Pseudo-random word could be set so that we can indicate that some memory
is more desirable than other. (Like memory for DMA transfers should be allocated
less easily).

Generation of the column mask
-----------------------------

We have to generate a column mask which will make us able to check one row at one time.
If there is no constraints on the upper N-z cols, we can check 2^(N-z) frame at one time!

If N-z = 5: 

column_mask_maker_0 = {0b01010101010101010101010101010101,
		       0b00110011001100110011001100110011,
 		       0b00001111000011110000111100001111,
		       0b00000000111111110000000011111111,
		       0b00000000000000001111111111111111};
		       
column_mask_maker_1 = {0b10101010101010101010101010101010,
 		       0b11001100110011001100110011001100,
		       0b11110000111100001111000011110000,		       
		       0b11111111000000001111111100000000,
		       0b11111111111111110000000000000000};

column_mask = ~0;

/* Only the first N-z bits are interresting.  */
constraint1 = constraint1>>(p+z); 
constraint0 = constraint0>>(p+z); 

for(i=4; i>=0; i--)
  {
    test_mask = 1 << i;
    if(constraint1 & test_mask)
      column_mask &= column_mask_maker_1[i];
    else if(constraint0 & test_mask)
      column_mask &= column_mask_maker_0[i];
  }

Test of the row and allocation
------------------------------

Returns the number of allocated frames in the row.

int
allocate_frames(word_t row_number, word_t col_mask, allocation_t result)
{
   word_t allocated_frames = memory[row_number] & col_mask;

   /* Deallocates the allocated frames*/
   memory[row_number] &= ~col_mask;

   result.row_number = row_number;
   result.allocated_frames = allocated_frames;

   return NUMBER_OF_BITS (allocated_frames);
}

An allocation is a list of (row_numbers X allocated_frames):

struct allocation
{
  word_t row_number;
  word_t allocated_frames;
  struct allocation_unit *next;
};

Deallocation of frames
----------------------

Thus, deallocation is trivial: We just have to re-set the right bits
at the right place.

void
deallocate (allocation_t *allocation)
{
  allocation_t unit = allocation;
  do
   {
     word_t row_number = unit.row_number;
     memory[row_number] |= unit.allocated_frames;
    } while(unit = unit.next);
}

Conclusion
==========

Allocating memory should not be too long using this algorithm, provided that memory
is distributed over the different rows.  Deallocation is very fast.

One possible variant would be not to have an array of bits, but to
have an array (of size 2^z) of lists. But that would consume more
memory and would be slowest, I think.

Thanks,

Matthieu
olafBuddenhagen | 18 May 14:26
Picon

Re: Mechanism to request physical memory with certain properties

Hi,

> 01X0X10X11X00X means that the bytes you require on the allocation must
> have 1 and 0 where specified, and can be either 1 or 0 when there is a
> 'X'. To do this, we will pass a pair of two words like this:
> 
> *bits in the first word will indicate bits which have to be on in the
> address of the physical memory eventually choosen
> 
> *bits on in the second word will indicate bits which have to be off in
> the address of the physical memory eventually choosen

A more intuitive interface would be one word with the desired bits, and
one word with a mask which bits are actually relevant. This is more
robust too, as you never have conflicting information (some bit being
requested to be '0' and '1' at the same time) -- either a bit is
specified directly or it is ignored.

Also, this seems to be the more usual approach -- I believe I've seen it
in similar contexts already.

-Olaf-
Matthieu Lemerre | 21 May 01:13
Picon
Favicon

Mechanism to request physical memory with certain properties, new algorithm


Hi,

My previous previously proposed algorithm had the problem that I
didn't took into consideration superpages, which are groups of
power-of-two pages aligned on a power-of-two boundary.

My new proposal looks like the buddy system in the way allocation is
done; you have zones of size 2^n that you split into two equal parts
when needed.

The difference is that instead of keeping lists of zones of each size,
zones are arranged into a tree, which enable to search for zone with
constraints more efficiently.

The node of a tree is the following struct:

struct tree_node
{
  struct tree_node *parent;
  struct tree_node *left;
  struct tree_node *right;
  word_t present_sizes;
  };

  Each node represent a region in memory. Root node represents the whole memory
  (considering that memory is of size 2^M, if that's not the case, you can act as
  if memory was of size 2^M, and some of the memory was already allocated).

  When a node has sons, it means that it has ben cut in two: the left son is the
  lower region, the right son is the upper region.

  A node has no sons if it has been allocated as a whole, or if it's free. 

  present_size is word so that (present_size & (1 << i)) if the subtree contains
  at least one zone of size 2^i.

  (Presence of a zone of size 2^i doesn't imply presence of a zone of size 2^i-1)

  The goal of this present_sizes word is to find as quickly as possible
  if an allocation of a certain size can't be made, and thus to explore as little
  of the tree as  possible.

  Normally we have the relation:
  tree.present_sizes == tree.left.present_sizes | tree.right.present_sizes

  Thus, when modifying this field, you have to go up through the list of parents
  to modify the parent's present_sizes list consequently.

   Note: all of the following functions are tail-recursive.

   Allocation :
   ============

What allocation do is to explore the tree (choosing the branches according
to the constraints on the address).  If there is no possible allocation for
the requested zone size, then it search a larger zone that it splits in halfs,
and use for its allocation. It's quite similar in spirit to the buddy system.

struct allocation
{
  struct tree_node *node;
  word_t position;
  word_t mask_size;
};

   
   /* Find a node of the wanted size.
      MASK_SIZE is (1 << i) where 2^i is the number of wanted frames. */
   bool_t 
   find_node(word_t constraint1, word_t constraint0, word_t mask,
	     word_t mask_size, tree_node_t node, allocation_t result,
	     word_t current_pos)
  {
    if (node.present_sizes & mask_size)
      {
	/* Did we find a zone with the good size.  */
	if (node.present_sizes == mask_size)
	  return node;
	
	if !(constraint1 & mask)
	  result.node = find_node (constraint1, constraint0, mask>>1,
				   mask_size, node.left, result,
				   current_pos);
	if (result.node)
	  {
	    result.position = current_pos;
	    result.size = mask_size;
	    return true;
	  }
	
	if !(constraint0 & mask)
	  result.node = find_node (constraint1, constraint0, mask>>1,
			      mask_size, node.right, result,
			      current_pos | mask);
	if (result.node)
	  {
	    result.position = current_pos;
	    result.size = mask_size;
	    return true;
	  }
      }

    return false;
  }

   
   /* Update the present_sizes information in each node.  */
   void
   update_nodes (tree_node_t node)
  {
    new_sizes = node.left.present_sizes | node.right.present_sizes;
    if (node.present_sizes == new_sizes)
      return;

    node.present_sizes = new_sizes;
    if (node.parent)
      update_nodes (parent);
  }

   l4_word_t
   zalloc (word_t constraint1, word_t constraint0, size_t size)
  {

    assert (constraint1 & constraint0 == 0)

    struct allocation result;
    word_t initial_mask_size = (1 << size);
    word_t mask_size = initial_mask_size;
    word_t mask = (1 << 31); /* (1 << 63 on 64 bits processors) */

    while (!find_node (constraint1, constraint0, mask, mask_size,
		       root_node, result, 0))
      {
	/* Try with a larger zone.  */
	mask_size = mask_size << 1;
	
	/* Allocation was impossible. */
	if (mask_size == mask)
	  return 0;
	
      }

    tree_node_t current_node = allocation.node;

    /* Allocate the node according to the constraints.  */
    while (mask_size != initial_mask_size)
      {
	tree_node_t node1, node2;

	/* Allocate 2 struct nodes.  */
	allocate_nodes (&node1, &node2);
	
	mask_size = mask_size >> 1;
	
	node1->present_sizes = mask_size;
	node1->parent = current_node;
	node1->left = NULL;
	node1->right = NULL;

	node2->present_sizes = mask_size;
	node2->parent = current_node;
	node2->left = NULL;
	node2->right = NULL;

	current_node->left = node1;
	current_node->right = node2;

	/* New allocations are created.  */
	update_nodes (current_node);

	if (!(mask_size & constraint1))
	  current_node = left;
	else
	  {
	    current_node = right;
	    allocation.position |= mask_size;
	  }
      }
	
	
    /* Here, current node is a correct node of the good size.  */
    current_node.present_sizes = 0;
    update_nodes (current_node);

    return allocation.position;
  }

  /* We could instead store current_pos in each node, and allocation
     would be slightly faster, but would consume more memory */

Deallocation (general idea) 
============================	

   void
   zfree (word_t address, word_t size)
  {
    /* Find the node by following the bits in the address.  */

    
    /* Look if we can merge the node with its buddy.  Do this recursively.  */

    /* Call update_node on the last un-mergeable node.  */
  }

   This show the general idea.  Optimisations are possible;
   (for instance, I just saw that it would be better to change tree_node
   to store the address of the buddy node, and of the first child.
   This would be better for zfree.)

We could indicate that low memory should be allocated less easily for instance
by exploring the "right" buddy before the left one. We could also have seperate
trees for low memory and normal memory (like Linux does, I believe)

Comparaison with the buddy allocator
====================================

We can do a quick comparaison with the buddy allocator, so in the case where
there is no constraints.

In that case, this zfree takes as many operation as the buddy allocator: it's
a very similar algorithm.

zalloc is different: assuming the size (2^t) you want is present
(which you can know by just looking at the present_sizes word of the root node),
in the worst case you have to look to two nodes for each level to find the good node.

So this take 2*(N-t) operations to find the node.
Then, also for the worst case, you have (N-t) operations to update the node.

So you have 3*(N-t) "operations" maximum to do an allocation, so if N=20 and t=0
it's 60 operations. (In the worst case where you have only 1 single frame
located at position 0, which is a theorical case.)

In average, if you take t=5, you would have 1.5*(N-t)+ 3 ~= 25 operations to do
an allocation with no constraints. So I thing using this algorithm instead of the
buddy allocator would not be too slow (or it would not be visible).

Thanks,
Matthieu
Maurizio Boriani | 21 May 13:58
Picon

missing file


Hi,
        missing libl4/powerpc/l4/bits/arch.h in hurd-l4 cvs...

last './confgiure' output lines:

config.status: linking ./libl4/powerpc/l4/bits/misc.h to include/l4/bits/misc.h
config.status: linking ./libl4/powerpc/l4/bits/arch.h to include/l4/bits/arch.h
config.status: error: ./libl4/powerpc/l4/bits/arch.h: file not found

thanks,
baux

--

-- 
Maurizio Boriani 
GPG key: 0xCC0FBF8F
 => E429 A37C 5259 763C 9DEE  FC8B 5D61 C796 CC0F BF8F <=
Matthieu Lemerre | 21 May 15:34
Picon
Favicon

Re: missing file

Maurizio Boriani <baux <at> gnu.org> writes:

> Hi,
>         missing libl4/powerpc/l4/bits/arch.h in hurd-l4 cvs...
>
> last './confgiure' output lines:
>
> config.status: linking ./libl4/powerpc/l4/bits/misc.h to include/l4/bits/misc.h
> config.status: linking ./libl4/powerpc/l4/bits/arch.h to include/l4/bits/arch.h
> config.status: error: ./libl4/powerpc/l4/bits/arch.h: file not found
>
> thanks,
> baux

Hi,

This patch should correct the problem. 

2005-05-21  Matthieu Lemerre <racin <at> free.fr>

	* powerpc/Makefile.am (nobase_include_HEADERS): Add l4/bits/arch.h,
	l4/bits/compat/arch.h, l4/bits/compat.h. Replaced one
	l4/bits/compat/space.h with l4/bits/gnu/space.h
	* powerpc/l4/bits/arch.h, powerpc/l4/bits/compat/arch.h,
	powerpc/l4/bits/gnu/arch.h: New file.

Apply this patch:

Attachment (patch): application/octet-stream, 907 bytes

This is powerpc/l4/bits/arch.h

Attachment (arch.h): text/x-chdr, 1304 bytes

This is powerpc/l4/bits/compat/arch.h

Attachment (arch.h): text/x-chdr, 1087 bytes

This is powerpc/l4/bits/gnu/arch.h

Attachment (arch.h): text/x-chdr, 1085 bytes

Thanks,

Matthieu
_______________________________________________
L4-hurd mailing list
L4-hurd <at> gnu.org
http://lists.gnu.org/mailman/listinfo/l4-hurd
Martin Schaffner | 22 May 15:36
Picon

Re: Mechanism to request physical memory with certain properties, new algorithm

À Sat, 21 May 2005 01:13:03 +0200, From: Matthieu Lemerre 
<racin <at> free.fr> a écrit:

> struct tree_node
> {
>   struct tree_node *parent;
>   struct tree_node *left;
>   struct tree_node *right;
>   word_t present_sizes;
>   };
>
>   Each node represent a region in memory. Root node represents the 
> whole memory
>   (considering that memory is of size 2^M, if that's not the case, you 
> can act as
>   if memory was of size 2^M, and some of the memory was already 
> allocated).
>
>   When a node has sons, it means that it has ben cut in two: the left 
> son is the
>   lower region, the right son is the upper region.

I suggest calling "left" "lower" and "right" "upper" (or 
"upper_region", or "upper_half"), so that people do not need to keep 
this mapping in their heads. This makes it easier especially for people 
with right-to-left scripts.

Thank you,
Martin
Picon

booting l4-hurd leading to hang after starting ruth...

Hi,

After following instructions as they are in README of the cvs source
of the hurd-l4 module, including the same boot modules in the same
sequence under the grub menu:

.../laden -D
.../ia32-kernel
.../libexec/l4/sigma0
.../wortel -D
.../physmem
.../task
.../deva
.../task
.../ruth

I get no prompt and no message (both when adding -D or not to ruth):

...
physmem:frame_memory_alloc:  allocated physical memory: 30f000+1000
ruth:main: ruth 0.0
ruth:main: Hello, here is Ruth, your friendly root server!
physmem:frame_memory_alloc: allocated physical memory: 1e0000+10000
physmem:frame_memory_alloc: allocated physical memory: 41c000+1000

Then it just hangs there...

Well, only change is that on "make menuconfig" I just selected pentium
2/3, not the default pentium 4, since I'm using laptop dell inspiron
600m which is pentium mobile (neither 3, neither 4)....

Is this supposed to happen?  I don't see any terminal started neither
a prompt or something I'm familiar with...  It looks like the process
just hanged...

Thx,

--

-- 
Javier-Elias Vasquez-Vivas

Gmane