Masao Uebayashi | 26 Nov 08:41 2014

Critical section

The problem of kpreempt_*() API is that its meaning is overriden by kernel
internal (scheduler, sync primitives, ...).  This change separates the internal
use (scheduler disables preeemption) and others (kernel subsystem code executes
critical section).  Detect sleep from within critical section in mi_switch().

The only problem I've seen is, cprng_fast.c calling percpu_getref() in
KASSERT(); it's kind of re-entrance.

Index: sys/crypto/cprng_fast/cprng_fast.c
RCS file: /cvsroot/src/sys/crypto/cprng_fast/cprng_fast.c,v
retrieving revision 1.11
diff -p -u -r1.11 cprng_fast.c
--- sys/crypto/cprng_fast/cprng_fast.c	11 Aug 2014 22:36:49 -0000	1.11
+++ sys/crypto/cprng_fast/cprng_fast.c	26 Nov 2014 07:35:51 -0000
 <at>  <at>  -258,8 +258,10  <at>  <at>  static inline void
 cprng_fast_put(struct cprng_fast *cprng, int s)

+#if 0
 	KASSERT((cprng == percpu_getref(cprng_fast_percpu)) &&
 	    (percpu_putref(cprng_fast_percpu), true));
Index: sys/kern/kern_synch.c
RCS file: /cvsroot/src/sys/kern/kern_synch.c,v
retrieving revision 1.308
(Continue reading)

Patrick Welche | 24 Nov 17:29 2014

drm trivial build fix

I found I needed this in the no options SYSCTL_INCLUDE_DESCR case, to
avoid -Werror.
OK to commit?


Index: drm_sysctl.c
RCS file: /cvsroot/src/sys/external/bsd/drm2/drm/drm_sysctl.c,v
retrieving revision 1.3
diff -u -r1.3 drm_sysctl.c
--- drm_sysctl.c	12 Nov 2014 04:53:13 -0000	1.3
+++ drm_sysctl.c	24 Nov 2014 16:18:57 -0000
 <at>  <at>  -40,6 +40,7  <at>  <at> 

 #include <drm/drm_sysctl.h>

 static const char *
 drm_sysctl_get_description(const struct linux_module_param_info *p,
     const struct drm_sysctl_def *def)
 <at>  <at>  -53,6 +54,7  <at>  <at> 
 	return NULL;

(Continue reading)

Nicolas Joly | 23 Nov 17:37 2014

posix_madvise(2) should fail with ENOMEM for invalid adresses range


According the OpenGroup online document for posix_madvise[1], it
should fail with ENOMEM for invalid addresses ranges :

    Addresses in the range starting at addr and continuing for len
    bytes are partly or completely outside the range allowed for the
    address space of the calling process.

But we currently fail with EINVAL (returned value from range_check()

Ok to apply the attached patch to fix posix_madvise/madvise ?




Nicolas Joly

Biology IT Center
Institut Pasteur, Paris.
Index: sys/uvm/uvm_mmap.c
RCS file: /cvsroot/src/sys/uvm/uvm_mmap.c,v
(Continue reading)

Emmanuel Dreyfus | 23 Nov 17:47 2014

Should sys_unmount() FOLLOW?


I ran into this strange bug with glusterfs NFS server, which is possible
because it allows the mounted filesystem root vnode to be VLNK (NetBSD's
native mountd prevents this situation and therefore the bug does not
happen with our native NFS server):

bacasel# mount on /mnt/nfs/0 type nfs

bacasel# ls -l /mnt/nfs
lrwxrwxrwx  1 root  wheel    4 Nov 23 10:03 0 -> dir1

That is possible because on the exported filesystem, symlink1 is a
symlink to dir1.

It looks funny and harmless, at least until one try to unmount while the
NFS server is down. I do this using the most reliable way: umount -f -R

umount(8) will quickly call unmount(2), passing the /mnt/nfs/0 path
without trying anything fancy. But at the beginning of sys_unmount() in
the kernel, we can find:

        if ((error = namei(&nd)) != 0) {
                return error;

(Continue reading)

Michael van Elst | 22 Nov 14:23 2014

Boot wedges

So far, the netbsd kernel supported several methods to determine a
root disk.

1. A literal device denoted by major/minor number in the
   kernel configuration.

2. A literal device name in the kernel configuration. The kernel
   accepts either a driver+unit string (e.g. "sd0") or the
   string "wedge:name-of-wedge" to look up a wedge by name (*).

3. Interactively query the device name on the console. The kernel
   accepts the same strings as in 2.

4. Use the boot device and partition number, MD code interprets
   information from the boot loader to determine a device (driver/unit)
   and partition number.

5. Use the boot device and an offset/size pair, MD code interprets
   information from the boot loader to determine a disk device
   and partition or a wedge that matches the offset/size.

I have added another option. The bootloader may pass a string
that is interpreted as in 2. or 3. Like the other options, the
data is passed in a global variable from MD to MI code.

Previously defined variables are:

boothowto           - boot flags
booted_device       - the boot disk and unit
booted_partition    - the boot partition 
(Continue reading)

Taylor R Campbell | 21 Nov 06:06 2014

pserialized queue(9)

The attached patch adds _PSZ variants to all the insert, remove, and
foreach operations in <sys/queue.h> to issue the necessary store
barriers, for insert, and data-dependent load barriers, for foreach.

I made *_REMOVE*_PSZ an alias for *_REMOVE* so that if you're using
pserialize with a queue, all of the operations you perform on it after
initialization are marked with _PSZ.  That way it should be easy to
eyeball code for obvious mistakes.

There is no, e.g., LIST_NEXT_PSZ because there's no good way to
express that without expression blocks.  Most readers need to iterate
over a queue, anyway, rather than just fetch one entry's next pointer.
I omitted TAILQ_FOREACH_REVERSE_PSZ because, as Dennis observed, TAILQ
reverse traversal violates strict aliasing, so we need a new kind of
queue for that.  (It's also not clear it would be useful.)

I have only compile-tested so far -- I'm asking for review on the
concept before I spend a lot of my Copious Spare Time^TM engineering
and running stress tests.  Thoughts?
Index: share/man/man3/queue.3
RCS file: /cvsroot/src/share/man/man3/queue.3,v
retrieving revision 1.49
diff -p -u -r1.49 queue.3
--- share/man/man3/queue.3	18 May 2014 15:45:08 -0000	1.49
+++ share/man/man3/queue.3	21 Nov 2014 05:04:36 -0000
 <at>  <at>  -53,7 +53,7  <at>  <at> 
(Continue reading)

Masao Uebayashi | 19 Nov 06:38 2014

Re: struct ifnet and ifaddr handling [was: Re: Making global variables of if.c MPSAFE]

On Wed, Nov 19, 2014 at 11:24 AM, Ryota Ozaki <ozaki-r <at>> wrote:
> Weird :-/

I don't think so.  For fast paths to access data really fast, slow
paths take a little way around (pre-allocation, etc).  This is a fair
trade-off.  If you can achieve such a goal (e.g. lockless access of
list) without restructuring, that's really weird. :)

In this case, another approach is to allocate callout per-interface?
Should be considered and compared with alternatives.

Masao Uebayashi | 18 Nov 15:13 2014

pserialize(9) vs. TAILQ

I thought I kind of understood how pserialize(9) works, but the manual
confused me:

              * Perform the updates (e.g. remove data items from a list).
              * At this point it is safe to destroy old data items.

My understanding is that, writers can update data structures between
mutex_enter() and pserialize_perform(), but that operation must be
done atomically, because at that point readers are still reading the
protected data.

In pserialize_perform(), context switches are made on all CPUs.  After
pserialize_perform(), all readers on all CPUs see the update data.
The data old data item is safely destroyed.

In the TAILQ case, where readers iterate a list by TAILQ_FOREACH(),
TAILQ_REMOVE() is safely used as the update operation, because:

- Readers only see tqe_next in TAILQ_FOREACH(), and
- Pointer assignment (done in TAILQ_REMOVE()) is atomic.

If this is correct, pserialize(9) should be updated to be clearer;
(Continue reading)

Kamil Rytarowski | 16 Nov 23:03 2014

Tru64 AdvFS porting to NetBSD - 3. status 2014-11-16


This is the third status [1] of porting AdvFS to NetBSD.

Thanks especially to the rump team for help! The world is small as I still meet new people who tried to get open
pieces of Tru64 or Alpha (like the SRM code) in the past. Lately I was more reading then porting as I have the
occasion to learn new things to me, regarding virtual memory subsystems.

1. What is done
- Basic locking is done [2] (I could say that 80% of code is adapted, and this required 20% of time and effort
for locking)
2. What is in progress
- Studying NetBSD specific bits of VFS, UVM, Virtual Memory, Pager, UBC etc, to be prapared for porting
virtual-memory logic
- Analyzing Tru64 / AdvFS usage of VM (available documentation is helping here)
- Cleaning the code to stop the flood of annoying errors from compiler -- it started to be really clean when
comparing to the initial stage
3. Issues
- This time mostly time and knowledge shortage (regarding VM and VFS internals)
4. Next steps
- Migrate VM, VFS for NetBSD's API
- Squash as many trivial compiler warnings as possible, to stop the flood of errors (GCC 4.8.x and clang
3.5.x do the job well)
5. Pushed to NetBSD
- sys/time.h patches waiting for review/comments (at problem-reports)
- Proposed small patches regarding improvement of documentation (NVNODE, uvn_findpages())

Help and motivation support is appreciated.

Code is here:
(Continue reading)

Subhashish Pradhan | 16 Nov 01:56 2014

NetBSD kernel project


I am Subhashish Pradhan, a 3rd year Undergraduate in Computer Science
at IIIT Bhubaneswar[1].

I am interested in the project "Make /boot.cfg handling machine
independent" listed in the NetBSD wiki's projects page[2].

I would like to take up this or any kernel/networking project for my
undergraduate project/thesis.

Since I am a beginner, I'd like some initial guidance as to attain a
level of familiarity with NetBSD. Are any books /papers required or
the project's (read - organization's) documentation will be enough?

Also do I need to directly contact the listed mentor? I guess that is
for GSoC students.

Subhashish Pradhan

1 -
2 -

Emmanuel Dreyfus | 15 Nov 12:48 2014

set a watchpoint programatically


I am tracking a memory corruption problem that pops up on a field of
struct in a chained list. I would like to set a watchpoint on the field,
but the problem is that the structures are added and removed from the
list, and I cannot reproduce reliabily the bug.

Is there a way to programatically set a watchpoint, without having to do
it by hand on ddb prompt? I would add it when a struct is added on the
list, and delete it when a struct is removed.


Emmanuel Dreyfus
manu <at>