Sean McGranaghan | 14 Apr 16:59 2004

Stuck in biowait() again....

Hello all,

I have a strange situation occuring where it looks like biowait() is waiting
for a buffer that never gets loaded via strategy(). I have looked over the
archives and found some similiar situations. I found an old bug reference
#3694/Kern that could explain my situation. What is the status of that bug?

Here is my situation:

I am building a diskless kernel on the evbarm platform. I am trying to debug
a simple polled mode disk driver for a flash device. The driver is basically
functional. The driver is implemented using a worker thread that is woken up
by strategy() when new requests arrive. The worker thread processes all the
requests on the queue, calling biodone() as needed and goes back to sleep.
This appears to be working fine. Basic read/write operations seem to be
working. I can mount a MSDOS filesystem from the device and walk through the
directory tree. I can create and delete files. The problem first appeared
when I starting using "cat" to read files larger than one block (512 bytes).

I can read small files (<512 bytes) perfectly. When trying to read a 513
byte file "cat" gets stuck inside biowait() in tsleep(). I added debugging
to trace the buffer requests through each biowait(), biodone() and
strategy() call. It looks like the buffer that biowait() is waiting for was
never loaded into the cache. Every buffer request I receive via strategy() I
see processed in the worker thread. I can not find a strategy() call for the
buffer that causes the sleep in biowait()!

I must be missing something. This is my first netbsd block driver so maybe I
am filling the buffer requests incorrectly? Are there other response fields
in the buffer struct that I need to set? I set in b_bcount, b_resid, b_error
(Continue reading)

Sean McGranaghan | 14 Apr 17:05 2004

Stuck in biowait() again....

Hello all,

I have a strange situation occuring where it looks like biowait() is waiting for a buffer that never gets loaded via strategy(). I have looked over the archives and found some similiar situations. I found an old bug reference #3694/Kern that could explain my situation. What is the status of that bug?

Here is my situation:

I am building a diskless kernel on the evbarm platform. I am trying to debug a simple polled mode disk driver for a flash device. The driver is basically functional. The driver is implemented using a worker thread that is woken up by strategy() when new requests arrive. The worker thread processes all the requests on the queue, calling biodone() as needed and goes back to sleep. This appears to be working fine. Basic read/write operations seem to be working. I can mount a MSDOS filesystem from the device and walk through the directory tree. I can create and delete files. The problem first appeared when I starting using "cat" to read files larger than one block (512 bytes).

I can read small files (<512 bytes) perfectly. When trying to read a 513 byte file "cat" gets stuck inside biowait() in tsleep(). I added debugging to trace the buffer requests through each biowait(), biodone() and strategy() call. It looks like the buffer that biowait() is waiting for was never loaded into the cache. Every buffer request I receive via strategy() I see processed in the worker thread. I can not find a strategy() call for the buffer that causes the sleep in biowait()!

I must be missing something. This is my first netbsd block driver so maybe I am filling the buffer requests incorrectly? Are there other response fields in the buffer struct that I need to set? I set in b_bcount, b_resid, b_error and b_flags. What exactly are the rules on responding to the buffer requests? (I based my driver on a combination of the memory (md) and floppy (fd) drivers.)

Any help is greatly appreciated,

Sean

Valeriy E. Ushakov | 16 Apr 20:49 2004
Picon

Simple way to panic arm kernels

I've seen the panic triggered by assertion at uvm_bio.c:257 on both
netwinder and shark (both diskless):

         KASSERT(umap->refcount != 0);

I reliably get it running a test suite that executes a bunch of
commands with output to file.  I get it when the output file is either
on nfs or mfs.

I have just reproduced the same panic with:

$ yes | sed 1000000q | while read; do				\
    ( echo -n 1 >>XXX; echo -n 2 >>XXX; echo -n 3 >>XXX );	\
done

Anyone seen this?

SY, Uwe
--

-- 
uwe <at> ptc.spbu.ru                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen

Richard Earnshaw | 16 Apr 23:13 2004
Picon
Picon

Re: Simple way to panic arm kernels


> I've seen the panic triggered by assertion at uvm_bio.c:257 on both
> netwinder and shark (both diskless):
> 
>          KASSERT(umap->refcount != 0);
> 
> I reliably get it running a test suite that executes a bunch of
> commands with output to file.  I get it when the output file is either
> on nfs or mfs.
> 
> I have just reproduced the same panic with:
> 
> $ yes | sed 1000000q | while read; do				\
>     ( echo -n 1 >>XXX; echo -n 2 >>XXX; echo -n 3 >>XXX );	\
> done
> 
> 
> Anyone seen this?

Yep,  I get this almost every time I try to do a full bootstrap and 
testrun of gcc.  It's reported in port-arm/23581 as item 2 -- in that case 
the panic was on an FFS local disk.

Debugging this is complicated by the fact that getting a panic dump on ARM 
is currently broken.  I have a patch for that which I'll be testing out 
tomorrow (hopefully); my first attempt was somewhat buggy.

Ian Zagorskih | 20 Apr 08:42 2004

NetBSD abd Atmel AT91RM9200


Any ideas about support of Atmel AT91RM9200 on NetBSD ?

Brief CPU description can be found here:
http://www.atmel.com/dyn/products/product_card.asp?part_id=2983

// wbr

Richard Earnshaw | 25 Apr 22:50 2004
Picon
Picon

Re: Simple way to panic arm kernels


> I've seen the panic triggered by assertion at uvm_bio.c:257 on both
> netwinder and shark (both diskless):
> 
>          KASSERT(umap->refcount != 0);
> 
> I reliably get it running a test suite that executes a bunch of
> commands with output to file.  I get it when the output file is either
> on nfs or mfs.
> 
> I have just reproduced the same panic with:
> 
> $ yes | sed 1000000q | while read; do				\
>     ( echo -n 1 >>XXX; echo -n 2 >>XXX; echo -n 3 >>XXX );	\
> done
> 

I've finally managed to track this down to revision 1.8 of 
arch/arm/include/arm32/frame.h

That's somewhat odd, since I can't really see why that patch would 
introduce any sort of race condition that wouldn't have existed previously 
(and presumably that is what is happening).  But the simple fact is that 
kernels built from code before that change are stable, and kernels built 
afterwards (or earlier kernels with that change applied) are not.

R.

Richard Earnshaw | 26 Apr 11:17 2004

Re: Simple way to panic arm kernels

On Sun, 2004-04-25 at 21:50, Richard Earnshaw wrote:
> > I've seen the panic triggered by assertion at uvm_bio.c:257 on both
> > netwinder and shark (both diskless):
> > 
> >          KASSERT(umap->refcount != 0);
> > 
> > I reliably get it running a test suite that executes a bunch of
> > commands with output to file.  I get it when the output file is either
> > on nfs or mfs.
> > 
> > I have just reproduced the same panic with:
> > 
> > $ yes | sed 1000000q | while read; do				\
> >     ( echo -n 1 >>XXX; echo -n 2 >>XXX; echo -n 3 >>XXX );	\
> > done
> > 
> 
> I've finally managed to track this down to revision 1.8 of 
> arch/arm/include/arm32/frame.h
> 
> That's somewhat odd, since I can't really see why that patch would 
> introduce any sort of race condition that wouldn't have existed previously 
> (and presumably that is what is happening).  But the simple fact is that 
> kernels built from code before that change are stable, and kernels built 
> afterwards (or earlier kernels with that change applied) are not.

A 'current' kernel with frame.h backed off to revision 1.7 ran the above
test for 9 hours last night without panicing.  That's about 9 times
longer than I've ever seen a broken kernel manage.

Steve Woodford | 26 Apr 11:42 2004
Picon

Re: Simple way to panic arm kernels

On Monday 26 April 2004 10:17 am, Richard Earnshaw wrote:
> On Sun, 2004-04-25 at 21:50, Richard Earnshaw wrote:

> > I've finally managed to track this down to revision 1.8 of
> > arch/arm/include/arm32/frame.h

> A 'current' kernel with frame.h backed off to revision 1.7 ran the
> above test for 9 hours last night without panicing.  That's about 9
> times longer than I've ever seen a broken kernel manage.

Thanks for tracking this down. I'll look into the problem RSN.

Cheers, Steve

Richard Earnshaw | 26 Apr 11:44 2004

Re: Simple way to panic arm kernels

On Mon, 2004-04-26 at 10:42, Steve Woodford wrote:
> On Monday 26 April 2004 10:17 am, Richard Earnshaw wrote:
> > On Sun, 2004-04-25 at 21:50, Richard Earnshaw wrote:
> 
> > > I've finally managed to track this down to revision 1.8 of
> > > arch/arm/include/arm32/frame.h
> 
> > A 'current' kernel with frame.h backed off to revision 1.7 ran the
> > above test for 9 hours last night without panicing.  That's about 9
> > times longer than I've ever seen a broken kernel manage.
> 
> Thanks for tracking this down. I'll look into the problem RSN.
> 
> Cheers, Steve

I think I've spotted the difference in behaviour.

The change is in the case where we are NOT returning to user mode.  In
that case the new sequence completes with IRQs preserved in the new
code, but blocked in the old.
R.

Lin.Colin | 26 Apr 02:35 2004
Picon

A question about printf in machdep.c


Hi there,
When I traced kernel, I found that "printf" can be called before serial port has been initialized.
I am curious where the printed strings be output before a console is ready?

Thanks and regards,
Colin


Gmane