andrey mirtchovski | 1 Nov 2003 01:24
Picon
Favicon

more fossil woes

I never thought I'd get to that point, but here it is:

	Fossil is unable to initialize a partition with flfmt.

Here's the whole story:

This morning after succesfully checking my email from home I arrived at
school just to find that fossil has died with the familiar:

	 assert failed: b->nlock == 1                                             
	fossil 44: suicide: sys: trap: fault read addr=0x0 pc=0x0002b6b7

It was the first crash in a long time, but unfortunately I had no way of
finding out who/what had caused it, because Plan 9 does not allow me to
examine process' activity based on utilization of a particular resource.
(Interestingly enough, when I suggested such "features" are added to the
system there was an outrage, especially from people who never use Plan 9,
telling me I'm just polluting the beautiful system :)...

I didn't give much thought to the problem and ran fossil/flchk, which
surprisingly discovered much more errors than I had thought I had. Here's
how many blocks it couldn't access anymore (I run a 3-day wide epoch
window) and had suggested that I bfree:

	mirtchov <at> fbsd$ cat flchk | sed '/^[^b]/d' | wc -l
	 365357
	mirtchov <at> fbsd$ 

that's 3 gigs of broken data... For comparison my entire venti archive
weights in at 1.3GB.
(Continue reading)

David Presotto | 1 Nov 2003 04:35

kernels

I brought the kernel and libc sources up to date with what we're
running at the labs.  Problems to me.

jmk | 1 Nov 2003 05:18
Favicon

Re: more fossil woes

I'd say you had something more fundamental wrong, or else you're not telling
the whole story. If you do the 2nd flfmt as described below you should
get a message like
	fs header block already exists; are you sure? [y/n]:
unless you have the '-y' option.

On Fri Oct 31 19:25:36 EST 2003, mirtchov <at> cpsc.ucalgary.ca wrote:
> I never thought I'd get to that point, but here it is:
> 
> 	Fossil is unable to initialize a partition with flfmt.
> 
> Here's the whole story:
> 
> 
> This morning after succesfully checking my email from home I arrived at
> school just to find that fossil has died with the familiar:
> 
> 	 assert failed: b->nlock == 1                                             
> 	fossil 44: suicide: sys: trap: fault read addr=0x0 pc=0x0002b6b7
> 
> It was the first crash in a long time, but unfortunately I had no way of
> finding out who/what had caused it, because Plan 9 does not allow me to
> examine process' activity based on utilization of a particular resource.
> (Interestingly enough, when I suggested such "features" are added to the
> system there was an outrage, especially from people who never use Plan 9,
> telling me I'm just polluting the beautiful system :)...
> 
> I didn't give much thought to the problem and ran fossil/flchk, which
> surprisingly discovered much more errors than I had thought I had. Here's
> how many blocks it couldn't access anymore (I run a 3-day wide epoch
(Continue reading)

Russ Cox | 1 Nov 2003 06:35

Re: more fossil woes

There are definitely some problems with fossil that I'd like to fix,
but I have very little time these days.  The robustness of flchk is
high on that list.

In any event, if you zero the beginning of the fossil partition
you should be able to start afresh without problems.  Just
cp /dev/zero /dev/sdC0/fossil and then hit rubout after a few
seconds.

Sorry.
Russ

andrey mirtchovski | 1 Nov 2003 08:50
Picon
Favicon

Re: more fossil woes

On Sat, 1 Nov 2003, Russ Cox wrote:

> In any event, if you zero the beginning of the fossil partition
> you should be able to start afresh without problems.  Just
> cp /dev/zero /dev/sdC0/fossil and then hit rubout after a few
> seconds.

i did exactly what you suggested and got:

	cacheLocalData: addr=7841 type got 0 exp 0: tag got 0 exp 6669fe74
	fossil 55: suicide: sys: trap: fault read addr=0x0 pc=0x0002b6b7

is there anything else i can do to debug? the last few retries were done
without the devfs, even though I'm normally using it to mirror two disks.

andrey

andrey mirtchovski | 1 Nov 2003 08:56
Picon
Favicon

Re: more fossil woes

On Fri, 31 Oct 2003 jmk <at> plan9.bell-labs.com wrote:

> I'd say you had something more fundamental wrong, or else you're not telling
> the whole story. If you do the 2nd flfmt as described below you should
> get a message like
> 	fs header block already exists; are you sure? [y/n]:
> unless you have the '-y' option.
> 

there is no message, fossil dies absolutely immediately after I type the
command (and before I had done the 'cat /dev/zero > /dev/sdC0/fossil' there
was no corruption on the disk whatsoever, so after each crash I rebooted
safely from either devfs or the first disk on the box).

the only thing suspect for me is the earlier crash with nblock > 1, which I
have no way of debugging anymore -- there was no log trace of that crash,
except what I had logged on a different machine from the serial console.

if you think i'm omitting something important -- tell me what it could be, so i
can try and help...

andrey

Charles Forsyth | 1 Nov 2003 11:05
Picon
Picon

Re: fun with rfork

even more than that!
i ought to have said that it's perhaps most interesting on
an x86.  (Hint: how and where could you call rendezvous?)
From: Bruce Ellis <brucee <at> chunder.com>
Subject: Re: [9fans] fun with rfork
Date: 2003-10-31 23:35:53 GMT
the spinptr code was replaced quite a while ago because it
just doesn't work.  a rendezvous replaced it.
Bruce Ellis | 1 Nov 2003 12:40

Re: fun with rfork

looks a bit like this ...

 targ.fd = fd[0];
 targ.cmd = cmd;
 targ.tag = &tag;

 switch(rfork(RFFDG|RFPROC)) {
 case -1:
  return -1;
 case 0:
  vstack(&targ);   /* Never returns */
 default:
  rendezvous(&tag, 0);
  break;
 }

----- Original Message ----- 
From: "Charles Forsyth" <forsyth <at> caldo.demon.co.uk>
To: <9fans <at> cse.psu.edu>
Sent: Saturday, November 01, 2003 10:03 AM
Subject: Re: [9fans] fun with rfork

> even more than that!
> i ought to have said that it's perhaps most interesting on
> an x86.  (Hint: how and where could you call rendezvous?)
> 

Charles Forsyth | 1 Nov 2003 13:43
Picon
Picon

Re: fun with rfork

yes i see now why you said the spinptr code didn't work, but
the code i posted was a little different.

the spinptr code really did work originally: the code was a little different
from what i posted earlier--it had
	switch(fork()){
which became
	switch(rfork(RFFDG|RFPROC)) {
in your version

BUT in Plan 9 when the code was originally written, an rfork with RFMEM
caused RFMEM to be implied(!) in all subsequent rforks (and thus forks),
so the original spinptr worked because although it said `fork', it was still
sharing the address space.  (i first noticed the feature when i
had a little trouble with the 8-1/2 event library.)

when Plan 9's implementation (fortunately) later changed
to eliminate that unpleasant effect, the fork made the memory unshared,
spinptr stopped working and a rendezvous would indeed
be needed.  on the other hand, the vstack probably isn't needed
then because the two processes are not sharing the stack.

my rfork call in the example was
	rfork(RFMEM|RFPROC|RFFDG|RFENVG|RFREND)
which kept the RFMEM (explicitly), and that does keep the spinptr working.
not that it's an efficient method, but there's a reason it's hard to
use rendezvous straightforwardly in that particular context.

it was really some of the peculiar effects of RFMEM and consequent shared (malloc'd)
stack in this particular case that interested me, as an example of how plausible
(Continue reading)

lucio | 1 Nov 2003 15:09
Picon

pwd(1)

Does it really return the value of the pathname which it failed to
retrieve when it fails to retrieve it?

	if(getwd(pathname, sizeof(pathname)) == 0) {
		print("pwd: %r\n");
		exits(pathname);
	}

I hope I'm looking at old code.  Did somebody forget to put quotes
around the exits() argument?

++L


Gmane