Niels Möller | 3 Apr 17:05 2006
Picon
Picon
Picon

Re: Arla 0.42-RC2

Tomas Olsson <tol <at> stacken.kth.se> writes:

> it's time for a release again, with universal binaries for the mac users,
> support for modern linux, etc. I've prepared a 0.42-RC2, and I hope you all
> will stress it as much as you can to help us get the last bugs out before
> we roll a final 0.42.

Hi,

I tried this with linux-2.6.16.1, and I got the following syslog
messages when I tried to start arla, using the supplied init script
(hacked slightly to use mount -o nosuid,nodev).

Apr  3 16:53:02 maskros kernel: Failed to find address of sys_call_table
Apr  3 16:53:03 maskros kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Apr  3 16:53:03 maskros kernel:  printing eip:
Apr  3 16:53:03 maskros kernel: c02afa1c
Apr  3 16:53:03 maskros kernel: *pde = 00000000
Apr  3 16:53:03 maskros kernel: Oops: 0002 [#1]
Apr  3 16:53:03 maskros kernel: PREEMPT
Apr  3 16:53:03 maskros kernel: Modules linked in: nnpfs isofs nls_base zlib_inflate usbhid radeon drm
pcmcia crc32 md5 ipv6 af_packet yenta_socket rsrc_nonstatic pcmcia_core 8250_pci 8250 serial_core
snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp
agpgart ehci_hcd uhci_hcd usbcore piix tg3 tsdev psmouse ide_cd cdrom genrtc unix
Apr  3 16:53:03 maskros kernel: CPU:    0
Apr  3 16:53:03 maskros kernel: EIP:    0060:[__down+92/288]    Not tainted VLI
Apr  3 16:53:03 maskros kernel: EFLAGS: 00010002   (2.6.16.1-nisse #1)
Apr  3 16:53:03 maskros kernel: EIP is at __down+0x5c/0x120
Apr  3 16:53:03 maskros kernel: eax: 00000000   ebx: cb27c000   ecx: e0b5f918   edx: cb27dda0
Apr  3 16:53:03 maskros kernel: esi: e0b5f910   edi: 00000286   ebp: d3f30030   esp: cb27dd90
(Continue reading)

Tomas Olsson | 4 Apr 09:24 2006
Picon
Picon

Re: Arla 0.42-RC2

nisse <at> lysator.liu.se (Niels Möller) writes:
> Apr  3 16:53:02 maskros kernel: Failed to find address of sys_call_table
>
I've got the same issue on my machine. I'll look into it, but this really
shouldn't have much of an impact.

> kernel: Unable to handle kernel NULL pointer dereference at virtual
> address 00000000
>
Not good. The attached patch should fix the problem, it's also in CVS.

> maskros kernel:  [do_exit+308/1184] do_exit+0x134/0x4a0
>
Ok, so what happened was that arlad for some reason died before mount,
which triggered the oops. You probably want to start arlad with
"-n --debug=alm" or so to see what happens, or under gdb.

Thanks
        /t
_______________________________________________
Arla-drinkers mailing list
Arla-drinkers <at> stacken.kth.se
https://lists.stacken.kth.se/mailman/listinfo/arla-drinkers
Jeffrey Hutzelman | 4 Apr 09:56 2006
Picon

Re: Arla 0.42-RC2


On Tuesday, April 04, 2006 09:24:10 AM +0200 Tomas Olsson 
<tol <at> stacken.kth.se> wrote:

> nisse <at> lysator.liu.se (Niels Möller) writes:
>> Apr  3 16:53:02 maskros kernel: Failed to find address of sys_call_table
>>
> I've got the same issue on my machine. I'll look into it, but this really
> shouldn't have much of an impact.

Yup; you're going to have that problem with 2.6.16, because they went off 
and moved things into different sections.  Worse, if the kernel is compiled 
with CONFIG_DEBUG_RODATA, then on some platforms the system call table will 
actually be stored in read-only pages, and attempts to update it will 
result in a panic.

We've been doing some work to update osi_probe to deal with this situation, 
but Chaskiel did most of the work and all of the code on this issue, and at 
4am I'm not awake enough to page in the current state.

We probably should do something about getting updated syscall table probing 
code back into Arla.

-- Jeff
Adam Megacz | 4 Apr 10:45 2006
Picon

is cache-only-prefixes an nnpfs limitation?


Hi,

I understand that, currently, Arla must retrieve bytes 0..n of a file
in order to access byte n+1.  Is this a limitation imposed by nnpfs,
or simply the way arlad currently works?

Specifically, I'm wondering about the possibilities of writing other
filesystems that run on top of nnpfs (including the Win32 nnpfs), and
I wanted to know if this limitation would be inherited.

  - a

--

-- 
PGP/GPG: 5C9F F366 C9CF 2145 E770  B1B8 EFB1 462D A146 C380
Jean-Damien Durand | 4 Apr 12:34 2006
Picon
Picon

nnpfs_readdir problem

Just catched the following loop: pressed tab to have completion in my zsh and it got very cpu
consuming. fs arladebug showed nothing relevant:

Apr  4 12:24:29 pcitds04 arla[7348]: sending wakeup: seq = 28427, error = 0
Apr  4 12:24:29 pcitds04 arla[7348]: Send message: opcode = 1 (wakeup), size = 24
Apr  4 12:24:29 pcitds04 arla[7348]: worker 0: done
Apr  4 12:24:29 pcitds04 arla[7348]: worker 0 waiting
Apr  4 12:24:52 pcitds04 arla[7348]: worker 0: processing
Apr  4 12:24:52 pcitds04 arla[7348]: Rec message: opcode = 22 (pioctl), size = 2104
Apr  4 12:24:52 pcitds04 arla[7348]: sending wakeup: seq = 28428, error = 0
Apr  4 12:24:52 pcitds04 arla[7348]: Send message: opcode = 23 (wakeup_data), size = 2080
Apr  4 12:24:52 pcitds04 arla[7348]: worker 0: done
Apr  4 12:24:52 pcitds04 arla[7348]: worker 0 waiting
Apr  4 12:24:54 pcitds04 arla[7348]: worker 0: processing
Apr  4 12:24:54 pcitds04 arla[7348]: Rec message: opcode = 22 (pioctl), size = 2104
Apr  4 12:24:54 pcitds04 arla[7348]: sending wakeup: seq = 28429, error = 0
Apr  4 12:24:54 pcitds04 arla[7348]: Send message: opcode = 23 (wakeup_data), size = 2080
Apr  4 12:24:54 pcitds04 arla[7348]: worker 0: done
Apr  4 12:24:54 pcitds04 arla[7348]: worker 0 waiting
Apr  4 12:25:20 pcitds04 arla[7348]: poller waiting
Apr  4 12:25:20 pcitds04 arla[7348]: running poller
Apr  4 12:25:20 pcitds04 arla[7348]: poller done
Apr  4 12:25:20 pcitds04 arla[7348]: poller waiting

but fs nnpfsdebug, that I switched after, showed weird behaviour:

Apr  4 12:26:01 pcitds04 kernel: nnpfs_syscall returns error: 0
Apr  4 12:26:01 pcitds04 kernel: BUG: using smp_processor_id() in preemptible [00000001] code: fs/20020
Apr  4 12:26:01 pcitds04 kernel: caller is sys_afs_int+0x654/0x673 [nnpfs]
Apr  4 12:26:01 pcitds04 kernel:  [debug_smp_processor_id+117/136] debug_smp_processor_id+0x75/0x88
(Continue reading)

Tomas Olsson | 4 Apr 12:40 2006
Picon
Picon

Re: is cache-only-prefixes an nnpfs limitation?

Adam Megacz <megacz <at> cs.berkeley.edu> writes:
> I understand that, currently, Arla must retrieve bytes 0..n of a file
> in order to access byte n+1.  Is this a limitation imposed by nnpfs,
> or simply the way arlad currently works?
> 
It's a limitation imposed by nnpfs. You probably want to take a look at the
protocol (arla/nnpfs/include/nnpfs/nnpfs_message.h) and arla/doc/nnpfs.txt
(may need an update). If you're adventurous, take a look at the protocol
version in block_branch in CVS, too. It's not exactly final.

> Specifically, I'm wondering about the possibilities of writing other
> filesystems that run on top of nnpfs (including the Win32 nnpfs), and
> I wanted to know if this limitation would be inherited.
>
Yup, the nnpfs implementation and protocol does limit things. It will get
less limiting, but I don't see us implementing completely free byte range
fetching. Apart from that, it should work. We've mentioned a few other
projects that use xfs/nnpfs here before. I guess arlad is the most complete
reference on how it is supposed to work, but it's not a very clear design.
If you happen to write up a little libnnpfs or so, I'm _very_ interested.

AFAIK, w2k nnpfs works very well for demos and benchmarks, but it does have
threading issues. Needs some work to be fit for production use.

What do you have in mind?

/t
Tomas Olsson | 4 Apr 12:50 2006
Picon
Picon

Re: Arla 0.42-RC2

> nisse <at> lysator.liu.se (Niels Möller) writes:
> > Apr  3 16:53:02 maskros kernel: Failed to find address of sys_call_table
> >
> I've got the same issue on my machine. I'll look into it, but this really
> shouldn't have much of an impact.
> 
Does
http://cvsweb.stacken.kth.se/cvsweb.cgi/arla/nnpfs/linux/nnpfs_syscalls-lossage.c.diff?r1=1.14;r2=1.15
work for you?

/t
Niels Möller | 4 Apr 15:13 2006
Picon
Picon
Picon

Re: Arla 0.42-RC2

Tomas Olsson <tol <at> stacken.kth.se> writes:

> Not good. The attached patch should fix the problem, it's also in CVS.

Applied the patch, and the ooops disappeared.

> Ok, so what happened was that arlad for some reason died before mount,
> which triggered the oops. You probably want to start arlad with
> "-n --debug=alm" or so to see what happens, or under gdb.

arlad -n --debug=alm produces the following output:

  2006-04-04 15:04:49 CEST: arla: read_conffile: /usr/arla/etc/arla.conf
  2006-04-04 15:04:49 CEST: arla: Arlad booting sequence:
  2006-04-04 15:04:49 CEST: arla: connected mode: connected
  2006-04-04 15:04:49 CEST: arla: ports_init
  2006-04-04 15:04:49 CEST: arla: uae_init
  2006-04-04 15:04:49 CEST: arla: rx
  hipri was 3 actually 0
  2006-04-04 15:04:49 CEST: arla: conn_init numconns = 100
  2006-04-04 15:04:49 CEST: arla: initconncache
  hipri was 3 actually 1
  2006-04-04 15:04:49 CEST: arla: cellcache
  2006-04-04 15:04:49 CEST: arla: poller
  2006-04-04 15:04:49 CEST: arla: initpoller
  hipri was 3 actually 1
  2006-04-04 15:04:49 CEST: arla: fprio
  2006-04-04 15:04:49 CEST: arla: volcache numvols = 100
  2006-04-04 15:04:49 CEST: arla: using rxkad level crypt
  2006-04-04 15:04:49 CEST: arla: credcache numcreds = 100
(Continue reading)

Tomas Olsson | 4 Apr 16:12 2006
Picon
Picon

Re: nnpfs_readdir problem

Jean-Damien Durand <Jean-Damien.Durand <at> cern.ch> writes:
> Apr  4 12:26:01 pcitds04 kernel: nnpfs_syscall returns error: 0
> Apr  4 12:26:01 pcitds04 kernel: BUG: using smp_processor_id() in preemptible [00000001] code: fs/20020
> Apr  4 12:26:01 pcitds04 kernel: caller is sys_afs_int+0x654/0x673 [nnpfs]
>
Oh my. How ironic. You do a pioctl to turn on the debug flags, and at the
end of that routine there's a debug print that now gets executed. Which
calls smp_processor_id(), causing the oops.

You probably want
http://cvsweb.stacken.kth.se/cvsweb.cgi/arla/nnpfs/linux/nnpfs_syscalls.c.diff?r1=1.117;r2=1.118

> Apr  4 12:26:01 pcitds04 kernel: nnpfs_readdir offset: 547 namlen: 0 offset2: 2595
> Apr  4 12:26:01 pcitds04 last message repeated 1341 times
>
I wonder how that happened. Hopefully 
http://cvsweb.stacken.kth.se/cvsweb.cgi/arla/nnpfs/linux/nnpfs_inodeops.c.diff?r1=1.215;r2=1.216
helps to avoid the loop in case it happens again. If you can reproduce it
easily, the patch along with "fs nnpfsdeb vnops" should give some clue
about how to fix the cause.

thanks  
        /t
Tomas Olsson | 4 Apr 16:28 2006
Picon
Picon

Re: Arla 0.42-RC2

nisse <at> lysator.liu.se (Niels Möller) writes:
> > Ok, so what happened was that arlad for some reason died before mount,
> > which triggered the oops. You probably want to start arlad with
> > "-n --debug=alm" or so to see what happens, or under gdb.
> 
> arlad -n --debug=alm produces the following output:
>
Looks good, up until
>   2006-04-04 15:04:49 CEST: arla: Rec message: opcode = 1 (wakeup), size = 24
>   libgcc_s.so.1 must be installed for pthread_cancel to work
> 
> (I configured arla with --with-pthreads). I do have a file
> /lib/libgcc_s.so.1 installed, so this error message seems strange.
> 
That message is probably from the pthreads implementation, so I guess we
have a linking problem. Things to try:

 * run with the same flags in gdb and see where it dies

 * ldd on arlad

 * cd arla-obj/arlad ; rm arlad ; make
   to see how it was built

 * read config.log to see what happened

>   ls: /afs: No such device
> 
> Is there any more appropriate error code that could be returned?
> ENODEV made me suspect the nnpfs module, which at the moment seems to
(Continue reading)


Gmane