Julian Elischer | 1 Feb 01:51 2008

Re: needs a tester with an SMP 7.0 box

Xin LI wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Julian Elischer wrote:
> [...]
>> it continuously forks children that exit.,
> 
>> when it goes bad, then in top, you will see some of the children not
>> exiting and staying present for ages.
>> if aster 10 minutes you do not see any children hanging around the
>> problem is not occuring.
> 
> So if it resolved the problem, I am expected to observe only 1 long-live
> parent running with some threads, and last pid continuously increases,
> but not two test programs?  I ran them in unprivileged user, is that Ok?
> 
> The first run was put into background with a log file, and it seems that
> it exited while I am on my way to office from co-location center without
> dangling processes, I am now running it again.

yes.
It APPEARS that the original problem does not occur in 7.0
but only in 6.3
you are linked with libkse right?

It is possible that something else has been changed to cover the
problem window.

> 
(Continue reading)

FreeBSD Tinderbox | 1 Feb 01:54 2008
Picon

[head tinderbox] failure on powerpc/powerpc

TB --- 2008-01-31 23:46:31 - tinderbox 2.3 running on freebsd-current.sentex.ca
TB --- 2008-01-31 23:46:31 - starting HEAD tinderbox run for powerpc/powerpc
TB --- 2008-01-31 23:46:31 - cleaning the object tree
TB --- 2008-01-31 23:46:58 - cvsupping the source tree
TB --- 2008-01-31 23:46:58 - /usr/bin/csup -r 3 -g -L 1 -h localhost -s /tinderbox/HEAD/powerpc/powerpc/supfile
TB --- 2008-01-31 23:47:05 - building world (CFLAGS=-O -pipe)
TB --- 2008-01-31 23:47:05 - cd /src
TB --- 2008-01-31 23:47:05 - /usr/bin/make -B buildworld
>>> World build started on Thu Jan 31 23:47:08 UTC 2008
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Fri Feb  1 00:48:50 UTC 2008
TB --- 2008-02-01 00:48:50 - generating LINT kernel config
TB --- 2008-02-01 00:48:50 - cd /src/sys/powerpc/conf
TB --- 2008-02-01 00:48:50 - /usr/bin/make -B LINT
TB --- 2008-02-01 00:48:50 - building LINT kernel (COPTFLAGS=)
TB --- 2008-02-01 00:48:50 - cd /src
TB --- 2008-02-01 00:48:50 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Fri Feb  1 00:48:50 UTC 2008
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
(Continue reading)

Xin LI | 1 Feb 02:05 2008
Picon

Re: needs a tester with an SMP 7.0 box


Julian Elischer wrote:
> Xin LI wrote:
> Julian Elischer wrote:
> [...]
>>>> it continuously forks children that exit.,
> 
>>>> when it goes bad, then in top, you will see some of the children not
>>>> exiting and staying present for ages.
>>>> if aster 10 minutes you do not see any children hanging around the
>>>> problem is not occuring.
> 
> So if it resolved the problem, I am expected to observe only 1 long-live
> parent running with some threads, and last pid continuously increases,
> but not two test programs?  I ran them in unprivileged user, is that Ok?
> 
> The first run was put into background with a log file, and it seems that
> it exited while I am on my way to office from co-location center without
> dangling processes, I am now running it again.
> 
>> yes.
>> It APPEARS that the original problem does not occur in 7.0
>> but only in 6.3
>> you are linked with libkse right?

Yes, libkse (newly compiled)

>> It is possible that something else has been changed to cover the
>> problem window.

(Continue reading)

FreeBSD Tinderbox | 1 Feb 02:40 2008
Picon

[head tinderbox] failure on sparc64/sparc64

TB --- 2008-02-01 00:36:04 - tinderbox 2.3 running on freebsd-current.sentex.ca
TB --- 2008-02-01 00:36:04 - starting HEAD tinderbox run for sparc64/sparc64
TB --- 2008-02-01 00:36:04 - cleaning the object tree
TB --- 2008-02-01 00:36:34 - cvsupping the source tree
TB --- 2008-02-01 00:36:34 - /usr/bin/csup -r 3 -g -L 1 -h localhost -s /tinderbox/HEAD/sparc64/sparc64/supfile
TB --- 2008-02-01 00:36:40 - building world (CFLAGS=-O -pipe)
TB --- 2008-02-01 00:36:40 - cd /src
TB --- 2008-02-01 00:36:40 - /usr/bin/make -B buildworld
>>> World build started on Fri Feb  1 00:36:42 UTC 2008
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Fri Feb  1 01:34:40 UTC 2008
TB --- 2008-02-01 01:34:40 - generating LINT kernel config
TB --- 2008-02-01 01:34:40 - cd /src/sys/sparc64/conf
TB --- 2008-02-01 01:34:40 - /usr/bin/make -B LINT
TB --- 2008-02-01 01:34:40 - building LINT kernel (COPTFLAGS=)
TB --- 2008-02-01 01:34:40 - cd /src
TB --- 2008-02-01 01:34:40 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Fri Feb  1 01:34:40 UTC 2008
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
(Continue reading)

Xin LI | 1 Feb 02:50 2008
Picon

Re: needs a tester with an SMP 7.0 box


Julian Elischer wrote:
> Xin LI wrote:
> Julian Elischer wrote:
> [...]
>>>> it continuously forks children that exit.,
> 
>>>> when it goes bad, then in top, you will see some of the children not
>>>> exiting and staying present for ages.
>>>> if aster 10 minutes you do not see any children hanging around the
>>>> problem is not occuring.
> 
> So if it resolved the problem, I am expected to observe only 1 long-live
> parent running with some threads, and last pid continuously increases,
> but not two test programs?  I ran them in unprivileged user, is that Ok?
> 
> The first run was put into background with a log file, and it seems that
> it exited while I am on my way to office from co-location center without
> dangling processes, I am now running it again.
> 
>> yes.
>> It APPEARS that the original problem does not occur in 7.0
>> but only in 6.3
>> you are linked with libkse right?
> 
>> It is possible that something else has been changed to cover the
>> problem window.

I'm not sure why, but a fresh RELENG_7_0 with your patch, plus DDB
enabled does not trigger the problem :-/
(Continue reading)

Julian Elischer | 1 Feb 02:55 2008

Re: needs a tester with an SMP 7.0 box


> 
> I'm not sure why, but a fresh RELENG_7_0 with your patch, plus DDB
> enabled does not trigger the problem :-/
> 

  what happens without the patch?
_______________________________________________
freebsd-current <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe <at> freebsd.org"

FreeBSD Tinderbox | 1 Feb 02:55 2008
Picon

[head tinderbox] failure on sparc64/sun4v

TB --- 2008-02-01 00:54:21 - tinderbox 2.3 running on freebsd-current.sentex.ca
TB --- 2008-02-01 00:54:21 - starting HEAD tinderbox run for sparc64/sun4v
TB --- 2008-02-01 00:54:21 - cleaning the object tree
TB --- 2008-02-01 00:54:42 - cvsupping the source tree
TB --- 2008-02-01 00:54:42 - /usr/bin/csup -r 3 -g -L 1 -h localhost -s /tinderbox/HEAD/sparc64/sun4v/supfile
TB --- 2008-02-01 00:54:48 - building world (CFLAGS=-O -pipe)
TB --- 2008-02-01 00:54:48 - cd /src
TB --- 2008-02-01 00:54:48 - /usr/bin/make -B buildworld
>>> World build started on Fri Feb  1 00:54:51 UTC 2008
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Fri Feb  1 01:50:39 UTC 2008
TB --- 2008-02-01 01:50:39 - generating LINT kernel config
TB --- 2008-02-01 01:50:39 - cd /src/sys/sun4v/conf
TB --- 2008-02-01 01:50:39 - /usr/bin/make -B LINT
TB --- 2008-02-01 01:50:39 - building LINT kernel (COPTFLAGS=)
TB --- 2008-02-01 01:50:39 - cd /src
TB --- 2008-02-01 01:50:39 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Fri Feb  1 01:50:40 UTC 2008
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
(Continue reading)

Xin LI | 1 Feb 03:12 2008
Picon

Re: needs a tester with an SMP 7.0 box


Julian Elischer wrote:
> 
>>
>> I'm not sure why, but a fresh RELENG_7_0 with your patch, plus DDB
>> enabled does not trigger the problem :-/
>>
> 
>  what happens without the patch?

No, I mean that it used to crash with previous kernel which is built
only a couple of days ago, no more than a week.  You want me to test the
case without patch?

Cheers,
--
Xin LI <delphij <at> delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!
Maxim Sobolev | 1 Feb 02:59 2008
Picon

Re: needs a tester with an SMP 7.0 box

On my 2-way SMP 7.0 system (old 2.6 GHz Xeon), the programs segfaults 
constantly:

pid 57965 (a.out), uid 1001: exited on signal 11 (core dumped)
pid 58080 (a.out), uid 1001: exited on signal 11 (core dumped)
pid 58126 (a.out), uid 1001: exited on signal 11 (core dumped)
pid 58123 (a.out), uid 1001: exited on signal 11 (core dumped)
pid 58158 (a.out), uid 1001: exited on signal 11 (core dumped)
pid 58188 (a.out), uid 1001: exited on signal 11 (core dumped)
pid 58226 (a.out), uid 1001: exited on signal 11 (core dumped)
[etc]

(gdb) bt
#0  0x48086af3 in pthread_sigmask () from /usr/lib/libkse.so.3
#1  0x00000001 in ?? ()
#2  0x48210734 in ?? ()
#3  0x48210700 in ?? ()
#4  0xbf7fcf48 in ?? ()
#5  0x4809cc41 in __error () from /usr/lib/libkse.so.3

Also, I've got the following after having the program run for a while 
(about 30 minutes):

[sobomax <at> noisy /tmp]$ ./a.out
Fatal error 'thread in syncq when it shouldn't be.' at line 1817 in file 
/usr/src/lib/libkse/thread/thr_mutex.c (errno = 0)
Fatal error 'thread in syncq when it shouldn't be.' at line 1817 in file 
/usr/src/lib/libkse/thread/thr_mutex.c (errno = 0)
Fatal error 'Recurse on a private mutex.' at line 1002 in file 
/usr/src/lib/libkse/thread/thr_mutex.c (errno = 22)
(Continue reading)

Julian Elischer | 1 Feb 04:24 2008

Re: needs a tester with an SMP 7.0 box

Xin LI wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Julian Elischer wrote:
>>> I'm not sure why, but a fresh RELENG_7_0 with your patch, plus DDB
>>> enabled does not trigger the problem :-/
>>>
>>  what happens without the patch?
> 
> No, I mean that it used to crash with previous kernel which is built
> only a couple of days ago, no more than a week.  You want me to test the
> case without patch?
> 

We need to detirmine
1/ whether there is a problem in 7.0 in the first place. if not, we 
are done.

2/ whether the patch fixes it if there is a problem.

> Cheers,
> - --
> Xin LI <delphij <at> delphij.net>	http://www.delphij.net/
> FreeBSD - The Power to Serve!
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.4 (FreeBSD)
> 
> iD8DBQFHooAji+vbBBjt66ARAlcdAJ0Q3IRPkjZGvUsKIDbkVamT9Q9v+wCfRsiG
> STfOJ310j+cEuBFMLamfVA4=
(Continue reading)


Gmane