Glenn Tarcea | 1 Mar 03:43 2010
Picon

Review: Change for handling invalid threads after fork in MacOSX

Hi,

Could someone review this change and let me know if it would potentially be acceptable?

The problem this solves is when a fork() is done the threads in the parent no longer exist in the child.
However perform_thread_post_mortem() will still attempt to access these invalid threads. In MacOSX
this leads to a number of test failures because gc_assert() is doing an abort when pthead_join() returns a
non-zero value (which it does when given an invalid thread).

Note: I'm not submitting this yet as an official patch request for the following reasons:

1. I don't understand why the current code works on Linux systems and fails on MacOSX. Since I didn't have a
Linux system handy this weekend I couldn't investigate that aspect.
2. The good news: This change causes tests that used to fail on MacOSX to succeed. The bad news: This is
turning up other bugs, and in particular threads.impure.lisp has a section of code that hangs. I haven't
tracked down why this is happening (but I can easily reproduce it on my system) but since this bug has
probably existed in the MacOSX port for a while I'd prefer to fix it as well before suggesting anything for
potential inclusion.

For this change I just want to make sure it seems reasonable. I haven't submitted any patches to SBCL before
and want to make sure I am following any needed protocols. I can provide more details if needed.

Thanks,

Glenn
Attachment (thread.c.patch): application/octet-stream, 2019 bytes

V. Glenn Tarcea
(Continue reading)

Faré | 1 Mar 04:05 2010
Picon

Re: Review: Change for handling invalid threads after fork in MacOSX

Dear Glenn,

it would be very nice if you could get thread cleanup after fork to
work, but before you try it, you might want to read my post:
Of threads, forks and PCLSRing in high-level languages
http://fare.livejournal.com/148185.html

[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ]
Sir, I don't care which utopia you want to live in. I care for the force
you propose to use to prevent other people from living their own lives.

On 28 February 2010 21:43, Glenn Tarcea <gtarcea <at> umich.edu> wrote:
> Hi,
>
> Could someone review this change and let me know if it would potentially be acceptable?
>
> The problem this solves is when a fork() is done the threads in the parent no longer exist in the child.
However perform_thread_post_mortem() will still attempt to access these invalid threads. In MacOSX
this leads to a number of test failures because gc_assert() is doing an abort when pthead_join() returns a
non-zero value (which it does when given an invalid thread).
>
> Note: I'm not submitting this yet as an official patch request for the following reasons:
>
> 1. I don't understand why the current code works on Linux systems and fails on MacOSX. Since I didn't have a
Linux system handy this weekend I couldn't investigate that aspect.
> 2. The good news: This change causes tests that used to fail on MacOSX to succeed. The bad news: This is
turning up other bugs, and in particular threads.impure.lisp has a section of code that hangs. I haven't
tracked down why this is happening (but I can easily reproduce it on my system) but since this bug has
probably existed in the MacOSX port for a while I'd prefer to fix it as well before suggesting anything for
potential inclusion.
(Continue reading)

Glenn Tarcea | 1 Mar 04:15 2010
Picon

Re: Review: Change for handling invalid threads after fork in MacOSX

Hi Faré,

Thanks for the pointers. I'll take a look at it. I recall that there had been some discussion in the past on
this. I'll take a look at what you posted.

Thanks,

Glenn

V. Glenn Tarcea
gtarcea <at> umich.edu

On Feb 28, 2010, at 10:05 PM, Faré wrote:

> Dear Glenn,
> 
> it would be very nice if you could get thread cleanup after fork to
> work, but before you try it, you might want to read my post:
> Of threads, forks and PCLSRing in high-level languages
> http://fare.livejournal.com/148185.html
> 
> [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ]
> Sir, I don't care which utopia you want to live in. I care for the force
> you propose to use to prevent other people from living their own lives.
> 
> 
> On 28 February 2010 21:43, Glenn Tarcea <gtarcea <at> umich.edu> wrote:
>> Hi,
>> 
>> Could someone review this change and let me know if it would potentially be acceptable?
(Continue reading)

Nikodemus Siivola | 1 Mar 15:05 2010
Picon

Re: [Sbcl-commits] CVS: sbcl NEWS, 1.1693, 1.1694 base-target-features.lisp-expr, 1.53, 1.54 make-config.sh, 1.95, 1.96 version.lisp-expr, 1.4760, 1.4761

On 1 March 2010 15:09, Alastair Bridgewater
<lisphacker <at> users.sourceforge.net> wrote:
> Update of /cvsroot/sbcl/sbcl
> In directory sfp-cvsdas-3.v30.ch3.sourceforge.com:/tmp/cvs-serv20432
>
> Modified Files:
>        NEWS base-target-features.lisp-expr make-config.sh
>        version.lisp-expr
> Log Message:
> 1.0.36.9: UD2-BREAKPOINTS feature for x86oid systems
>
>  * Add new feature UD2-BREAKPOINTS, enabled by default only on x86oid
> darwin targets.
>
>  * Use said feature instead of DARWIN for breakpoint trap selection.
>
>  * Make breakpoints work when using UD2-BREAKPOINTS (tested on x86 and
> x86-64 linux).
>
>  * This patch brought to you by lp#309067, which remains valid for
> three reasons: First, the test case is still disabled.  Second, this
> only fixes for x86oids, not for PPC.  And third, I didn't actually test
> this on a darwin system.

...and it doesn't quite work on Darwin -- something funny happens.
Below run under GDB:

(gdb) run --core output/sbcl.core --no-userinit
Starting program: /Users/nikodemus/src/sbcl-cvs/src/runtime/sbcl
--core output/sbcl.core --no-userinit
(Continue reading)

Nikodemus Siivola | 1 Mar 15:16 2010
Picon

Re: [Sbcl-commits] CVS: sbcl NEWS, 1.1693, 1.1694 base-target-features.lisp-expr, 1.53, 1.54 make-config.sh, 1.95, 1.96 version.lisp-expr, 1.4760, 1.4761

On 1 March 2010 16:05, Nikodemus Siivola <nikodemus <at> random-state.net> wrote:

> ...which leaves me a bit baffled. The function start breakpoint was
> handled fine, but somehow we get SIGTRAP from CLC? Uh, what?

Oh! I know! We set the single-stepping mode to run the displaced
instruction -- and single-stepping uses SIGTRAP.

So we need to either handle both SIGTRAP and SIGILL, or do manual
single-stepping. (I vote for the first...)

Cheers,

 -- Nikodemus

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
Erik Winkels | 1 Mar 14:02 2010
Picon
Picon

Re: "--script" switch caveats?

Hi Nikodemus,

On Fri, Feb 26, 2010 at 03:52:42PM +0200, Nikodemus Siivola wrote:
> 
> If fast startup is important, I would use an executable core. (No idea
> how feasible this is given the setup, though.)

I eventually went with that option since the AI Challenge folks were
using a pretty old SBCL version (1.0.19) which didn't have support for
the "--script" switch.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
Glenn Tarcea | 3 Mar 03:16 2010
Picon

Another MacOSX Patch

I'm getting further through the tests. This patch resolves a bug with nanosleep on MacOSX. The bug
manifests itself if you call sleep a bunch of times. What happens is that nanosleep eventually returns a
remaining time measured at over 4 Billion seconds. Ugh. I finally looked to see what Clozure CL was doing
when it slept and they were apparently seeing the same behavior I was. 

With this patch applied and the former one (thread.c.patch) I'm getting much further along in the tests.
The threads.impure.lisp test has been the stumbling block so far and remains so. I'll send another patch
out once I get past this next issue.

Just to make things easy I'm going to include both fixes I've applied on my local system.

thread.c.patch - This addresses the issue with pthread_join() being called in a child lisp process on a
non-existent thread.
unix.lisp.patch - This fixes the issue above with nanosleep on a Snow Leopard (Clozure CL indicates
Leopard as well) 64-bit Mac.

Fare: I read through your notes on forking. What a mess - I don't have have much more thoughts for the moment.
Right now I'm just concentrating on getting SBCL on a Mac to pass all the tests!

Attachment (unix.lisp.patch): application/octet-stream, 1474 bytes
Attachment (thread.c.patch): application/octet-stream, 2019 bytes

Thanks,

Glenn

V. Glenn Tarcea
gtarcea <at> umich.edu
(Continue reading)

Glenn Tarcea | 4 Mar 04:51 2010
Picon

Question on Futex Comment (Another MacOSX related item)

In target-thread.lisp there is the following comment (in get-mutex):

    ;; FIXME: Lutexes do not currently support deadlines, as at least
    ;; on Darwin pthread_foo_timedbar functions are not supported:
    ;; this means that we probably need to use the Carbon multiprocessing
    ;; functions on Darwin.
    ;;
    ;; FIXME: This is definitely not interrupt safe: what happens if
    ;; we get hit (1) during the lutex calls (ok, they may be safe,
    ;; but has that been checked?) (2) after the lutex call, but
    ;; before setting the mutex owner.
    #!+sb-lutex

The bit of code related to this comment is causing a hang in threads.impure.lisp where deadlines are being
used. My question is does anyone remember what the particular issue was? On Snow Leopard
pthread_cond_timedwait() (which is the only pthread_foo_timedbar function that I could find that
seemed related to the comment) seems to be supported.

I tried using pthread-futex.c rather than pthread-lutex.c. (disabled sb-lutex, enabled
sb-pthread-futex). That seems to work for some of the deadline tests (that were hanging but started
passing with that change) but other tests that were passing now hang when I do the switch. Also,
pthread-futex.c didn't compile at first - so it doesn't seem like its really being used and makes me wonder
if there are other known issues with it that caused it to be deprecated.

Any insight greatly appreciated!

Thanks,

Glenn

(Continue reading)

Tobias C. Rittweiler | 4 Mar 09:37 2010
Picon

Re: Question on Futex Comment (Another MacOSX related item)

Glenn Tarcea <gtarcea <at> umich.edu> writes:

> The bit of code related to this comment is causing a hang in
> threads.impure.lisp where deadlines are being used.

Slightly related question:

Why doesn't the WITH-TEST macro expand to a WITH-TIMEOUT so instead of
hanging, we could just get test failures? (Perhaps it should rather
expand to both WITH-TIMEOUT for a hard limit, and WITH-DEADLINE for a
soft limit.

  -T.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
Tobias C. Rittweiler | 4 Mar 10:53 2010
Picon

internal compiler error


(in-package :sb-thread)
(defstruct foo data)
(define-structure-slot-addressor foo-data-address
    :structure foo
    :slot data)

(defvar *foo* (make-foo :data 0))
(defun test ()
  (futex-wait (foo-data-address *foo*)
              (get-lisp-obj-address -1)
              0
              0))

#<SB-C:TN t1> is not valid as the first argument to VOP:
  SB-VM::MOVE-WORD-ARG
Primitive type: T
SC restrictions:
  (SB-VM::SIGNED-REG SB-VM::UNSIGNED-REG)
The primitive type disallows these loadable SCs:
  (SB-VM::SIGNED-REG SB-VM::UNSIGNED-REG)

   [Condition of type SIMPLE-ERROR]

Restarts:
 0: [ABORT] Abort compilation.
 1: [ABORT] Return to SLIME's top level.
 2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "worker" RUNNING {C488A99}>)

Backtrace:
(Continue reading)


Gmane