Re: Changeset 0306c5a64775
Robert Ransom <rransom.8774 <at> gmail.com>
2012-01-07 19:05:39 GMT
On 2012-01-06, Michael Sperber <sperber <at> deinprogramm.de> wrote:
>
> Thanks for looking into this!
>
> Robert Ransom <rransom.8774 <at> gmail.com> writes:
>
>> See attached for a bundle which fixes wait-for-child-process. I'm no
>> longer convinced that changeset 0306c5a64775 was wrong, but I think my
>> bundle uses a cleaner approach, and it works with the current
>> external-event and interrupt systems.
>
> I'm somewhat suspicious of the changes in the bundle, as it's not clear
> to me why deadlock should get erroneously signalled before the changes:
> After all, there's is special provision to *not* signal deadlock in the
> root scheduler - it calls `waiting-for-external-events?', and if that
> returns #t, no deadlock is assumed. (And I still think the right fix is
> to handle wait the same as getaddrinfo.)
>
> You've looked at the code in depth - did you consider this?
Before Roderic's patch, waiting-for-external-events? didn't know that
threads which were waiting on the process-id's placeholder were
waiting for an external event, just as it still doesn't know that
threads which are waiting on a signal queue are waiting for an
external event. My branch provides a way to inform
waiting-for-external-events? that threads blocked on a given
synchronization object will be alerted when an external event of some
sort occurs; this should be used for signal queues, too.
Regarding getaddrinfo, I think the Right Thing is to make the Scheme
interface to getaddrinfo fill in a placeholder, rather than using the
external event system directly as it does now. Ideally, the external
asynchronous result code would be written in such a way that both
getaddrinfo and wait-for-child-process could use it.
>> When I wrote that, I suspected that part of the problem with
>> wait-for-child-process was that it dynamically allocated a new
>> external-event UID for each event it wants to wait for, rather than
>> using one external-event UID for a whole class of events to be waited
>> for. The PreScheme code to handle external events seems to me to be
>> designed for the latter case. [...]
>
>> getaddrinfo also uses a new UID for each event, rather than one UID
>> for the whole class of getaddrinfo-completion events; I still suspect
>> that that is why multiple simultaneous calls to wait-for-child-process
>> broke a later call to getaddrinfo.
>
> It was designed for both cases, but in fact the getaddrinfo code was the
> original motivation for implementing it. It's needed there because
> there may be multiple simultaneous active calls to getaddrinfo from
> different threads, and they all need to be notified separately of
> completion.
>
> So, if that doesn't work correctly, there's a bug.
There probably is a bug. It looks like the same bug in how interrupts
are handled (when multiple interrupts of the same type arrive at about
the same time, some seem to get 'lost' or delayed) that causes my test
case for the POSIX signals package to fail. (That test case has been
disabled since I wrote it because I couldn't figure out why it wasn't
working or how to fix it.)
Someone may have to put 'bread crumbs' into the VM and RTS in order to
figure out exactly what is going wrong.
Robert Ransom