RE: Re[2]: Back to "GC Stack problem on Win32"
Ignore the fact that this doesn't update last_stack_min correctly.
I'll fix that part of the logic.
Hans
On Fri, 5 Dec 2008, Boehm, Hans wrote:
> How about something like this at the end of GC_push_stack_for?
>
> This will still repeatedly grow the stack for something like your example. But, as in the last version, I
think the total number of VirtualQuery calls is bounded by the final stack depth, plus one for each GC.
>
> Unlike my last version, this should typically avoid the GC_get_stack_min calls in your example.
>
> I'm not sure I want to introduce very ugly code to handle something like this example a bit better. The
collector already does work in proportion to the root size. And this seems to be another example where a
huge root size causes it to slow down.
>
> (Warning: completely untested:)
>
> /* Set stack_min to the lowest address in the thread stack, */
> /* or to an address in the thread stack no larger than sp, */
> /* taking advantage of the old value to avoid slow traversals */
> /* of large stacks. */
> if (thread -> last_stack_min == ADDR_LIMIT) {
> stack_min = GC_get_stack_min(thread -> stack_base);
> } else {
> if (sp < thread -> stack_base && sp >= thread -> last_stack_min) {
> stack_min = sp;
> } else {
> # ifdef MSWINCE
> stack_min = GC_get_stack_min(thread -> stack_base);
> # else
> if (GC_may_be_in_stack(thread -> last_stack_min)) {
> stack_min = GC_get_stack_min(thread -> last_stack_min);
> } else {
> /* Stack shrunk? Is this possible? */
> stack_min = GC_get_stack_min(thread -> stack_base);
> }
> # endif
> }
> }
> thread -> last_stack_min = stack_min;
>
> if (sp >= stack_min && sp < thread->stack_base) {
> # ifdef DEBUG_THREADS
> GC_printf("Pushing thread from %p to %p for 0x%x from 0x%x\n",
> sp, thread -> stack_base, (int)thread -> id, (int)me);
> # endif
> GC_push_all_stack(sp, thread->stack_base);
> } else {
> /* If not current thread then it is possible for sp to point to */
> /* the guarded (untouched yet) page just below the current */
> /* stack_min of the thread. */
> if (thread -> id == me || sp >= thread->stack_base
> || sp + GC_page_size < stack_min)
> WARN("Thread stack pointer 0x%lx out of range, pushing everything\n",
> (unsigned long)(size_t)sp);
> GC_push_all_stack(stack_min, thread->stack_base);
> }
> } /* thread looks live */
>
> Hans
>
>> -----Original Message-----
>> From: gc-bounces@...
>> [mailto:gc-bounces@...] On Behalf Of Ivan Maidanski
>> Sent: Thursday, November 13, 2008 3:08 PM
>> To: gc@...
>> Subject: Re[2]: [Gc] Back to "GC Stack problem on Win32"
>>
>> Hi!
>>
>> "Boehm, Hans" <hans.boehm@...> wrote:
>>> Ivan -
>>>
>>> Thanks. However, I don't think I understand the problem
>> here correctly. The old code should in nearly all cases only
>> invoke GC_may_be_in_stack(thread -> last_stack_min). This
>> should be cheap, since it only walks a page or so of the stack, right?
>>
>> Right, but only if the stack hasn't grown too much between
>> collections.
>> By default, Win32 apps has StackCommit==4K. So, if the stack
>> grew by 4MB then VirtualQuery() would be called 1000 times
>> during nearest collection. But for GC_push_stack_for() only
>> one VirtualQuery() is really required - just to check sp value.
>>
>>>
>>> It seems like it would usually result in exactly one extra
>> VirtualQuery call, as it walks off the end of the stack.
>> Since GC_may_be_in_stack() makes one call anyway, it seems to
>> me that it should at most double the time spent there.
>>
>> Try this test app:
>>
>> #include "gc.h"
>>
>> void GC_printf();
>>
>> int f(int n) {
>> return n > 0 ? f(n - 1) + 1 : 0;
>> }
>>
>> void test(char c, int n) {
>> n = f(n);
>> GC_printf("\n Test%c: N= %d\n\n", c, n); GC_gcollect(); }
>>
>> void *obj;
>>
>> int main(void) {
>> int n;
>> int max = 9 * 1000 * 1000;
>> obj = GC_MALLOC(16);
>> for (n = 100 * 1000; n <= max; n += n >> 1) {
>> test('A',n);
>> test('B',n);
>> }
>> GC_printf("Done");
>> return 0;
>> } // end
>>
>> Compile it with -fno-optimize-sibling-calls -Xlinker --stack
>> -Xlinker 0x10000000 Set GC_PRINT_STATS=1 to see the
>> world-stopped delays timing.
>>
>> If You use Your code or just comment out one line "if (sp <
>> stack_min || sp >= thread->stack_base)" in
>> GC_push_stack_for() then you see how much time is required to
>> collect just a few bytes.
>>
>>>
>>> I don't like the reference to last_info in the patch, since
>> that relies on side effects of GC_may_be_in_stack that I
>> would like to keep as a private implementation detail of
>> GC_may_be_in_stack and GC_get_stack_min. Is there a reason
>> not to use thread -> last_stack_min instead of last_info.BaseAddress?
>>
>> I don't like it too. But to say the truth, "caching" here
>> works realy only to pass BaseAddress from
>> GC_may_be_in_stack() to GC_get_stack_min() even in case of
>> one thread. Another solution (without "caching") is to make
>> GC_may_be_in_stack() return BaseAddress or NULL (instead of
>> bool) - simple to change (but the func should be renamed, may be).
>>
>>>
>>> Hans
>>>
>>>> ...
>>>> I've already pointed out some bugs.
>>>> Now I've just run the same test (having Your patch
>> applied) as I had
>>>> run for my patch... And it turns out that Yours doesn't solve the
>>>> problem as it was originally stated (to say more precisely, it
>>>> doesn't reduce time for the first collection after stack growth).
>>>>
>>>> Look into the things You had advised me before:
>>>>> 1) Initially call VirtualQuery on the sp. If the stack
>>>> base is in the same region, we know we're OK, and don't need
>>>> GC_get_stack_min. Hopefully this will be true about 100%
>> of the time.
>>>>
>>>> So I did it for Your code now. It works.
>>>>
>>>> The patch is attached.
>>>>
>>>> Bye.
>>
>> Bye.
>>
>> _______________________________________________
>> Gc mailing list
>> Gc@...
>> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>>
>
> _______________________________________________
> Gc mailing list
> Gc@...
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>