Racket's worst-case GC latencies
Racket has relatively poor worst-case GC pause times on a specific
1) Is this expected? What is the GC algorithm for Racket's old generation?
2) Can you make it better?
James Fisher has a blog post on a case where GHC's runtime system
imposed unpleasant latencies/pauses on their Haskell program:
The blog post proposes a very simple, synthetic benchmark that exhibit
the issue -- namely, if the old generation of the GC uses a copying
strategy, then those copies can incur long pauses when many large
objects are live in the old general.
I ported this synthetic benchmark to OCaml, and could check that the
OCaml GC suffers from no such issue, as its old generation uses
a mark&sweep strategy that does not copy old memory. The Haskell
benchmark has worst-case latencies around 50ms, which James Fisher
finds excessive. The OCaml benchmark has worst-case latencies around
Max New did a port of the benchmark to Racket, which I later modified;
the results I see on my machine are relatively bad: the worst-case
pause time is between 120ms and 220ms on my tests.
I think that the results are mostly unrelated to the specific edge
case that this benchmark was designed to exercize (copies of large
objects in the old generation): if I change the inserted strings to be
of size 1 instead of 1024, I also observe fairly high latencies --
such as 120ms. So I'm mostly observing high latencies by inserting and
removing things from an immutable hash in a loop.
The benchmark fills an (immutable) associative structure with strings
of length 1024 (the idea is to have relatively high memory usage per
pointer, to see large copy times), keeping at most 200,000 strings in
the working set. In total, it inserts 1,000,000 strings (and thus
removes 800,000, one after each insertion after the first 200,000). We
measure latencies rather than throughput, so the performance details
of the associative map structure do not matter.
My benchmark code in Haskell, OCaml and Racket can be found here:
the Makefile contains my scripts to compile, run and analyze each
To run the Racket benchmark with instrumentation
PLTSTDERR=debug <at> GC racket main.rkt 2> racket-gc-log
To extract the pause times from the resulting log file (in the format
produced by Racket 6.5), I do:
cat racket-gc-log | grep -v total | cut -d' ' -f7 | sort -n
Piping `| uniq --count` after that produces a poor-man histogram of
latencies. I get the following result on my machine:
## Non-incremental vs. incremental GC
We experimented with PLT_INCREMENTAL_GC=1; on my machine, this does
not decrease the worst-case pause time; on Asumu Takikawa's beefier
machine, I think the pause times decreased a bit -- but still well
above 50ms. Because the incremental GC consumes sensibly more memory,
I am unable to test with both PLT_INCREMENTAL_GC and
PLTSTDERR=debug <at> gc enabled -- my system runs out of memory.
If I reduce the benchmark sizes in half (half the working size, half
the number of iteration), I can run the incremental GC with debugging
enabled. On this half instance, I observe the following results:
for the *non-incremental* GC:
for the *incremental* GC
As you can see, the incremental GC helps, as the distribution of
pauses moves to shorter pause times: it does more, shorter,
pauses. However, the worst-case pauses do not improve -- in fact they
are even a bit worse.
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+unsubscribe@...
To post to this group, send email to racket-dev@...
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAPFanBE6%2BMVxG44nHnw_HCsaNwZ-HSp4L7%2BVT-v-QJ%3Div-EK1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.