Alexis King | 17 Jun 19:29 2016
Picon
Gravatar

Using private github repositories as package sources?

I have been looking for a way to use Racket at work, and we’ve found
a couple places where it might be useful for documentation or
tooling. As part of this, it would be very nice to keep our source
code private, but it would still be helpful to make use of the
package manager to handle dependency resolution. We explored creating
a custom catalog that would contain our packages, which has worked
reasonably well, but the Racket package system does not appear to
be capable of fetching packages backed by private repositories.

Glancing over the git protocol documentation and interacting with
a private repository via HTTP client, implementing this on the
technical side doesn’t look too difficult. GitHub uses the “smart”
HTTP protocol as documented here[1], and authorization is done using
HTTP Basic Authentication. Implementing this without using libgit
or the git CLI might be a little difficult, but distributing libgit
would not be hard if it ended up being a problem.

The trickier issue is the social side, as well as the user interface.
How would git credentials be provided to the package manager so
that it could actually access these packages? More importantly, is
it actually okay for the success of package installation to be
dependent on some configuration that lives on a user’s local machine?
If these sorts of packages were uploaded to the main package catalog,
what would be the policy for handling them?

Ultimately, I think it’s important for the package manager to support
private package distribution mechanisms for me to be able to
comfortably adopt Racket in a corporate setting, so I think having
some solution to this problem that does not involve out-of-band
trickery would be nice. I’m just not sure what that solution might
(Continue reading)

Ryan Culpepper | 7 Jun 05:54 2016

macro profiler

I recently added a macro profiler to the macro-debugger package. The
macro profiler measures the increase in code size due to macro
expansion. (It doesn't measure expansion time.) See the docs for an
explanation of how the profiler counts direct and indirect costs.

The profiler is a raco subcommand; run it on a module with

   raco macro-profiler path/to/module.rkt

The macro profiler shows what macros contribute the most to the size
of the expanded program. The basic principles of profiling apply:
Improvements have the greatest impact for high-cost macros; on the
other hand, cost does not necessarily imply waste.

Reducing a macro's generated code size generally involves extracting
parts of the macro template as helper functions. For example, the
following macro definition

   (define-syntax-rule (capture-output body ...)
     (let ([sp (open-output-string)])
       (parameterize ((current-output-port sp))
         body ...
         (get-output-string sp))))

can be rewritten as

   (define (call/capture-output proc)
     (let ([sp (open-output-string)])
       (parameterize ((current-output-port sp))
         (proc)
(Continue reading)

Alexis King | 4 Jun 00:28 2016
Picon
Gravatar

Command-line REPL for a custom #lang?

Currently, it does not seem like there is any option to open a new
command-line REPL for a particular #lang. Based on what already exists,
I’m not sure if this is a bug or if it’s intended (or perhaps
overlooked) behavior.

The -I command line flag to the racket executable states that sets the
<init-lib>, with a note explaining that it sets the language. This is
not quite true, however: it appears to do more than simply requiring the
module into the top-level namespace, but it’s not setting the #lang —
instead, it seems to simply open a REPL as if you started a REPL in
DrRacket inside the specified module.

This sounds similar, but it’s not: if I have a language called “foo”,
implemented in foo/lang/reader, then run `racket -I foo`, the module
that is required is just foo, not foo/lang/reader. This means that the
language used is actually whatever #lang is present in the foo module,
which might not be #lang foo at all — it might be racket/base, or even
anything else. Interestingly, this is the case for typed/racket/gui:
using typed/racket and typed/racket/base works with the -I flag because
they are implemented using the typed-racket/minimal language, but using
typed/racket/gui does not enable the #{x : T} reader syntax because it
is implemented with racket/base.

It seems like it would be valuable to either make -I do what I think its
description would cause most people to expect or to add another option
that is just like running a REPL in DrRacket in an empty module with a
particular #lang line. I’d guess it might be a good idea to make it a
separate flag, since I could imagine the behavior of -I being used
intentionally, so maybe another flag? Is there some other way to achieve
this behavior with the current set of options that I don’t know about?
(Continue reading)

Sam Tobin-Hochstadt | 26 May 16:38 2016
Picon
Picon
Gravatar

Re: Merging the `net` and `compiler` repositories back in

On Wed, May 25, 2016 at 9:12 PM, Matthias Felleisen
<matthias@...> wrote:
>
>> On May 25, 2016, at 4:25 PM, Sam Tobin-Hochstadt <samth@...> wrote:
>>
>> When we split the Racket repository out into many smaller
>> repositories, we were quite aggressive -- just about everything moved
>> out. At this point, it's clear that we were a little too aggressive.
>> In particular, the `net` and `compiler` repositories have been uneasy
>> as separate entities. Their docs are in the wrong places, and they
>> often have to change in coordination with the main repo. So we're
>> going to merge them back in.
>>
>> You can see how this will work in this pull request:
>> https://github.com/racket/racket/pull/1332
>>
>> The resulting repository will have multiple root commits. I don't
>> think this is a problem, but it can perhaps be fixed if needed.
>>
>> Once this is done, previously cloned versions of those repositories
>> won't update. To address this, run the following command now, unless
>> you absolutely need the clone versions:
>>
>>    $ raco pkg update --lookup net compile
>>
>> I plan to merge this in the next couple days.
>>
>
>
> Could you explain to the uninitiated what the meaning of the result is?
(Continue reading)

Jay McCarthy | 26 May 13:51 2016
Picon
Gravatar

Inside Racket Seminar 4. Vincent St-Amour on Typed Racket optimizer

On June 21st at 11am Central time, please join us for the fourth Inside Racket
Seminar where Vincent St-Amour will give us a walk-through of the
implementation of Typed Racket's optimizer.

As before, it will be on Google Hangouts on Air with Vincent walking
through the code and giving an explanation of how it all hooks
together. This is not a tutorial on Racket or on the library, but a
kind of oral history and explanation of the software and how it works.
Our hope is that this will increase the ability of others to build and
maintain similar software as we share this kind of expertise in a way
that doesn't fit our existing distribution mechanisms (research
papers, RacketCon talks, documentation, etc.)

Hangouts on Air link: https://plus.google.com/events/chm87mh4umkdpkomk8k8i2lmi7o

I hope that you are able to attend and send your own questions as we go through.

Here are some things you may want to look at to prepare:

0. The optimizer is all syntax classes, so having a good understanding
of syntax-parse is useful.

1. The system is mentioned in passing in "Languages as Libraries":
https://www.cs.utah.edu/plt/publications/pldi11-tscff.pdf

2. A big part of the implemented optimizations depend on an
understanding of Racket's numeric tower, which is discussed well in
"Typing the Numeric Tower":
http://www.ccs.neu.edu/racket/pubs/padl12-stff.pdf

(Continue reading)

Sam Tobin-Hochstadt | 25 May 22:25 2016
Picon
Picon
Gravatar

Merging the `net` and `compiler` repositories back in

When we split the Racket repository out into many smaller
repositories, we were quite aggressive -- just about everything moved
out. At this point, it's clear that we were a little too aggressive.
In particular, the `net` and `compiler` repositories have been uneasy
as separate entities. Their docs are in the wrong places, and they
often have to change in coordination with the main repo. So we're
going to merge them back in.

You can see how this will work in this pull request:
https://github.com/racket/racket/pull/1332

The resulting repository will have multiple root commits. I don't
think this is a problem, but it can perhaps be fixed if needed.

Once this is done, previously cloned versions of those repositories
won't update. To address this, run the following command now, unless
you absolutely need the clone versions:

    $ raco pkg update --lookup net compile

I plan to merge this in the next couple days.

Sam

--

-- 
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+unsubscribe@...
To post to this group, send email to racket-dev@...
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAK%3DHD%2BabXU4FExBCxX1bMe7Y%3DX3UqoTt9%2BNaRXrvBCPW4aSS6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
(Continue reading)

Gabriel Scherer | 16 May 23:31 2016
Picon
Gravatar

Re: Racket's worst-case GC latencies

On my machine, working from master moves the:
- non-incremental worst pause move from 150ms to 120ms
- incremental-gc worst pause from 163ms to 120ms

I experimented with explicit GC calls during the ramp-up period again,
and on master it seems to be interesting. I have to choose a
high-enough frequency (1 over 100 is not enough, 1 over 30 or 1 over
50 give net improvements), and to call the gc not only during the
rampup, but also after it. Calling the GC between (window-size / 2) and
(window-size * 2) seems to work well -- and the results are very sensitive
to these factors, for example stopping only after (window-size * 1.5) degrades
the efficiency a lot (but more than 2 doesn't help much).

With this rampup strategy, I get the pause time down to 40ms with the
incremental GC, with a modest throughput cost: if I amortize by
iterating 2 million times instead of 1 million times, explicit GC
during rampup moves total runtime from 11s to 13s.
(These explicit calls only help when the incremental GC is used
(obviously ?). In fact they actively hurt when the non-incremental GC
is used.)

I added the ramp-up code, disabled by default, to the Racket
benchmark. I'm satisfied with the current results and I think they'll
make for a fine post (I think explaining the ramp-up problem and how
to observe/detect it is more important and useful than the details of
the current hack to mitigate it).

On Mon, May 16, 2016 at 11:34 AM, Matthew Flatt <mflatt@...> wrote:
> At Mon, 16 May 2016 11:21:43 -0400, Gabriel Scherer wrote:
>> If you were willing to make publicly available a patchset with your
(Continue reading)

Gabriel Scherer | 16 May 17:21 2016
Picon
Gravatar

Re: Racket's worst-case GC latencies

Thanks for the detailed replies. I updated the Racket code and
Makefile to uses gcstats by default instead of debug <at> GC. In my tests,
this does not change worst-case latencies (I did not expect it
toreason it would), but the generated summary is nice and easier to
parse for non-experts than debug <at> GC output. The old instrumentation is
still available with run-racket-instrumented,
analyze-racket-instrumented. On my machine I get a realiable 145ms
worst-case latency on the updated benchmark -- with reduce memory
loads thanks to the bytestring suggestion.

I experimented with the case where the message (byte)strings are of
length 1 instead. In this case, I'm able to get a good compromise,
working with Matthew's suggestion that the rampup is difficult for the
incremental GC: if I perform explicit collections only during the
rampup period (when (and (i . < . window-size) (zero? ...))), I get
short pause times (30ms) with little costs in terms of throughput --
especially if I enlarge the total msg-count, which I think is fine.
However, I was unable to reproduce these rampup-only improvements with
1024.
(Another thing I found out is that the worst-case latency is very
sensitive to the choice of frequency of minor GCs during the rampup. I
got nice results with modulo 100, but worse results with modulo 50 or
even modulo 70.)

Matthew: I would like to eventually write a blog post to report on my
findings for OCaml and Racket, and of course if would be nice if
Racket looked as good as reasonably possible¹. If you were willing to
make publicly available a patchset with your GC tuning on top of
Racket's current master branch (or whatever; or apply them to master
directly, of course), I would be glad to report the numbers using
(Continue reading)

Sam Tobin-Hochstadt | 15 May 04:17 2016
Picon
Picon
Gravatar

Re: Racket's worst-case GC latencies

You might be interested in my gcstats package, which will do some of these statistics for you, and may allow you to run larger heaps with data gathering.

Sam


On Sat, May 14, 2016, 9:09 PM Gabriel Scherer <gabriel.scherer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi racket-devel,

Short version:
  Racket has relatively poor worst-case GC pause times on a specific
  benchmark:
    https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.rkt

  1) Is this expected? What is the GC algorithm for Racket's old generation?
  2) Can you make it better?

Long version:

## Context

James Fisher has a blog post on a case where GHC's runtime system
imposed unpleasant latencies/pauses on their Haskell program:

  https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/

The blog post proposes a very simple, synthetic benchmark that exhibit
the issue -- namely, if the old generation of the GC uses a copying
strategy, then those copies can incur long pauses when many large
objects are live in the old general.

I ported this synthetic benchmark to OCaml, and could check that the
OCaml GC suffers from no such issue, as its old generation uses
a mark&sweep strategy that does not copy old memory. The Haskell
benchmark has worst-case latencies around 50ms, which James Fisher
finds excessive. The OCaml benchmark has worst-case latencies around
3ms.

Max New did a port of the benchmark to Racket, which I later modified;
the results I see on my machine are relatively bad: the worst-case
pause time is between 120ms and 220ms on my tests.

I think that the results are mostly unrelated to the specific edge
case that this benchmark was designed to exercize (copies of large
objects in the old generation): if I change the inserted strings to be
of size 1 instead of 1024, I also observe fairly high latencies --
such as 120ms. So I'm mostly observing high latencies by inserting and
removing things from an immutable hash in a loop.

## Reproducing

The benchmark fills an (immutable) associative structure with strings
of length 1024 (the idea is to have relatively high memory usage per
pointer, to see large copy times), keeping at most 200,000 strings in
the working set. In total, it inserts 1,000,000 strings (and thus
removes 800,000, one after each insertion after the first 200,000). We
measure latencies rather than throughput, so the performance details
of the associative map structure do not matter.

My benchmark code in Haskell, OCaml and Racket can be found here:
  https://gitlab.com/gasche/gc-latency-experiment.git
  https://gitlab.com/gasche/gc-latency-experiment/tree/master
the Makefile contains my scripts to compile, run and analyze each
language's version.

To run the Racket benchmark with instrumentation

    PLTSTDERR=debug <at> GC racket main.rkt 2> racket-gc-log

To extract the pause times from the resulting log file (in the format
produced by Racket 6.5), I do:

   cat racket-gc-log | grep -v total | cut -d' ' -f7 | sort -n

Piping `| uniq --count` after that produces a poor-man histogram of
latencies. I get the following result on my machine:

      1 0ms
      2 1ms
      1 2ms
      1 3ms
      2 4ms
      1 5ms
      1 6ms
      3 8ms
     12 9ms
      1 11ms
      2 12ms
     38 13ms
    126 14ms
     43 15ms
     13 16ms
     19 17ms
      4 18ms
      1 19ms
      1 21ms
      1 48ms
      1 68ms
      1 70ms
      1 133ms
      1 165ms
      1 220ms
      1 227ms
      1 228ms


## Non-incremental vs. incremental GC

We experimented with PLT_INCREMENTAL_GC=1; on my machine, this does
not decrease the worst-case pause time; on Asumu Takikawa's beefier
machine, I think the pause times decreased a bit -- but still well
above 50ms. Because the incremental GC consumes sensibly more memory,
I am unable to test with both PLT_INCREMENTAL_GC and
PLTSTDERR=debug <at> gc enabled -- my system runs out of memory.

If I reduce the benchmark sizes in half (half the working size, half
the number of iteration), I can run the incremental GC with debugging
enabled. On this half instance, I observe the following results:

for the *non-incremental* GC:

      2 1ms
      1 2ms
      2 3ms
      2 4ms
      1 5ms
      1 6ms
      9 8ms
      2 9ms
      1 10ms
     38 13ms
     43 14ms
     13 15ms
      8 16ms
      5 17ms
      6 18ms
      1 44ms
      1 66ms
      1 75ms
      2 126ms
      1 136ms
      1 142ms

for the *incremental* GC

      2 1ms
      1 2ms
      2 3ms
      3 4ms
      1 5ms
     38 6ms
    155 7ms
    136 8ms
     78 9ms
     56 10ms
     28 11ms
     16 12ms
      2 14ms
      1 15ms
      1 16ms
      2 20ms
      1 32ms
      1 41ms
      1 61ms
      1 101ms
      1 148ms

As you can see, the incremental GC helps, as the distribution of
pauses moves to shorter pause times: it does more, shorter,
pauses. However, the worst-case pauses do not improve -- in fact they
are even a bit worse.

--
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to racket-dev-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAPFanBE6%2BMVxG44nHnw_HCsaNwZ-HSp4L7%2BVT-v-QJ%3Div-EK1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to racket-dev-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAK%3DHD%2BaVisQBOmg5PMyp0Ajqdi3DbSqwx6aRuxpPRc1GUWPJLw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Gabriel Scherer | 15 May 04:08 2016
Picon
Gravatar

Racket's worst-case GC latencies

Hi racket-devel,

Short version:
  Racket has relatively poor worst-case GC pause times on a specific
  benchmark:
    https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.rkt

  1) Is this expected? What is the GC algorithm for Racket's old generation?
  2) Can you make it better?

Long version:

## Context

James Fisher has a blog post on a case where GHC's runtime system
imposed unpleasant latencies/pauses on their Haskell program:

  https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/

The blog post proposes a very simple, synthetic benchmark that exhibit
the issue -- namely, if the old generation of the GC uses a copying
strategy, then those copies can incur long pauses when many large
objects are live in the old general.

I ported this synthetic benchmark to OCaml, and could check that the
OCaml GC suffers from no such issue, as its old generation uses
a mark&sweep strategy that does not copy old memory. The Haskell
benchmark has worst-case latencies around 50ms, which James Fisher
finds excessive. The OCaml benchmark has worst-case latencies around
3ms.

Max New did a port of the benchmark to Racket, which I later modified;
the results I see on my machine are relatively bad: the worst-case
pause time is between 120ms and 220ms on my tests.

I think that the results are mostly unrelated to the specific edge
case that this benchmark was designed to exercize (copies of large
objects in the old generation): if I change the inserted strings to be
of size 1 instead of 1024, I also observe fairly high latencies --
such as 120ms. So I'm mostly observing high latencies by inserting and
removing things from an immutable hash in a loop.

## Reproducing

The benchmark fills an (immutable) associative structure with strings
of length 1024 (the idea is to have relatively high memory usage per
pointer, to see large copy times), keeping at most 200,000 strings in
the working set. In total, it inserts 1,000,000 strings (and thus
removes 800,000, one after each insertion after the first 200,000). We
measure latencies rather than throughput, so the performance details
of the associative map structure do not matter.

My benchmark code in Haskell, OCaml and Racket can be found here:
  https://gitlab.com/gasche/gc-latency-experiment.git
  https://gitlab.com/gasche/gc-latency-experiment/tree/master
the Makefile contains my scripts to compile, run and analyze each
language's version.

To run the Racket benchmark with instrumentation

    PLTSTDERR=debug <at> GC racket main.rkt 2> racket-gc-log

To extract the pause times from the resulting log file (in the format
produced by Racket 6.5), I do:

   cat racket-gc-log | grep -v total | cut -d' ' -f7 | sort -n

Piping `| uniq --count` after that produces a poor-man histogram of
latencies. I get the following result on my machine:

      1 0ms
      2 1ms
      1 2ms
      1 3ms
      2 4ms
      1 5ms
      1 6ms
      3 8ms
     12 9ms
      1 11ms
      2 12ms
     38 13ms
    126 14ms
     43 15ms
     13 16ms
     19 17ms
      4 18ms
      1 19ms
      1 21ms
      1 48ms
      1 68ms
      1 70ms
      1 133ms
      1 165ms
      1 220ms
      1 227ms
      1 228ms

## Non-incremental vs. incremental GC

We experimented with PLT_INCREMENTAL_GC=1; on my machine, this does
not decrease the worst-case pause time; on Asumu Takikawa's beefier
machine, I think the pause times decreased a bit -- but still well
above 50ms. Because the incremental GC consumes sensibly more memory,
I am unable to test with both PLT_INCREMENTAL_GC and
PLTSTDERR=debug <at> gc enabled -- my system runs out of memory.

If I reduce the benchmark sizes in half (half the working size, half
the number of iteration), I can run the incremental GC with debugging
enabled. On this half instance, I observe the following results:

for the *non-incremental* GC:

      2 1ms
      1 2ms
      2 3ms
      2 4ms
      1 5ms
      1 6ms
      9 8ms
      2 9ms
      1 10ms
     38 13ms
     43 14ms
     13 15ms
      8 16ms
      5 17ms
      6 18ms
      1 44ms
      1 66ms
      1 75ms
      2 126ms
      1 136ms
      1 142ms

for the *incremental* GC

      2 1ms
      1 2ms
      2 3ms
      3 4ms
      1 5ms
     38 6ms
    155 7ms
    136 8ms
     78 9ms
     56 10ms
     28 11ms
     16 12ms
      2 14ms
      1 15ms
      1 16ms
      2 20ms
      1 32ms
      1 41ms
      1 61ms
      1 101ms
      1 148ms

As you can see, the incremental GC helps, as the distribution of
pauses moves to shorter pause times: it does more, shorter,
pauses. However, the worst-case pauses do not improve -- in fact they
are even a bit worse.

--

-- 
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+unsubscribe@...
To post to this group, send email to racket-dev@...
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAPFanBE6%2BMVxG44nHnw_HCsaNwZ-HSp4L7%2BVT-v-QJ%3Div-EK1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Alexis King | 8 May 01:48 2016
Picon
Gravatar

Performance of Racket in R7RS benchmarks

Hello,

I maintain the Racket r7rs package, which as far as I know, has not
gotten much use. Recently, however, someone has put together a set of
R7RS benchmarks and run it against various Scheme implementations,
including Racket, using my r7rs-lib package. The benchmarks themselves
are here:

https://www.nexoid.at/tmp/scheme-benchmark-r7rs.html

I’ve been working to fix any areas in which Racket’s speed is negatively
impacted by the code I’ve written, and I’ve fixed a few small issues.
However, there are definitely a few areas where the performance problems
live outside of my code, and Sam recommended I bring up at least one of
them here.

Specifically, the following program is fairly slow:

#lang racket/base

(define (catport in out)
  (let ((x (read-char in)))
    (unless (eof-object? x)
      (write-char x out)
      (catport in out))))

(define (go input-file output-file)
  (when (file-exists? output-file)
    (delete-file output-file))
  (call-with-input-file
      input-file
    (lambda (in)
      (call-with-output-file
          output-file
        (lambda (out)
          (catport in out))))))

It is a very simple cat implementation the benchmarks themselves invoke
it on a file about 4.5MB in size, and it takes about 15 seconds. This is
comparable with some other Scheme implementations, but it’s thoroughly
beaten by Bigloo, Chez, and Larceny, which are all in the 1-3 second
range. Replacing read-char and write-char with read-byte and write-byte
brings the time down to about 11 seconds, but it doesn’t change things
significantly. Is there any simple way this could be improved, or is the
time I’m getting just intrinsic to Racket’s implementation of I/O?

Alexis

--

-- 
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+unsubscribe@...
To post to this group, send email to racket-dev@...
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/C19327E2-AF25-4861-AFA2-1D50A01638DC%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Gmane