Jan Pfeifer | 1 Jul 01:21 2012
Picon

Re: memory management contention

I just re-ran some tests with Rob's version (binary-tree-freelist.go) and got some interesting results.

(Time measured with time(1), resident size and cpu peak measured using top on a MacOS)

Version in alioth.debian.org, only relying on Go's memory manager and GC: 
    56s to run, 584Mb on resident size, often CPU was idle(!?)

Version I did with large memory pools, but still relying on Go's GC: 
   12.4s to run, 330Mb, cpu peak ~400%
   Sometimes it locks, cpu goes to 0, and it takes longer, haven't figured out why. I assume something related to GC.

Rob's version, bypassing GC by keeping the nodes in a list and reusing: 
  13.5s to run, 274Mb, cpu peak 100%

Parallelized version of Rob's code, with 4 "Arena" objects (one per goroutine, so no contention):
  15.5s to run, 427Mb, cpu peak 400%

Parallelized version of Rob's code, with 4 "Arena" objects, but actually limited to only one goroutine at a time:
  14.2s to run, 427Mb, cpu peak 100%

It took me a while to understand that things won't speed up with parallel goroutines because all this is doing is accessing memory, so it is gated by the hardware memory speeds -- only one address can be accessed at a time.

And the GC does seem to be the cause of slowness on the version on the benchmark. By changing the Rob's code to allocate only one Node at a time (as opposed to a whole large batch), it only makes the system a few seconds slower. Meaning it's not overhead of the memory manager, but overhead of the GC that makes the version that is there so slow.

cheers, and thanks for the responses!
Jan

On Wed, Jun 27, 2012 at 12:15 PM, Han-Wen Nienhuys <hanwen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
You are trading space for speed. Did you also look at the memory consumption? I think the bottleneck generally is the garbage collector (ie. discovering what memory is reachable), and by allocating big blocks, you reduce to that problem to seeing if the big block is reachable.  

IIRC each thread has its own pool, so there should not be a lot contention for memory allocation.


On Wed, Jun 27, 2012 at 4:03 PM, Jan Pfeifer <pfeifer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
hi there,

I was looking at some language benchmarks (shootout.alioth.debian.org) the other day and got intrigued by why did the Go version did so poorly in binary-trees example:


It was easy to imagine contention on the memory manager for allocating new nodes, so I added a memory pool manager that allocated Nodes in batches (whole array at a time), and give them away without any locks -- see code below -- 4 to 5 times speed up in my laptop.

A few questions for those of you who know better:

1) Is there a faster way to implement a memory pool ? Meaning faster than allocating an array at a time, and yielding pointers to the individual elements ?

2) Will the GC deal correctly with pointers to the elements of the array ? And free the array only when all the individual pointers are freed ?

3) What is the bottle neck in Go's memory manager: locks (all goroutines are requesting memory at the same time) or simply the splitting and organization of the memory ?

thanks :)
Jan

 
ps.: Memory pool "batch allocation" code:
const BATCH_SIZE = 2048

type NodeBatchAlloc struct {
batch *[BATCH_SIZE]Node
used  int
}

func NewNodeBatchAlloc() (me *NodeBatchAlloc) {
me = new(NodeBatchAlloc)
me.batchAlloc()
return
}

func (me *NodeBatchAlloc) batchAlloc() {
me.batch = new([BATCH_SIZE]Node)
me.used = 0
}

func (me *NodeBatchAlloc) alloc() (node *Node) {
if me.used >= BATCH_SIZE {
me.batchAlloc()
}
node = &(*me.batch)[me.used]
me.used += 1
return
}




--
Han-Wen Nienhuys
Google Engineering Belo Horizonte
hanwen <at> google.com

Jan Pfeifer | 1 Jul 01:25 2012
Picon

Re: memory management contention

Btw, for comparison, the C version in http://shootout.alioth.debian.org/u64q/program.php?test=binarytrees&lang=gpp&id=6:

   6.9s to run, 113Mb resident size, 100% cpu (one cpu only).

So the numbers seem reasonable. 

thanks.

On Sat, Jun 30, 2012 at 4:21 PM, Jan Pfeifer <pfeifer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
I just re-ran some tests with Rob's version (binary-tree-freelist.go) and got some interesting results.

(Time measured with time(1), resident size and cpu peak measured using top on a MacOS)

Version in alioth.debian.org, only relying on Go's memory manager and GC: 
    56s to run, 584Mb on resident size, often CPU was idle(!?)

Version I did with large memory pools, but still relying on Go's GC: 
   12.4s to run, 330Mb, cpu peak ~400%
   Sometimes it locks, cpu goes to 0, and it takes longer, haven't figured out why. I assume something related to GC.

Rob's version, bypassing GC by keeping the nodes in a list and reusing: 
  13.5s to run, 274Mb, cpu peak 100%

Parallelized version of Rob's code, with 4 "Arena" objects (one per goroutine, so no contention):
  15.5s to run, 427Mb, cpu peak 400%

Parallelized version of Rob's code, with 4 "Arena" objects, but actually limited to only one goroutine at a time:
  14.2s to run, 427Mb, cpu peak 100%

It took me a while to understand that things won't speed up with parallel goroutines because all this is doing is accessing memory, so it is gated by the hardware memory speeds -- only one address can be accessed at a time.

And the GC does seem to be the cause of slowness on the version on the benchmark. By changing the Rob's code to allocate only one Node at a time (as opposed to a whole large batch), it only makes the system a few seconds slower. Meaning it's not overhead of the memory manager, but overhead of the GC that makes the version that is there so slow.

cheers, and thanks for the responses!
Jan

On Wed, Jun 27, 2012 at 12:15 PM, Han-Wen Nienhuys <hanwen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
You are trading space for speed. Did you also look at the memory consumption? I think the bottleneck generally is the garbage collector (ie. discovering what memory is reachable), and by allocating big blocks, you reduce to that problem to seeing if the big block is reachable.  

IIRC each thread has its own pool, so there should not be a lot contention for memory allocation.


On Wed, Jun 27, 2012 at 4:03 PM, Jan Pfeifer <pfeifer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
hi there,

I was looking at some language benchmarks (shootout.alioth.debian.org) the other day and got intrigued by why did the Go version did so poorly in binary-trees example:


It was easy to imagine contention on the memory manager for allocating new nodes, so I added a memory pool manager that allocated Nodes in batches (whole array at a time), and give them away without any locks -- see code below -- 4 to 5 times speed up in my laptop.

A few questions for those of you who know better:

1) Is there a faster way to implement a memory pool ? Meaning faster than allocating an array at a time, and yielding pointers to the individual elements ?

2) Will the GC deal correctly with pointers to the elements of the array ? And free the array only when all the individual pointers are freed ?

3) What is the bottle neck in Go's memory manager: locks (all goroutines are requesting memory at the same time) or simply the splitting and organization of the memory ?

thanks :)
Jan

 
ps.: Memory pool "batch allocation" code:
const BATCH_SIZE = 2048

type NodeBatchAlloc struct {
batch *[BATCH_SIZE]Node
used  int
}

func NewNodeBatchAlloc() (me *NodeBatchAlloc) {
me = new(NodeBatchAlloc)
me.batchAlloc()
return
}

func (me *NodeBatchAlloc) batchAlloc() {
me.batch = new([BATCH_SIZE]Node)
me.used = 0
}

func (me *NodeBatchAlloc) alloc() (node *Node) {
if me.used >= BATCH_SIZE {
me.batchAlloc()
}
node = &(*me.batch)[me.used]
me.used += 1
return
}




--
Han-Wen Nienhuys
Google Engineering Belo Horizonte
hanwen <at> google.com


Dave Cheney | 1 Jul 01:42 2012
Picon

Re: memory management contention

When CPU usage drops to zero, is your machine paging?

On 01/07/2012, at 9:21, Jan Pfeifer <pfeifer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

I just re-ran some tests with Rob's version (binary-tree-freelist.go) and got some interesting results.

(Time measured with time(1), resident size and cpu peak measured using top on a MacOS)

Version in alioth.debian.org, only relying on Go's memory manager and GC: 
    56s to run, 584Mb on resident size, often CPU was idle(!?)

Version I did with large memory pools, but still relying on Go's GC: 
   12.4s to run, 330Mb, cpu peak ~400%
   Sometimes it locks, cpu goes to 0, and it takes longer, haven't figured out why. I assume something related to GC.

Rob's version, bypassing GC by keeping the nodes in a list and reusing: 
  13.5s to run, 274Mb, cpu peak 100%

Parallelized version of Rob's code, with 4 "Arena" objects (one per goroutine, so no contention):
  15.5s to run, 427Mb, cpu peak 400%

Parallelized version of Rob's code, with 4 "Arena" objects, but actually limited to only one goroutine at a time:
  14.2s to run, 427Mb, cpu peak 100%

It took me a while to understand that things won't speed up with parallel goroutines because all this is doing is accessing memory, so it is gated by the hardware memory speeds -- only one address can be accessed at a time.

And the GC does seem to be the cause of slowness on the version on the benchmark. By changing the Rob's code to allocate only one Node at a time (as opposed to a whole large batch), it only makes the system a few seconds slower. Meaning it's not overhead of the memory manager, but overhead of the GC that makes the version that is there so slow.

cheers, and thanks for the responses!
Jan

On Wed, Jun 27, 2012 at 12:15 PM, Han-Wen Nienhuys <hanwen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
You are trading space for speed. Did you also look at the memory consumption? I think the bottleneck generally is the garbage collector (ie. discovering what memory is reachable), and by allocating big blocks, you reduce to that problem to seeing if the big block is reachable.  

IIRC each thread has its own pool, so there should not be a lot contention for memory allocation.


On Wed, Jun 27, 2012 at 4:03 PM, Jan Pfeifer <pfeifer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
hi there,

I was looking at some language benchmarks (shootout.alioth.debian.org) the other day and got intrigued by why did the Go version did so poorly in binary-trees example:


It was easy to imagine contention on the memory manager for allocating new nodes, so I added a memory pool manager that allocated Nodes in batches (whole array at a time), and give them away without any locks -- see code below -- 4 to 5 times speed up in my laptop.

A few questions for those of you who know better:

1) Is there a faster way to implement a memory pool ? Meaning faster than allocating an array at a time, and yielding pointers to the individual elements ?

2) Will the GC deal correctly with pointers to the elements of the array ? And free the array only when all the individual pointers are freed ?

3) What is the bottle neck in Go's memory manager: locks (all goroutines are requesting memory at the same time) or simply the splitting and organization of the memory ?

thanks :)
Jan

 
ps.: Memory pool "batch allocation" code:
const BATCH_SIZE = 2048

type NodeBatchAlloc struct {
batch *[BATCH_SIZE]Node
used  int
}

func NewNodeBatchAlloc() (me *NodeBatchAlloc) {
me = new(NodeBatchAlloc)
me.batchAlloc()
return
}

func (me *NodeBatchAlloc) batchAlloc() {
me.batch = new([BATCH_SIZE]Node)
me.used = 0
}

func (me *NodeBatchAlloc) alloc() (node *Node) {
if me.used >= BATCH_SIZE {
me.batchAlloc()
}
node = &(*me.batch)[me.used]
me.used += 1
return
}




--
Han-Wen Nienhuys
Google Engineering Belo Horizonte
hanwen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org

Jan | 1 Jul 01:55 2012
Picon

Re: memory management contention

Ha, I must have missed your email earlier. I didn't notice the "interesting alternatives" in the bottom:


http://shootout.alioth.debian.org/u64q/performance.php?test=binarytrees

Thanks for pointing out. There a couple of other there that are interesting as well.

On Wednesday, June 27, 2012 5:51:21 PM UTC-7, Isaac Gouy wrote:
And, for the last couple of years, that program has been shown as -- binary-trees Go #2 program


On Wednesday, June 27, 2012 12:32:24 PM UTC-7, Rob 'Commander' Pike wrote:
http://golang.org/test/bench/shootout/binary-tree-freelist.go
Sameer Ajmani | 1 Jul 01:55 2012

Re: emacs go mode fatal error

Hi Brad,

This submit a few weeks back addressed the most painful paste bug:
http://codereview.appspot.com/6139066/
It should be included in Go 1.0.2.

On Feb 17, 2012 1:47 PM, "brad clawsie" <brad.clawsie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
cut and pasting go code that has been commented out routinely leads to
emacs itself falling over for me. just an fyi in case a maintainer
reads this list.

thanks
brad
Jan | 1 Jul 01:59 2012
Picon

Re: memory management contention

Not that I noticed, but that could be it. 


But my laptop has 4gb of ram, and I was not doing much else while running this. I did run a few times, so even if it paged the first time, I wouldn't expect it to page on the later runs.

Not sure how Mac manages memory, I'm more used to run things in linux -- should have run this test there, but I was in an airplane when I was running this.

thx
On Saturday, June 30, 2012 4:42:07 PM UTC-7, Dave Cheney wrote:
When CPU usage drops to zero, is your machine paging?

On 01/07/2012, at 9:21, Jan Pfeifer <pfeifer <at> gmail.com> wrote:

I just re-ran some tests with Rob's version (binary-tree-freelist.go) and got some interesting results.

(Time measured with time(1), resident size and cpu peak measured using top on a MacOS)

Version in alioth.debian.org, only relying on Go's memory manager and GC: 
    56s to run, 584Mb on resident size, often CPU was idle(!?)

Version I did with large memory pools, but still relying on Go's GC: 
   12.4s to run, 330Mb, cpu peak ~400%
   Sometimes it locks, cpu goes to 0, and it takes longer, haven't figured out why. I assume something related to GC.

Rob's version, bypassing GC by keeping the nodes in a list and reusing: 
  13.5s to run, 274Mb, cpu peak 100%

Parallelized version of Rob's code, with 4 "Arena" objects (one per goroutine, so no contention):
  15.5s to run, 427Mb, cpu peak 400%

Parallelized version of Rob's code, with 4 "Arena" objects, but actually limited to only one goroutine at a time:
  14.2s to run, 427Mb, cpu peak 100%

It took me a while to understand that things won't speed up with parallel goroutines because all this is doing is accessing memory, so it is gated by the hardware memory speeds -- only one address can be accessed at a time.

And the GC does seem to be the cause of slowness on the version on the benchmark. By changing the Rob's code to allocate only one Node at a time (as opposed to a whole large batch), it only makes the system a few seconds slower. Meaning it's not overhead of the memory manager, but overhead of the GC that makes the version that is there so slow.

cheers, and thanks for the responses!
Jan

On Wed, Jun 27, 2012 at 12:15 PM, Han-Wen Nienhuys <hanwen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
You are trading space for speed. Did you also look at the memory consumption? I think the bottleneck generally is the garbage collector (ie. discovering what memory is reachable), and by allocating big blocks, you reduce to that problem to seeing if the big block is reachable.  

IIRC each thread has its own pool, so there should not be a lot contention for memory allocation.


On Wed, Jun 27, 2012 at 4:03 PM, Jan Pfeifer <pfeifer <at> gmail.com> wrote:
hi there,

I was looking at some language benchmarks (shootout.alioth.debian.org) the other day and got intrigued by why did the Go version did so poorly in binary-trees example:


It was easy to imagine contention on the memory manager for allocating new nodes, so I added a memory pool manager that allocated Nodes in batches (whole array at a time), and give them away without any locks -- see code below -- 4 to 5 times speed up in my laptop.

A few questions for those of you who know better:

1) Is there a faster way to implement a memory pool ? Meaning faster than allocating an array at a time, and yielding pointers to the individual elements ?

2) Will the GC deal correctly with pointers to the elements of the array ? And free the array only when all the individual pointers are freed ?

3) What is the bottle neck in Go's memory manager: locks (all goroutines are requesting memory at the same time) or simply the splitting and organization of the memory ?

thanks :)
Jan

 
ps.: Memory pool "batch allocation" code:
const BATCH_SIZE = 2048

type NodeBatchAlloc struct {
batch *[BATCH_SIZE]Node
used  int
}

func NewNodeBatchAlloc() (me *NodeBatchAlloc) {
me = new(NodeBatchAlloc)
me.batchAlloc()
return
}

func (me *NodeBatchAlloc) batchAlloc() {
me.batch = new([BATCH_SIZE]Node)
me.used = 0
}

func (me *NodeBatchAlloc) alloc() (node *Node) {
if me.used >= BATCH_SIZE {
me.batchAlloc()
}
node = &(*me.batch)[me.used]
me.used += 1
return
}




--
Han-Wen Nienhuys
Google Engineering Belo Horizonte
hanwen <at> google.com

Henrik Johansson | 1 Jul 02:52 2012
Picon

Re: Re: [ANN] GoSublime - SublimeText 2 Golang Completion Plugin

Awesome work!

Den 30 jun 2012 20:57 skrev "DisposaBoy" <disposaboy-lhxDjxhMuWs@public.gmane.org>:
Update

demos at http://www.youtube.com/user/GoSublime/videos (pretty much nothing there yet)

the summary: GoSublime is a Golang plugin collection for the text editor SublimeText 2 providing code completion and other IDE-like features.

A lot has changed since the last announcement

main features:

  • code completion from Gocode
  • context aware snippets via the code-completion popup to complement the existing SublimeText Go package.
  • sublime build system(ctrl+b) allowing to run any command your shell will accept with some focus(and enhancements) on the gocommand including support for cycling errors with F4/Shift+F4
  • lint/syntax check as you type
  • quickly to any syntax error reported (and jump back to where you were before (across files))
  • quickly fmt your source or automatically on save to conform with the Go standards
  • easily create a new go file and run it without needing to save it first (go play)
  • share your snippets(anything in the loaded file) on play.golang.org
  • list declarations in the current file
  • automtically add/remove package imports
  • quickly jump your import section(automatically goes to the last import) where you can easily edit the pkg alias and return to where you were before
  • go to definition of a package function or constant, etc.
  • show the source(and thus documentation) of a variable without needing change views






David Klaffenbach | 1 Jul 04:14 2012

Re: Re: Go(lang) VS Generics: What are the USE CASES?

http://play.golang.org/p/-krpoaS1Ti

On Saturday, June 30, 2012 3:35:02 PM UTC-5, aam wrote:

> func memo(f func(int) int) func(int) int {

it's a bit contrived and probably a more elegant solution can be
found, but off the top of my head i came up with this:

http://play.golang.org/p/m-gH-U_Zws

add10 and addten can be combined into one, but are separate here for
illustrative purposes.
David Klaffenbach | 1 Jul 04:17 2012

Re: Re: Go(lang) VS Generics: What are the USE CASES?

This is more telling: http://play.golang.org/p/YZlpwMze-2


I don't understand why this doesn't panic at runtime.

On Saturday, June 30, 2012 9:14:43 PM UTC-5, David Klaffenbach wrote:
http://play.golang.org/p/-krpoaS1Ti

On Saturday, June 30, 2012 3:35:02 PM UTC-5, aam wrote:
> func memo(f func(int) int) func(int) int {

it's a bit contrived and probably a more elegant solution can be
found, but off the top of my head i came up with this:

http://play.golang.org/p/m-gH-U_Zws

add10 and addten can be combined into one, but are separate here for
illustrative purposes.
andrey mirtchovski | 1 Jul 04:21 2012
Picon

Re: Re: Go(lang) VS Generics: What are the USE CASES?

> I don't understand why this doesn't panic at runtime.

because every time implements the empty interface. presumably in the
two functions I put there one would want to emit errors if they
encountered the incorrect type (that follows from he fact that they
can be combined).

this isn't type-checked generics. the code side-steps the type system
to gain a bit of generality at the expense of safety.


Gmane