1 Jul 2012 01:21
Re: memory management contention
I just re-ran some tests with Rob's version (binary-tree-freelist.go) and got some interesting results.
(Time measured with time(1), resident size and cpu peak measured using top on a MacOS)
Version in alioth.debian.org, only relying on Go's memory manager and GC:
56s to run, 584Mb on resident size, often CPU was idle(!?)
Version I did with large memory pools, but still relying on Go's GC:
12.4s to run, 330Mb, cpu peak ~400%
Sometimes it locks, cpu goes to 0, and it takes longer, haven't figured out why. I assume something related to GC.
Rob's version, bypassing GC by keeping the nodes in a list and reusing:
13.5s to run, 274Mb, cpu peak 100%
Parallelized version of Rob's code, with 4 "Arena" objects (one per goroutine, so no contention):
15.5s to run, 427Mb, cpu peak 400%
Parallelized version of Rob's code, with 4 "Arena" objects, but actually limited to only one goroutine at a time:
14.2s to run, 427Mb, cpu peak 100%
It took me a while to understand that things won't speed up with parallel goroutines because all this is doing is accessing memory, so it is gated by the hardware memory speeds -- only one address can be accessed at a time.
And the GC does seem to be the cause of slowness on the version on the benchmark. By changing the Rob's code to allocate only one Node at a time (as opposed to a whole large batch), it only makes the system a few seconds slower. Meaning it's not overhead of the memory manager, but overhead of the GC that makes the version that is there so slow.
cheers, and thanks for the responses!
Jan
On Wed, Jun 27, 2012 at 12:15 PM, Han-Wen Nienhuys <hanwen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
You are trading space for speed. Did you also look at the memory consumption? I think the bottleneck generally is the garbage collector (ie. discovering what memory is reachable), and by allocating big blocks, you reduce to that problem to seeing if the big block is reachable.
IIRC each thread has its own pool, so there should not be a lot contention for memory allocation.--On Wed, Jun 27, 2012 at 4:03 PM, Jan Pfeifer <pfeifer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:hi there,I was looking at some language benchmarks (shootout.alioth.debian.org) the other day and got intrigued by why did the Go version did so poorly in binary-trees example:It was easy to imagine contention on the memory manager for allocating new nodes, so I added a memory pool manager that allocated Nodes in batches (whole array at a time), and give them away without any locks -- see code below -- 4 to 5 times speed up in my laptop.A few questions for those of you who know better:1) Is there a faster way to implement a memory pool ? Meaning faster than allocating an array at a time, and yielding pointers to the individual elements ?2) Will the GC deal correctly with pointers to the elements of the array ? And free the array only when all the individual pointers are freed ?3) What is the bottle neck in Go's memory manager: locks (all goroutines are requesting memory at the same time) or simply the splitting and organization of the memory ?thanks :)Janps.: Memory pool "batch allocation" code:const BATCH_SIZE = 2048type NodeBatchAlloc struct {batch *[BATCH_SIZE]Nodeused int}func NewNodeBatchAlloc() (me *NodeBatchAlloc) {me = new(NodeBatchAlloc)me.batchAlloc()return}func (me *NodeBatchAlloc) batchAlloc() {me.batch = new([BATCH_SIZE]Node)me.used = 0}func (me *NodeBatchAlloc) alloc() (node *Node) {if me.used >= BATCH_SIZE {me.batchAlloc()}node = &(*me.batch)[me.used]me.used += 1return}
Han-Wen Nienhuys
Google Engineering Belo Horizonte
hanwen <at> google.com
RSS Feed