Richard Brooksby | 10 Jul 12:28 2014

[Gc] C optimizer hazards in practice says:

> Some C optimizers may lose the last undisguised pointer to a memory object as a consequence of clever
optimizations. This has almost never been observed in practice.

Does anyone have any stories to tell about when this *has* been observed, or any links to accounts of this happening?

Background: The Memory Pool System <> team is currently
investigating a case of the Microsoft C compiler optimising away the last reference to an object, causing
it to be prematurely recycled and corrupt the heap.  See
<> for details,
including repro, disassembly, analysis, etc.
Lucas Meijer | 30 Jun 10:24 2014

[Gc] adding profiling callbacks


Unity is a game development tool that for some platforms uses Boehm to collect garbage generated by our users .NET code.

Our builtin profiler can profile all the different game subsystems we have, and I've added support to it for also profiling the boehm collector.

I've added profiling callbacks to our version of boehm, and would like to ask this list if there's any interest in taking these patches. (the friendly folks at the mono project did pretty much the same for their fork of boehm).

Here's the patches we have now. I'd be more than happy to adapt/fix any concerns they might have.

Bye, Lucas

diff --git a/External/bdwgc/alloc.c b/External/bdwgc/alloc.c
--- a/External/bdwgc/alloc.c
+++ b/External/bdwgc/alloc.c
 <at>  <at>  -347,6 +347,18  <at>  <at> 
 STATIC GC_bool GC_stopped_mark(GC_stop_func stop_func);
 STATIC void GC_finish_collection(void);

+static GC_event_callback_func GC_event_callback = NULL;
+void GC_set_event_callback(GC_event_callback_func func)
+  GC_event_callback = func;
+GC_event_callback_func GC_get_event_callback()
+  return GC_event_callback;
  * Initiate a garbage collection if appropriate.
  * Choose judiciously
 <at>  <at>  -420,6 +432,10  <at>  <at> 
 #   endif
     if (GC_dont_gc || (*stop_func)()) return FALSE;
+    if (GC_event_callback)
+      GC_event_callback (GC_EVENT_START, NULL);
     if (GC_incremental && GC_collection_in_progress()) {
             "GC_try_to_collect_inner: finishing collection in progress\n");
 <at>  <at>  -476,6 +492,10  <at>  <at> 
 #   endif
+    if (GC_event_callback)
+      GC_event_callback (GC_EVENT_END, NULL);

 <at>  <at>  -607,10 +627,20  <at>  <at> 
 #   endif

+    if (GC_event_callback)
+      GC_event_callback (GC_EVENT_PRE_STOP_WORLD, NULL);
       GC_world_stopped = TRUE;
 #   endif
+    if (GC_event_callback)
+    {
+      GC_event_callback (GC_EVENT_POST_STOP_WORLD, NULL);
+      GC_event_callback (GC_EVENT_MARK_START, NULL);
+    }
         /* Output blank line for convenience here */
               "\n--> Marking for collection #%lu after %lu allocated bytes\n",
 <at>  <at>  -632,10 +662,19  <at>  <at> 
             GC_COND_LOG_PRINTF("Abandoned stopped marking after"
                                " %u iterations\n", i);
             GC_deficit = i;     /* Give the mutator a chance.   */
+            if (GC_event_callback)
+            {
+              GC_event_callback (GC_EVENT_MARK_END, NULL);
+              GC_event_callback (GC_EVENT_PRE_START_WORLD, NULL);
+            }
 #           ifdef THREAD_LOCAL_ALLOC
               GC_world_stopped = FALSE;
 #           endif
+            if (GC_event_callback)
+              GC_event_callback (GC_EVENT_POST_START_WORLD, NULL);
           if (GC_mark_some(GC_approx_sp())) break;
 <at>  <at>  -656,7 +695,18  <at>  <at> 
       GC_world_stopped = FALSE;
 #   endif
+    if (GC_event_callback)
+    {
+      GC_event_callback (GC_EVENT_MARK_END, NULL);
+      GC_event_callback (GC_EVENT_PRE_START_WORLD, NULL);
+    }
+    if (GC_event_callback)
+      GC_event_callback (GC_EVENT_POST_START_WORLD, NULL);
 #   ifndef SMALL_CONFIG
       if (GC_PRINT_STATS_FLAG) {
         unsigned long time_diff;
diff --git a/External/bdwgc/include/gc.h b/External/bdwgc/include/gc.h
--- a/External/bdwgc/include/gc.h
+++ b/External/bdwgc/include/gc.h
 <at>  <at>  -105,6 +105,25  <at>  <at> 
 /* Public R/W variables */
 /* The supplied setter and getter functions are preferred for new code. */

+typedef enum {
+} GCEventType;
+typedef void * (GC_CALLBACK * GC_event_callback_func)(GCEventType eventType, void* data);
+GC_API void GC_CALL GC_set_event_callback(GC_event_callback_func);
+GC_API GC_event_callback_func GC_CALL GC_get_event_callback(void);
 typedef void * (GC_CALLBACK * GC_oom_func)(size_t /* bytes_requested */);
                         /* When there is insufficient memory to satisfy */
diff --git a/External/bdwgc/pthread_stop_world.c b/External/bdwgc/pthread_stop_world.c
--- a/External/bdwgc/pthread_stop_world.c
+++ b/External/bdwgc/pthread_stop_world.c
 <at>  <at>  -505,6 +505,9  <at>  <at> 
                 case 0:
+                    GC_get_event_callback_func cb = GC_get_event_callback();
+                    if (cb)
+                      cb(GC_EVENT_SUSPENDED_THREAD, p->id);
                     ABORT_ARG1("pthread_kill failed at suspend",
 <at>  <at>  -829,6 +832,9  <at>  <at> 
                 case 0:
+                    GC_get_event_callback_func cb = GC_get_event_callback();
+                    if (cb)
+                      cb(GC_EVENT_UNSUSPENDED_THREAD, p->id);
                     ABORT_ARG1("pthread_kill failed at resume",
bdwgc mailing list
Stefan Kral | 25 Jun 13:26 2014

[Gc] Supplying hugepages to the GC

Hello everyone.

I have had good results speeding up some big lookup-table accesses by backing them with hugepages (and initializing them upon application startup.)

Is there a straightforward way I could configure / instruct the gc to alloc the first few heap chunks using hugepages when available? Possibly by letting it alloc memory with (hugepage backed) malloc instead of mmap or sbrk?

Regards, Stefan Kral.

bdwgc mailing list
Paul Bone | 25 Jun 03:34 2014

[Gc] glibc 2.19 lock elision bug


I wrote earlier about a bug I was trying to find regarding lock elision in
glibc 2.19 that affects Boehm GC.  I beleive that this affects all
applications that use Boehm GC, not just Mercury, as mono applications have
also been crashing since I upgraded glibc.  I would like some feedback and
help with the following change:

The git branch containing this change can be found here:

Thank you.

From eb31ad476a0a3b4125202bd7628a9ab3cfb634d6 Mon Sep 17 00:00:00 2001
From: Paul Bone <paul@...>
Date: Wed, 25 Jun 2014 11:17:50 +1000
Subject: [PATCH] Workaround Linux NTPL lock elision bug.

glibc 2.19 on Linux x86-64 platforms includes support for lock elision,
by using Intel's TSX support when it is available.  Without modifying an
application this converts suitable critical sections that use mutex into
transactional memory critical sections.  See
If a problem occurs that means that transactional memory can't be used, such
as a system call or buffer overflow, the pthreads implementation will catch
this error and retry the critical section using a normal mutex.

I noticed that since upgrading glibc that programs using Boehm GC crash, one
of these crashes was an assertion that the owner field of a mutex was
invalid.  The assertion was generated by the pthreads implementation.
I believe that there is a bug in glibc that when a mutex cannot be used
safely for transactions that some series of events causes it's owner field
to be set incorrectly (or cleared when it shouldn't be).

I've found that I can work around this problem by having Boehm GC use an
error checking mutex, which I believe doesn't use lock elision and in my
testing doesn't crash.

XXX: This work-around mostly works except for linking the feature detection
in to the conditional compilation in pthread_support.c as there
isn't an obvious way to make it work for automake and
Could I have some help updating the build system please?

    Define GC_setup_mark_lock()  This procedure creates the lock specifying a
    pthread_mutexattr_t structure.  This is used to disable lock elision on
    Linux with glibc 2.19.
    If we're using Linux then check for the gnu extensions required to
    identify the version of glibc at runtime.

    Call GC_setup_mark_lock() when initialising the collector.
---                      | 13 +++++++++++++
 include/private/pthread_support.h |  2 ++
 misc.c                            |  3 +++
 pthread_support.c                 | 39 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/ b/
index 7667949..6853e97 100644
--- a/
+++ b/
 <at>  <at>  -646,6 +646,19  <at>  <at>  case "$host" in
  *) AC_MSG_RESULT(no) ;;

+dnl Check for specific glibc functions and definitions that we need to for
+dnl the glibc 2.19 workaround.
+case "${host}" in
+  *-linux*)
+    AC_CHECK_HEADER([gnu/libc-version.h], HAVE_LIBC_VERSION_H=yes)
+    AC_CHECK_FUNC([gnu_get_libc_version], HAVE_GNU_GET_LIBC_VERSION=yes)
+    ;;
 dnl Include defines that have become de facto standard.
 dnl in the startup code.
diff --git a/include/private/pthread_support.h b/include/private/pthread_support.h
index 525a9aa..017f194 100644
--- a/include/private/pthread_support.h
+++ b/include/private/pthread_support.h
 <at>  <at>  -148,6 +148,8  <at>  <at>  GC_INNER_PTHRSTART GC_thread GC_start_rtn_prepare_thread(
                                         struct GC_stack_base *sb, void *arg);
 GC_INNER_PTHRSTART void GC_thread_exit_proc(void *);

+GC_INNER void GC_setup_mark_lock(void);
 #endif /* GC_PTHREADS && !GC_WIN32_THREADS */

 #endif /* GC_PTHREAD_SUPPORT_H */
diff --git a/misc.c b/misc.c
index df434a1..dccf5f3 100644
--- a/misc.c
+++ b/misc.c
 <at>  <at>  -875,6 +875,9  <at>  <at>  GC_API void GC_CALL GC_init(void)
         /* else */ InitializeCriticalSection (&GC_allocate_ml);
 #   endif /* GC_WIN32_THREADS */
+#   if (defined(GC_PTHREADS) && !defined(GC_WIN32_THREADS))
+     GC_setup_mark_lock();
+#   endif /* GC_PTHREADS */
 #   if (defined(MSWIN32) || defined(MSWINCE)) && defined(THREADS)
 #   endif
diff --git a/pthread_support.c b/pthread_support.c
index c00b93d..49b33d0 100644
--- a/pthread_support.c
+++ b/pthread_support.c
 <at>  <at>  -95,6 +95,10  <at>  <at> 
   typedef unsigned int sem_t;
 #endif /* GC_DGUX386_THREADS */

+# include <gnu/libc-version.h>
 /* Undefine macros used to redirect pthread primitives. */
 # undef pthread_create
 <at>  <at>  -1973,12 +1977,47  <at>  <at>  GC_INNER void GC_lock(void)
   /* defined.                                                           */
   static pthread_mutex_t mark_mutex =
         {0, 0, 0, PTHREAD_MUTEX_ERRORCHECK_NP, {0, 0}};
+  static pthread_mutex_t mark_mutex;
   static pthread_mutex_t mark_mutex = PTHREAD_MUTEX_INITIALIZER;

 static pthread_cond_t builder_cv = PTHREAD_COND_INITIALIZER;

+GC_INNER void GC_setup_mark_lock(void)
+    char *version_str;
+    pthread_mutexattr_t attr;
+    if (0 != pthread_mutexattr_init(&attr)) {
+        goto error;
+    }
+    version_str = gnu_get_libc_version;
+    if (0 == strcmp("2.19", version_str))
+    {
+        /*
+         * Disable lock elision on this version of glibc.
+         */
+        if (0 != pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK)) {
+            goto error;
+        }
+    }
+    if (0 != pthread_mutex_init(&mark_mutex, &attr)) {
+        goto error;
+    }
+    pthread_mutexattr_destroy(&attr);
+    return;
+    perror("Error setting up marker mutex");
+    exit(1);
 GC_INNER void GC_acquire_mark_lock(void)
     GC_ASSERT(GC_mark_lock_holder != NUMERIC_THREAD_ID(pthread_self()));


Paul Bone
Daniel R. Grayson | 25 Jun 01:46 2014

[Gc] parallel speedup

In our application that uses libgc (see I observe no
speedup when running tasks in parallel, if the tasks allocate memory using
libgc.  Perhaps I'm doing something wrong.  Are there any commonly observed
situations where no speedup occurs?

A glance at the source code shows that mutex locks lock down the world on
almost every occasion, so it's hard to see why there would ever be any speedup
when using threads.

bdwgc mailing list
Stefan Kral | 23 Jun 23:05 2014

[Gc] Freeing thread-local storage

Hello everyone!

I have an application that runs the GC multi-threaded with TLS enabled.

Could you please recommend a scalable way of freeing memory (of some known size, say, a few granules; type "NORMAL") --- putting it back directly into the proper TLS free-lists used internally by the GC?

Stefan Kral.
bdwgc mailing list
Stefan Kral | 21 Jun 11:37 2014

[Gc] Is GC_ALWAYS_MULTITHREADED safe with single-threaded case

Hello everyone.

My multi-threaded application remains single-threaded for some time after starting up. The first few collections are likely to occur while single-threaded.

I wonder if it is safe for me to define GC_ALWAYS_MULTITHREADED when building the GC.

Also, I am curious to know if using the #define saves some CPU cycles---compared to calling GC_allow_register_threads after initializing the GC.

Best Regards,

bdwgc mailing list
Ivan Maidanski | 20 Jun 23:08 2014

[Gc] Fwd: Re: [libatomic_ops] Sparc v8 FTBS (#9)


I think CFLAGS=-DAO_NO_SPARC_V9 should help you.
Of course, it would be good if someone (you?) contributed support of v8 (as said in TODO at tail of

Fri, 20 Jun 2014 10:09:58 -0700 from cb88 <notifications <at>>:

Would it be possible to disable any features that currently depend on Sparc v8+/v9 or implement the missing v8 support code?

I'm building gentoo sparc32 stages and this is an issue I am running into.

>>> Unpacking source... >>> ... * econf: updating gc-7.2alpha4/libatomic_ops/config.sub with /usr/share/gnuconfig/config.sub ./configure --prefix=/usr --build=sparc-unknown-linux-gnu --host=sparc-unknown-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib --disable-dependency-tracking checking build system type... sparc-unknown-linux-gnu ... checking for sparc-unknown-linux-gnu-gcc option to accept ISO C89... none needed checking dependency style of sparc-unknown-linux-gnu-gcc... none checking whether sparc-unknown-linux-gnu-gcc and cc understand -c and -o together... yes checking dependency style of sparc-unknown-linux-gnu-gcc... none checking how to run the C preprocessor... sparc-unknown-linux-gnu-gcc -E ... config.status: executing default commands >>> Source configured. >>> Compiling source in /var/tmp/portage/dev-libs/libatomic_ops-7.2_alpha4/work/gc-7.2alpha4/libatomic_ops ... make -j32 AR=sparc-unknown-linux-gnu-ar ... sparc-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -I. -fPIC -O2 -pipe -mcpu=v8 -DNDEBUG -c atomic_ops_malloc.c atomic_ops_malloc.c: In function ‘msb’: atomic_ops_malloc.c:223:2: warning: right shift count >= width of type [enabled by default] {standard input}: Assembler messages: {standard input}:35: Error: Architecture mismatch on "membar". {standard input}:35: (Requires v9|v9a|v9b; requested architecture is v8.) {standard input}:36: Error: Architecture mismatch on "cas". {standard input}:36: (Requires v9|v9a|v9b; requested architecture is v8.) ... make[3]: *** [atomic_ops_stack.o] Error 1 make[3]: *** Waiting for unfinished jobs.... ...

Reply to this email directly or view it on GitHub.

bdwgc mailing list
Ivan Maidanski | 15 Jun 14:10 2014

[Gc] Fwd: Re: GC and Libatomic_ops 7.2f / 7.4.2 releases (tarballs)

Forwarding to ML...

-------- Forwarded message --------
From: Hans Boehm <boehm <at>>
To: Ivan Maidanski <ivmai <at>>
Date: Sun, 15 Jun 2014, 02:03 +04:00
Subj: Re: GC and Libatomic_ops 7.2f / 7.4.2 releases (tarballs)

This should finally have been uploaded to both the official web site and the semi-secret low bandwidth mirror at  ...  Please don't advertise the latter widely, but it's OK to use it if the other server is down (which hopefully will be very rare).

Thanks again.


On Sat, Jun 7, 2014 at 4:31 PM, Hans Boehm  < boehm <at> > wrote:
>Sorry that I've been so slow at this. I'm traveling, but will do so asap. Thanks.
>On Jun 3, 2014 3:24 PM, "Ivan Maidanski" < ivmai <at> > wrote:
>>Hello Hans,
>>I've prepared new bug-fix releases (7.2f and 7.4.2) for GC- please check them and publish (tarballs attached).
>>Libatomic_ops releases (7.2f and 7.4.2) are already published at (as for all gc 7.2 releases, gc tarball contains copy of libatomic_ops).
>>Thank you

bdwgc mailing list
Eli Zaretskii | 12 Jun 19:24 2014

Re: [Gc] "GC_is_visble_test failed" error on MS-Windows

> From: Ivan Maidanski <ivmai@...>
> Cc: bdwgc@...
> Date: Thu, 12 Jun 2014 12:44:16 +0400
> The config you sent corresponds to single-threaded one
> (GC_WIN32_THREAD is not defined)

Yes.  This is the configuration that gives me trouble, the one
configured with --disable-threads.

> - I checked it with MinGW:
> gcc -I include -DHAVE_CONFIG_H tests/test.c extra/gc.c
> So, I don't understand why you say it uses Win32 threads.

Sorry, I must have failed to explain myself.  I didn't say this
configuration used threads, I said that the configuration that omits
the --enable-threads option altogether uses Win32 threads.

While it sounds reasonable to use Win32 threads when none of the
--enable-threads and --disable-threads switches were specified, my
question is why the library configured with --disable-threads causes
the "GC_is_visble_test failed" error.  How is that error related to

> If you pass --disable-threads to the configure script, the
> built library does not use any threading API and does not have any support threads in the client app.

Right, I understand that.

> If your app uses pthreads but GC is not supporting pthreads (in case of MnGW, GC_WIN32_PTHREADS not
defined) then garbage collection will malfunction.

No, the application didn't use threads, because I configured it not to
do so.  Are you saying that the "GC_is_visble_test failed" error is a
sign that the application tries to use some interface to libgc that
requires threading, and that is why the error happens?

Thanks again for your help and explanations.
Christian Schafmeister | 7 Jun 17:42 2014

[Gc] Does GC_get_heap_size() accurately report the total amount of heap used by the Boehm GC?

I started a new thread because in the other thread we are starting to blame the Boehm GC when it might be my code that is leaking memory.

I’m starting to think my memory leak is not related to Boehm. 
The total virtual memory 30GB is MUCH larger than the 2GB memory reported by GC_get_heap_size().

Two questions:
1) Does GC_get_heap_size() accurately reflect the total heap used by the Boehm GC?
2) Can I use a debugging malloc library like dmalloc to debug memory leaks while at the same time running the Boehm GC?

I have written a program (a Common Lisp compiler that uses LLVM as the back end) that runs a C++ static analyzer for 5 hours and its memory consumption balloons to 30 GB and then sometimes crashes before it completes.
This morning the static analyzer ran to completion and using the OS X program “Activity Monitor” I can see that the process consumes 30GB of virtual memory.

When the static analyzer completed it queried the Boehm GC to get info on memory usage according to Boehm:
Total memory usage (bytes):          541,259,440   ;; Sum of sizes of reachable objects obtained by walking the Boehm heap using code provided by Peter Wang
Total GC_get_heap_size()           1,943,457,792   ;; about 2GB - that’s reasonable - it’s NOT 30GB
Total GC_get_total_bytes()     1,467,004,314,013   ;; Total bytes allocated about 1,500 GB - clearly garbage collection is working!

In gc.h it says:
/* Return the number of bytes in the heap. Excludes collector private */ /* data structures. Includes empty blocks and fragmentation loss. */ /* Includes some pages that were allocated but never written. */ GC_API size_t GC_get_heap_size GC_PROTO((void));

bdwgc mailing list