Ajit Kumar Agarwal | 4 Jul 15:45 2015

Live on Exit renaming.


Design and Analysis of Profile-Based Optimization in Compaq's
     Compilation Tools for Alpha; Journal of Instruction-Level
     Parallelism 3 (2000) 1-25

The above paper based on this paper the existing tracer pass (This pass performs the tail duplication
needed for superblock formation.) is 
Implemented in the GCC.

There is another optimization  that of interest in the above paper is the following.

Live on Exit Renamer:

This optimizations tries to remove a constraint that force the compiler to create long dependent chains of
operations in unrolled loops.

The following example

While (a[i] != key)
    Return I;


Unrolled Loop:

1.While (a[i] == key)
   2.I = I +1;
  3. If(a[i] == key ) goto E
(Continue reading)

Richard Earnshaw | 3 Jul 16:37 2015

Proposed AAPCS update - parameter passing types with modified alignments

Since it may take some time before an official update to the ARM AAPCS
document can be made, I'm publishing a proposed change here for advanced
notice.  Alan will follow up with some GCC patches shortly to implement
these changes.

The proposed changes should deal with types that have been either
under-aligned (packed) or over-aligned by language extensions or
language defined alignment modifiers.  They work by assuming that the
values passed to a procedure are *copies* of values and that these
copies can safely have alignments that differ from both the source of
the copy and also from the target use inside the called procedure (in
the latter case a second copy to suitably aligned memory might be

Since the ABI has not previously defined rules for parameter passing of
values with alignment modifiers it is possible that existing
implementations will not be 100% compatible with all these rules.
Modifying the compiler to conform may result in a silent code-generation
change.  (There should be no change for types that are naturally
aligned).  We believe this should be very rare and because the ABI has
not previously sanctioned such types they are unlikely to appear at
shared library boundaries.  It may help if compilers could emit a
warning should they detect that a parameter may cause such change in


Definitions used in this description:

(Continue reading)

Vineet Gupta | 3 Jul 15:10 2015

Possible issue with ARC gcc 4.8


I have the following test case (reduced from Linux kernel sources) and it seems
gcc is optimizing away the first loop iteration.

arc-linux-gcc -c -O2 star-9000857057.c -fno-branch-count-reg --save-temps -mA7

static inline int __test_bit(unsigned int nr, const volatile unsigned long *addr)
 unsigned long mask;

 addr += nr >> 5;
#if 0
    nr &= 0x1f;
 mask = 1UL << nr;
 return ((mask & *addr) != 0);

int foo (int a, unsigned long *p)
  int i;
  for (i = 63; i>=0; i--)
      if (!(__test_bit(i, p)))
      a += i;
  return a;
(Continue reading)

Alan Lawrence | 3 Jul 11:37 2015

Re: making the new if-converter not mangle IR that is already vectorizer-friendly

Abe wrote:

> In other words, the problem about which I was concerned is not going to be triggered by e.g. "if (c)  x = ..."
> which lacks an attached "else  x = ..." in a multithreaded program without enough locking just because 'x'
is global/static.
> The only remaining case to consider is if some code being compiler takes the address of something
thread-local and then "gives"
> that pointer to another thread.  Even for _that_ extreme case, Sebastian says that the gimplifier will
detect this
> "address has been taken" situation and do the right thing such that the new if converter also does the right thing.

Great :). I don't understand much/anything about how gcc deals with 
thread-locals, but everything before that, all sounds good...

> [Alan wrote:]
>> Can you give an example?
> The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and "gcc.dg/vect/vect-mask-load-1.c"
> currently test as: the new if-converter is "converting" something that`s already vectorizer-friendly...
 > [snip]
> However, TTBOMK the vectorizer already "understands" that in cases where its input looks like:
>    x = c ? y : z;
> ... and 'y' and 'z' are both pure [side-effect-free] -- including, but not limited to, they must be
non-"volatile" --
> it may vectorize a loop containing code like the preceding, ignoring for this particular instance the C mandate
> that only one of {y, z} be evaluated...
(Continue reading)

Sebastian Huber | 2 Jul 21:57 2015

libgomp: Purpose of gomp_thread_pool::last_team?


does anyone know what the purpose of gomp_thread_pool::last_team is? This field seems to be used to delay
the team destruction in gomp_team_end() in case the team has more than one thread and the previous team
state has no team associated (identifies this a master thread?):

  if (__builtin_expect (thr->ts.team != NULL, 0)
      || __builtin_expect (team->nthreads == 1, 0))
    free_team (team);
      struct gomp_thread_pool *pool = thr->thread_pool;
      if (pool->last_team)
	free_team (pool->last_team);
      pool->last_team = team;

Why can you not immediately free the team?


Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

(Continue reading)

Armin Rigo | 2 Jul 17:57 2015

%fs and %gs segments on x86/x86-64

Hi all,

I implemented support for %fs and %gs segment prefixes on the x86 and
x86-64 platforms, in what turns out to be a small patch.

For those not familiar with it, at least on x86-64, %fs and %gs are
two special registers that a user program can ask be added to any
address machine instruction.  This is done with a one-byte instruction
prefix, "%fs:" or "%gs:".  The actual value stored in these two
registers cannot quickly be modified (at least before the Haswell
CPU), but the general idea is that they are rarely modified.
Speed-wise, though, an instruction like "movq %gs:(%rdx), %rax" runs
at the same speed as a "movq (%rdx), %rax" would.  (I failed to
measure any difference, but I guess that the instruction is one more
byte in length, which means that a large quantity of them would tax
the instruction caches a bit more.)

For reference, the pthread library on x86-64 uses %fs to point to
thread-local variables.  There are a number of special modes in gcc to
already produce instructions like "movq %fs:(16), %rax" to load
thread-local variables (declared with __thread).  However, this
support is special-case only.  The %gs register is free to use.  (On
x86, %gs is used by pthread and %fs is free to use.)

So what I did is to add the __seg_fs and __seg_gs address spaces.  It
is used like this, for example:

    typedef __seg_gs struct myobject_s {
        int a, b, c;
    } myobject_t;
(Continue reading)

Ajit Kumar Agarwal | 2 Jul 12:02 2015

Consideration of Cost associated with SEME regions.


The Cost Calculation for a candidate to Spill in the Integrated Register Allocator(IRA) considers only
the SESE regions.
The Cost Calculation in the IRA should consider the SEME regions into consider for spilling decisions. 

The Cost associated with the path that has un-matured exists should be less, thus making the more chances of
spilling decision
In the path of  un-matured exits. The path that has un-matured (normal )exists should be having a higher cost
than the cost of un-matured exists and
Spilling decisions has to made accordingly in order to spill inside the less frequency path with the
un-matured exists than the high frequency
Path with the normal exits.

I would like to propose the above for consideration of cost associated with SEME regions in IRA.


Thanks & Regards

Ajit Kumar Agarwal | 2 Jul 11:18 2015

Transformation from SEME(Single Entry Multiple Exit) to SESE(Single Entry Single Exit)


Single Entry and Multiple Exits disables traditional Loop optimization. The presence of short circuit
also makes the CFG as
Single Entry and Multiple Exits. The transformation from SEME(Single Entry and Multiple Exits) to SESE(
Single Entry and 
Single Exits enables many Loop Optimizations. 

The approach like Node Splitting to make SEME regions to SESE regions is an important optimization on the
CFG that 
Enable the transformation with respect to Loops and Conditionals.

The Loops transformation in LLVM does the node splitting to convert from SEME regions to SESE regions. The presence
of break and GOTO statements inside the loops makes the CFG unstructured transforming  it SEME.  To convert
such control
Flow from unstructured to Structured control flow enables many Loop transformation.

I would like to implement a  transformation phase on the loops before any Loop optimizations pass is enabled
to transform 
Unstructured CFG to structured CFG like LLVM.

Does the GCC already has such transformation passes on Loops? Please share your thoughts.

Thanks & Regards

DJ Delorie | 2 Jul 06:14 2015

rl78 vs cse vs memory_address_addr_space

In this bit of code in explow.c:

  /* By passing constant addresses through registers
     we get a chance to cse them.  */
  if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x))
    x = force_reg (address_mode, x);

On the rl78 it results in code that's a bit too complex for later
passes to be optimized fully.  Is there any way to indicate that the
above force_reg() is bad for a particular target?

gccadmin | 2 Jul 00:35 2015

gcc-4.9-20150701 is now available

Snapshot gcc-4.9-20150701 is now available on
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 225283

You'll find:

 gcc-4.9-20150701.tar.bz2             Complete GCC


Diffs from 4.9-20150624 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Abe | 2 Jul 00:15 2015

making the new if-converter not mangle IR that is already vectorizer-friendly

Dear all,

[Please feel free to skip to the second instance of "end of introductions"
  and read the introduction sections later or never.]

<personal introduction>

Hi!  My name is Abe.  Although I`m from New York City, I`ve been living in Texas for about 5 years now,
due to having been "sucked in" to Texas by Texas A&M University and staying in Texas for an excellent job
at the Samsung Austin R&D Center ["SARC"], where the compiler team of which I am a part is working on GCC.

</personal introduction>

<topic introduction>

As some of you already know, at SARC we are working on a new "if converter" to help convert
simple "if"-based blocks of code that appear inside loops into an autovectorizer-friendly form
that closely resembles the C ternary operator ["c ? x : y"].  GCC already has such a converter,
but it is off by default, in part because it is unsafe: if enabled, it can cause certain code
to be transformed in such a way that it malfunctions even though the non-converted code worked
just fine with the same inputs.  The new converter, originally by my teammate Sebastian Pop,
is safer [almost-always safe *]; we are working on getting it into good-enough shape that the
always-safe transformations can be turned on by default whenever the autovectorizer is on.

* Always safe for stores, sometimes a little risky for loads:
   speculative loads might cause multithreaded programs with
   insufficient locking to fail due to writes by another thread
   being "lost"/"missed", even though the same program works OK
   "by luck" when compiled without if-conversion of loads.
   This risk comes mainly/only from what the relevant literature
(Continue reading)