Dave Pitsbawn | 18 Apr 06:00 2015
Picon

Does LLVM optimize rudimentary i16 -> i32 conversions

In my language there are a lot of i16 definitions, but almost all of the time they are upgraded to i32 because my add operations only happen on i32.

So to be representative to my language definition, I have a lots of Sext/Zext and Truncs pretty much every time I add or subtract.

As soon as I pass through InstCombine things look much nicer, all the upcasts and downcasts go away, but my test cases are simple.

Is InstCombine pretty good about finding most/all such cases?
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Dave Pitsbawn | 18 Apr 02:37 2015
Picon

Why would one use SExt vs CreateIntCast

I'm seeing many APIs which seemingly do the same thing, but they seem to go through slightly different code paths.

When I think of integer casts I think sign extension, zero extension or truncation. But there seems to be a IntCast ... which does the same thing?

Why does CreateIntCast API exist (same for FP methods as well)?
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
zhi chen | 18 Apr 02:21 2015
Picon

how can I create an SSE instrinsics sqrt?

I want to create a vector version sqrt as the following. 

Value *Approx::CreateFSqrt(IRBuilder<> &builder, Value *v, const char* Name) {
  Type *tys[] = {v->getType()};
  Module* M = currF->getParent();
  Value* sqrtv = Intrinsic::getDeclaration(M, Intrinsic::x86_sse2_sqrt_pd);
  CallInst *CI = builder.CreateCall(sqrtv, v, Name);

  return CI;
}

Here is Value *v is <2 x double>
However, it outputs Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type! any idea?
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
zhi chen | 18 Apr 02:19 2015
Picon

how can I create an SSE instrinsics sqrt?

I want to create a vector version sqrt 

_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Ivan Baev | 17 Apr 23:13 2015

RFC: Indirect Call Promotion LLVM Pass

Hi, we've implemented an indirect call promotion llvm pass. The design
notes including examples are shown below. This pass complements the
indirect call profile infrastructure
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html

Your feedback and comments will be highly appreciated.

Thanks,
Ivan

============================================================================RFC:
Indirect Call Promotion LLVM Pass
Betul Buyukkurt and Ivan Baev

1. Introduction
Indirect call promotion (ICP) replaces an indirect call instruction to a
set of target addresses with a sequence of tests guarding direct calls to
selected targets, plus a fall through branch containing the original
indirect call. The ICP optimization is found to be the second most
profitable (after inlining) profile-based optimization in a recent study
[2].

We've implemented an ICP LLVM pass that iterates over all indirect call
sites in the module and selectively (under heuristics) performs the
promotion. Here is one example of the transformation.

--------------------ico.ll--------------------------------------------------
define void  <at> foo(i32 %a) {
entry:
  %a1 = add i32 %a, 1
  ret void
}

define void  <at> bar(i32 %a) {
entry:
  %a2 = add i32 %a, 2
  ret void
}

define void  <at> main(void (i32)* %fun) {
entry:
  call void %fun(i32 10), !prof !1
  ret void
}

!1 = !{!"indirect_call_targets", i64 6000, !"foo", i64 5000, !"bar", i64 100}
----------------------------------------------------------------------------

> opt -ic-opt ico.ll -o ico-post.bc

--------------------ico-post.ll---------------------------------------------define
void  <at> foo(i32 %a) {
entry:
  %a1 = add i32 %a, 1
  ret void
}

define void  <at> bar(i32 %a) {
entry:
  %a2 = add i32 %a, 2
  ret void
}

define void  <at> main(void (i32)* %fun) {
entry:
  %0 = bitcast void (i32)* %fun to i8*
  %1 = bitcast void (i32)*  <at> foo to i8*
  %2 = icmp eq i8* %0, %1
  br i1 %2, label %if.true, label %if.false, !prof !0

if.merge:                                         ; preds = %if.false,
%if.true
  ret void

if.true:                                          ; preds = %entry
  call void  <at> foo(i32 10) #0
  br label %if.merge

if.false:                                         ; preds = %entry
  call void %fun(i32 10), !prof !1
  br label %if.merge
}

attributes #0 = { inlinehint }
!0 = !{!"branch_weights", i32 5000, i32 1000}
!1 = !{!"indirect_call_targets", i64 1000, !"bar", i64 100}
----------------------------------------------------------------------------

The ICP pass handles indirect call and indirect invoke LLVM IR
instructions. It depends on the availability of indirect call metadata
provided by the indirect call profile infrastructure briefly described at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
Here is the new indirect call (IC) metadata type used in the example above:

 !1 = !{!"indirect_call_targets", i64 6000, !"foo", i64 5000, !"bar", i64
100}

The 6000 represents the number of times the indirect call was executed
during the profiling runs, of which function foo was the receiver 5000
times, and function bar was the receiver 100 times.

The input for ICP pass:
LLVM IR including IC metadata, (future) function entry count metadata

The output:
Modified LLVM IR with modified indirect call sites. Selected indirect call
targets are promoted and inline hint attributes are added subject to
heuristics.

2. Aigner & Hölzle-based heuristics
------------------------------------
We've implemented two heuristics from this paper [1].

a. Hot call site heuristic: only consider for promotion a call site which
contribution (profile count) is more than 0.1% of the total count of all
indirect calls executed during the profile runs.

b. Hot target heuristic: promote the most frequent target at a call site
if it gets at least 40% of all receivers at the call site.

Only the hottest target from a call site is possibly promoted, similarly
to the approach taken in the paper.

In addition to Aigner & Hölzle-based heuristics, we add an inline hint to
the promoted direct call/invoke instruction if it is the single receiver
at the call site according to the profile data or the number of times the
target is executed at the call site is at least 4% of the total count of
all indirect calls.  Once the function entry profile counts become
available we will use them to tune the above inline-related heuristic.

3. Handling virtual function calls
-----------------------------------

Consider the following C++ example involving virtual functions and calls.
----------------------------------------------------------------------------class
Operation {
  public:
    virtual int test_add(int a, int b)=0;
    virtual int test_sub(int a, int b)=0;
};

class A : public Operation {
  public:
  int test_add(int a, int b) {
    return a + b;
  }
  int test_sub(int a, int b) {
    return a - b;
  }
};

A myA;

int __attribute__ ((noinline)) testmain(int (A::*fptr)(int, int)) {
  return (myA.*fptr)(1, 2);
}
----------------------------------------------------------------------------

The debugging output of the ICP pass for this example is as follows:

----------------------------------------------------------------------------****
INDIRECT CALL OPTIMIZATION ****
IC target hotness threshold = 40
Total IC execution count = 1.000000e+00

Attempting IC_opt on:   %call = call i32 %5(%class.A* %this.adjusted, i32
1, i32 2), !prof !8
with: !{!"indirect_call_targets", i64 1, !"_ZN1A8test_subEii", i64 1} in
function: _Z8testmainM1AFiiiE
CS hotness% = 1.000000e+02
Target hotness% = 100

== Basic Block Before ==
memptr.end:                                       ; preds =
%memptr.nonvirtual, %memptr.virtual
  %5 = phi i32 (%class.A*, i32, i32)* [ %memptr.virtualfn, %memptr.virtual
], [ %memptr.nonvirtualfn, %memptr.nonvirtual ]
  %call = call i32 %5(%class.A* %this.adjusted, i32 1, i32 2), !prof !8
ret i32 %call

...
!8 = !{!"indirect_call_targets", i64 1, !"_ZN1A8test_subEii", i64 1}

== Basic Blocks After == // code after the ICP pass
memptr.end:                                       ; preds =
%memptr.nonvirtual, %memptr.virtual
  %5 = phi i32 (%class.A*, i32, i32)* [ %memptr.virtualfn, %memptr.virtual
], [ %memptr.nonvirtualfn, %memptr.nonvirtual ]
  %6 = bitcast i32 (%class.A*, i32, i32)* %5 to i8*
  %7 = bitcast i32 (%class.A*, i32, i32)*  <at> _ZN1A8test_subEii to i8* %8 =
icmp eq i8* %6, %7
  br i1 %8, label %if.true, label %if.false, !prof !8

if.true:                                          ; preds = %memptr.end
  %10 = call i32  <at> _ZN1A8test_subEii(%class.A* %this.adjusted, i32 1, i32
2) #3
  br label %if.merge

if.false:                                         ; preds = %memptr.end
  %call = call i32 %5(%class.A* %this.adjusted, i32 1, i32 2), !prof !9 br
label %if.merge

if.merge:                                         ; preds = %if.false,
%if.true
  %9 = phi i32 [ %10, %if.true ], [ %call, %if.false ]
  ret i32 %9

...
attributes #3 = { inlinehint }
!8 = !{!"branch_weights", i32 1, i32 0}
!9 = !{!"indirect_call_targets", i64 0}
----------------------------------------------------------------------------

At LLVM IR level the ICP pass sees virtual function calls as normal
indirect calls, and proceeds as in the first example. Currently ICP is
oblivious to vtables and virtial function support. On a more complicated
example found in eon benchmark, one future enhancement is to identify that
an indirect call is virtual and change the comparison (shown at a higher
IR level) from

  if (ptr->foo == A::foo)
to
  if (ptr->_vptr == A::_vtable)

This will sink one load from the original block into the less frequently
executed if.false block. This opportunity was found by Balaram Makam.

4. New enhancement patch
-------------------------
Currently our implementation has the following shortcomings:
a. Our heuristics do not depend on the global information on function
counts. It could be that none of the indirect call sites are contributing
highly to the overall calls. Because our current implementation is
deciding what to inline based on the indirect call site sum only, it could
be inlining functions that are in essence cold when all functions in the
source base are considered. This situation will be improved when the
function entry profile counts become available in llvm IR.

b. Current implementation only transforms the first hot target, the rest
of the targets are never considered even if they are relatively hot.

We are evaluating a new solution which depends on the
presence/availability of functions counts in clang. We form a sorted
multiset of all functions counts. A given indirect target is considered
for inlining if the target’s count at the call site falls within one of
the ranges that form the top 0-10%, 10-20% or 20-30% of the sorted
multiset.  We’ve added checks which become stricter as the target count
falls farther away from the top most called 10%, 20% or 30% of all
functions respectively.

Targets that are classified as making calls to one of the top most called
30% of the functions receive inline hints.  Inline hints are communicated
from clang down to LLVM in metadata. Then, on the LLVM side the
transformation pass uses the metadata field for the hint to add an inline
hint at the transformed call site.

-------------------------
[1] G. Aigner and U. Hölzle. Eliminating virtual function calls in C++
programs. ECOOP, 1996.
[2] X. Li, R. Ashok, R. Hundt. Lightweight Feedback-Directed Cross-Module
Optimization. CGO, 2010.
Rafael Espíndola | 17 Apr 20:42 2015
Picon

Re: [cfe-dev] A problem with names that can not be demangled.

So looks like just adding a "." wold work. My preference would be to
do it for all Values and update the tests.

Also, the code currently looks like

-------------------------
 unsigned BaseSize = UniqueName.size();
  while (1) {
    // Trim any suffix off and append the next number.
    UniqueName.resize(BaseSize);
    raw_svector_ostream(UniqueName) << ++LastUnique;
-------------------------

Which means that currently if a name passes here multiple times we get

foo -> foo1 -> foo12

With the '.' we could change BaseSize to get

foo -> foo.1 -> foo.2

Cheers,
Rafael

On 16 April 2015 at 16:46, Cary Coutant <ccoutant <at> gmail.com> wrote:
> GCC has a generalized mangling syntax for cloned functions. See GCC PR 40831:
>
>    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40831
>
> and the discussion thread here:
>
>    https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01375.html
>
> -cary
>
>
>
> On Tue, Apr 14, 2015 at 12:44 PM, David Blaikie <dblaikie <at> gmail.com> wrote:
>> Adding llvm-dev as that might be a more suitable audience for this
>> discussion.
>>
>> (& I know Lang's been playing around with the same problem in the Orc JIT,
>> so adding him too)
>>
>> Is there any basis/reason to believe that the .X suffix is a better, more
>> principled one than straight X? Is that documented somewhere as a thing the
>> demangling tools will ignore?
>>
>> On Tue, Apr 14, 2015 at 12:06 PM, Srivastava, Sunil
>> <sunil_srivastava <at> playstation.sony.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> We are running into a problem created by renaming of static symbols by
>>> llvm-link.  It first
>>>
>>> showed up using LTO, but we can illustrate this by using llvm-link as
>>> well.
>>>
>>>
>>>
>>> Say we have two files with the same named static symbol Bye
>>>
>>>
>>>
>>> --------------- t1.cpp ---------
>>>
>>> static void Bye(int* ba1) { ba1[0] /= ba1[2] - 2; }
>>>
>>> void main_a( int* inB) { void (*func)(int*) = Bye; func(inB); }
>>>
>>> --------------- t2.cpp ---------
>>>
>>> static void Bye(int* ba1) { ba1[0] *= ba1[2] + 2; }
>>>
>>> void main_b( int* inB) { void (*func)(int*) = Bye; func(inB+1); }
>>>
>>>
>>>
>>> --------- cmd sequence -------
>>>
>>> $ clang++ -c -emit-llvm t1.cpp -o t1.bc
>>>
>>> $ clang++ -c -emit-llvm t1.cpp -o t2.bc
>>>
>>> $ llvm-link t1.bc t2.bc -o t23.bc
>>>
>>> $ clang -c t23.bc
>>>
>>> $ nm t23.o
>>>
>>>
>>>
>>> t1.o and t2.o have the same named function “_ZL3ByePi”. In order to
>>> distinguish them,
>>>
>>> one gets a ‘1’ appended to it, making it  “_ZL3ByePi1”.
>>>
>>>
>>>
>>> While the code is all correct, the problem is that this modified name
>>> cannot be demangled.
>>>
>>>
>>>
>>> That is what I am trying to fix.
>>>
>>>
>>>
>>> In similar situations gcc appends a ‘.’ before appending the
>>> discriminating number, making “_ZL3ByePi.1”
>>>
>>>
>>>
>>> The following change in lib/IR/ValueSymbolTable.cpp seems to fix this
>>> problem.
>>>
>>>
>>>
>>> ------------ start diff -------------------
>>>
>>>  <at>  <at>  -54,5 +54,5  <at>  <at>  void ValueSymbolTable::reinsertValue(Value* V) {
>>>
>>>      // Trim any suffix off and append the next number.
>>>
>>>      UniqueName.resize(BaseSize);
>>>
>>> -    raw_svector_ostream(UniqueName) << ++LastUnique;
>>>
>>> +    raw_svector_ostream(UniqueName) <<  "."  << ++LastUnique;
>>>
>>>
>>>
>>>      // Try insert the vmap entry with this suffix.
>>>
>>> -------------- end diff ---------------------
>>>
>>>
>>>
>>> However it causes 60 test failures. These are tests where some names that
>>> are expecting
>>>
>>> to get a plain numeric suffix now have a ‘.’ before it. These are all
>>> local symbols, so I think
>>>
>>> the generated code will always be correct, but the tests as written do not
>>> pass. For
>>>
>>> example, take test/CodeGen/ARM/global-merge-addrspace.ll
>>>
>>>
>>>
>>> ; RUN: llc < %s -mtriple=thumb-apple-darwin -O3 | FileCheck %s
>>>
>>> ; Test the GlobalMerge pass. Check that the pass does not crash when using
>>>
>>> ; multiple address spaces.
>>>
>>> ; CHECK: _MergedGlobals:
>>>
>>>  <at> g1 = internal addrspace(1) global i32 1
>>>
>>>  <at> g2 = internal addrspace(1) global i32 2
>>>
>>> ; CHECK: _MergedGlobals1:
>>>
>>>  <at> g3 = internal addrspace(2) global i32 3
>>>
>>>  <at> g4 = internal addrspace(2) global i32 4
>>>
>>>
>>>
>>> With my change, the symbol is named MergedGlobals.1, hence it fails this
>>> test.
>>>
>>>
>>>
>>> I could change these 60 tests to match the new behavior. That will fix
>>> these 60 failures.
>>>
>>> However, I do have a concern that there may be other places in llvm that
>>> expect the
>>>
>>> names to be pure identifiers. Adding a ‘.’ may cause them to fail. No such
>>> failure has been
>>>
>>> seen in running the whole clang test, but the concern is still there.
>>>
>>>
>>>
>>> I should note that even local symbols are treated similarly, so for
>>> example, a parameter
>>>
>>> named ‘str’ becomes ‘str.1’ with my change, instead of ‘str1’ currently
>>> (an actual
>>>
>>> example from a test).
>>>
>>>
>>>
>>> Alternatively, I could try to limit my change to just mangled names.
>>>
>>>
>>>
>>> Any suggestion about how this should be fixed ?
>>>
>>>
>>>
>>> There is another similar change about 40 lines below in
>>> ValueSymbolTable::createValueName().
>>>
>>> That is not needed to fix this particular problem, but looks similar, so
>>> perhaps should be treated
>>>
>>> similarly for consistency. It causes 66 more failures of the same nature
>>> though.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Sunil Srivastava
>>>
>>> Sony Computer Entertainment
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev <at> cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Dave Pitsbawn | 17 Apr 09:43 2015
Picon

Is bitcast now needed in LLVM?

Seems like a new change in LLVM has made it so that bitcast of bitcast i8* %1 to %Foo* meaningless?

If I'm correct is there any need for the bitcast anymore? 
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Xin Tong | 17 Apr 06:57 2015
Picon

Padding in Aggregates Useful ?

Hi

I see that LLVM does not scalar replace aggregates used as the src or
dst of memcpy. and the reason is  memcpy could be moving around
elements that live in structure padding of the LLVM types. I wonder
what could live in the padding, but may actually used ?

02605   // Okay, we know all the users are promotable.  If the
aggregate is a memcpy
02606   // source and destination, we have to be careful.  In
particular, the memcpy
02607   // could be moving around elements that live in structure
padding of the LLVM
02608   // types, but may actually be used.  In these cases, we refuse
to promote the
02609   // struct.
02610   if (Info.isMemCpySrc && Info.isMemCpyDst &&
02611       HasPadding(AI->getAllocatedType(), DL))
02612     return false;

Thanks
Trent
zhi chen | 17 Apr 02:38 2015
Picon

how to use "new instruction()"

I read the tutorial document, but I didn't understand the it and Ops fields of instruction class well. Can any one give me an example?

Instruction *newInstr = new Instruction(Type *ty, unsigned it, Use *Ops, unsigned NumOps, Instruction *InsertBefore);

For example, I have an instruction *pInst without no the type of operation in advance. If I want to create a new instruction which is similar to the pInst. How can I do it? 

Any help will be appreciated. Thanks.

Best,
Zhi
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Geof Sawaya | 16 Apr 22:43 2015
Picon

bytecode stripping from clang -emit-llvm

Hi Devs,

I'm developing a tool that relies on semantic information in bytecode labels (i.e. block names).

I've discovered that clang is stripping these named labels (along with some virtual register names) when I run on a virtual machine.  Well, I'm using VirtualBox, and have tried two different versions of Ubuntu and some different clang builds.

Can someone point me in the right direction to understand why the IR would be emitted differently because clang is running on a VM?

Many thanks -- Geof
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Xinyu Wang | 16 Apr 04:06 2015
Picon

error building LLVM

Hello, 

I have an error in building LLVM:

In file included from /.../llvm/projects/compiler rt/lib/profile/GCDAProfiling.c:23:
In file included from /usr/include/errno.h:36:
In file included from /usr/include/bits/errno.h:25:
/usr/include/linux/errno.h:4:10: fatal error: 'asm/errno.h' file not found
#include <asm/errno.h>

Can anyone kindly help me with this?

Best,
Xinyu

_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Gmane