Evan Cheng | 1 Apr 2008 01:32
Picon
Favicon

Re: reg_iterator Caveats


On Mar 31, 2008, at 2:53 PM, David Greene wrote:

> On Monday 31 March 2008 00:57, Chris Lattner wrote:
>> On Mar 30, 2008, at 10:42 PM, David A. Greene wrote:
>>>> SSA form, it is reasonable to say "give me the first def" and  
>>>> expect
>>>> it to be the only def.  For multiply defined values like physregs,
>>>> this is not true, because the reg can have multiple defs.
>>>
>>> Gotcha.  This is exactly what I want.  Thanks for the explanation.
>>>
>>> For non-SSA values, is there some indication of which defs reach  
>>> which
>>> uses?  I don't need this right now but I can imagine using it in the
>>> future.
>>
>> The reg def/kill/dead flags are all that there is.
>
> I just discovered that def_itterator (and presumably, reg_iterator)  
> doesn't
> include implicit defs, for example at function calls for caller-save  
> physical
> registers.  Guh.  I'm not sure if it should or not, but it's certainly
> necessary information in some cases.  Is this expected behavior, or an
> oversight?

MachineRegisterInfo tracks virtual register only.

I also wish it would track physical register defs and uses as well. It  
(Continue reading)

Chris Lattner | 1 Apr 2008 01:55

Re: reg_iterator Caveats

On Mon, 31 Mar 2008, Evan Cheng wrote:
>> I just discovered that def_itterator (and presumably, reg_iterator)
>> doesn't
>> include implicit defs, for example at function calls for caller-save
>> physical
>> registers.  Guh.  I'm not sure if it should or not, but it's certainly
>> necessary information in some cases.  Is this expected behavior, or an
>> oversight?

reg iterators will return everything that is in the function.  If the 
implicit operands haven't been added to the machieninstrs yet, then they 
won't be returned.

> MachineRegisterInfo tracks virtual register only.

It works for vregs and pregs today.

> I also wish it would track physical register defs and uses as well. It
> can be used to simplify a lot of code (in livevariable, etc.). Chris,
> do you think that's feasible?

Really really feasible :)

-Chris

--

-- 
http://nondot.org/sabre/
http://llvm.org/
Bobby Powers | 1 Apr 2008 01:46
Picon
Gravatar

Additional Optimization I'm Missing?

Hello, I'm working on using the LLVM JIT for a little project of mine, amazing work first off!  I have a question about optimization passes.  I initially have this function I've created, in python it looks like this:


OS_end = 50 OS_start = 0 OS_timestep = 1 birth_rate = .3 population = 30.0 for time in range(OS_start, OS_end, OS_timestep): births = birth_rate * population deaths = 0.1 * population population_NEXT = population + births - deaths #generally put print statements here #updating stocks population = population_NEXT
Then I can turn it into LLVM IR:
define void <at> simulate() { entry: %time = alloca double ; <double*> [#uses=4] %population_next = alloca double ; <double*> [#uses=2] %population = alloca double ; <double*> [#uses=5] %net_change = alloca double ; <double*> [#uses=2] %deaths = alloca double ; <double*> [#uses=2] %births = alloca double ; <double*> [#uses=2] %birth_rate = alloca double ; <double*> [#uses=2] %OS_timestep = alloca double ; <double*> [#uses=2] %OS_start = alloca double ; <double*> [#uses=2] %OS_end = alloca double ; <double*> [#uses=2] store double 5.000000e+01, double* %OS_end store double 0.000000e+00, double* %OS_start store double 1.000000e+00, double* %OS_timestep store double 3.000000e-01, double* %birth_rate store double 3.000000e+01, double* %population %OS_start1 = load double* %OS_start ; <double> [#uses=1] store double %OS_start1, double* %time br label %forcond forcond: ; preds = %forinc, %entry %time2 = load double* %time ; <double> [#uses=1] %OS_end3 = load double* %OS_end ; <double> [#uses=1] %forcond4 = fcmp olt double %time2, %OS_end3 ; <i1> [#uses=1] br i1 %forcond4, label %forbody, label %forafter forbody: ; preds = %forcond %birth_rate5 = load double* %birth_rate ; <double> [#uses=1] %population6 = load double* %population ; <double> [#uses=1] %multmp = mul double %birth_rate5, %population6 ; <double> [#uses=1] store double %multmp, double* %births %population7 = load double* %population ; <double> [#uses=1] %multmp8 = mul double 1.000000e-01, %population7 ; <double> [#uses=1] store double %multmp8, double* %deaths %births9 = load double* %births ; <double> [#uses=1] %deaths10 = load double* %deaths ; <double> [#uses=1] %subtmp = sub double %births9, %deaths10 ; <double> [#uses=1] store double %subtmp, double* %net_change %population11 = load double* %population ; <double> [#uses=1] %net_change12 = load double* %net_change ; <double> [#uses=1] %addtmp = add double %population11, %net_change12 ; <double> [#uses=1] store double %addtmp, double* %population_next br label %updatestocks updatestocks: ; preds = %forbody %population_next13 = load double* %population_next ; <double> [#uses=1] store double %population_next13, double* %population br label %forinc forinc: ; preds = %updatestocks %time14 = load double* %time ; <double> [#uses=1] %OS_timestep15 = load double* %OS_timestep ; <double> [#uses=1] %new_time = add double %time14, %OS_timestep15 ; <double> [#uses=1] store double %new_time, double* %time br label %forcond forafter: ; preds = %forcond ret void }


And then I run the optimizations from the Kaleidoscope tutorial (mem2reg, predsimplify, instcombine, reassociate, gvn, simplifycfg) and get the following:
define void <at> simulate() { entry: br label %forcond forcond: ; preds = %forbody, %entry %population.0 = phi double [ 3.000000e+01, %entry ], [ %addtmp, %forbody ] ; <double> [#uses=3] %time.0 = phi double [ 0.000000e+00, %entry ], [ %new_time, %forbody ] ; <double> [#uses=2] %forcond4 = fcmp olt double %time.0, 5.000000e+01 ; <i1> [#uses=1] br i1 %forcond4, label %forbody, label %forafter forbody: ; preds = %forcond %multmp = mul double %population.0, 3.000000e-01 ; <double> [#uses=1] %multmp8 = mul double %population.0, 1.000000e-01 ; <double> [#uses=1] %subtmp = sub double %multmp, %multmp8 ; <double> [#uses=1] %addtmp = add double %population.0, %subtmp ; <double> [#uses=1] %new_time = add double %time.0, 1.000000e+00 ; <double> [#uses=1] br label %forcond forafter: ; preds = %forcond ret void }

The thing is, in forbody it is still doing:
subtmp = population + (population * .3) - (population * .1)

ideally I would love to see it reduced to:
subtmp = 1.2 * population


I've played around with adding a bunch of optimizations that sound good from [http://www.llvm.org/docs/Passes.html], but I must admit I'm a bit over my head here. Does anyone have any suggestions? Am I missing passes, do I need to restructure the IR, or am I missing some concept?

thanks! (keep up the amazing work)
yours,
Bobby Powers
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Jon Sargeant | 1 Apr 2008 02:07
Picon

Re: Reference Manual Clarifications

Gordon Henriksen wrote:
> Hi Jon,
> 
> Please you'll want to submit patches as unified diffs and as  
> attachments.
> 
> I notice you're using Thunderbird, so I refer you to this tip:
> 
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2008-January/011992.html
> 
> Although this note doesn't apply to how you included your original  
> patch (looks like you pasted it in), Thunderbird has default  
> attachment handling settings which cause problems for many of our  
> reviewers.
> 
> Thanks,
> Gordon

Ok, did I get it right this time?
--- langref1.html	2008-03-31 11:44:59.171875000 -0700
+++ langref2.html	2008-03-31 12:15:42.562500000 -0700
 <at>  <at>  -1569,8 +1569,10  <at>  <at> 

   <dd>Floating point constants use standard decimal notation (e.g. 123.421),
   exponential notation (e.g. 1.23421e+2), or a more precise hexadecimal
-  notation (see below).  Floating point constants must have a <a
-  href="#t_floating">floating point</a> type. </dd>
+  notation (see below).  The assembler requires the exact decimal value of
+  a floating-point constant.  For example, the assembler accepts 1.25 but
+  rejects 1.3 because 1.3 is a repeating decimal in binary.  Floating point
+  constants must have a <a href="#t_floating">floating point</a> type. </dd>

   <dt><b>Null pointer constants</b></dt>

 <at>  <at>  -2175,11 +2177,10  <at>  <at> 
 <div class="doc_subsection"> <a name="binaryops">Binary Operations</a> </div>
 <div class="doc_text">
 <p>Binary operators are used to do most of the computation in a
-program.  They require two operands, execute an operation on them, and
+program.  They require two operands of the same type, execute an operation on them, and
 produce a single value.  The operands might represent
 multiple data, as is the case with the <a href="#t_vector">vector</a> data type.
-The result value of a binary operator is not
-necessarily the same type as its operands.</p>
+The result value has the same type as its operands.</p>
 <p>There are several different binary operators:</p>
 </div>
 <!-- _______________________________________________________________________ -->
 <at>  <at>  -2329,7 +2330,7  <at>  <at> 
 of the values in which case the elements must be integers.</p>

 <h5>Semantics:</h5>
-<p>The value produced is the signed integer quotient of the two operands.</p>
+<p>The value produced is the signed integer quotient of the two operands rounded towards zero.</p>
 <p>Note that signed integer division and unsigned integer division are distinct
 operations; for unsigned integer division, use '<tt>udiv</tt>'.</p>
 <p>Division by zero leads to undefined behavior. Overflow also leads to
 <at>  <at>  -2383,8 +2384,7  <at>  <at> 

 <h5>Semantics:</h5>
 <p>This instruction returns the unsigned integer <i>remainder</i> of a division.
-This instruction always performs an unsigned division to get the remainder,
-regardless of whether the arguments are unsigned or not.</p>
+This instruction always performs an unsigned division to get the remainder.</p>
 <p>Note that unsigned integer remainder and signed integer remainder are
 distinct operations; for signed integer remainder, use '<tt>srem</tt>'.</p>
 <p>Taking the remainder of a division by zero leads to undefined behavior.</p>
 <at>  <at>  -2455,7 +2455,8  <at>  <at> 

 versions of floating point values.</p>
 <h5>Semantics:</h5>
-<p>This instruction returns the <i>remainder</i> of a division.</p>
+<p>This instruction returns the <i>remainder</i> of a division.
+The remainder has the same sign as the dividend.</p>
 <h5>Example:</h5>
 <pre>  &lt;result&gt; = frem float 4.0, %var          <i>; yields {float}:result = 4.0 % %var</i>

 <at>  <at>  -2469,9 +2470,8  <at>  <at> 
 <p>Bitwise binary operators are used to do various forms of
 bit-twiddling in a program.  They are generally very efficient
 instructions and can commonly be strength reduced from other
-instructions.  They require two operands, execute an operation on them,
-and produce a single value.  The resulting value of the bitwise binary
-operators is always the same type as its first operand.</p>
+instructions.  They require two operands of the same type, execute an operation on them,
+and produce a single value.  The resulting value is the same type as its operands.</p>
 </div>

 <!-- _______________________________________________________________________ -->
 <at>  <at>  -2496,9 +2496,9  <at>  <at> 

 <h5>Semantics:</h5>

-<p>The value produced is <tt>var1</tt> * 2<sup><tt>var2</tt></sup>.  If
-<tt>var2</tt> is (statically or dynamically) equal to or larger than the number
-of bits in <tt>var1</tt>, the result is undefined.</p>
+<p>The value produced is <tt>var1</tt> * 2<sup><tt>var2</tt></sup> mod 2<sup>n</sup>,
+where n is the width of the result.  If <tt>var2</tt> is (statically or dynamically) negative or
+equal to or larger than the number of bits in <tt>var1</tt>, the result is undefined.</p>

 <h5>Example:</h5><pre>
   &lt;result&gt; = shl i32 4, %var   <i>; yields {i32}: 4 &lt;&lt; %var</i>
 <at>  <at>  -2529,7 +2529,7  <at>  <at> 

 <p>This instruction always performs a logical shift right operation. The most
 significant bits of the result will be filled with zero bits after the
-shift.  If <tt>var2</tt> is (statically or dynamically) equal to or larger than
+shift.  If <tt>var2</tt> is (statically or dynamically) negative or equal to or larger than
 the number of bits in <tt>var1</tt>, the result is undefined.</p>

 <h5>Example:</h5>
 <at>  <at>  -2564,7 +2564,7  <at>  <at> 
 <h5>Semantics:</h5>
 <p>This instruction always performs an arithmetic shift right operation,
 The most significant bits of the result will be filled with the sign bit
-of <tt>var1</tt>.  If <tt>var2</tt> is (statically or dynamically) equal to or
+of <tt>var1</tt>.  If <tt>var2</tt> is (statically or dynamically) negative or equal to or
 larger than the number of bits in <tt>var1</tt>, the result is undefined.
 </p>

_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
me22 | 1 Apr 2008 02:08
Picon

Re: Additional Optimization I'm Missing?

On Mon, Mar 31, 2008 at 7:46 PM, Bobby Powers <bobbypowers <at> gmail.com> wrote:
>
> The thing is, in forbody it is still doing:
> subtmp = population + (population * .3) - (population * .1)
>
>
> ideally I would love to see it reduced to:
> subtmp = 1.2 * population
>

I expect that such a transformation is not safe in general thanks to
rounding errors, and as such isn't applied.  AFAIK, compilers usually
don't reorganize floating point operations.

Have you tried it on other compilers?  I expect it'll never happen
without some imprecise math option (like gcc's -ffast-math).
Dale Johannesen | 1 Apr 2008 02:10
Picon
Favicon

Re: Additional Optimization I'm Missing?


On Mar 31, 2008, at 4:46 PM, Bobby Powers wrote:

Hello, I'm working on using the LLVM JIT for a little project of mine, amazing work first off!  I have a question about optimization passes.  I initially have this function I've created, in python it looks like this:

OS_end = 50 OS_start = 0 OS_timestep = 1 birth_rate = .3 population = 30.0 for time in range(OS_start, OS_end, OS_timestep): births = birth_rate * population deaths = 0.1 * population population_NEXT = population + births - deaths

The thing is, in forbody it is still doing:
subtmp = population + (population * .3) - (population * .1)

ideally I would love to see it reduced to:
subtmp = 1.2 * population

This transformation changes the result due to roundoff errors, and would be incorrect.  
In C using %lf the printed values of the 2 expressions diverge on the 84th iteration.

I've played around with adding a bunch of optimizations that sound good from [http://www.llvm.org/docs/Passes.html], but I must admit I'm a bit over my head here. Does anyone have any suggestions? Am I missing passes, do I need to restructure the IR, or am I missing some concept?

thanks! (keep up the amazing work)
yours,
Bobby Powers
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Evan Cheng | 1 Apr 2008 02:35
Picon
Favicon

Re: reg_iterator Caveats


On Mar 31, 2008, at 4:55 PM, Chris Lattner wrote:

> On Mon, 31 Mar 2008, Evan Cheng wrote:
>>> I just discovered that def_itterator (and presumably, reg_iterator)
>>> doesn't
>>> include implicit defs, for example at function calls for caller-save
>>> physical
>>> registers.  Guh.  I'm not sure if it should or not, but it's  
>>> certainly
>>> necessary information in some cases.  Is this expected behavior,  
>>> or an
>>> oversight?
>
> reg iterators will return everything that is in the function.  If the
> implicit operands haven't been added to the machieninstrs yet, then  
> they
> won't be returned.
>
>> MachineRegisterInfo tracks virtual register only.
>
> It works for vregs and pregs today.

Ok! Fooled me with this comment:

/// MachineRegisterInfo - Keep track of information for each virtual  
register,
/// including its register class.

Evan

>
>
>> I also wish it would track physical register defs and uses as well.  
>> It
>> can be used to simplify a lot of code (in livevariable, etc.). Chris,
>> do you think that's feasible?
>
> Really really feasible :)
>
> -Chris
>
> -- 
> http://nondot.org/sabre/
> http://llvm.org/
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Chris Lattner | 1 Apr 2008 02:55
Picon
Favicon

Re: reg_iterator Caveats

On Mar 31, 2008, at 5:35 PM, Evan Cheng wrote:
>>> MachineRegisterInfo tracks virtual register only.
>>
>> It works for vregs and pregs today.
>
> Ok! Fooled me with this comment:
>
> /// MachineRegisterInfo - Keep track of information for each virtual  
> register,
> /// including its register class.

Hrm, my tree [now] says:

/// MachineRegisterInfo - Keep track of information for virtual and  
physical
/// registers, including vreg register classes, use/def chains for  
registers,
/// etc.

maybe you need to update :-P

-Chris
Jon Sargeant | 1 Apr 2008 04:14
Picon

Calling Conventions

I'm trying to understand the LLVM calling conventions.  Coming from a 
Windows C++ background, I'm familiar with three calling conventions: 
cdecl, stdcall, and fastcall.  It looks like cdecl corresponds to ccc in 
LLVM, but I'm not sure about the other two.

Best Regards,
Jon
Talin | 1 Apr 2008 05:54
Picon
Gravatar

Re: Advice on debugging?

Ping? Still looking for advice in figuring out how and why my generated 
code is causing lli to crash...

Talin wrote:
> I've been using lli to do most of my unit tests for the compiler that 
> I'm writing. However, when I get a test that crashes, its difficult to 
> find which instruction it was that caused the crash. I tried running 
> bugpoint, but it didn't seem to work for me, and upon reading the 
> documentation, it seems to be intended for a different purpose than 
> finding bugs in my source program; It seems to be related more to 
> finding errors in the various optimizer passes.
>
> So for example, when I run lli on my program:
> ----------------------------------------------------------
> Assertion failed: (V == V2 && "Didn't find key?"), function RemoveKey, 
> file StringMap.cpp, line 177.
> 0   lli                                 0x0049edcd 
> _ZN40_GLOBAL__N_Signals.cpp_00000000_B58D0A3815PrintStackTraceEv + 45
> 1   lli                                 0x0049efc9 
> _ZN40_GLOBAL__N_Signals.cpp_00000000_B58D0A3813SignalHandlerEi + 323
> 2   libSystem.B.dylib                   0x9534297b _sigtramp + 43
> 3   ???                                 0xffffffff 0x0 + 4294967295
> 4   libSystem.B.dylib                   0x953bb782 raise + 26
> 5   libSystem.B.dylib                   0x953cad3f abort + 73
> 6   libSystem.B.dylib                   0x953bc923 __assert_rtn + 101
> 7   lli                                 0x00491ca7 
> _ZN4llvm13StringMapImpl9RemoveKeyEPNS_18StringMapEntryBaseE + 127
> 8   lli                                 0x00461946 
> _ZN4llvm9StringMapIPNS_5ValueENS_15MallocAllocatorEE6removeEPNS_14StringMapEntryIS2_EE 
> + 24
> 9   lli                                 0x00461094 
> _ZN4llvm16ValueSymbolTable15removeValueNameEPNS_14StringMapEntryIPNS_5ValueEEE 
> + 24
> 10  lli                                 0x0040aabe 
> _ZN4llvm21SymbolTableListTraitsINS_10BasicBlockENS_8FunctionEE18removeNodeFromListEPS1_ 
> + 94
> 11  lli                                 0x003cbd66 
>
_ZN4llvm6iplistINS_10BasicBlockENS_12ilist_traitsIS1_EEE6removeERNS_14ilist_iteratorIS1_EE 
> + 254
> ...etc...
> ----------------------------------------------------------
>
> But when I run bugpoint, I get:
> ----------------------------------------------------------
> Read input file      : 'out/Debug/test/stdlib/test08.bcc'
> *** All input ok
> Found gcc: /usr/bin/gcc
> Initializing execution environment: Running the code generator to test 
> for a crash: <cbe>
> Generating reference output from raw program: 
> <cbe><cbe><gcc><program>Reference output is: bugpoint.reference.out
>
> *** Checking the code generator...
> <cbe><gcc><program>
> *** Debugging miscompilation!
> Checking to see if '' compile correctly: <cbe><gcc><program> yup.
> *** Optimized program matches reference output!  No problem detected...
> bugpoint can't help you with your problem!
> ----------------------------------------------------------
>
> To be honest, I am not sure what this all means.
>
> In any case, what I'd like is a way to find out more about the source 
> of the crash.
>
> I don't suppose anyone is working on a version of lli that supports 
> single-step debugging from the command line? Or do I need to compile 
> to assembly and use gdb? What's the best strategy for solving this 
> type of problem?
>
> -- Talin
>
>

Gmane