Evan Cheng | 1 Mar 2009 04:05
Picon
Favicon

Re: Shrink Wrapping - RFC and initial implementation


On Feb 26, 2009, at 2:02 PM, John Mosby wrote:

> Hello LLVMdev,
>
> I have been working with LLVM for just over a year now, mainly in  
> the area of compilation for HDLs like SystemVerilog and SystemC.
> Most of this work dealt with translation to LLVM IR, representing  
> concurrent languages with LLVM and using LLVM analyses and transforms
> for compiling onto proprietary simulation acceleration hardware. All  
> of this work used the C back end exclusively, since I wanted a  
> transparent
> and easily debuggable flow.

Welcome to the community.

>
> To learn more about the code generator, I decided to try  
> implementing shrink wrapping, a reasonably self-contained back end  
> transformation pass.
>
> I now have a preliminary implemenation of shrink wrapping, done as  
> an option to prologue/epilogue insertion under the switch --shrink- 
> wrap.

Nice.

> It is limited to X86 presently since that is the only target I have  
> access to at the moment.

(Continue reading)

Evan Cheng | 1 Mar 2009 04:07
Picon
Favicon

Re: Shrink Wrapping - RFC and initial implementation


On Feb 26, 2009, at 5:52 PM, John Mosby wrote:

Hi Anton,

Thanks for your questions, that's what I'm looking for.

On Thu, Feb 26, 2009 at 5:33 PM, Anton Korobeynikov <anton <at> korobeynikov.info> wrote:
Hello, John

> My limited implementation uses a workaround that adjusts the
> generation of prologue code and the frame indices used by
> the target eliminateFrameIndex() when necessary. I am looking at
> several approaches, but I would like input from anyone who
> has an opinion.
I haven't looked into the patch deep enough yet, but I have at least 2 questions:
1. How do all the stuff play with dynamic stack realignment?
2. It seems, that dwarf information about callee-saved registers is invalidated by your patch.
This means, that you won't have sane stack traces in the debugger. Unwinding won't also work.
Have you tried to compile some C++ code, which uses EH?

Integrating shrink wrapping with dynamic stack realignment, debugging info, EH (and more)
requires a more general (or more complete) way of treating callee-saved registers, and I did
not attempt to tackle this in the patch. I meant to show a starting point for this work and get
some questions coming in (working so far :-)).

I think for step 1 PEI should not attempt shrink wrapping when dynamic stack realignment or EH is required. Debug info is a different story. We are in the process of eliminating debug specific instructions (i.e. rely completely on DebugLoc on machine instructions). Once that's done, it should *just work*. 

Evan


I am not far along enough with the two approaches I'm looking at to give more detail. I will
have more worked out in a few days.

I'm still coming up to speed in the code generator areas, so thanks for your patience!
 
Again, thanks for your questions,
John



--
With best regards, Anton Korobeynikov.

Faculty of Mathematics & Mechanics, Saint Petersburg State University.

_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Nick Lewycky | 1 Mar 2009 05:24
Picon

Re: -fPIC warning on every compile on Cygwin

Please try this patch. I tried to copy exactly what libtool would do on 
Cygwin by reading the libtool source.

Nick

Aaron Gray wrote:
> On Fri, Feb 27, 2009 at 4:50 PM, Aaron Gray 
> <aaronngray.lists <at> googlemail.com 
> <mailto:aaronngray.lists <at> googlemail.com>> wrote:
> 
>     On Fri, Feb 27, 2009 at 4:32 PM, Jay Foad <jay.foad <at> gmail.com
>     <mailto:jay.foad <at> gmail.com>> wrote:
> 
>          >> Could you please rig Makefile.rules or something to print
>         out the value
>          >> of $(LLVM_ON_WIN32) ? The only way I can think of this
>         happening is if
>          >> that's erroneously false.
> 
>         This works for me:
> 
>         Index: Makefile.rules
>         ===================================================================
>         --- Makefile.rules      (revision 65633)
>         +++ Makefile.rules      (working copy)
>          <at>  <at>  -298,6 +298,8  <at>  <at> 
>               # Common symbols not allowed in dylib files
>               CXX.Flags += -fno-common
>               C.Flags   += -fno-common
>         +    else ifeq ($(OS),Cygwin)
>         +      # Nothing. Cygwin defaults to PIC and warns when given -fPIC
>             else
>               # Linux and others; pass -fPIC
>               CXX.Flags += -fPIC
> 
>      
>     Jay, thanks I will try this.
> 
>  
> I had to hand modify the code as it did not seem to want to work as a patch.
>  
> However, it does not deal with the LLVMHello.dll problem. Here's what I 
> am getting :-
>  
> ~~~
> llvm[3]: Linking Debug Loadable Module LLVMHello.dll
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZN79_GLO
> BAL__N__usr_src_llvm_65633_lib_Transforms_Hello_Hello.cpp_00000000_965F4EBD6Hell
> o213runOnFunctionERN4llvm8FunctionE':
> /usr/src/llvm-65633/lib/Transforms/Hello/Hello.cpp:53: undefined 
> reference to `l
> lvm::EscapeString(std::basic_string<char, std::char_traits<char>, 
> std::allocator
> <char> >&)'
> /usr/src/llvm-65633/lib/Transforms/Hello/Hello.cpp:54: undefined 
> reference to `l
> lvm::cerr'
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZN79_GLO
> BAL__N__usr_src_llvm_65633_lib_Transforms_Hello_Hello.cpp_00000000_965F4EBD5Hell
> o13runOnFunctionERN4llvm8FunctionE':
> /usr/src/llvm-65633/lib/Transforms/Hello/Hello.cpp:34: undefined 
> reference to `l
> lvm::EscapeString(std::basic_string<char, std::char_traits<char>, 
> std::allocator
> <char> >&)'
> /usr/src/llvm-65633/lib/Transforms/Hello/Hello.cpp:35: undefined 
> reference to `l
> lvm::cerr'
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZSt17__v
> erify_groupingPKcjRKSs':
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2569: undefined reference to 
> `llvm::PassInfo::registerPass()'
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2571: undefined reference to `llvm::Pass::getPassName() 
> const'
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2571: undefined reference to 
> `llvm::Pass::print(std::basic_ostr
> eam<char, std::char_traits<char> >&, llvm::Module const*) const'
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2571: undefined reference to 
> `llvm::FunctionPass::assignPassMan
> ager(llvm::PMStack&, llvm::PassManagerType)'
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2570: undefined reference to 
> `llvm::Pass::dumpPassStructure(uns
> igned int)'
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2576: undefined reference to 
> `llvm::FunctionPass::runOnModule(l
> lvm::Module&)'
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZN4llvm1
> 2RegisterPassIN79_GLOBAL__N__usr_src_llvm_65633_lib_Transforms_Hello_Hello.cpp_0
> 0000000_965F4EBD5HelloEEC1EPKcS5_bb':
> /usr/src/llvm-65633/include/llvm/PassSupport.h:172: undefined reference 
> to `llvm
> ::Pass::getPassName() const'
> /usr/src/llvm-65633/include/llvm/PassSupport.h:172: undefined reference 
> to `llvm
> ::Pass::print(std::basic_ostream<char, std::char_traits<char> >&, 
> llvm::Module c
> onst*) const'
> /usr/src/llvm-65633/include/llvm/PassSupport.h:172: undefined reference 
> to `llvm
> ::FunctionPass::assignPassManager(llvm::PMStack&, llvm::PassManagerType)'
> /usr/src/llvm-65633/include/llvm/PassSupport.h:175: undefined reference 
> to `llvm
> ::Pass::dumpPassStructure(unsigned int)'
> /usr/src/llvm-65633/include/llvm/PassSupport.h:175: undefined reference 
> to `llvm
> ::FunctionPass::runOnModule(llvm::Module&)'
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZSt17__v
> erify_groupingPKcjRKSs':
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2560: undefined reference to 
> `llvm::Statistic::RegisterStatisti
> c()'
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2558: undefined reference to `llvm::Value::getNameStr() 
> const'
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZNSt12_V
> ector_baseISt4pairIPKN4llvm8PassInfoEPNS1_4PassEESaIS7_EEC2ERKS8_':
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/s
> tl_vector.h:(.text$_ZN4llvm12FunctionPassD2Ev[llvm::FunctionPass::~FunctionPass(
> )]+0x7): undefined reference to `vtable for llvm::FunctionPass'
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZSt17__v
> erify_groupingPKcjRKSs':
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2558: undefined reference to `llvm::Pass::~Pass()'
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZNSt12_V
> ector_baseISt4pairIPKN4llvm8PassInfoEPNS1_4PassEESaIS7_EEC2ERKS8_':
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/s
> tl_vector.h:(.text$_ZN4llvm4PassC2EPKv[llvm::Pass::Pass(void 
> const*)]+0x7): unde
> fined reference to `vtable for llvm::Pass'
> /usr/build/llvm-65633/lib/Transforms/Hello/Debug/Hello.o: In function 
> `_ZSt17__v
> erify_groupingPKcjRKSs':
> /usr/gcc-4.2.2/lib/gcc/i686-pc-cygwin/4.2.2/../../../../include/c++/4.2.2/bits/l
> ocale_facets.tcc:2558: undefined reference to `vtable for 
> llvm::FunctionPass'
> collect2: ld returned 1 exit status
> make[3]: *** [/usr/build/llvm-65633/Debug/lib/LLVMHello.dll] Error 1
> make[3]: Leaving directory `/usr/build/llvm-65633/lib/Transforms/Hello'
> make[2]: *** [Hello/.makeall] Error 2
> make[2]: Leaving directory `/usr/build/llvm-65633/lib/Transforms'
> make[1]: *** [Transforms/.makeall] Error 2
> make[1]: Leaving directory `/usr/build/llvm-65633/lib'
> make: *** [all] Error 1
>  
> Aaron Gray <at> AMD2500-PC <mailto:Gray <at> AMD2500-PC> /usr/build/llvm-65633 $
>  
> Aaron
>  

Attachment (cygwin1.patch): text/x-patch, 912 bytes
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Mark Shannon | 1 Mar 2009 11:41
Picon
Picon
Favicon

Re: Why LLVM should NOT have garbage collection intrinsics

Gordon Henriksen wrote:
> 
> The "runtime interface" is a historical artifact. LLVM does not impose  
> a runtime library on its users. I wouldn't have a problem deleting all  
> mention of it, since LLVM does not impose a contract on the runtime.
> 
Excellent, I found it somewhat unhelpful!

>> The semantics of llvm.gcroot are vague:
>> "At compile-time, the code generator generates information to allow  
>> the
>> runtime to find the pointer at GC safe points."
>>
>> Vague, ill-specified interfaces are worse than none.
> 
> There's nothing ill-defined about the semantics of gcroot except  
> insofar as GC code generation is pluggable.
> 
Sorry, but "At compile-time, the code generator generates information to 
allow the runtime to find the pointer at GC safe points." does not 
really say anything.
No one could possibly implement this "specification".

Sorry about all my negative comments, but I would like to implement a 
generational collector for llvm, but I cannot do so in a portable way.

So, here is a suggestion:

Call the GC 'intrinsics' something else, "extinsics"?, and provide 
low-level intrinsics so that the GC calls, gcroot, gcread and gcwrite 
can be converted to GC-free LLVM code in a GC-lowering pass.

IR+GC -> | GC Lowering pass | -> IR

Rather than than the current.

IR+GC -> | Backend lowering pass(es) | -> SelectionDAG

Read and write barriers can already be written in llvm-IR.
It is the marking of roots that is the problem.

Given that any new intrinsics/instructions are an additional burden on 
all back-ends, I'm not going to propose particular ones, but it seems 
that they are needed.

By the way, I think that adding a GC pointer type is an unnecessary 
burden on the the back-ends, front-ends really should be able to handle 
this.

The current trio of gcroot, gcread and gcwrite is OK, BUT GC 
implementations should be able to translate them to llvm-IR so that the 
optimisers and back-ends can do their jobs without worrying about GC 
details.

As an aside, I think that debug info can be treated in a similar way:

IR+debug -> | Debug lowering pass | ->  IR

After all both debug and GC require similar things, that is, information 
about the location of stack variables (and possibly, register variables)
and the machine location of points in code (for line numbering or 
gc-safe points).

If intrinsics/instructions to do the above can be implemented then I 
will port my generational, copying collector to LLVM *and* maintain it 
for as long as possible.

Mark.
Gordon Henriksen | 1 Mar 2009 16:32

Re: Why LLVM should NOT have garbage collection intrinsics

On 2009-03-01, at 05:41, Mark Shannon wrote:

> Gordon Henriksen wrote:
>
>>> The semantics of llvm.gcroot are vague: "At compile-time, the code  
>>> generator generates information to allow the runtime to find the  
>>> pointer at GC safe points."
>>>
>>> Vague, ill-specified interfaces are worse than none.
>>
>> There's nothing ill-defined about the semantics of gcroot except  
>> insofar as GC code generation is pluggable.
>>
>
> Sorry, but "At compile-time, the code generator generates  
> information to allow the runtime to find the pointer at GC safe  
> points." does not really say anything. No one could possibly  
> implement this "specification".

llvm.gcroot is an interface to a runtime library (or binary format)  
only through the mediation of the GC plugin, so the exact front-to- 
back behavior is undefined, yes. Likewise, the 'add' instruction does  
not specify by what machine instruction the addition will be  
performed, yet it is not vague.

What is communicated to the plugins themselves through the presence of  
a llvm.gcroot call is detailed lower in the document, in the  
Implementing a GC Plugin section.

This is abstract and complex, but not imprecise. Still, if you'd like  
to propose improved wording for GarbageCollection.html that makes this  
clearer for you, I'd be happy to incorporate it.

> I would like to implement a generational collector for llvm, but I  
> cannot do so in a portable way.

You'll certainly need to map roots on the stack and use write barriers.

• shadow-stack is an easy, portable way to bring up root discovery.  
You can switch to static stack maps later (with the requirement that  
your runtime be able to crawl the machine stack, which is out-of-scope  
for LLVM unless Talin makes some progress with his GC building blocks).

• As you observe, your write barrier can be written in LLVM IR without  
the use of the llvm.gcwrite intrinsic if you so desire. Otherwise, you  
can perform the IR-to-IR transform to eliminate llvm.gcwrite using the  
performCustomLowering hook.

What else is blocking you?

> By the way, I think that adding a GC pointer type is an unnecessary  
> burden on the the back-ends, front-ends really should be able to  
> handle this.

I don't see how these points mesh together into a single concern, but  
I can address your three points:

• On your concern for "burdening the backends." You've used this straw  
man before. LLVM backends are not monoliths; they actually share a  
great deal of code. All current GC changes were made in the shared  
codebase, making the cost-to-implement and maintain O(1), not O(N). I  
see no reason this would change in the future.

• On whether the back-end need be involved. As I've already discussed  
WRT stack layout, the back-end must be involved because only it knows  
how stack frames and code are laid out. I'd like to reemphasize that  
back ends can and do introduce or delete both control flow and calls  
in the program--adding safe points which are not (as such) represented  
in LLVM IR. Thus, the front-end cannot know even the set of all safe  
points (much less their locations). From that, it follows that  
liveness and stack maps cannot be computed by the front-end, precisely  
because they need be computed at said unknown safe points. Finally, I  
hope it's abundantly obvious that register maps are impossible for the  
front-end to compute.

• On whether a GC pointer type is necessary in the IR. 'llvm.gcroot'  
as it stands today basically makes GC pointer manipulation code opaque  
to all of LLVM's optimizations, both front- and back-end. (The root  
alloca 'escapes', so mem2reg can't hack on it, and all is lost.) The  
generated code is full of redundant memory operations as a result.  
There's true redundancy, and there's the redundancy required when  
passing a safe point without register maps. Allowing SSA values to be  
GC roots directly (rather than merely pointing to roots) would enable  
improvements in this area.

> The current trio of gcroot, gcread and gcwrite is OK, BUT GC  
> implementations should be able to translate them to llvm-IR so that  
> the optimisers and back-ends can do their jobs without worrying  
> about GC details.

GC plugins can already eliminate GC intrinsics prior to code  
generation, for cases where the GC scheme can be represented as pure  
IR. In the tree, shadow-stack is implemented as a pure IR transform.

http://llvm.org/docs/GarbageCollection.html#custom

But this is not the interesting case, since it has limitations as I  
discussed above (and could've been written without the intrinsics in  
the first place).

— Gordon
Mark Shannon | 1 Mar 2009 20:11
Picon
Picon
Favicon

Re: Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]


Gordon Henriksen wrote:

> You'll certainly need to map roots on the stack and use write barriers.
> 
> • shadow-stack is an easy, portable way to bring up root discovery.  
> You can switch to static stack maps later (with the requirement that  
> your runtime be able to crawl the machine stack, which is out-of-scope  
> for LLVM unless Talin makes some progress with his GC building blocks).

This this is the crux of my argument:
Without the ability to traverse the stack in a portable way, the only 
way I can write a portable GC is to avoid the llvm intrinsics.
Therefore, they are useless and should be removed.

However, if the ability to traverse the stack is added to llvm then most 
of my objections to the intrinsics disappear.

> • As you observe, your write barrier can be written in LLVM IR without  
> the use of the llvm.gcwrite intrinsic if you so desire. Otherwise, you  
> can perform the IR-to-IR transform to eliminate llvm.gcwrite using the  
> performCustomLowering hook.
> 
> What else is blocking you?
> 

Nothing, except the lack of stack traversal code ;)

Once the portable stack-traversal code is available, I'll port my GC, as 
promised.

Thanks for taking the time to discuss this.
You've just about convinced me that the intrinsics should stay, but I 
still think the interface to the GC subsystem is (currently) a bit of a 
mess.

Mark.
Gordon Henriksen | 1 Mar 2009 22:33

Re: Why LLVM should NOT have garbage collection intrinsics[MESSAGE NOT SCANNED]

On 2009-03-01, at 14:11, Mark Shannon wrote:

> Without the ability to traverse the stack in a portable way, the  
> only way I can write a portable GC is to avoid the llvm intrinsics.  
> Therefore, they are useless and should be removed.

Nonsense. Compare at: DWARF EH tables vs. unwind support in libgcc  
(not LLVM). It would be more accessible for those developing novel  
runtimes if LLVM incorporated facilities to make building runtimes  
easier, however—which is precisely what this thread was about to begin  
with.

— Gordon

P.S. HP has published a library that should work to crawl the stack  
for return addresses. It is permissively licensed.

http://www.nongnu.org/libunwind/
John Mosby | 1 Mar 2009 23:57
Picon

Re: Shrink Wrapping - RFC and initial implementation

First, thanks very much for your comments!

On Sat, Feb 28, 2009 at 8:05 PM, Evan Cheng <evan.cheng <at> apple.com> wrote:

On Feb 26, 2009, at 2:02 PM, John Mosby wrote:
> It is limited to X86 presently since that is the only target I have
> access to at the moment.

What part of this is target dependent? Is this due to emitPrologue /
emitEpilogue being target specific?

It is target dependent (X86) at present because of the way I developed it, just using the X86 target since that is the only one on which I can test the entire (static) flow: test.c -> llvm-gcc -emit-llvm -> (.ll, .bc) -> llc --shrink-wrap -> .s -> gcc test.s -o test.

I worked with other targets also, but I decided to take it as far as I could on the first go with X86.

First pass was without debugging info and with simple stack frames, in the interest of getting as much worked out as possible.
I saw the issue concerning how code gen handles placement of spill and restore code outside of entry/return blocks before I had the first test cases running, but I worked through the details using -march=x86 only.

Re: debugging info: I know about the work to change the way debugging info is handled, so I held off trying to make the shrink wrapping work with the current impl.

 
> The main features are:
>   - Placing callee saved register (CSR) spills and restores in the
> CFG to tightly surround uses
>      so that execution paths that do not use CSRs do not pay the
> spill/restore penalty.
>
>   - Avoiding placment of spills/restores in loops: if a CSR is used
> inside a loop(nest), the spills
>      are placed in the loop preheader, and restores are placed in
> the loop exit nodes (the
>      successors of the loop _exiting_ nodes).
>
>   - Covering paths without CSR uses: e.g. if a restore is placed in
> a join block, a matching spill
>      is added to the end of all immediate predecessor blocks that
> are not reached by a spill.
>      Similarly for saves placed in branch blocks.

Sounds great. It would help everyone if you can show some examples code.

I am putting documented examples together from the test cases in the patch.

> Since I ran into a non-trivial issue in developing this pass, I
> would like to submit my implementation as a "RFC + code/tests"
> rather than a typical contribution, and get people's opinions on how
> to proceed.
>
> The issue is that the code generator assumes all spills and restores
> of callee saved registers must be placed in the entry and return
> blocks resp.
> Shink wrapping violates this assumption, with the result that the
> frame offsets computed by PEI for frame-relative operands may be
> incorrect.

I am not sure how this would happen. Why would frame offsets be
affected by where these instructions are placed?

The issue is illustrated by a simple example in which a single CSR is used in one branch of a conditional. When the stack frame is laid out, the spill for this CSR is accounted for in the calculation of stack size as it should be. The stack slot for the CSR is correctly computed and everything seems fine when the MO_FrameIndex are replaced. The problem is that since the spill instruction for the CSR (e.g. pushl %esi) is moved from the entry block, the push does not happen, and the value of %esp in the entry block is not what it should be to satisfy the offsets produced by eliminateFrameIndex().
A similar situation exists for the BB into which a spill is "moved" (from the entry block): a push happens  to spill the CSR on entry to the block, and now %esp is not what it should be for that block. The example below illustrates this issue:

assume:
int F(int a, int b, int c) uses one CSR in conditional branch

prologue, no shrink wrapping:

_F:
pushl %esi                   # spill CSR %csi, %esp -= 4 (in this case)
subl $56, %esp           # create frame, %esp = %esp - 56
movl 64(%esp), %eax  # fetch arg 'a' from entry %esp + 4
movl %eax, 52(%esp)
movl 68(%esp), %eax  # fetch arg 'b'
movl %eax, 48(%esp)
...

prologue with spill shrink-wrapped to basic block bb:

_F:
        # no spill of %esi, moved to bb
subl $56, %esp          # create frame same as before %esp = %esp - 56
movl 64(%esp), %eax  # error: 'a' is not at 64(%esp), it's at 60(%esp)
movl %eax, 52(%esp)
...

The simple, ugly hack of adjusting the value by which %esp is decremented in the prologue when one or more CSR spills have been placed into other blocks takes care of the issue on this simple code (no dynamic stack realign., nor EH) on x86.

The companion hack for (non entry) MBBs into which spills have been introduced is to adjust the stack size around eliminateFrameIndex()'s for replacement of MO_FrameIndex operands.

Obviously, all of this applies only when spills are done with push/pop, which is the case on x86. I used this issue to start looking at generalizing how spills and restores are handled, before looking too closely at other targets, and developed the workaround for the initial implementation.

 
>
> My limited implementation uses a workaround that adjusts the
> generation of prologue code and the frame indices used by
> the target eliminateFrameIndex() when necessary. I am looking at
> several approaches, but I would like input from anyone who
> has an opinion.
>

I think to do this right for every target is a big job. I'd like to
see prologue / epilogue related stuff be moved out of
TargetRegisterInfo. Shrink wrapping will only happen when the targets
buy-in, i.e. providing the right hooks.

Part of what I'm doing now is estimating the work, which requires going through the targets. I am not far enough along to send out a proposal. Moving pro/epi generation out of TRI, perhaps into its own "component" is one architecture I am looking at.
 
When is shrink wrapping happening? Is it possible to do it after CSR
spills and restores are inserted but before FI are lowered into sp /
fp +/- offset?

Shrink wrapping starts after calculateCalleeSavedRegisters(), which creates the list of CSRs used in the function. Shrink wrapping assigns MBB placements for spills and restores based on where they are used. calculateCalleeSavedRegisters() determines stack slots for the CSRs used in the function.
I don't see an interaction between this and shrink wrapping, of have I missed something?

 
> Finally, I realize that shrink wrapping is probably not high
> priority in the larger scheme of LLVM development, so I'm not
> expecting
> a huge response, but any ideas are certainly welcome.

It's actually a fairly useful optimization. It can really help a class
of functions, e.g. functions with early returns.

Quite right, it is certainly worthwhile. I could have left that comment out :-)


>
> The patch and a test tarball are attached. I include basic tests
> that are run with the supplied Makefile.

Some comments:

1. The code needs some refactoring. :-) It's a big chunk of code so
it's hard to digest.
2. There doesn't seem to be a way to turn off shrink wrapping. Please
make sure it's optional. When it's not being done, PEI should not
require dominator, etc.

I already refactored once, but I knew it would not be enough(!), I'll definitely do another pass. I forgot to put the analysis deps under --shrink-wrap, I will fix that and anything else that I might have left out of the option.
 

 From what I can see this is a very good first step. I look forward to
seeing its completion.

Evan

Thanks! Likewise, and it's a pleasure to work on.

John


_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Chris Lattner | 2 Mar 2009 04:26
Picon
Favicon

Please review the 2.5 release notes

Hi All,

Please review the 2.5 release notes here: http://llvm.org/docs/ReleaseNotes.html

Let me know if you have any additions, improvements, or see any  
oversights.  If you have commit access, please just directly change  
the document.

The release is planned to go out in about 24 hours from now!

Thanks!

-Chris 
Shweta Jain | 2 Mar 2009 08:12
Picon

Re: LLVMdev Digest, Vol 53, Issue 72


Hello All,
 
I am currently working on a project which requires me to generate a .bc file for given .c file and open the .bc file to identify various functions and the caller callee relationship amongst them. The end goal is to generate a type of callgraph for all the functions present in the original C code. I am quite new to llvm and will really appreciate if I can be provided some pointers. I am looking at various llvm passes, but I am not quite sure if thats the way to go.
 
Your help will be greatly appreciated.
 
Thanks
SHWETA
_______________________________________________
LLVM Developers mailing list
LLVMdev <at> cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Gmane