Jonathan Roelofs | 28 Mar 14:53 2015

Re: Customize Standard C Library Using LLVM (to support llvm backend optimization)

On 3/27/15 8:07 PM, Chao Yan wrote:
> 2015-03-12 15:07 GMT-05:00 Jonathan Roelofs <jonathan <at>
> <mailto:jonathan <at>>>:
>     This isn't a gcc support mailing list... If you have questions
>     pertaining to llvm/clang, we'd be happy to answer them.
>     That being said, that symbol should be part of the compiler's
>     runtime library (or maybe in the libc?). llvm provides that symbol
>     as part of libclangrt/libcompiler_rt.
> The problem is basically caused by llvm. I cross-compile the musl-libc
> to arm binaries using clang. However, the cross-compiled musl-libc needs
> runtime support from libcompiler_rt. The arm version of this static
> runtime library cannot be obtained by cross compiling.

Where did you read that? Because it's plainly not true.

libcompiler_rt is *designed* to be cross-built, and building the ARM 
version of it from x86 Linux/Darwin is well exercised by several members 
of the community, myself included.

> Do I need to build llvm on an arm host machine?

No. LLVM/Clang is a cross compiler. Building host binaries is just a 
special, easier case of that.

(Continue reading)

Xinliang David Li | 27 Mar 21:36 2015

Re: fix for loop scale limiting in BFI

On Fri, Mar 27, 2015 at 1:12 PM, Diego Novillo <dnovillo <at>> wrote:
On Fri, Mar 27, 2015 at 4:09 PM, Xinliang David Li <xinliangli <at>> wrote:
> How about only removing the scaling limit when PGO is on? I don't see the
> need for this change without PGO.

This is what my patch does, and it's getting into issues.  With the
scaling limit gone, the frequencies propagated overflow 64bits, so
they need to be scaled down to a 64bit space.

What I mean is only do this when real profile data is used? I don't see the patch has that check ?

To be on the safe side, my patch is mapping them down to a 32bit
space, but I am squishing them too much on  the lower end. So regions
of the CFG that before had distinct temperatures are now showing up
with frequency == 1.

I need a better smoother for the mapping from the Scale64 floats down
to 64bit (or 32bit) integers.

This seems to show another weakness of the block frequency propagation --  it can not handle infinite loops.  We need to think about how to handle it ..



LLVM Developers mailing list
LLVMdev <at>
Ashish Saxena | 27 Mar 20:58 2015

Re: LLVM fails for inline asm with Link Time Optimization

Even when I am passing -mllvm --x86-asm-syntax=intel to clang++.exe the assembly is generated in AT&T syntax. When can this happen ?

On Fri, Mar 27, 2015 at 11:25 PM, Ashish Saxena <ashishcseitbhu <at>> wrote:
It is failing at link time . Examples are what I have given above. There is an compiler option -mllvm --x86-asm-syntax=intel. Is there something similar for linker which I can provide via -plugin-opt ?

On Fri, Mar 27, 2015 at 9:31 PM, Rafael Espíndola <rafael.espindola <at>> wrote:
At which stage is this failing? Can you provide an example and commands?

Looks like the information about the use of intel syntax is being
dropped along the way.

On 27 March 2015 at 11:40, Ashish Saxena <ashishcseitbhu <at>> wrote:
> Ah , I thought that there is issue while parsing inline asm in function
> bodies , here are some of instruction where it cribs . Can you make out
> something of it ?
> I am going to try out -no-integrated-as option . Not sure if it will help ?
> LLVM ERROR: Error parsing inline asm
> 1><inline asm>:1:17 : error 0: unexpected token in argument list
> 1>        mov ebx, dword ptr 16(%esp)
> 1>                       ^
> 1><inline asm>:2:17 : error 0: unexpected token in argument list
> 1>        mov edi, dword ptr 24(%esp)
> 1>                       ^
> 1><inline asm>:3:17 : error 0: unexpected token in argument list
> 1>        mov esi, dword ptr 28(%esp)
> 1>                       ^
> 1><inline asm>:4:21 : error 0: invalid token in expression
> 1>        movq mm1, [edi+ebx-$8]
> 1>                           ^
> 1><inline asm>:5:12 : error 0: invalid operand for instruction
> 1>        pxor mm0, mm0
> Thanks
> Ashish
> On Fri, Mar 27, 2015 at 8:21 PM, Rafael Espíndola
> <rafael.espindola <at>> wrote:
>> If you are getting a parse error it is very likely a different bug. In
>> that bug the issue is that we don't parse the function bodies to find
>> if some inline in them defines (or uses) a given symbol.
>> On 26 March 2015 at 13:30, Ashish Saxena <ashishcseitbhu <at>> wrote:
>> > Thanks for response Francois . Do you have any pointers on what can be
>> > the
>> > issue here or something I can try out. I saw similar active bug in llvm
>> > database
>> >
>> >
>> >
>> > Thanks
>> > Ashish
>> >
>> > On Thu, Mar 26, 2015 at 10:52 PM, Francois Pichet <pichet2000 <at>>
>> > wrote:
>> >>
>> >>
>> >>
>> >> On Wed, Mar 25, 2015 at 4:47 PM, Ashish Saxena
>> >> <ashishcseitbhu <at>>
>> >> wrote:
>> >>>
>> >>> Hi ,
>> >>>     I am trying to enable link time optimization for my projects. Few
>> >>> of
>> >>> them has inline assembly which works perfectly with clang/llvm but on
>> >>> enabling LTO I get following error
>> >>>
>> >>> LLVM ERROR: Error parsing inline asm
>> >>>
>> >>> <inline asm>:103:2 : error 0: unknown use of instruction mnemonic
>> >>> without
>> >>> a size suffix
>> >>> <inline asm>:104:16 : error 0: invalid operand for instruction
>> >>> <inline asm>:106:17 : error 0: unexpected token in argument list
>> >>>
>> >>> & so on
>> >>>
>> >>> Is this a known issue ? Any workaround for this ?
>> >>
>> >>
>> >>
>> >> My experience is that LTO doesn't break inline assembly.
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev <at>
>> >
>> >

LLVM Developers mailing list
LLVMdev <at>
Diego Novillo | 27 Mar 19:52 2015

fix for loop scale limiting in BFI

I've been trying to get rid of the loop scale limiting problem during
BFI. Initially, this was saturating frequencies to the max side of the
scale, so a double nested loop would get max frequencies in all the
blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner
loop no hotter than the outer loop, so block placement would not
bother aligning them.

In convertFloatingToInteger() we are scaling the BFI frequencies so
they fit an integer. The function tries to choose a scaling factor and
warns about being careful so RA doesn't get confused. It chooses a
scaling factor of 1 / Min, which almost always turns up to be 1. This
was causing me grief in the double nested loop case because the inner
loop had a freq of about 6e20 while the outer blocks had a frequency
of 2e19. With a scaling factor of 1, we were saturating everything to

I changed it so it uses a scaling factor that puts the frequencies in
[1, UINT32_MAX], but only if the Max frequency is outside that range.
This is causing two failures in the testsuite, which seem to be caused
by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll
we are hoisting into the wrong loop now, but I'm not sure.

The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block
placement. Which is a bit odd. The frequencies given by my changes are
certainly different, but the body of the loop is given a
disproportionately larger frequency than the others (much like in the
original case).  Though, I think what's going on here is that my
changes are causing the smaller frequencies to be saturated down to 1:

float-to-int: min = 0.0000004768367035, max = 2047.994141, factor = 16777232.0

Printing analysis 'Block Frequency Analysis' for function 't':
  block-frequency-info: t
   - entry: float = 1.0, int = 16777232
   - if.then: float = 0.0000009536743164, int = 16
   - if.else: float = 0.9999990463, int = 16777216
   - if.then15: float = 0.0000009536734069, int = 16
   - if.else18: float = 0.9999980927, int = 16777200
   - if.then102: float = 0.0000009536734069, int = 16
   - cond.true10.i: float = 0.0000004768367035, int = 8
   - t.exit: float = 0.0000009536734069, int = 16
   - if.then115: float = 0.4999985695, int = 8388592
   - if.else145: float = 0.2499992847, int = 4194296
   - if.else163: float = 0.2499992847, int = 4194296
   - while.body172: float = 2047.994141, int = 34359672832

if.else173: float = 0.4999985695, int = 8388592

My patch:
float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0,
factor = 0.0000000004656626195

block-frequency-info: t
 - entry: float = 1.0, int = 1
 - if.then: float = 0.0000009536743164, int = 1
 - if.else: float = 0.9999990463, int = 1
 - if.then15: float = 0.0000009536734069, int = 1
 - if.else18: float = 0.9999980927, int = 1
 - if.then102: float = 0.0000009536734069, int = 1
 - cond.true10.i: float = 0.0000004768367035, int = 1
 - t.exit: float = 0.0000009536734069, int = 1
 - if.then115: float = 0.4999985695, int = 1
 - if.else145: float = 0.2499992847, int = 1
 - if.else163: float = 0.2499992847, int = 1
 - while.body172: float = 9223345648592486401.0, int = 4294967295
 - if.else173: float = 0.4999985695, int = 1

The scaling factor is so minuscule that I end up squashing every "low"
frequency to 1. I think I need to smooth this better. In the meantime,
I wanted to pick your brain. Maybe I'm completely off-base in my

Thanks.  Diego.
LLVM Developers mailing list
LLVMdev <at>
David Blaikie | 27 Mar 18:31 2015

Non-clang warning cleanliness in the LLVM project

So a while back we took a moderately aggressive stance on disabling GCC warnings with false positives, specifically those related to uninitialized variables. In cases where GCC suggested initializing a variable yet the algorithm was safely initializing the variable, adding the GCC-suggested initialization could thwart tools like MSan/ASan. So it was suggested that we should not abide by GCC's warning and rely on Clang's more carefully limited warning without the problematic false positives.

Recently Andy Kaylor's been working on getting the MSVC build warning clean by a combination of disabling warnings and fixing a few cases of relatively low frequency.

I've generally been encouraging people to aggressively disable non-clang warnings whenever there's a false positive (anything other than a bug, or at least anything that doesn't strictly improve readability) and one such instance of this is in the review for r233088 which amounts to something like:

template<typename T>
int func(T t) {
  return t.func() >> 8;

(it's more complicated than that, but this is the crux of it) - for some instantiations of this template, T::func returns an 8-bit type, and MSVC warns that the shift will produce a zero value. The code is correct (the author intended this behavior, because some instantiations produce a wider-than-8-bit- type and the higher bits are desired, if present).

I suggested disabling this warning, but that would mean we miss out on the true positives as well (Clang doesn't seem to have any warning like this), though we haven't seen these very often.

How do people feel about working around this warning so we can keep it enabled?
How do people feel about disabling this warning?
How do people feel about disabling other non-Clang warnings which have false positives?

(in the case of Clang warnings with false positives we should take the same approach, except we also have the option to improve the quality of the warning by fixing Clang, which is what we generally try to do)

- David
LLVM Developers mailing list
LLVMdev <at>
Mingxing Zhang | 27 Mar 16:55 2015

Re: [GSoC] Applying for GSoC 2015

Hello John,

In fact I'm rushing toward a paper submission in recent days.
Thus I'm very sorry that I do have enough time to re-write a detailed timeline.
However, I've revised the "Preliminary Results" section and add some more information about the current prototype
(updated both on google-melange and the url ).
I hope it will address your main concerns.

Thank you very much!

On 27 March 2015 at 03:03, John Criswell <jtcriswel <at>> wrote:
Dear Mingxing,

Sorry for the late reply.  I've been gallivanting about Europe giving talks and attending a conference. :)

Attached is feedback on your performance diagnosis proposal.  I think the proposal has two flaws which, if possible, you should rectify before the deadline tomorrow:

1) It is not clear what the current prototype can do and how well it works.  Is it implemented only in PIN, or also in LLVM?  If in LLVM, how far along is that prototype, and how well does it perform?  What is missing from it that you want to implement for GSoC?

2) Your reasoning for why dynamic slicing is going to work is flawed.  You assume that Giri is slow because it instruments a lot of instructions.  That is incorrect.  Giri is slow because it generates so much data during program execution that this data cannot be kept in memory and must therefore be flushed to persistent storage (which was magnetic disk at the time we wrote it).  If instrumentation just needs to update data structures, you get 2x-4x slowdown and life is slow but livable.  If you're streaming tons of data to disk about every load, store, and branch, that's another thing entirely.

I realize that the time to the deadline is short, but if possible, please improve your proposal before the deadline.  Despite the above issues, I find your proposal interesting and would like to see it have the best chance possible of being accepted for GSoC.


John Criswell

On 3/16/15 11:55 AM, Mingxing Zhang wrote:
Thank you very much for all your advices!
I'll revise the proposal according to them.

To George,

As mentioned in the former emails of this thread, I intend to prepare two proposals for the AA project listed in the idea list and the bloat detection project proposed by myself respectively and at most one of them will be accepted by GSoC.
Personally, I do prefer the second project since I'm more familiar with that field and the technique (static instrumentation) it uses.

According to the timeline of GSoC, the accepted proposals will only be announced at 27 April. Is it too late since I do not know which project will be selected or even none of them will be accepted until then?
Otherwise, will the mentors know the decision earlier (e.g., at 15 April after the slot is allocated to organizations)?

To John,
The proposal for bloat detection is also available now at (not completely finished yet).
Some preliminary evaluation results on overhead and detecting ability based on a simple prototype are given.
(Actually I came up with this idea during my visitation in Columbia U and the prototype is also implemented at those days, but the project is paused until recent days due to my internship in Google and some other works.)

P.S. The tex template is downloaded at

Once again, thank you for your time!

On 16 March 2015 at 02:58, George Burgess IV <george.burgess.iv <at>> wrote:
CFLAA already has some basic interprocedural analysis built in (see: tryInterproceduralAnalysis; it basically boils down to CFLAA grabbing the StratifiedSets for each function call and unifying sets based off of that instead of unifying everything ever). The only real changes I had in mind for it were:

- Adding context sensitivity (which kind of requires adding context sensitivity to CFLAA first)
- Making it less terrible for {mutually,indirectly,} recursive functions (While we're building the sets for some function A(), the sets for A() aren't accessible, so any call to A() from a function called by A() looks opaque).

If you want to take a stab at making IPA better/adding context sensitivity, you're more than welcome to do so. I'm happy to work on other things in the meantime :)


On Sun, Mar 15, 2015 at 8:50 AM, Mingxing Zhang <james0zan <at>> wrote:
Hello Daniel,

Thank you for your comments and sorry for my mistakes, I'll revise them.
And I'll for sure read the paper you mentioned and survey the recent researches before deciding the implementation technique.

To George:
May I know the exact plan of your attempt for making cfl-aa interprocedural?
I do think that this is the most valuable part of my proposal, but that makes no sense to do it twice.

Maybe I can work on the porting of the flow-sensitive method proposed by Prof. Ben Hardekopf at CGO11.
It is declared in his homepage that the published source code "is written for a pre-release version of LLVM 2.5 and does not work in current versions of LLVM"


On 15 March 2015 at 08:31, Daniel Berlin <dberlin <at>> wrote:
A few notes:
1. "But these standard LLVM AA passes either take a large amount of time (Anderson Analysis at cubic time and large memory requirements)"

Neither of these is correct. Andersen's is not cubic in practice, or large memory in practice, when implemented properly.  GCC uses it by default as the points-to implementation, and it's never even close to the top of the profile.

It takes about 1 second to do a million lines of code.
And one can do better (gcc's impl is no longer state of the art).

2. The approach to field sensitivity you mention is not going to work very well, given how much casting occurs (everything is casted). I would suggest using the approach in

3. George, cc'd, is planning on implementing both context sensitive and context-insensitive interprocedural analysis in cfl-aa the next month or two. 

4. Using a BDD cloning approach for CFL-AA doesn't make much sense, the whole point of CFL is not having to generate explicit points-to sets if you don't want to. Plus, followup papers and researchers have *never* been able to reproduce the scalability of Whaley's work.

Not to mention it's done on Java. Java is a language where doing things like field-sensitivity always increase precision, which is not true for C.

If you really want to attempt this, I would suggest using one of the demand driven context-sensitive approaches that will be easy to fit in CFL.

On Sat, Mar 14, 2015 at 5:57 AM Mingxing Zhang <james0zan <at>> wrote:
Hello John,

I've finished the first version of my proposal on enhancing alias analysis.
The proposal can be downloaded at
I hope I've successfully justified the necessity and benefits of this project.
If possible, please find some time to review it and give me some more feedbacks.

Thank you very much!

P.S. I'm working on the other proposal, a couple of days is needed.

On 8 March 2015 at 21:42, Mingxing Zhang <james0zan <at>> wrote:
Got it.
I'll try to find the applications for field-sensitivity (and interprocedural, etc) AA analysis.
And I'll do some preliminary evaluation on the tracing/slicing part for the bloat detection.


On 8 March 2015 at 21:34, John Criswell <jtcriswel <at>> wrote:
On 3/8/15 8:56 AM, Mingxing Zhang wrote:
Hello John,

According to the FAQ, I can submit two proposals although at most one of them can be accepted.
Thus I will prepare a proposal for each of the two projects.

Correct.  Only one proposal will be accepted.

And, after reading the code of cfl-aa and several related papers, I've listed four milestones for the AA project:

1) In order to use the fast algorithm described in PLDI'13 [1], cfl-aa makes a simplification on the CFL defined in POPL'08 [2], which will lead to a reduction on precision (I've confirmed this observation with the author).
Thus a quantitative measurement on how much is the reduction is needed.

2) In cfl-aa, different fields of a same struct and the whole array are represented by a single node.
This is the reason of the problem 2, 4 listed in
We should split these large nodes.

I think the real question is whether the loss of precision matters, and if so, to which uses of alias analysis.  SAFECode, for example, wants field information to determine type safety (so that it can optimize away type-safe loads and stores), so field sensitivity matters.  Perhaps field sensitivity doesn't matter for other applications (e.g., optimization).  There's no point in improving precision if it doesn't help the analyses that people care about most.

As part of your project, I think you should state the uses of alias analysis/points-to analysis that you're aiming to improve and understand whether your proposed improvements will help that use.  I would also recommend picking a use that matters to a significant portion of the LLVM community.

3) Handling special global variables, such as errno.

4) It seems that the current version of cfl-aa is an intraprocedural analysis.
If the time is enough, I think we may extend it to an interprocedural analysis.
The algorithm described in [3] can be applied to scaling it.

As for the bloat-detection project, the final result should be a tool that is verified by known bugs and a set of newly detected bugs.

For the bloat detection tool, I would like to be convinced that dynamic tracing will be, or can be, sufficiently efficient to be practical.  I hate to ask, but I think you need to run an experiment with Giri to show that dynamic slicing is going to be practical for the executions that you expect to analyze.  Either that, or you need to explain how you can use something more efficient than dynamic slicing (note that dynamic slicing and dynamic tracing are not the same, so be sure you're correctly stating which one you need).

Do you have any suggestions on these objectives?

In your proposal, be sure to include a set of milestones and how long you think you will need to achieve those milestones.  I may have said that before, but it's worth repeating.


John Criswell


[1] Fast Algorithms for Dyck-CFL-Reachability with Applications to Alias Analysis. PLDI'13
[2] Demand-Driven Alias Analysis for C. POPL'08
[3] Demand-Driven Context-Sensitive Alias Analysis for Java. ISSTA'11

On 5 March 2015 at 09:58, Mingxing Zhang <james0zan <at>> wrote:
Wow, that is cool!
I'll check about it.

Thank you!

On 4 March 2015 at 21:57, John Criswell <jtcriswel <at>> wrote:
On 3/4/15 2:18 AM, Mingxing Zhang wrote:
Hello John,

Thank you for your advices and congratulations~

I'll read the code of cfl-aa and Giri first and make the decision of which project to pursue.
The choice will be reported to this thread once I made the determination (hopefully within this week).

You should check for yourself, but I don't think anything prevents you from submitting two proposals.  If you have time to write two strong proposals, I see no problem with that.

Just make sure that any proposal you write is strong: it provides a concrete explanation of what you want to do, some justification for why it would benefit the community (short or long term), and why you're the person qualified to do it.  Proposals should also include a set of milestones and expected dates for completing those milestones.


John Criswell


On 3 March 2015 at 23:12, John Criswell <jtcriswel <at>> wrote:
Dear Mingxing,

I think both projects are interesting and useful.

Points-to analysis is something that is needed by research users of LLVM, but to the best of my knowledge, no solid implementation currently exists (although the cfl-aa work being done at Google may provide us with something; you should check into it before writing a proposal).  My interest is in a points-to analysis that is robust and is useful to both research and industry users of LLVM.  A points-to analysis proposal must indicate how it will help both of these subsets of the LLVM community, and it must argue why current efforts do not meet the requirements of both subsets of the community.

The runtime bloat tool also looks interesting, and your approach (at least to me) is interesting.  One question in my mind, though, is whether dynamic slicing is going to work well.  Swarup Sahoo and I built a dynamic slicer for LLVM named Giri, and we found the tracing required for dynamic slicing to be slow.  For our purposes, the overhead was okay as we only needed to record execution until a crash (which happened quickly).  In your bloat tool, the program will probably run for awhile, creating a long trace record.  You should take a look at the Giri code, use it to trace some programs, and see if the overheads are going to be tolerable.  If they are not, then your first task would be to optimize Giri for your bloat tool.

You should also be more specific about which LLVM instructions will be traced.  For example, I wouldn't record the outputs of every LLVM instruction; I might only record the outputs of loads and stores or the end of a def-use chain.

I'd be interested in mentoring either project.

BTW, it looks like your FSE paper won an award.  Congrats.


John Criswell

On 3/3/15 2:30 AM, Mingxing Zhang wrote:
Hi all,

As a Ph.D. student majored in Software Reliability, I have used LLVM in many of my projects, such as the Anticipating Invariant ( and some other undergoing ones.
Thus, it would be a great pleasure for me if I could take this opportunity to contribute to this awesome project.

After reading the idea list (, I was most interested in the idea of improving the "Pointer and Alias Analysis" passes.
Could you please give me some more tips or advices on how to get started on working on the application?

Simultaneously, I also have another idea about using LLVM to detect runtime bloat, just like the ThreadSanitizer tool for data races.
If there is anyone here who would like to mentor this project, could you please find some time to review the more detailed proposal on gist and give me some feedbacks?

  I do prefer the bloat detection tool, but I'm not sure about whether it is suitable for GSoC.
  Thus I will apply for the Alias Analysis one if it is not suitable.


Mingxing Zhang

Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China

_______________________________________________ LLVM Developers mailing list LLVMdev <at>

-- John Criswell Assistant Professor Department of Computer Science, University of Rochester

Mingxing Zhang

Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China

-- John Criswell Assistant Professor Department of Computer Science, University of Rochester

Mingxing Zhang

Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China

Mingxing Zhang

Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China

-- John Criswell Assistant Professor Department of Computer Science, University of Rochester

Mingxing Zhang

Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China

Mingxing Zhang

Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China
LLVM Developers mailing list
LLVMdev <at>

Mingxing Zhang

Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China

Mingxing Zhang

Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China

-- John Criswell Assistant Professor Department of Computer Science, University of Rochester

Mingxing Zhang

Tel.: +86-10-62797143
Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China
LLVM Developers mailing list
LLVMdev <at>
Josh Klontz | 27 Mar 16:29 2015

Missed constant replacement opportunity with llvm.assume?

As of ToT it seems that even the simplest cases of assumes involving equality between an instruction and a constant don't cause an instruction RAUW constant.

In the attached example, %channels is assumed to be 3, leading to a missed optimization opportunity with %src_c.

Am I overlooking something that would cause this optimization to be invalid?


Attachment (convert_grayscale_u8SCXY_u8SCXY.ll): application/octet-stream, 4194 bytes
LLVM Developers mailing list
LLVMdev <at>
Martin J. O'Riordan | 27 Mar 16:00 2015

Contributing a new target to LLVM

Hi LLVM and CLang Devs,


At the moment my company (Movidius) is considering contributing the changes we have made to LLVM and CLang in order to support our proprietary processor, and I would like to seek advice on how best to approach doing this?  I am pretty sure that there are coding guidelines and conventions that we should be following but have not followed over the course of the last few years, and we will have to go through the process of preparing the sources so that they are suitable for pushing back to the LLVM and CLang community, though I expect that as a small team of 2 developers this is likely to take us several months.


Are there any existing documents that I should read to help us prepare our code so that it might be acceptable to the LLVM (and CLang) communities?  What guidelines are there for contributors submitting new targets to LLVM, how they should maintain them in the future, and how to ensure that other targets are not negatively impacted by the addition?


Thanks in advance,




Martin J. O’Riordan                Email:  Martin.ORiordan <at>

Compiler Development               Web:

Movidius Ltd.                      Skype:  moviMartinO

1st Floor,  O’Connell Bridge House,  d’Olier Street,  Dublin 2, Ireland


LLVM Developers mailing list
LLVMdev <at>

Proposal for GSOC : KCoFI

can anyone please review my proposal. I need suggestions on timeline on the PNaCL improvements and also its improvements.

Project Goals:
The primary objective of this project is to implement stronger SFI mechanism in existing kernel of KCoFI.The following are three broad improvements I aim to implement in KCoFI:
      1 .Implement a stronger call graph using libLTO tool.
      2.To replace KCoFI's SFI instrumehntation with that found in Portable Native Client (PNaCL).
CFI is a compiler based security mechanism that protects against the malicious programs that hijack the flow of control of the program[1]. KCoFI [2] is a security mechanism for operating system kernel that extends the CFI technique to os kernel. Thus KCoFI is a security mechanism that protects commodity operating systems from classical control- flow hijack attacks, return-to-user attacks, and code segment modification attacks.
KCoFI uses traditional label-based protection for programmed indirect jumps [1] but adds a thin run-time layer linked into the OS that protects some key OS data structures like thread stacks and monitors all low-level state manipulations performed by the OS.
KCoFI is an LLVM based kernel. In this project I aim to undertake the improvements in the KCoFI mechanism to make it stronger against ever growing future attacks.
Software Fault Isolation(SFI) is the act of separating distrusted extensions that are possibly faulty.
This project is organized as a two part project in which the first part aims to implement a stronger call graph and the second part integrates the SFI Instrumentation found in PNaCl(Portable Native Client) to KCoFI and replace the older SFI Instrumentations.
Portable Native Client extends that technology of sandboxing used by Native CLlent with architecture independence, letting developers compile their code once to run in any website and on any architecture with ahead-of-time (AOT) translation.
This project will make use of Software Fault Isolation(SFI) Instrumentations of PNaCL and integrate them with KCoFI while replacing the older SFI Instrumentations of KCoFI.
The following is the things to do in the project:
1. Implementing a stronger call graph: in this part of the project the FreeBSD kernel will be compiled using the libTO tool. This will involve writing patches that build to IR, use llvm-link to run LTO and then link the resulting binary. This project will involve delving further into the llvm bundle. It will requires modifying the CFI MachineFunctionPass to support multiple labels.
Since, KCoFI currently uses a really weak call-graph (all functions and call sites are equivalence-classed), thus after compiling the FreeBSD kernel with libLTO, first task to do is to improve the CFI instrumentation to enforce a more accurate call graph. This implementation will be done by using libLTO to compute a whole-kernel call graph and then using different CFI labels for different nodes in the call graph.
 It could be improved by using libLTO to compute a whole-kernel call graph and then using different CFI labels for different nodes in the call graph.

A second improvement would be to remove CFI labels that are unnecessary.  Any function that is only called directly does not need a CFI label.  Again, to make this optimization work, a whole-kernel analysis will be done via libLTO.
Another improvement to undertake is to implement Forward Edge Call graph[5].
2.Replace KCoFI's SFI instrumentation with the Instrumentation  found in Portable Native Client (PNaCL): 
The PNaCL implementation should be much better than KCoFI's. PNacl and NaCL both are open source.The SFI approach NaCl takes expects a single sandbox per process, which doesn't seem very suitable to kernel use. It can be made to support multiple sandboxes in the same address space, which is the work that I will undertake as a part of the project.
This project will require modifying the CFI MachineFunctionPass and using either the LLVM CallGraphAnalysis pass or the DSA CallGraph pass.  It will also require modification of the low-level KCoFI run-time library (i.e., the implementation of the SVA-OS instructions, as some of them need to do CFI checks).

Timeline and Roadmap:
Since it is a big project and I will be using the existing code of KCoFI I will be going ahead with the Iterative Enhancement model of Software Development Process
Week 1:Discussion with my mentor on documentation style and the code.
Week 2 to Week 3: Writing the patches that build to IR and use llvm- link to run LTO with FreeBSD
Week 4 to Week 6: Compiling the kernel with libLTO tool. In this week I will write the methods to build a strong call graph.
Week 7: Testing the call graphs with proper benchmarking.
Week 8 to Week 9: using the PNaCl and NaCL SFI techniques and implementing them in the kernel.
Week 10: using the NaCl to support multiple sandboxing in same address space for for multiple processes in an os kernel.
Week 11: testing the new sandboxing techniques together with the previous techniques of stronger call graph imlemntation with proper benchmarking of the compile time.
WEEK 12: Evaluating the performance of the improvements.
Criteria of Success:
1. Newer stronger call graph implementation. Evaluation done using proper benchmarking.
2.Implmentation of SFI Instrumentation of PNaCl,
Thus by the end of the summer improving the call graph, replacing the SFI instrumentation, and evaluating the performance will be the work that I will complete.
Brief Bio:
I am a third year undergraduate in Computer Science and Engineering. My interests lie in Computer Architecture and Operating System. I like working with the machinistic aspects of computer science. My rigorous programming experience has spanned across fields such as Database Management System, Operating Systems, Networking , Artificial Intelligence and Machine Learning. I see myself as a hardworking and sincere, at the same time passionate about building newer software. I also have experience programing the Microprocessor 8085 and 8086.
I am proficient in C and C++.
[1] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Control-flow integrity principles, implementations, and applications,” ACM Trans. Inf. Syst. Secur. , vol. 13, pp. 4:1–4:40, November 2009.
[2] KCoFI: Complete Control-Flow Integrity for Commodity Operating System Kernels
[3] M. Zhang and R. Sekar, “Control flow integrity for COTS binaries,” in Proceedings of the 22nd USENIX conference on Security , ser. SEC’13.
Berkeley, CA, USA: USENIX Association, 2013, pp. 337–352.
[4] J. Criswell, A. Lenharth, D. Dhurjati, and V. Adve, “Secure Virtual Architecture: A Safe Execution Environment for Commodity Operating Systems,” in Proc. ACM SIGOPS Symp. on Op. Sys. Principles
[5] Caroline Tice , Tom Roeder , Peter Collingbourne , Stephen Checkoway , Úlfar Erlingsson , Luis Lozano , Geoff Pike, Enforcing forward-edge control-flow integrity in GCC & LLVM, Proceedings of the 23rd USENIX conference on Security Symposium, p.941-955, August 20-22, 2014, San Diego, CA
Aditya Verma
Junior Undergraduate
IDD Computer Sc & Engg
IIT(BHU), Varanasi(UP)
LLVM Developers mailing list
LLVMdev <at>
Kenneth Adam Miller | 27 Mar 15:15 2015

SFI and Artificial Diversity

I read a lot of white papers, but is there not any open source implementation of SFI or artificial diversity? I google around, but I can't find anywhere anything regarding what I could openly download. In the same respect, I would also like to make an innovation proposal to create such an endeavor if there is not one already.
LLVM Developers mailing list
LLVMdev <at>
yao | 27 Mar 09:24 2015

Use the IR information to Modify the AST?


I am interested to use the IR information to modify the AST of the source code
But I am not sure whether the LLVM support or not.


LLVM Developers mailing list
LLVMdev <at>