Jack Howarth via llvm-dev | 24 May 17:07 2016

Undefined symbols in llvm-objdump linkage on x86_64-apple-darwin15

Is anyone else seeing a bootstrap failure on x86_64-apple-darwin15 in
current trunk?

[ 95%] Linking CXX executable ../../bin/llvm-objdump
Undefined symbols for architecture x86_64:
  "_xar_serialize", referenced from:
      DumpBitcodeSection(llvm::object::MachOObjectFile*, char const*,
unsigned int, bool, bool, bool, std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >) in
MachODump.cpp.o
  "_xar_file_first", referenced from:
      DumpBitcodeSection(llvm::object::MachOObjectFile*, char const*,
unsigned int, bool, bool, bool, std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >) in
MachODump.cpp.o
  "_xar_iter_new", referenced from:
      DumpBitcodeSection(llvm::object::MachOObjectFile*, char const*,
unsigned int, bool, bool, bool, std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >) in
MachODump.cpp.o
  "_xar_prop_first", referenced from:
      DumpBitcodeSection(llvm::object::MachOObjectFile*, char const*,
unsigned int, bool, bool, bool, std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >) in
MachODump.cpp.o
  "_xar_extract_tobuffersz", referenced from:
      DumpBitcodeSection(llvm::object::MachOObjectFile*, char const*,
unsigned int, bool, bool, bool, std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >) in
MachODump.cpp.o
(Continue reading)

Liveness of AL, AH and AX in x86 backend

I'm trying to see how the x86 backend deals with the relationship 
between AL, AH and AX, but I can't get it to generate any code that 
would expose an interesting scenario.

For example, I wrote this piece:

typedef struct {
   char x, y;
} struct_t;

struct_t z;

struct_t foo(char *p) {
   struct_t s;
   s.x = *p++;
   s.y = *p;
   z = s;
   s.x++;
   return s;
}

But the output at -O2 is

foo:                                    #  <at> foo
         .cfi_startproc
# BB#0:                                 # %entry
         movb    (%rdi), %al
         movzbl  1(%rdi), %ecx
         movb    %al, z(%rip)
         movb    %cl, z+1(%rip)
(Continue reading)

RFC: FileCheck Enhancements

Hi everyone,

 

There was idea to add new directives to FileCheck:

1.       Directive to use some patterns as named template with or without parameters.

2.       CHECK-INCLUDE - Directive to include other file with checks to another.

3.       Expressions repeat  for CHECK - If statement should be checked several times repeat modifiers {n}, {n,m} , {,n}, {n,}, *, + can be used.

4.       Repeat in regexs - Repeat with current number should become available by using {n}, {n,m} , {,n}, {n,}

5.       CHECK-LABEL-DAG - Not sequential order of labels.

6.       Check statement for words only - // CHECK-WORD, // CHECK-WORD-NEXT, // CHECK-WORD-SAME, // CHECK-WORD-DAG, // CHECK-WORD-NOT.

7.       Wildcard for prefixes - If some statements should be checked regardless prefix, it should be used //{{*}}, //{{*}}-NEXT, //{{*}}-SAME and etc.

8.       Prefix with regular expressions - If statement should be checked if prefix matches some regular expression, it should be used {{regex}}:, {{regex}}-NEXT  and etc.

 

More information in file https://docs.google.com/document/d/1wAKNzU7-S2EeK1-aADwgP8dEiKfByKNazonybCQW3zs/edit?usp=sharing.

 

Now we have prototype with these features. It’s tested on LLVM 3.8.

There was found unsupported before directive in old test. Bug about this - https://llvm.org/bugs/show_bug.cgi?id=27852.

 

There is about 6% slowdown with new features when we tested them on 3.8.

 

I see that there are some changes in FileCheck LLVM 3.9 with new features too. We can publish patch for 3.8 and it can be adapted for LLVM 3.9. Is it interesting for anyone? And how will be better to publish patch as for 3.8 or for 3.9?

 

Thanks,

Elena.

_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Romaric Jodin via llvm-dev | 24 May 11:39 2016

BitcodeReader non explicit error

Hi,

I'm working on OpenCL and I'm using clang as compiler (based on clang 3.7.0).
I have a issue, I'm generating a bitcode file (that I can print before before the generation). But when I'm
trying to read it again with clang, I have this issue:

"error: Invalid record"

How can I managed to know where it comes from?

Thank you,
Romaric

Here is what is print before the generation of the bitcode:

##################################################################################################################################
##################################################################################################################################

; ModuleID = '/nfs/home/.cache/pocl/kcache/temp_sTRQB2.cl'                                                                                                                                                                              
target datalayout = "e-p:32:32-i64:64-n8:16:32:64-S64"                                                                                                                                                                                         
target triple = "k1b---k1bdp"                                                                                                                                                                                                                  

%opencl.event_t = type opaque                                                                                                                                                                                                                  

 <at> vector_add.async_buffer = internal addrspace(2) global [4 x double] undef, align 8                                                                                                                                                            

; Function Attrs: nounwind                                                                                                                                                                                                                     
define void  <at> vector_add(double addrspace(3)* nocapture readonly %a, double addrspace(3)* nocapture
readonly %b, double addrspace(1)* %c, i32 %n) #0 {                                                                                          
  %1 = tail call i32  <at> _Z13get_global_idj(i32 0) #2                                                                                                                                                                                             
  %2 = mul i32 %1, %n                                                                                                                                                                                                                          
  %3 = add i32 %2, %n                                                                                                                                                                                                                          
  %4 = icmp ult i32 %2, %3                                                                                                                                                                                                                     
  br i1 %4, label %.lr.ph3, label %._crit_edge                                                                                                                                                                                                 

.loopexit:                                        ; preds = %12, %.lr.ph3                                                                                                                                                                      
  %5 = icmp ult i32 %7, %3                                                                                                                                                                                                                     
  br i1 %5, label %.lr.ph3, label %._crit_edge                                                                                                                                                                                                 

.lr.ph3:                                          ; preds = %0, %.loopexit                                                                                                                                                                     
  %i.02 = phi i32 [ %7, %.loopexit ], [ %2, %0 ]                                                                                                                                                                                               
  %6 = tail call %opencl.event_t*
 <at> _Z21async_work_group_copyPU7CLlocaldPKU8CLglobaldj9ocl_event(double addrspace(2)*
getelementptr inbounds ([4 x double], [4 x double] addrspace(2)*  <at> vector_add.async_buffer, i32 0, i32
0), double addrspac\
e(1)* %c, i32 4, %opencl.event_t* undef) #2                                                                                                                                                                                                    
  %7 = add i32 %i.02, 4                                                                                                                                                                                                                        
  %8 = icmp ult i32 %i.02, -4                                                                                                                                                                                                                  
  br i1 %8, label %.lr.ph, label %.loopexit                                                                                                                                                                                                    

.lr.ph:                                           ; preds = %.lr.ph3                                                                                                                                                                           
  %9 = getelementptr inbounds double, double addrspace(3)* %a, i32 %i.02                                                                                                                                                                       
  %10 = getelementptr inbounds double, double addrspace(3)* %b, i32 %i.02                                                                                                                                                                      
  %11 = getelementptr inbounds double, double addrspace(1)* %c, i32 %i.02                                                                                                                                                                      
  br label %12                                                                                                                                                                                                                                 

; <label>:12                                      ; preds = %.lr.ph, %12                                                                                                                                                                       
  %j.01 = phi i32 [ %i.02, %.lr.ph ], [ %16, %12 ]                                                                                                                                                                                             
  %13 = load double, double addrspace(3)* %9, align 8, !tbaa !10                                                                                                                                                                               
  %14 = load double, double addrspace(3)* %10, align 8, !tbaa !10                                                                                                                                                                              
  %15 = fadd double %13, %14                                                                                                                                                                                                                   
  store double %15, double addrspace(1)* %11, align 8, !tbaa !10                                                                                                                                                                               
  %16 = add i32 %j.01, 1                                                                                                                                                                                                                       
  %17 = icmp ult i32 %16, %7                                                                                                                                                                                                                   
  br i1 %17, label %12, label %.loopexit                                                                                                                                                                                                       

._crit_edge:                                      ; preds = %.loopexit, %0                                                                                                                                                                     
  ret void                                                                                                                                                                                                                                     
}                                                                                                                                                                                                                                              

declare i32  <at> _Z13get_global_idj(i32) #1                                                                                                                                                                                                        

declare %opencl.event_t*
 <at> _Z21async_work_group_copyPU7CLlocaldPKU8CLglobaldj9ocl_event(double addrspace(2)*, double
addrspace(1)*, i32, %opencl.event_t*) #1                                                                                   

; Function Attrs: nounwind                                                                                                                                                                                                                     
define void  <at> vector_sub(double addrspace(3)* nocapture readonly %a, double addrspace(3)* nocapture
readonly %b, double addrspace(1)* nocapture %c, i32 %n) #0 {                                                                                
  %1 = tail call i32  <at> _Z13get_global_idj(i32 0) #2                                                                                                                                                                                             
  %2 = mul i32 %1, %n                                                                                                                                                                                                                          
  %3 = add i32 %2, %n                                                                                                                                                                                                                          
  %4 = icmp ult i32 %2, %3                                                                                                                                                                                                                     
  br i1 %4, label %.lr.ph, label %._crit_edge                                                                                                                                                                                                  

.lr.ph:                                           ; preds = %0                                                                                                                                                                                 
  %5 = add i32 %1, 1                                                                                                                                                                                                                           
  %6 = mul i32 %5, %n                                                                                                                                                                                                                          
  br label %7

; <label>:7                                       ; preds = %7, %.lr.ph                                                                                                                                                                        
  %i.01 = phi i32 [ %2, %.lr.ph ], [ %14, %7 ]                                                                                                                                                                                                 
  %8 = getelementptr inbounds double, double addrspace(3)* %a, i32 %i.01                                                                                                                                                                       
  %9 = load double, double addrspace(3)* %8, align 8, !tbaa !10                                                                                                                                                                                
  %10 = getelementptr inbounds double, double addrspace(3)* %b, i32 %i.01                                                                                                                                                                      
  %11 = load double, double addrspace(3)* %10, align 8, !tbaa !10                                                                                                                                                                              
  %12 = fsub double %9, %11                                                                                                                                                                                                                    
  %13 = getelementptr inbounds double, double addrspace(1)* %c, i32 %i.01                                                                                                                                                                      
  store double %12, double addrspace(1)* %13, align 8, !tbaa !10                                                                                                                                                                               
  %14 = add nuw i32 %i.01, 1                                                                                                                                                                                                                   
  %exitcond = icmp eq i32 %14, %6                                                                                                                                                                                                              
  br i1 %exitcond, label %._crit_edge, label %7                                                                                                                                                                                                

._crit_edge:                                      ; preds = %7, %0                                                                                                                                                                             
  ret void                                                                                                                                                                                                                                     
}                                                                                                                                                                                                                                              

; Function Attrs: nounwind                                                                                                                                                                                                                     
define void  <at> vector_mult(double addrspace(3)* nocapture readonly %a, double addrspace(3)* nocapture
readonly %b, double addrspace(1)* nocapture %c, i32 %n) #0 {                                                                               
  %1 = tail call i32  <at> _Z13get_global_idj(i32 0) #2                                                                                                                                                                                             
  %2 = mul i32 %1, %n                                                                                                                                                                                                                          
  %3 = add i32 %2, %n                                                                                                                                                                                                                          
  %4 = icmp ult i32 %2, %3                                                                                                                                                                                                                     
  br i1 %4, label %.lr.ph, label %._crit_edge                                                                                                                                                                                                  

.lr.ph:                                           ; preds = %0                                                                                                                                                                                 
  %5 = add i32 %1, 1                                                                                                                                                                                                                           
  %6 = mul i32 %5, %n                                                                                                                                                                                                                          
  br label %7                                                                                                                                                                                                                                  

; <label>:7                                       ; preds = %7, %.lr.ph                                                                                                                                                                        
  %i.01 = phi i32 [ %2, %.lr.ph ], [ %14, %7 ]                                                                                                                                                                                                 
  %8 = getelementptr inbounds double, double addrspace(3)* %a, i32 %i.01                                                                                                                                                                       
  %9 = load double, double addrspace(3)* %8, align 8, !tbaa !10                                                                                                                                                                                
  %10 = getelementptr inbounds double, double addrspace(3)* %b, i32 %i.01                                                                                                                                                                      
  %11 = load double, double addrspace(3)* %10, align 8, !tbaa !10                                                                                                                                                                              
  %12 = fmul double %9, %11                                                                                                                                                                                                                    
  %13 = getelementptr inbounds double, double addrspace(1)* %c, i32 %i.01                                                                                                                                                                      
  store double %12, double addrspace(1)* %13, align 8, !tbaa !10                                                                                                                                                                               
  %14 = add nuw i32 %i.01, 1                                                                                                                                                                                                                   
  %exitcond = icmp eq i32 %14, %6                                                                                                                                                                                                              
  br i1 %exitcond, label %._crit_edge, label %7                                                                                                                                                                                                

._crit_edge:                                      ; preds = %7, %0                                                                                                                                                                             
  ret void                                                                                                                                                                                                                                     
}

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false"
"no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false"
"stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-\
float"="false" }                                                                                                                                                                                                                               
attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false"
"no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false"
"stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="f\
alse" }                                                                                                                                                                                                                                        
attributes #2 = { nobuiltin nounwind }                                                                                                                                                                                                         

!opencl.kernels = !{!0, !7, !8}                                                                                                                                                                                                                
!llvm.ident = !{!9}                                                                                                                                                                                                                            

!0 = !{void (double addrspace(3)*, double addrspace(3)*, double addrspace(1)*, i32)*  <at> vector_add, !1,
!2, !3, !4, !5, !6}                                                                                                                      
!1 = !{!"kernel_arg_addr_space", i32 3, i32 3, i32 1, i32 0}                                                                                                                                                                                   
!2 = !{!"kernel_arg_access_qual", !"none", !"none", !"none", !"none"}                                                                                                                                                                          
!3 = !{!"kernel_arg_type", !"double*", !"double*", !"double*", !"uint"}                                                                                                                                                                        
!4 = !{!"kernel_arg_base_type", !"double*", !"double*", !"double*", !"uint"}                                                                                                                                                                   
!5 = !{!"kernel_arg_type_qual", !"const", !"const", !"", !"const"}                                                                                                                                                                             
!6 = !{!"kernel_arg_name", !"a", !"b", !"c", !"n"}                                                                                                                                                                                             
!7 = !{void (double addrspace(3)*, double addrspace(3)*, double addrspace(1)*, i32)*  <at> vector_sub, !1,
!2, !3, !4, !5, !6}                                                                                                                      
!8 = !{void (double addrspace(3)*, double addrspace(3)*, double addrspace(1)*, i32)*  <at> vector_mult,
!1, !2, !3, !4, !5, !6}                                                                                                                     
!9 = !{!"Kalray clang version 3.7.0  (based on LLVM 3.7.0)"}                                                                                                                                                                                   
!10 = !{!11, !11, i64 0}                                                                                                                                                                                                                       
!11 = !{!"double", !12, i64 0}                                                                                                                                                                                                                 
!12 = !{!"omnipotent char", !13, i64 0}                                                                                                                                                                                                        
!13 = !{!"Simple C/C++ TBAA"}

##################################################################################################################################
##################################################################################################################################
_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Sean Silva via llvm-dev | 24 May 05:23 2016

The state of IRPGO (3 remaining work items)

Jake and I have been integrating IRPGO on PS4, and we've identified 3 remaining work items.


- Driver changes

We'd like to make IRPGO the default on PS4. We also think that it would be beneficial to make IRPGO the default PGO on all platforms (coverage would continue to use FE instr as it does currently, of course). In previous conversations (e.g. http://reviews.llvm.org/D15829) it has come up that Apple have requirements that would prevent them from moving to IRPGO as the default PGO, at least without a deprecation period of one or two releases.

I'd like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all platforms except Apple platforms. I really don't like fragmenting things like this (e.g. if a third-party tests "clang's" PGO they will get something different depending on the platform), but I don't see another way given Apple's constraints.


- Pre-instrumentation passes

Pre-instrumentation optimization has been critical for reducing the overhead of PGO for the PS4 games we tested (as expected). However, in our measurements (and we are glad to provide more info) the main benefit was inlining (also as expected). A simple pass of inlining at threshold 100 appeared to give all the benefits. Even inlining at threshold 0 gave almost all the benefits. For example, the passes initially proposed in http://reviews.llvm.org/D15828 did not improve over just inlining with threshold 100.

(due to PR27299 we also need to add simplifycfg after inlining to clean up, but this doesn't affect the instrumentation overhead in our measurements)

Bottom line: for our use cases, inlining does all the work, but we're not opposed to having more passes, which might be beneficial for non-game workloads (which is most code).


- Warnings

We identified 3 classes of issues which manifest as spammy warnings when applying profile data with IRPGO (these affect FEPGO also I believe, but we looked in depth at IRPGO):

1. The main concerning one is that getPGOFuncName mangles the filename into the counter name. This causes us to get instrprof_error::unknown_function when the pgo-use build is done in a different build directory from the training build (which is a reasonable thing to support). In this situation, PGO data is useless for all `static` functions (and as a byproduct results in a huge volume of warnings).

2. In different TU's, pre-instr inlining might make different inlining decisions (for example, different functions may be available for inlining), causing hash mismatch errors (instrprof_error::hash_mismatch). In building a large game, we only saw 8 instance of this, so it is not as severe as 1, but would be good to fix.

3. A .cpp file may be compiled and put into an archive, but then not selected by the linker and will therefore not result in a counter in the profraw. When compiling this file with pgo-use, instrprof_error::unknown_function will result and a warning will be emitted.

Case 1 can be fixed using a function hash or other unique identifier instead of a file path. David, in D20195 you mentioned that Rong was working on a patch that would fix 2; we are looking forward to that.

For 3, I unfortunately do not know of any solution. I don't think there is a way for us to make this warning reliable in the face of this circumstance. So my conclusion is that instrprof_error::unknown_function at least must be defaulted to off unfortunately.

-- Sean Silva
_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

difference between built-in and inline assembly

I noticed intrinsics are implemented using a builtin construct. Can someone explain the difference buitin and an inline assembly? Is there a preference one over the other?

--
Rail Shafigulin
Software Engineer
Esencia Technologies
_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hal Finkel via llvm-dev | 23 May 21:43 2016

Re: sum elements in the vector

Hi Chandler,

Regardless of the canonical form we choose, we need code to match non-canonical associated shuffle
sequences and convert them into the canonical form. We also need code to match the pattern where we
extractelement on all elements and sum them into this canonical form. This code needs to exist somewhere,
so we need to decide whether it exists in the frontend or the backend.

Having an intrinsic is obviously smaller, in terms of IR memory overhead, than these instructions.
However, I'm not sure how many passes we'll need to teach about the new intrinsic. Obviously there are many
passes that understand integer addition, but likely many fewer would really learn anything useful by
looking through the reduction. We would need add code into InstCombine in order to pull apart the
reduction intrinsic when we learn that the vector has only one contributing element.

In short, I don't have a strong opinion on this, because we need the matching code somewhere regardless.
Using the intrinsic means not matching multiple times, but it means adding extra code to handle the
intrinsic. Regarding issues such as idiom recognition, use by the SLP vectorizer, etc. these seem
independent of whether the canonical form is an intrinsic or a composite, and I don't think it makes the
vectorizer cost model easier one way or the other.

In any case, we now have relevant pattern-matching code in SDAGBuilder (although it is currently somewhat
specific to reductions after loops), so we already have a better infrastructure to help backends with the
shuffle-matching problem. There is a corresponding 'VectorReduction' SDNode flag.

 -Hal

----- Original Message -----
> From: "Chandler Carruth" <chandlerc <at> gmail.com>
> To: "Asghar-ahmad Shahid" <Asghar-ahmad.Shahid <at> amd.com>, "Rail Shafigulin"
<rail <at> esenciatech.com>, "llvm-dev"
> <llvm-dev <at> lists.llvm.org>, "Hal Finkel" <hfinkel <at> anl.gov>
> Sent: Sunday, May 15, 2016 8:15:37 PM
> Subject: Re: [llvm-dev] sum elements in the vector
> 
> 
> I'm starting to think we should directly implement horizontal
> operations on vector types.
> 
> 
> 
> My suspicion is that coming up with a nice model for this would help
> us a lot with things like:
> - Idiom recognition of reduction patterns that use horizontal
> arithmetic
> - Ability to use horizontal operations in SLPVectorizer
> - Significantly easier cost modeling of vectorizing loops with
> reductions in LoopVectorize
> - Other things I've not thought of?
> 
> Curious what others think?
> 
> 
> -Chandler
> 
> 
> On Wed, May 11, 2016 at 10:07 PM Shahid, Asghar-ahmad via llvm-dev <
> llvm-dev <at> lists.llvm.org > wrote:
> 
> 
> 
> 
> 
> 
> > why in order to add this particular instruction (sum elements in a
> > vector) I need to add an insrinsic?
> 
> 
> 
> Adding intrinsic is not the only way, it is one of the way and user
> WILL-NOT be required to invoke
> 
> It specifically.
> 
> 
> 
> Currently LLVM does not have any instruction to directly represent
> “sum of elements in a vector” and
> 
> generate your particular instruction.However, you can do it without
> intrinsic by pattern matching the
> 
> LLVM-IRs representing “sum of elements in vector” to your particular
> instruction in DAGCombiner.
> 
> 
> 
> Regards,
> 
> Shahid
> 
> 
> 
> 
> 
> 
> 
> 
> From: Rail Shafigulin [mailto: rail <at> esenciatech.com ]
> Sent: Monday, May 09, 2016 11:59 PM
> To: Shahid, Asghar-ahmad; llvm-dev
> Cc: Das, Dibyendu
> 
> 
> 
> 
> 
> 
> 
> Subject: Re: [llvm-dev] sum elements in the vector
> 
> 
> 
> 
> 
> 
> 
> I'm a little confused. Here is why.
> 
> 
> 
> 
> 
> I was able to add a vector add instruction to my target without using
> any intrinsics and without adding any new instructions to LLVM. So
> here is my question: how come I managed to add a new vector
> instruction without adding an intrinsic and why in order to add this
> particular instruction (sum elements in a vector) I need to add an
> insrinsic?
> 
> 
> 
> 
> 
> Another question that I have is whether compiler will be able to
> target this new instruction (sum elements in a vector) if it is
> implemented as an intrinsic or the user will have to specifically
> invoke an instrinsic.
> 
> 
> 
> 
> 
> Pardon if questions seem dumb, I'm still learning things.
> 
> 
> 
> 
> 
> Any help is appreciated.
> 
> 
> 
> 
> 
> On Fri, May 6, 2016 at 1:51 PM, Rail Shafigulin <
> rail <at> esenciatech.com > wrote:
> 
> 
> Thanks for the reply. These steps will add an instruction as an
> intrinsic. Is it possible to add an actual new instruction so that a
> compiler could target it during an optimization? How hard is it to
> do it? Is that a realistic objective.
> 
> 
> 
> 
> 
> Rail
> 
> 
> 
> 
> 
> 
> 
> On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad <
> Asghar-ahmad.Shahid <at> amd.com > wrote:
> 
> 
> 
> Hi Rail,
> 
> 
> 
> We had done this for generation of X86 PSAD (sum of absolute
> difference) instruction through
> 
> Llvm intrinsic. Doing this requires following
> 
> 1. Define an intrinsic, xyz(), for the required instruction and
> corresponding SDNode
> 
> 2. Generate the “call xyz() “ IR based the matched pattern
> 
> 3. Map “call xyz()” IR to corresponding SDNode in
> SelectionDagBuilder.cpp
> 
> 4. Provide default expansion of the xyz() intrinsic
> 
> 5. Legalize type and/or operation
> 
> 6. Provide Lowering of intrinsic/SDNode to generate your target
> instruction
> 
> 
> 
> You can visit http://llvm.org/docs/ExtendingLLVM.html for details.
> 
> 
> 
> Regards,
> 
> Shahid
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: llvm-dev [mailto: llvm-dev-bounces <at> lists.llvm.org ] On Behalf
> Of Rail Shafigulin via llvm-dev
> Sent: Monday, April 04, 2016 11:00 PM
> To: Das, Dibyendu
> Cc: llvm-dev <at> lists.llvm.org
> Subject: Re: [llvm-dev] sum elements in the vector
> 
> 
> 
> 
> 
> 
> Thanks for the pointers. I looked at hadd instructions. They seem to
> do very similar to what I need. Unfortunately as I said before my
> LLVM experience is limited. My understanding is that when I create a
> new type of SDNode I need to specify a pattern for it, so that when
> LLVM is analyzing the code and is seeing a given pattern it would
> create this particular node. I'm really struggling to understand how
> it is done. So here are the problems that I'm having.
> 
> 
> 
> 
> 
> 1. How do I identify that pattern that should be used?
> 
> 
> 2. How do I specify a given pattern?
> 
> 
> 
> 
> 
> Do you (or someone else) mind helping me out?
> 
> 
> 
> 
> 
> Any help is appreciated.
> 
> 
> 
> 
> 
> On Mon, Apr 4, 2016 at 9:59 AM, Das, Dibyendu < Dibyendu.Das <at> amd.com
> > wrote:
> 
> 
> 
> This is roughly along the lines of x86 hadd* instructions though the
> semantics of hadd* may not exactly match what you are looking for.
> This is probably more in line with x86/ARM SAD-like instructions but
> I don’t think llvm generates SAD without intrinsics.
> 
> 
> 
> From: llvm-dev [mailto: llvm-dev-bounces <at> lists.llvm.org ] On Behalf
> Of Rail Shafigulin via llvm-dev
> Sent: Monday, April 04, 2016 9:34 AM
> To: llvm-dev < llvm-dev <at> lists.llvm.org >
> Subject: [llvm-dev] sum elements in the vector
> 
> 
> 
> 
> My target has an instruction that adds up all elements in the vector
> and stores the result in a register. I'm trying to implement it in
> my compiler but I'm not sure even where to start.
> 
> 
> 
> 
> 
> 
> 
> I did look at other targets, but they don't seem to have anything
> like it ( I could be wrong. My experience with LLVM is limited, so
> if I missed it, I'd appreciate if someone could point it out ).
> 
> 
> 
> 
> 
> My understanding is that if SDNode for such an instruction doesn't
> exist I have to define one. Unfortunately, I don't know how to do
> it. I don't even know where to start looking. Would someone care to
> point me in the right direction?
> 
> 
> 
> 
> 
> Any help is appreciated.
> 
> 
> 
> 
> 
> --
> 
> 
> 
> 
> 
> 
> Rail Shafigulin
> 
> Software Engineer
> Esencia Technologies
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> 
> 
> 
> 
> 
> Rail Shafigulin
> 
> Software Engineer
> Esencia Technologies
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> 
> 
> 
> 
> 
> Rail Shafigulin
> 
> Software Engineer
> Esencia Technologies
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> 
> 
> 
> 
> 
> Rail Shafigulin
> 
> Software Engineer
> Esencia Technologies _______________________________________________
> LLVM Developers mailing list
> llvm-dev <at> lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

--

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Code owner for MSP430 target?

Who is a code owner for MSP430 target? I know that a lot of work on this target is done by Anton Korobeynikov
(aka asl) but he is not listed as a code owner in CODE_OWNERS.TXT. I would like to get my D20162 reviewed, but
I don't know who I can add as a reviewer.
_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Alex Bradbury via llvm-dev | 23 May 12:55 2016

LLVM Weekly - #125, May 23rd 2016

LLVM Weekly - #125, May 23rd 2016
=================================

If you prefer, you can read a HTML version of this email at
<http://llvmweekly.org/issue/125>.

Welcome to the one hundred and twenty-fifth issue of LLVM Weekly, a weekly
newsletter (published every Monday) covering developments in LLVM, Clang, and
related projects. LLVM Weekly is brought to you by [Alex
Bradbury](http://asbradbury.org). Subscribe to future issues at
<http://llvmweekly.org> and pass it on to anyone else you think may be
interested. Please send any tips or feedback to <asb <at> asbradbury.org>, or
 <at> llvmweekly or  <at> asbradbury on Twitter.

## News and articles from around the web

Steve Ire has written a blog post about [using Clang through the cindex API to
automatically generate Python
bindings](https://steveire.wordpress.com/2016/05/18/generating-python-bindings-with-clang/).
He also makes use of
[SIP](https://www.riverbankcomputing.com/software/sip/intro).

Krister Walfridsson has written a wonderfully clear post on [C's type-based
aliasing
rules](http://kristerw.blogspot.co.uk/2016/05/type-based-aliasing-in-c.html).

This week I discovered the [Swift Weekly Brief
newsletter](http://swiftweekly.github.io/). Its author, Jesse Squires does a
wonderful job of summarising mailing list traffic, recent commits, and
discussions on swift-evolution proposals. If you have an interest in Swift
development or language design in general I highly recommend it.

Are you interested in [writing for the LLVM
blog](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099807.html)? Or
volunteering to help recruit content authors? If so, get in touch with Tanya.

The next Cambridge LLVM Social will be held [at 7.30pm on May 25th at the
Cambridge
Blue](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099870.html).

## On the mailing lists

* Elena Demikhovsky is interested in [extending scalar evolution (SCEV)
analysis to include floating point
support](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099724.html). This
kicked off a pretty interesting discussion. Sanjoy Das highlighted what he
sees as the [most important issues to
discuss](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099743.html). A
number of follow-ups discussed whether enough code uses floating point values
as an induction variable to be worth optimising. There was also the question
of [should vectorisation be pursued at any
cost?](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099819.html). Even if
a loop can be made vectorisable through loop-versioning with run-time checks,
is it worth the code size? Is the [cost of maintaining the compiler
code](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099814.html)
worthwhile? Hideki Saito posted a useful [summary of the discussion so
far](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099928.html).

* Chandler Carruth is looking for feedback on the idea of [supporting
horizontal operations on vector types such as sum directly in LLVM
IR](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099715.html). Everyone
who has responsed so far is in favour.

* Jia Chen, GSoC student with LLVM, has noted the CFL-AA pass seems to be
mostly working now and would [appreciate reports from people trying it out on
their
codebases](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099742.html).So
far, Geoff Berry
[reports](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099900.html) no
correctness issues but seemingly very limited changes in the generated code
for SPEC and the LLVM test-suite.

* Adam Nemet is seeking feedback on the idea of [adding optimisation remarks
to indicate where non-temporal stores may be
profitable](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099873.html).

* Quentin Colombet has [summarised recent discussion on policies to help
release
management](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099752.html) and
detailed the automatic hooks he hopes to explore next for updating bugs when
referenced in a commit message. The following discussion looked at how these
hooks may be implemented and what level of rigidity would be most beneficial
to the community.

* Dean Michael Berris is looking for a way of [defining a default
implementation for a
pseudo-instruction](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099759.html).
No answers yet, but hopefully that will change soon!

* Galena Kistanova is doing some cleanup work on zorg (the buildbot-based
testing infrastructure of the LLVM project) and is interested whether [anyone
uses these seemingly stale
modules](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099739.html).

## LLVM commits

* llc will now report all errors in the input file rather than just exiting
after the first. [r269655](http://reviews.llvm.org/rL269655).

* The SPARC backend gained support for soft floating point.
[r269892](http://reviews.llvm.org/rL269892).

* `Reloc::Default` no longer exists. Instead, `Optional<Reloc>` is used.
[r269988](http://reviews.llvm.org/rL269988).

* An initial implementation of a "guard widening" pass has been committed.
This will combine multiple guards to reduce the number of checks at runtime.
[r269997](http://reviews.llvm.org/rL269997).

## Clang commits

* clang-include-fixer gained a basic Vim integration.
[r269927](http://reviews.llvm.org/rL269927).

* The intrinsics headers now have feature guards enabled in Microsoft mode to
combat the compile-time regression discussed last week due to their increased
size. [r269675](http://reviews.llvm.org/rL269675).

* avxintrin.h gained many new Doxygen comments.
[r269718](http://reviews.llvm.org/rL269718).

## Other project commits

* lld now lets you specify a subset of passes to run in LTO.
[r269605](http://reviews.llvm.org/rL269605).

* LLDB has replaced uses of its own Mutex class with `std::mutex`.
[r269877](http://reviews.llvm.org/rL269877),
[r270024](http://reviews.llvm.org/rL270024).
_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Irini via llvm-dev | 23 May 12:51 2016

Andersens analysis ?

Hi all,

I was trying to find the equivalent analysis of Andersens on LLVM.
I found it only on LLVM 2.6 on 'Analysis/IPA' folder.
Is it removed/renamed on later versions? I'm mostly interested in 3.4 version or later.

Thank you in advance,
-- Irini
_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Johan Engelen via llvm-dev | 23 May 12:15 2016

Update frequency of the LLVM APT repositories

Hi all,
  The LDC Travis CI builds depend on the llvm.org APT repositories and have started to fail regularly while trying to obtain the packages from llvm.org/apt. The packages are sometimes unavailable. For example at the moment, llvm-3.9_3.9~svn270380-1~exp1_i386.deb is available but not llvm-3.9_3.9~svn270380-1~exp1_amd64.deb, and so the build is broken.

Could this be related to the (afaik) recent change in update frequency? I thought that before the packages were updated once a week, but now they are updated once a day (or even more frequent). For us, this high update frequency is not needed and updating once a week would be much nicer (we try to keep a manually-built Windows LLVM package in-sync with the APT revision).

Another cause of the problem may be that the APT packages seem to have changed from amd64 to i386.

Thanks a lot for helping to resolve the Travis failures,
  Johan
_______________________________________________
LLVM Developers mailing list
llvm-dev <at> lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Gmane