Re: FFT on NEON
Hello Philip,
thank you for coming back on this subject.
I modified the library developed by Gregory Heckler, the source code is here:
http://github.com/gps-sdr/gps-sdr/tree/6153c01317f34a26b2fb41926505b9d97f764e90/objects
To give you an example, the DIT butterfly looks like this:
#define BUTTERFLY_FWD(_A, _B, _W) \
__asm__ ("LDR r0, [%0] \n\t" \
"LDR r2, [%1] \n\t" \
"MOV r3, #0 \n\t" \
"SHADD16 r0, r0, r3 \n\t" \
"SHADD16 r2, r2, r3 \n\t" \
"LDR r3, [%2] \n\t" \
"SMUADX r5, r2, r3 \n\t" \
"SMUSD r4, r2, r3 \n\t" \
"ADD r5, r5, #8192 \n\t" \
"ADD r4, r4, #8192 \n\t" \
"ASR r4, r4, #14 \n\t" \
"PKHBT r3, r4, r5, LSL #2 \n\t" \
"QSUB16 r2, r0, r3 \n\t" \
"QADD16 r0, r0, r3 \n\t" \
"STR r0, [%0] \n\t" \
"STR r2, [%1] \n\t" \
::"r" (_A), "r" (_B), "r" (_W) \
:"r0", "r2", "r3", "r4", "r5", "memory")
and just uses ARM assembly (NEON is complicated to use with this basic
radix2 implementation).
As user space, I am using the Angstrom image v0.92:
http://www.gumstix.net/overo-gm-images/v0.92/
on my Overo Water. I use the CodeSourcery 2009q1 free toolchain, even
though today I've been suggested to try something else by Koen.
Regards,
Michele
> Michele Bavaro wrote:
>> Hello everyone,
>>
>> I'm porting my software GPS receiver on the OMAP, therefore I need fast
>> signal processing libraries, and in particular FFTs.
>>
>> I have somehow adapted an open source library to do radix2 butterfly
>> using
>> ARM assembly. It works, but my 256 points fixed point 16 bit FFT still
>> takes about 60us. That's 12 times slower than 4.7us advertised with
>> NEON!
>
> What open source FFT library? You could try posting the code and seeing
> if anyone has any suggestions. (Post the code the Beagle list also,
> there are some good NEON people there)
>
>> Frustrated, I downloaded and compiled with the evaluation version of
>> RVCT
>> the openMAX libraries, but I don't manage to link the object file with
>> code compiled with the CodeSourcery gnu toolchain.
>
> What user space are you using? Angstrom or something else. You'll need
> to use a tool chain that matches your user space.
>
> Philip
>
>>
>> I tried to translate the assembly, but unfortunately it's a very
>> challenging task for me.
>>
>> Can someone point me in the right direction on this subject?
>> Should I keep working on my fixed point 16 bit FFT? Should I buy the ARM
>> toolchain and port all the software? Should I just give up and try using
>> the DSP maybe?
>>
>> Thank you in advance for any reply, and good luck with the OpenSDR,
>> which
>> I'm watching very closely.
>>
>> Cheers,
>> Michele
>>
>>
>>
>