Skip to content

Mixed-Radix Fast Fourier Transform is a lightweight, pure C Fast Fourier Transform (FFT) library designed for efficient signal processing across diverse input sizes. It implements a mixed-radix algorithm for lengths factorable by small primes (e.g., 2, 3, 4, 5, 7, 8) and Bluestein’s algorithm for arbitrary lengths.

License

Notifications You must be signed in to change notification settings

Tugbars/Mixed-Radix-Fast-Fourier-Transform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HighSpeedFFT

A high-performance mixed-radix FFT library in C with full support for complex and real transforms, AVX2/FMA vectorization, and Bluestein’s algorithm for arbitrary lengths. Designed as a drop-in alternative to FFTW with minimal dependencies.


Features

  • Mixed-Radix DIT for lengths factorable into small primes (2, 3, 4, 5, 7, 8, 11, 13, …)

    • Radix-3, 5, 11 now with fully tested AVX2 + SSE2 + scalar codepaths

      • Vector core processes 4 points per iteration
      • SSE2 tails handle 1–2 leftover samples
      • Scalar tail covers 0..3 leftovers or non-SIMD builds
      • Explicit prefetching of outputs and twiddles for cache-friendliness
      • FMADD/FMSUB macros used to leverage FMA when available
      • Error handling for insufficient scratch or missing precomputed twiddles
  • Bluestein’s Algorithm for arbitrary (prime) lengths via chirp z-transform

  • AVX2 + FMA SIMD acceleration

    • Common butterfly math factored into helpers (e.g. cmul_soa_avx, rot90_soa_avx)
    • Works with unaligned data (_mm256_loadu_pd, _mm256_storeu_pd)
    • Prefetching added to hide latency in deeper radices
  • SSE2 acceleration

    • Two-lane fallback for tails in radices 3, 5, 11
    • Safe unaligned loads/stores (_mm_loadu_pd / _mm_storeu_pd)
  • Robust scratch-buffer model

    • No per-call malloc — all recursion stages and local twiddles use a preallocated scratch buffer
    • Layout partitioned: sub-FFT outputs + (optional) stage twiddles
    • Prints an error and aborts stage if alignment or sizing is invalid
  • Precomputed or on-the-fly twiddles

    • Radices 2, 3, 4, 5, 7, 8, 11, 13 supported
    • If no precompute is provided, stage-local twiddles are generated in the scratch tail
  • Real ↔ Complex FFTs via half-complex storage

  • Minimal footprint (~25 KiB code, no giant tables)

  • CMake build system with optional GoogleTest harness

  • Comprehensive test suite

    • Round-trip forward/backward FFTs
    • Radix-3 and 5 vector/scalar tails
    • Radix-11 special-case coverage (k=0,1 SSE2 lanes)
    • Error conditions (invalid N, missing scratch, bad alignment)
    • MSE checks against reference FFTW within 1e-10

Comparison to FFTW

Criterion HighSpeedFFT FFTW
Algorithm support Mixed-radix + Bluestein Cooley–Tuk + Rader + Bluestein
Vectorization Hand-tuned AVX2/FMA + SSE2 Auto-SIMD (SSE/AVX)
Dependencies libm only pthread, libsimd, …
Memory layout contiguous real/imag buffers plan-specific allocators
Performance (bench) ≈FFTW on radix-friendly N, within 5–10% highly tuned wide radix
Footprint ≈25 KiB code ≈300 KiB code + data

Benchmarks on Intel Xeon Gold 6226R (single thread):

  • Mixed-radix (2^12–2^20): within 5–10% of FFTW
  • Bluestein on large primes: ~30% slower (extra convolution)

Usage

#include "highspeedFFT.h"

// Complex FFT
fft_data *in  = malloc(N*sizeof*in),
         *out = malloc(N*sizeof*out);
fft_object fwd = fft_init(N, +1);
fft_exec(fwd, in, out);
fft_object inv = fft_init(N, -1);
fft_exec(inv, out, in);  // normalize by 1/N yourself

// Real FFT
fft_real_object r2c = fft_real_init(N, +1);
fft_data *hc = malloc((N/2+1)*sizeof*hc);
fft_r2c_exec(r2c, real_in, hc);
fft_real_object c2r = fft_real_init(N, -1);
fft_c2r_exec(c2r, hc, real_out);  // normalize by 1/N

Build

mkdir build && cd build
cmake ..                    # optionally: -DENABLE_GTEST=ON
make                        # builds static lib + test harness
./test_mixedRadixFFT        # standalone tests
# or, with GoogleTest:
cmake -DENABLE_GTEST=ON ..
make test

CMake Options

  • ENABLE_GTEST (OFF/ON): GoogleTest vs. standalone harness

  • SIMD flags auto-detected:

    • GCC/Clang: -O3 -mavx2 -mfma -msse2
    • MSVC: /O2 /arch:AVX2 /fp:fast

About

Mixed-Radix Fast Fourier Transform is a lightweight, pure C Fast Fourier Transform (FFT) library designed for efficient signal processing across diverse input sizes. It implements a mixed-radix algorithm for lengths factorable by small primes (e.g., 2, 3, 4, 5, 7, 8) and Bluestein’s algorithm for arbitrary lengths.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published