-
Notifications
You must be signed in to change notification settings - Fork 73
Transforms
Currently, the following basic digital transforms are implemented:
- Fast Fourier Transform (FFT)
- Short-Time Fourier Transform (STFT)
- Discrete Cosine Transform (type I-IV)
- Cepstral Transform
- Hilbert Transform
- Goertzel algorithm for evaluation of one particular FFT term
Note, the implementation of algorithms is far from optimized. Nevertheless, even on humble modern hardware the overall speed of transforms allows performing a lot of useful things without significant or noticeable delays. And who says that it's the last version of NWaves? 😃 There's always a place for optimizations, and corresponding PRs are very welcome.
The FFT code is conventional decimation radix-2 FFT algorithm (O(Nlog(N))). Hence, direct/inverse FFT requires the length of input data to be a power of 2, otherwise exception will be thrown.
Implementation of DCT I-IV transformers is straightforward (O(N^2)), since so far it's used only in MFCC-like algorithms where small array sizes are involved.
Other transforms are based on FFT, so their performance is conditioned by the performance of FFT.
For each transform there's a corresponding transformer object. Each transformer object has Direct()
and Inverse()
methods.
The only parameter in Fft
constructor is the size of FFT (that should be power of 2, as mentioned above). Both direct and inverse transforms are performed in-place on arrays of real and imaginary parts of samples. If the length of data differs from specified size of Fft
object, exception will be thrown.
var fft = new Fft(1024);
float[] real = signal.First(1024).Samples;
float[] imag = new float [1024];
// in-place FFT
fft.Direct(real, imag);
// ...do something with real and imaginary parts of the spectrum...
// in-place IFFT
fft.Inverse(real, imag);
In some cases it's better to work directly with real
and imag
arrays (see Phase Vocoder code). But often what we need is to compute some real-valued post-processed results of complex FFT, like magnitude spectrum or power spectrum:
var magnitudeSpectrum =
fft.MagnitudeSpectrum(signal[1000, 2024]);
var powerSpectrum =
fft.PowerSpectrum(signal.First(1024), normalize: false);
var logPowerSpectrum =
fft.PowerSpectrum(signal.Last(1024))
.Samples
.Select(s => Scale.ToDecibel(s))
.ToArray();
The second parameter defines whether the spectrum should be normalized by the size of FFT or not. By default power spectrum is normalized, and magnitude spectrum is not normalized.
Also, methods MagnitudeSpectrum
and PowerSpectrum
will automatically zero-pad or truncate samples to fit FFT size (so unlike Direct()
and Inverse()
methods there will be no exceptions in this case).
The magnitude/power spectrum has length (FFT size) / 2 + 1, i.e. it includes 0th coefficient (DC component) and does not contain the symmetric part of the spectrum.
Finally, methods MagnitudeSpectrum
and PowerSpectrum
have overloaded versions for float[]
arrays designed for efficient sequential spectral analysis of a signal (discussed below).
Fft.Shift
is quite conventional routine in DSP. It shifts the zero-frequency component of spectrum to the center of the spectrum. It can work with any even-sized array:
float[] array = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
Fft.Shift(array);
// array = { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7 }
var stft = new Stft(1024, 512, WindowTypes.Hamming);
var timefreq = stft.Direct(signal);
var reconstructed = stft.Inverse(timefreq);
var spectrogram = stft.Spectrogram(signal);
The first parameter in the constructor is the window size (it may be less than FFT size and doesn't have to be a power of 2). The second parameter is the hop size. The third parameter is the type of a window function. The fourth parameter is the size of FFT that is calculated by default as the nearest power of 2 to window size.
There are 13 windowing functions in NWaves:
- Rectangular
- Triangular
- Hamming
- Blackman
- Hann
- Gaussian
- Kaiser
- Kaiser-Bessel derived
- Bartlett-Hann
- Lanczos
- Power of sine
- Flat-top
- Liftering
var dct1 = new Dct1(20, 12); // type I
dct1.Direct(input, output);
dct1.Inverse(output, input);
var dct2 = new Dct2(20, 12); // type II
dct2.Direct(input, output);
dct2.Inverse(output, input);
// normalized DCT-II (used in PNCC algorithm):
dct2.DirectN(input, output)
var dct3 = new Dct3(20, 12); // type III
dct3.Direct(input, output);
dct3.Inverse(output, input);
var dct4 = new Dct4(20, 12); // type IV
dct4.Direct(input, output);
dct4.Inverse(output, input);
var ct = new CepstralTransform(20, fftSize: 512);
var cepstrum = ct.Direct(signal);
Cepstrum is calculated according to formula:
Power cepstrum is calculated according to formula:
var ht = new HilbertTransform(1024);
var result = ht.Direct(doubleSamples);
HilbertTransform
class also provides method for computing complex analytic signal. Thus, previous line is equivalent to:
var result = ht.AnalyticSignal(doubleSamples).Imag;
By default HilbertTransformer works with double precision. This code is for floats:
var ht = new HilbertTransform(1024, doublePrecision: false);
var result = ht.AnalyticSignal(floatSamples).Item2;
Example of computing only one term (#2) of 8-size FFT using Goertzel algorithm:
float[] array = { 1, 2, 3, 4, 5, 6, 7, 8 };
Complex secondFftTerm = new Goertzel(8).Direct(array, 2);
In previous cases the result of each transform was a newly created object of the DiscreteSignal
class. If the sequence of blocks must be processed then it's better to work with reusable arrays in memory (all output and intermediate results will be stored in reusable arrays, i.e. there will be no unnecessary memory allocations):
var spectrum = new float[1024];
var cepstrum = new float[20];
fft.PowerSpectrum(signal[1000, 2024].Samples, spectrum);
// do something with spectrum
fft.PowerSpectrum(signal[2024, 3048].Samples, spectrum);
// do something with spectrum
fft.PowerSpectrum(signal[3048, 4072].Samples, spectrum);
// do something with spectrum
ct.Direct(signal[5000, 5512].Samples, cepstrum)
// do something with cepstrum
//...
Hence, the following code is the best for spectral analysis and widely used in many parts of NWaves library:
var fft = new Fft(512);
var window = Window.Blackman(512);
var spectrum = new float[512];
// loop:
for (var pos = 0; pos + fftSize < signal.Length; pos += hopSize)
{
var block = signal[pos, pos + fftSize].Samples;
block.ApplyWindow(window);
fft.MagnitudeSpectrum(block, spectrum);
// post-process/visualize spectrum
}
In filter design and analysis we're dealing with doubles (64 bit). Therefore, 64-bit version of FFT was added to library. It can be used just like its 32-bit counterpart, however, only Direct()
and Inverse
methods are available:
// for extension method ToDoubles()
using NWaves.Utils;
var fft = new Fft64(512);
double[] real = signal.First(512).Samples.ToDoubles();
double[] imag = new double [512];
// in-place FFT
fft.Direct(real, imag);
// ...do something with real and imaginary parts of the spectrum...
// in-place IFFT
fft.Inverse(real, imag);