-
Notifications
You must be signed in to change notification settings - Fork 73
Transforms
Currently, the following basic digital transforms are implemented:
- Complex Fast Fourier Transform (FFT)
- Real Fast Fourier Transform (RealFFT)
- Short-Time Fourier Transform (STFT)
- Fast Wavelet Transform (FWT)
- Discrete Cosine Transform (type I-IV), Modified DCT
- Cepstral Transform
- Hartley Transform
- Mellin Transform
- Hilbert Transform
- Goertzel algorithm for evaluation of one particular FFT term
Also read this:
For each transform there's a corresponding transformer object. Each transformer object has Direct()
and Inverse()
methods.
The most fundamental transform in NWaves (as in DSP, in general) is the FFT. All transformers (except FWT and DCT I-IV) are based on FFT, so their performance is conditioned by the performance of FFT.
The FFT code is conventional decimation radix-2 FFT algorithm (O(Nlog(N))). Hence, the size
parameter in FFT transformers must be a power of 2, otherwise exception will be thrown. The size of input and output arrays must be equal to the size of FFT (or exceed it - in this case rest of the samples will be simply ignored). For the sake of performance, NWaves does not do any checks/pre-conditions for the incorrect case when FFT size exceeds array size, and leave it to user (at this relatively low level of processing, similar to good old C++). In this case the .NET index-out-of-range exception will be thrown during processing.
The FWT is coded according to classic Mallat decomposition / reconstruction algorithm.
Implementation of DCT I-IV and MDCT transformers is straightforward (O(N^2)), since so far it's used only in MFCC-like algorithms where small array sizes are involved. For large sizes (starting from 128, for example) there are more efficient FFT-based versions FastDct2, FastDct3, FastDct4, FastMdct
.
There are two classes providing FFT functionality:
-
Fft
- FFT for complex input -
RealFft
- FFT for real input
They give same results for real inputs. Most likely, you'll be working with real inputs, so in most practical cases RealFft
should be preferred, since it performs faster. As of ver.0.9.2 all feature extractors, convolvers and filters use RealFft
objects under the hood for computations.
The only parameter in Fft
and RealFft
constructors is the size of FFT (that should be power of 2, as mentioned above). Both direct and inverse transforms are performed on arrays of real and imaginary parts of samples.
var fft = new Fft(1024);
float[] real = signal.First(1024).Samples;
float[] imag = new float [1024];
// FFT with separate blocks for output data
float[] realOut = new float[1024];
float[] imagOut = new float[1024];
fft.Direct(real, imag, realOut, imagOut);
// in-place FFT
fft.Direct(real, imag);
// ...do something with real and imaginary parts of the spectrum...
// in-place IFFT
fft.Inverse(real, imag);
RealFft
's analogous methods have slightly different prototypes. The following illustration explains the difference:
Fft direct : complex -> complex
RealFft direct : real -> complex
Fft inverse : complex -> complex
RealFft inverse : complex -> real
If you need to do the FFT in-place, just specify the same array reference both for input and output. So the code given above would change to:
var fft = new RealFft(1024);
// in-place FFT: real -> (real, imag)
fft.Direct(real, real, imag);
// ...do something with real and imaginary parts of the spectrum...
// in-place IFFT: (real, imag) -> real
fft.Inverse(real, imag, real);
// if original array must be preserved, another block can be used for output:
fft.Direct(input, outRe, outIm);
fft.Direct(outRe, outIm, output);
All other functions in Fft
and RealFft
classes are completely identical.
There are also overloaded methods Direct / Inverse
working with Span<float>
In some cases it's OK to work directly with real
and imag
arrays (see the code of Phase Vocoder and its variations). But often what we need is to compute some real-valued post-processed results of complex FFT, like magnitude spectrum or power spectrum:
var magnitudeSpectrum =
fft.MagnitudeSpectrum(signal[1000, 2024]);
var powerSpectrum =
fft.PowerSpectrum(signal.First(1024), normalize: false);
var logPowerSpectrum =
fft.PowerSpectrum(signal.Last(1024))
.Samples
.Select(s => Scale.ToDecibel(s))
.ToArray();
The second parameter defines whether the spectrum should be normalized by the size of FFT or not. By default power spectrum is normalized, and magnitude spectrum is not normalized.
Also, methods MagnitudeSpectrum
and PowerSpectrum
will automatically zero-pad or truncate samples to fit FFT size (so unlike Direct()
and Inverse()
methods there will be no exceptions in this case).
The magnitude/power spectrum has length (FFT size) / 2 + 1, i.e. it includes 0th coefficient (DC component) and does not contain the symmetric part of the spectrum.
Finally, methods MagnitudeSpectrum
and PowerSpectrum
have overloaded versions for float[]
arrays designed for efficient sequential spectral analysis of a signal (discussed below). In these versions the size of output array must be not less than FFT size (otherwise .NET index-out-of range exception will be thrown during processing).
Fft.Shift
is quite conventional routine in DSP. It shifts the zero-frequency component of spectrum to the center of the spectrum. It can work with any even-sized array:
float[] array = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
Fft.Shift(array);
// array = { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7 }
var stft = new Stft(1024, 250);
var timefreq = stft.Direct(signal);
var (realSpectrum1, imagSpectrum1) = timefreq[0];
var (realSpectrum2, imagSpectrum2) = timefreq[1];
//...
The first parameter in the constructor is the window size. The second parameter is the hop size that can be any arbitrary positive value. The third parameter is the type of a window function. By default Hann window is used due to its "good" mathematical properties, so don't change it unless you do some exotic processing. The fourth parameter is the size of FFT. If you don't specify this parameter or if you specify the value less than the window size, then FFT size will be calculated as the nearest power of 2 to window size.
The result of direct STFT has type List<(float[], float[])>
, i.e. collection of spectra. Each spectrum is represented with two arrays containing real and imaginary parts of spectral components. This collection contains all necessary data for inverse STFT. Often this information is redundant, and we need to evaluate only the so called spectrogram of a signal which is essentially a sequence of power spectra. In this case we'd use the Spectrogram
method that returns simply the List<float[]>
:
var stft = new Stft(512, 128);
var spectrogram = stft.Spectrogram(signal);
float[] powerSpectrum1 = spectrogram[0]; // of size (fftSize / 2 + 1)
float[] powerSpectrum2 = spectrogram[1]; // of size (fftSize / 2 + 1)
float[] powerSpectrum3 = spectrogram[2]; // of size (fftSize / 2 + 1)
// ...
If the normalize
parameter in Spectrogram()
method is set to true
, then each power spectrum will be normalized by FFT size.
Stft.Inverse()
method essentially does the reconstruction of the signal from STFT spectra obtained by Stft.Direct()
:
float[] reconstructedPerfect = stft.Inverse(timefreq);
float[] reconstructed = stft.Inverse(timefreq, false);
//...
The second parameter is the "perfect reconstruction flag". If it's set to false, there will be insignificant discrepancies between the initial and reconstructed signals (at the beginning and at the end of the signal). However, perfect reconstruction requires more memory.
Method AveragePeriodogram()
returns one averaged power spectrum in STFT sequence. It differs from the Spectrogram()
method, because it's more memory-efficient (it doesn't store all spectra in memory).
Sometimes it's convenient to work with STFT results in Magnitude-Phase form:
var stft = new Stft(1024, 250);
var magPhase = stft.MagnitudePhaseSpectrogram(signal);
// magnitudes and phases in the first frame:
float[] magnitudes = magPhase[0].Magnitudes;
float[] phases = magPhase[0].Phases;
// magPhase is of type MagnitudePhaseList:
// public struct MagnitudePhaseList
// {
// public List<float[]> Magnitudes { get; set; }
// public List<float[]> Phases { get; set; }
// }
Samples can be reconstructed from Magnitude-Phase:
float[] reconstructedPerfect = stft.ReconstructMagnitudePhase(magPhase);
float[] reconstructed = stft.ReconstructMagnitudePhase(magPhase, false);
13 well-known windowing functions are implemented in NWaves:
- Rectangular
- Triangular
- Hamming
- Blackman
- Hann
- Gaussian
- Kaiser
- Kaiser-Bessel derived
- Bartlett-Hann
- Lanczos
- Power of sine
- Flat-top
- Liftering
Window samples can be generated one of two ways:
// 1)
var blackmanWindow = Window.Blackman(512);
// 2)
var hannWindow = Window.OfType(WindowType.Hann, 512);
WindowType
is enum for window functions. Many algorithms in NWaves can accept parameters of this type.
All windows require length
parameter, and some of them have additional parameters:
var kaiserWindow1 = Window.Kaiser(512); // by default, alpha = 12
var kaiserWindow2 = Window.Kaiser(512, alpha: 9);
var kaiserWindow3 = Window.OfType(WindowType.Kaiser, 512, 15);
var kbdWindow1 = Window.Kbd(512); // by default, alpha = 4
var kbdWindow2 = Window.Kbd(512, alpha: 9);
var kbdWindow3 = Window.OfType(WindowType.Kbd, 512, 15);
var sineWindow1 = Window.PowerOfSine(512); // by default, alpha = 1.5
var sineWindow2 = Window.PowerOfSine(512, alpha: 0.5);
var sineWindow3 = Window.OfType(WindowType.PowerOfSine, 512, 2);
var lifterWindow1 = Window.Liftering(512); // by default, l = 22
var lifterWindow2 = Window.Liftering(512, l: 16);
var lifterWindow3 = Window.OfType(WindowType.Liftering, 512, 28);
There are several extension methods for applying windows:
var signal = new DiscreteSignal(8000, 128, 1f);
var samples = new float[128];
var window = Window.Blackman(128);
signal.ApplyWindow(window); // 1) window samples are pre-computed
samples.ApplyWindow(samples); // 2) window samples are pre-computed
signal.ApplyWindow(WindowType.Blackman); // 3) window samples will be created each time
samples.ApplyWindow(WindowType.Blackman); // 4) window samples will be created each time
Doing FWT is easy as this:
var input = new float[192]; // fill it with values
var output = new float[192];
var fwt = new Fwt(192, new Wavelet("db5"));
fwt.Direct(input, output);
fwt.Inverse(output, input);
FWT is compliant with pyWavelets/MATLAB given that signal extension mode is set there to 'periodization'/'ppd'.
FWT constructor accepts two parameters: FWT size and the Wavelet
object. The size should not be necessarily a power of 2. However, currently, it must guarantee that at the given level of wavelet decomposition, the block of samples will have even size. Otherwise, no exception will be thrown but the results will be incorrect. The power-of-2 size guarantees that arrays will always have even size.
var fwt = new Fwt(108, new Wavelet("db5"));
var maxLevel = fwt.MaxLevel(108); // maxLevel = 3
// maxLevel is auto-computed and used for Fwt by default
// 108 samples -> 54 samples + 54 samples
fwt.Direct(input, output, 1); // only first level, OK
fwt.Inverse(output, input, 1); // only first level, OK
// ============================ or ===================================
// level1: 108 samples (even) -> 54 samples + 54 samples
// level2: 54 samples (even) -> 27 samples + 27 samples
fwt.Direct(input, output, 2); // only two levels, OK
fwt.Inverse(output, input, 2); // only two levels, OK
// ============================ or ===================================
// level1: 108 samples (even) -> 54 samples + 54 samples
// level2: 54 samples (even) -> 27 samples + 27 samples
// level3: 27 samples (odd) -> 13 samples + 14 samples
fwt.Direct(input, output, 3); // at the third level results are incorrect
fwt.Inverse(output, input, 3); // at the third level results are incorrect
Wavelets are represented with class Wavelet
:
var w = new Wavelet(WaveletFamily.Symlet, 7);
// or
var w = new Wavelet("sym7");
var name = w.Name; // "sym7"
var length = w.Length; // 14
var loD = w.LoD; // { 0,002681815, -0,001047385, -0,0126363, ... }
var hiD = w.HiD; // { ... }
var loR = w.LoR; // { ... }
var hiR = w.HiR; // { ... }
Supported wavelets are:
- Haar ("haar")
- Daubechies ("db1", "db2", "db3", ..., "db20")
- Symlet ("sym2", "sym3", "sym4", ..., "sym20")
- Coiflet ("coif1", "coif2", "coif3", "coif4", "coif5")
Or just create custom wavelet from coefficients:
float[] loD;
float[] hiD;
float[] loR;
float[] hiR;
// ... fill arrays ...
var w = new Wavelet(loD, hiD, loR, hiR);
var name = w.Name; // "custom"
var lD = w.LoD; // loD array
var hD = w.HiD; // hiD array
var lR = w.LoR; // loR array
var hR = w.HiR; // hiR array
There are several different well-known DCTs. In NWaves DCTs of type I-IV and MDCT (modified DCT) are implemented. They are fully compliant with scipy/MATLAB equivalents. DCT classes implement four methods of IDct
interface:
- Direct
- Inverse
- DirectNorm (normalized DCT)
- InverseNorm (normalized IDCT)
var input = new float[20]; // fill it with values
var output = new float[12]; // take only 12 compressed values
var dct1 = new Dct1(20); // type I
dct1.Direct(input, output);
dct1.DirectNorm(input, output);
dct1.Inverse(output, input);
var dct2 = new Dct2(20); // type II
dct2.Direct(input, output);
dct2.DirectNorm(input, output);
dct2.Inverse(output, input);
var dct3 = new Dct3(20); // type III
dct3.Direct(input, output);
dct3.DirectNorm(input, output);
dct3.Inverse(output, input);
var dct4 = new Dct4(20); // type IV
dct4.Direct(input, output);
dct4.DirectNorm(input, output);
dct4.Inverse(output, input);
var mdct = new Mdct(20); // MDCT
mdct.Direct(input, output);
mdct.DirectNorm(input, output);
mdct.Inverse(output, input);
For large arrays DCT can be computed more efficiently via FFT using classes prefixed with Fast
:
var input = new float[256]; // fill it with values
var output = new float[32]; // take 32 compressed values
var dct2 = new FastDct2(256); // type II
dct2.Direct(input, output);
dct2.DirectNorm(input, output);
dct2.Inverse(output, input);
var dct3 = new FastDct3(256); // type III
dct3.Direct(input, output);
dct3.DirectNorm(input, output);
dct3.Inverse(output, input);
var dct4 = new FastDct4(256); // type IV
dct4.Direct(input, output);
dct4.DirectNorm(input, output);
dct4.Inverse(output, input);
var mdct = new FastMdct(256); // MDCT
mdct.Direct(input, output);
mdct.DirectNorm(input, output);
mdct.Inverse(output, input);
var dht = new HartleyTransform(512);
dht.Direct(re); // in-place
dht.Inverse(re); // in-place
Mellin transformer has some specific features:
- sizes of both input and output array must be specified (and (1) they may differ; (2) input size can be arbitrary, output size must be power of two)
- currently, only
Direct
method is implemented
The following code demonstrates that mellin spectra of a signal and its time-stretched version are (almost) identical:
var stretch = 1.5;
int length = 256;
int stretchedLength = (int)(stretch * length);
// build sinusoid
var sin1 = new SineBuilder()
.SetParameter("freq", 5/*Hz*/)
.SampledAt(length)
.OfLength(length)
.Build();
// build the same sinusoid but time-stretched
var sin2 = new SineBuilder()
.SetParameter("freq", 5/*Hz*/)
.SampledAt(stretchedLength)
.OfLength(stretchedLength)
.Build();
var outputRe = new float[128];
var outputIm = new float[128];
// compute Mellin power spectrum of sinusoid
var mt1 = new MellinTransform(length, 128);
mt1.Direct(sin1.Samples, outputRe, outputIm);
var mellinSpectrum1 = outputRe.Take(64)
.Zip(outputIm, (re, im) => re * re + im * im)
.ToArray();
// compute Mellin power spectrum of time-stretched sinusoid
var mt2 = new MellinTransform(stretchedLength, 128);
mt2.Direct(sin2.Samples, outputRe, outputIm);
var mellinSpectrum2 = outputRe.Take(64)
.Zip(outputIm, (re, im) => re * re + im * im)
.ToArray();
// compute Mean Absolute Error between two spectra:
var err = mellinSpectrum1.Zip(mellinSpectrum2, (s1, s2) => Math.Abs(s1 - s2)).Sum();
var ct = new CepstralTransform(20, fftSize: 512, logBase: 10);
var delay = ct.Direct(signal, cceps); // direct complex cepstrum: signal -> cceps
ct.Inverse(cceps, signal, delay); // inverse complex cepstrum: cceps -> signal
ct.RealCepstrum(signal, rceps); // real cepstrum: signal -> rceps
ct.PowerCepstrum(signal, pceps); // power cepstrum: signal -> pceps
ct.PhaseCepstrum(signal, hceps); // phase cepstrum: signal -> hceps
The first parameter in the constructor of CepstralTransform
is the number of cepstral coefficients. The second (optional) parameter is the size of FFT that will be used to compute cepstrum. The third (optional) parameter is the base of logarithm (by default it's Math.E
, i.e. natural logarithm is used for computations).
The results are compliant with output of MATLAB functions cceps/icceps
and rceps
.
Complex cepstrum is calculated according to formula:
Phase unwrapping is quite tricky and differs from standard MathUtils.Unwrap
. Also, the delay
parameter arises from this. Read more about it here.
Real cepstrum is computed similarly, with the term phase_unwrapped
discarded.
Power cepstrum is calculated from real cepstrum as: power = 4 * real ^ 2
.
Phase cepstrum is calculated from complex cepstrum as: phase = (complex_cepstrum - reversed_complex_cepstrum) ^ 2
.
var ht = new HilbertTransform(1024);
var result = ht.Direct(doubleSamples);
HilbertTransform
class also provides method for computing complex analytic signal. Thus, previous line is equivalent to:
var result = ht.AnalyticSignal(doubleSamples).Imag;
By default HilbertTransformer works with double precision. This code is for floats:
var ht = new HilbertTransform(1024, doublePrecision: false);
var result = ht.AnalyticSignal(floatSamples).Item2;
Example of computing only one term (#2) of 8-size FFT using Goertzel algorithm:
float[] array = { 1, 2, 3, 4, 5, 6, 7, 8 };
Complex secondFftTerm = new Goertzel(8).Direct(array, 2);
In previous cases the result of each transform was a newly created object of the DiscreteSignal
class. If the sequence of blocks must be processed then it's better to work with reusable arrays in memory (all output and intermediate results will be stored in reusable arrays, i.e. there will be no unnecessary memory allocations):
var fft = new RealFft(1024);
var spectrum = new float[1024];
var cepstrum = new float[20];
fft.PowerSpectrum(signal[1000, 2024].Samples, spectrum);
// do something with spectrum
fft.PowerSpectrum(signal[2024, 3048].Samples, spectrum);
// do something with spectrum
fft.PowerSpectrum(signal[3048, 4072].Samples, spectrum);
// do something with spectrum
ct.Direct(signal[5000, 5512].Samples, cepstrum)
// do something with cepstrum
//...
Hence, the following code is the best for spectral analysis and widely used in many parts of NWaves library:
// for extension method FastCopyTo()
using NWaves.Utils;
var fftSize = 512;
var fft = new RealFft(fftSize);
var window = Window.Blackman(fftSize);
var block = new float[fftSize];
var spectrum = new float[fftSize / 2 + 1];
// loop:
for (var pos = 0; pos + fftSize < signal.Length; pos += hopSize)
{
// copy 512 samples to 'block' starting from index 'pos'
signal.Samples.FastCopyTo(block, 512, pos);
block.ApplyWindow(window);
fft.MagnitudeSpectrum(block, spectrum);
// post-process/visualize spectrum
}
In filter design and analysis we're dealing with doubles (64 bit). Therefore, 64-bit version of complex and real FFT was added to library. It can be used just like its 32-bit counterpart, however, only Direct()
and Inverse
methods are available:
// for extension method ToDoubles()
using NWaves.Utils;
var fft = new Fft64(512);
var rfft = new RealFft64(512);
double[] real = signal.First(512).Samples.ToDoubles();
double[] imag = new double [512];
// in-place FFT
fft.Direct(real, imag);
// ...do something with real and imaginary parts of the spectrum...
// in-place IFFT
fft.Inverse(real, imag);
// ...
In .NET 5 version of NWaves you'll find one more implementation of FFT direct/inverse methods working with samples in the form of Span<float>
:
var fftSize = 1024;
var fft = new RealFft(fftSize);
Span<float> re = stackalloc float[fftSize];
Span<float> im = stackalloc float[fftSize];
// ...or any other way of creating a Span<float>
var startPos = 1000;
var input = samples.AsSpan(startPos, fftSize);
// ...or any other way of creating a Span<float>
fft.Direct(input, re, im);
Yet, there are reasons why the span approach is not leveraged in NWaves codebase. The most important are:
- most of the time the data is loaded and stored in managed memory
- data frames usually overlap, so there's a need to make a copy of data anyway
-
Span
-version is slightly slower thanfloat[]
-version, according to benchmarks