Skip to content

Transforms

Tim Sharii edited this page Sep 1, 2021 · 9 revisions

Currently, the following basic digital transforms are implemented:

Also read this:


For each transform there's a corresponding transformer object. Each transformer object has Direct() and Inverse() methods.

The most fundamental transform in NWaves (as in DSP, in general) is the FFT. All transformers (except FWT and DCT I-IV) are based on FFT, so their performance is conditioned by the performance of FFT.

The FFT code is conventional decimation radix-2 FFT algorithm (O(Nlog(N))). Hence, the size parameter in FFT transformers must be a power of 2, otherwise exception will be thrown. The size of input and output arrays must be equal to the size of FFT (or exceed it - in this case rest of the samples will be simply ignored). For the sake of performance, NWaves does not do any checks/pre-conditions for the incorrect case when FFT size exceeds array size, and leave it to user (at this relatively low level of processing, similar to good old C++). In this case the .NET index-out-of-range exception will be thrown during processing.

The FWT is coded according to classic Mallat decomposition / reconstruction algorithm.

Implementation of DCT I-IV and MDCT transformers is straightforward (O(N^2)), since so far it's used only in MFCC-like algorithms where small array sizes are involved. For large sizes (starting from 128, for example) there are more efficient FFT-based versions FastDct2, FastDct3, FastDct4, FastMdct.

FFT transformers

There are two classes providing FFT functionality:

  • Fft - FFT for complex input
  • RealFft - FFT for real input

They give same results for real inputs. Most likely, you'll be working with real inputs, so in most practical cases RealFft should be preferred, since it performs faster. As of ver.0.9.2 all feature extractors, convolvers and filters use RealFft objects under the hood for computations.

The only parameter in Fft and RealFft constructors is the size of FFT (that should be power of 2, as mentioned above). Both direct and inverse transforms are performed on arrays of real and imaginary parts of samples.

var fft = new Fft(1024);

float[] real = signal.First(1024).Samples;
float[] imag = new float [1024];


// FFT with separate blocks for output data

float[] realOut = new float[1024];
float[] imagOut = new float[1024];

fft.Direct(real, imag, realOut, imagOut);


// in-place FFT
fft.Direct(real, imag);

// ...do something with real and imaginary parts of the spectrum...

// in-place IFFT
fft.Inverse(real, imag);

RealFft's analogous methods have slightly different prototypes. The following illustration explains the difference:

    Fft direct  : complex -> complex
RealFft direct  : real    -> complex
    Fft inverse : complex -> complex
RealFft inverse : complex -> real

If you need to do the FFT in-place, just specify the same array reference both for input and output. So the code given above would change to:

var fft = new RealFft(1024);

// in-place FFT: real  ->   (real, imag)
fft.Direct(real, real, imag);

// ...do something with real and imaginary parts of the spectrum...

// in-place IFFT: (real, imag)  ->   real
fft.Inverse(real, imag, real);


// if original array must be preserved, another block can be used for output:

fft.Direct(input, outRe, outIm);
fft.Direct(outRe, outIm, output);

All other functions in Fft and RealFft classes are completely identical.

There are also overloaded methods Direct / Inverse working with Span<float>

In some cases it's OK to work directly with real and imag arrays (see the code of Phase Vocoder and its variations). But often what we need is to compute some real-valued post-processed results of complex FFT, like magnitude spectrum or power spectrum:

var magnitudeSpectrum = 
    fft.MagnitudeSpectrum(signal[1000, 2024]);

var powerSpectrum = 
    fft.PowerSpectrum(signal.First(1024), normalize: false);

var logPowerSpectrum = 
    fft.PowerSpectrum(signal.Last(1024))
       .Samples
       .Select(s => Scale.ToDecibel(s))
       .ToArray();

The second parameter defines whether the spectrum should be normalized by the size of FFT or not. By default power spectrum is normalized, and magnitude spectrum is not normalized.

Also, methods MagnitudeSpectrum and PowerSpectrum will automatically zero-pad or truncate samples to fit FFT size (so unlike Direct() and Inverse() methods there will be no exceptions in this case).

The magnitude/power spectrum has length (FFT size) / 2 + 1, i.e. it includes 0th coefficient (DC component) and does not contain the symmetric part of the spectrum.

Finally, methods MagnitudeSpectrum and PowerSpectrum have overloaded versions for float[] arrays designed for efficient sequential spectral analysis of a signal (discussed below). In these versions the size of output array must be not less than FFT size (otherwise .NET index-out-of range exception will be thrown during processing).

Fft.Shift is quite conventional routine in DSP. It shifts the zero-frequency component of spectrum to the center of the spectrum. It can work with any even-sized array:

float[] array = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };

Fft.Shift(array);

// array = { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7 }

Short-Time Fourier Transform:

var stft = new Stft(1024, 250);
var timefreq = stft.Direct(signal);
var reconstructed = stft.Inverse(timefreq);

The first parameter in the constructor is the window size. The second parameter is the hop size that can be any arbitrary positive value. The third parameter is the type of a window function. By default Hann window is used due to its "good" mathematical properties, so don't change it unless you do some exotic processing. The fourth parameter is the size of FFT. If you don't specify this parameter or if you specify the value less than the window size, then FFT size will be calculated as the nearest power of 2 to window size.

The result of direct STFT has type List<(float[], float[])>, i.e. collection of spectra. Each spectrum is represented with two arrays containing real and imaginary parts of spectral components. This collection contains all necessary data for inverse STFT. Often this information is redundant, and we need to evaluate only the so called spectrogram of a signal which is essentially a sequence of power spectra. In this case we'd use the Spectrogram method that returns simply the List<float[]>:

var stft = new Stft(512, 256);
var spectrogram = stft.Spectrogram(signal);

Sometimes it's convenient to work with STFT results in Magnitude-Phase form:

var stft = new Stft(1024, 250);
var magPhase = stft.MagnitudePhaseSpectrogram(signal);

// magnitudes and phases in the first frame:

float[] magnitudes = magPhase[0].Magnitudes;
float[] phases = magPhase[0].Phases;

// magPhase is of type MagnitudePhaseList:

//    public struct MagnitudePhaseList
//    {
//        public List<float[]> Magnitudes { get; set; }
//        public List<float[]> Phases { get; set; }
//    }

Samples can be reconstructed from Magnitude-Phase:

float[] reconstructed = stft.ReconstructMagnitudePhase(magPhase);

Window functions

13 well-known windowing functions are implemented in NWaves:

  • Rectangular
  • Triangular
  • Hamming
  • Blackman
  • Hann
  • Gaussian
  • Kaiser
  • Kaiser-Bessel derived
  • Bartlett-Hann
  • Lanczos
  • Power of sine
  • Flat-top
  • Liftering

Window samples can be generated one of two ways:

// 1)
var blackmanWindow = Window.Blackman(512);

// 2)
var hannWindow = Window.OfType(WindowType.Hann, 512);

WindowType is enum for window functions. Many algorithms in NWaves can accept parameters of this type.

All windows require length parameter, and some of them have additional parameters:

var kaiserWindow1 = Window.Kaiser(512);           // by default, alpha = 12
var kaiserWindow2 = Window.Kaiser(512, alpha: 9);
var kaiserWindow3 = Window.OfType(WindowType.Kaiser, 512, 15);

var kbdWindow1 = Window.Kbd(512);                 // by default, alpha = 4
var kbdWindow2 = Window.Kbd(512, alpha: 9);
var kbdWindow3 = Window.OfType(WindowType.Kbd, 512, 15);

var sineWindow1 = Window.PowerOfSine(512);        // by default, alpha = 1.5
var sineWindow2 = Window.PowerOfSine(512, alpha: 0.5);
var sineWindow3 = Window.OfType(WindowType.PowerOfSine, 512, 2);

var lifterWindow1 = Window.Liftering(512);        // by default, l = 22
var lifterWindow2 = Window.Liftering(512, l: 16);
var lifterWindow3 = Window.OfType(WindowType.Liftering, 512, 28);

There are several extension methods for applying windows:

var signal = new DiscreteSignal(8000, 128, 1f);
var samples = new float[128];

var window = Window.Blackman(128);

signal.ApplyWindow(window);    // 1) window samples are pre-computed
samples.ApplyWindow(samples);  // 2) window samples are pre-computed

signal.ApplyWindow(WindowType.Blackman);   // 3) window samples will be created each time
samples.ApplyWindow(WindowType.Blackman);  // 4) window samples will be created each time

Fast Wavelet Transform

Doing FWT is easy as this:

var input = new float[192];   // fill it with values
var output = new float[192];

var fwt = new Fwt(192, new Wavelet("db5"));

fwt.Direct(input, output);
fwt.Inverse(output, input);

FWT is compliant with pyWavelets/MATLAB given that signal extension mode is set there to 'periodization'/'ppd'.

FWT constructor accepts two parameters: FWT size and the Wavelet object. The size should not be necessarily a power of 2. However, currently, it must guarantee that at the given level of wavelet decomposition, the block of samples will have even size. Otherwise, no exception will be thrown but the results will be incorrect. The power-of-2 size guarantees that arrays will always have even size.

var fwt = new Fwt(108, new Wavelet("db5"));

var maxLevel = fwt.MaxLevel(108);  // maxLevel = 3
                                   // maxLevel is auto-computed and used for Fwt by default

// 108 samples ->   54 samples + 54 samples

fwt.Direct(input, output, 1);     // only first level, OK
fwt.Inverse(output, input, 1);    // only first level, OK

// ============================ or ===================================

// level1:  108 samples (even) ->   54 samples + 54 samples
// level2:  54 samples  (even) ->   27 samples + 27 samples

fwt.Direct(input, output, 2);     // only two levels, OK
fwt.Inverse(output, input, 2);    // only two levels, OK

// ============================ or ===================================

// level1:  108 samples (even) ->   54 samples + 54 samples
// level2:  54 samples  (even) ->   27 samples + 27 samples
// level3:  27 samples  (odd)  ->   13 samples + 14 samples

fwt.Direct(input, output, 3);   // at the third level results are incorrect
fwt.Inverse(output, input, 3);  // at the third level results are incorrect

Wavelets are represented with class Wavelet:

var w = new Wavelet(WaveletFamily.Symlet, 7);
// or
var w = new Wavelet("sym7");

var name = w.Name;     // "sym7"
var length = w.Length; // 14
var loD = w.LoD;       // { 0,002681815, -0,001047385, -0,0126363, ... }
var hiD = w.HiD;       // { ... }
var loR = w.LoR;       // { ... }
var hiR = w.HiR;       // { ... }

Supported wavelets are:

  • Haar ("haar")
  • Daubechies ("db1", "db2", "db3", ..., "db20")
  • Symlet ("sym2", "sym3", "sym4", ..., "sym20")
  • Coiflet ("coif1", "coif2", "coif3", "coif4", "coif5")

DCT transformers

There are several different well-known DCTs. In NWaves DCTs of type I-IV are implemented. They are fully compliant with scipy/MATLAB equivalents. DCT classes implement three methods of IDct interface:

  • Direct
  • Inverse
  • DirectNorm (normalized DCT)
var input = new float[20];  // fill it with values
var output = new float[12]; // take only 12 compressed values

var dct1 = new Dct1(20);    // type I
dct1.Direct(input, output);
dct1.DirectNorm(input, output);
dct1.Inverse(output, input);

var dct2 = new Dct2(20);   // type II
dct2.Direct(input, output);
dct2.DirectNorm(input, output);
dct2.Inverse(output, input);

var dct3 = new Dct3(20);   // type III
dct3.Direct(input, output);
dct3.DirectNorm(input, output);
dct3.Inverse(output, input);

var dct4 = new Dct4(20);   // type IV
dct4.Direct(input, output);
dct4.DirectNorm(input, output);
dct4.Inverse(output, input);

For large arrays DCT can be computed more efficiently via FFT.

Hartley transformer

var dht = new HartleyTransform(512);

dht.Direct(re);    // in-place
dht.Inverse(re);   // in-place

Mellin transformer

Mellin transformer has some specific features:

  • sizes of both input and output array must be specified (and (1) they may differ; (2) input size can be arbitrary, output size must be power of two)
  • currently, only Direct method is implemented

The following code demonstrates that mellin spectra of a signal and its time-stretched version are (almost) identical:

var stretch = 1.5;

int length = 256;
int stretchedLength = (int)(stretch * length);

// build sinusoid
var sin1 = new SineBuilder()
                   .SetParameter("freq", 5/*Hz*/)
                   .SampledAt(length)
                   .OfLength(length)
                   .Build();

// build the same sinusoid but time-stretched
var sin2 = new SineBuilder()
                   .SetParameter("freq", 5/*Hz*/)
                   .SampledAt(stretchedLength)
                   .OfLength(stretchedLength)
                   .Build();

var outputRe = new float[128];
var outputIm = new float[128];

// compute Mellin power spectrum of sinusoid
var mt1 = new MellinTransform(length, 128);
mt1.Direct(sin1.Samples, outputRe, outputIm);

var mellinSpectrum1 = outputRe.Take(64)
                              .Zip(outputIm, (re, im) => re * re + im * im)
                              .ToArray();

// compute Mellin power spectrum of time-stretched sinusoid
var mt2 = new MellinTransform(stretchedLength, 128);
mt2.Direct(sin2.Samples, outputRe, outputIm);

var mellinSpectrum2 = outputRe.Take(64)
                              .Zip(outputIm, (re, im) => re * re + im * im)
                              .ToArray();

// compute Mean Absolute Error between two spectra:
var err = mellinSpectrum1.Zip(mellinSpectrum2, (s1, s2) => Math.Abs(s1 - s2)).Sum();

Cepstral transformer:

var ct = new CepstralTransform(20, fftSize: 512, logBase: 10);

var delay = ct.Direct(signal, cceps);  // direct complex cepstrum:  signal -> cceps
ct.Inverse(cceps, signal, delay);      // inverse complex cepstrum: cceps -> signal

ct.RealCepstrum(signal, rceps);        // real cepstrum:  signal -> rceps
ct.PowerCepstrum(signal, pceps);       // power cepstrum: signal -> pceps
ct.PhaseCepstrum(signal, hceps);       // phase cepstrum: signal -> hceps

The first parameter in the constructor of CepstralTransform is the number of cepstral coefficients. The second (optional) parameter is the size of FFT that will be used to compute cepstrum. The third (optional) parameter is the base of logarithm (by default it's Math.E, i.e. natural logarithm is used for computations).

The results are compliant with output of MATLAB functions cceps/icceps and rceps.

Complex cepstrum is calculated according to formula:

img

Phase unwrapping is quite tricky and differs from standard MathUtils.Unwrap. Also, the delay parameter arises from this. Read more about it here.

Real cepstrum is computed similarly, with the term phase_unwrapped discarded.

Power cepstrum is calculated from real cepstrum as: power = 4 * real ^ 2.

Phase cepstrum is calculated from complex cepstrum as: phase = (complex_cepstrum - reversed_complex_cepstrum) ^ 2.

Hilbert transformer:

var ht = new HilbertTransform(1024);
var result = ht.Direct(doubleSamples);

HilbertTransform class also provides method for computing complex analytic signal. Thus, previous line is equivalent to:

var result = ht.AnalyticSignal(doubleSamples).Imag;

By default HilbertTransformer works with double precision. This code is for floats:

var ht = new HilbertTransform(1024, doublePrecision: false);
var result = ht.AnalyticSignal(floatSamples).Item2;

Goertzel algorithm

Example of computing only one term (#2) of 8-size FFT using Goertzel algorithm:

float[] array = { 1, 2, 3, 4, 5, 6, 7, 8 };

Complex secondFftTerm = new Goertzel(8).Direct(array, 2);

Efficient spectral processing

In previous cases the result of each transform was a newly created object of the DiscreteSignal class. If the sequence of blocks must be processed then it's better to work with reusable arrays in memory (all output and intermediate results will be stored in reusable arrays, i.e. there will be no unnecessary memory allocations):

var fft = new RealFft(1024);

var spectrum = new float[1024];
var cepstrum = new float[20];

fft.PowerSpectrum(signal[1000, 2024].Samples, spectrum);
// do something with spectrum

fft.PowerSpectrum(signal[2024, 3048].Samples, spectrum);
// do something with spectrum

fft.PowerSpectrum(signal[3048, 4072].Samples, spectrum);
// do something with spectrum

ct.Direct(signal[5000, 5512].Samples, cepstrum)
// do something with cepstrum

//...

Hence, the following code is the best for spectral analysis and widely used in many parts of NWaves library:

// for extension method FastCopyTo() 
using NWaves.Utils;

var fftSize = 512;

var fft = new RealFft(fftSize);
var window = Window.Blackman(fftSize);
var block = new float[fftSize];
var spectrum = new float[fftSize / 2 + 1];

// loop:

for (var pos = 0; pos + fftSize < signal.Length; pos += hopSize)
{
    // copy 512 samples to 'block' starting from index 'pos'
    signal.Samples.FastCopyTo(block, 512, pos);

    block.ApplyWindow(window);

    fft.MagnitudeSpectrum(block, spectrum);

    // post-process/visualize spectrum
}

64-bit FFT

In filter design and analysis we're dealing with doubles (64 bit). Therefore, 64-bit version of complex and real FFT was added to library. It can be used just like its 32-bit counterpart, however, only Direct() and Inverse methods are available:

// for extension method ToDoubles() 
using NWaves.Utils;

var fft = new Fft64(512);
var rfft = new RealFft64(512);

double[] real = signal.First(512).Samples.ToDoubles();
double[] imag = new double [512];

// in-place FFT
fft.Direct(real, imag);

// ...do something with real and imaginary parts of the spectrum...

// in-place IFFT
fft.Inverse(real, imag);

// ...

FFT with Span

In .NET 5 version of NWaves you'll find one more implementation of FFT direct/inverse methods working with samples in the form of Span<float>:

var fftSize = 1024;
var fft = new RealFft(fftSize);

Span<float> re = stackalloc float[fftSize];
Span<float> im = stackalloc float[fftSize];
// ...or any other way of creating a Span<float>

var startPos = 1000;
var input = samples.AsSpan(startPos, fftSize);
// ...or any other way of creating a Span<float>

fft.Direct(input, re, im);

Yet, there are reasons why the span approach is not leveraged in NWaves codebase. The most important are:

  • most of the time the data is loaded and stored in managed memory
  • data frames usually overlap, so there's a need to make a copy of data anyway
  • Span-version is slightly slower than float[]-version, according to benchmarks
Clone this wiki locally