Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some improvements #1

Open
happyTonakai opened this issue Jun 18, 2024 · 8 comments
Open

Some improvements #1

happyTonakai opened this issue Jun 18, 2024 · 8 comments

Comments

@happyTonakai
Copy link

It seems that you didn't use the istft_struct->window_sum for normalization, I don't know the purpose of it.

And since you are using fftwf_plan_dft_c2r_1d, it is unnecessary to handle the negative frequencies. It only takes up to nfft/2+1 frequency bins as input and returns the real-valued reconstructed signal.

@SuperKogito
Copy link
Owner

The first improvement, I need to test as for the second I agree with you.
Overall these sound adequate. I will try to integrate these and test their effect as soon as I have some time to work on this lib.
Feel free to make PR with these if you want :)

@happyTonakai
Copy link
Author

Yeah I mean you calculate window sum which should be used for normalization but in the end you just normalize the reconstructed signal by the maximum value and the window sum is not used in the code.

@SuperKogito
Copy link
Owner

The reason for some of these, is that I tried a lot of things while using this code in another project (music synthesis) and probably when I divided by the window sum the output was not correct, so I dropped that. Hence why I need to re-test this to confirm if it is actually an improvement :)

@SuperKogito
Copy link
Owner

have you tested "dividing by the window sum"?

@happyTonakai
Copy link
Author

I managed to perform the perfect reconstruction by weighted OLA, the window_sum can be initialized by

 for (int i = 0; i < S->num_frames; ++i)
    {
        for (int j = 0; j < n_fft; ++j)
        {
            S->window_sum[i * S->hop + j] += (S->wnd[j] * S->wnd[j] * n_fft);  // n_fft should be considered due to fftw
        }
    }

At the istft computation stage, apply overlap-add and then normalize it

   for (i = 0; i < S->num_frames; ++i)
    {
        // Retrieve the complex spectrum for the current frame
        for (j = 0; j < n_fft / 2 + 1; ++j)
        {
            S->istft_in[j][0] = stft_values[i * S->num_bins + j][0];
            S->istft_in[j][1] = stft_values[i * S->num_bins + j][1];
        }

        fftwf_execute(S->plan);

        for (j = 0; j < n_fft; ++j)
        {
            idx = i * S->hop + j;
            signal[idx] += S->istft_out[j] * S->wnd[j]; // Apply window and add to signal
        }
    }
    for (i = 0; i < S->signal_length; ++i)
        signal[i] /= S->window_sum[i]; // Normalize by the window sum

@happyTonakai
Copy link
Author

happyTonakai commented Jun 19, 2024

I referred to this answer and the official implementation of matlab.

% Set method
if strcmpi(opts.Method,'ola') 
    a = 0;
else % Else WOLA
    a = 1;
end

% Initialize normalization value
normVal = zeros(xlen,numCh);
winNominator = repmat(win.^a,1,1,numCh);
winDeNominator = repmat(win.^(a+1),1,numCh);

% Overlap-add
for ii = 1:nseg
    x(((ii-1)*hop+1):((ii-1)*hop+nwin),1,:) = x(((ii-1)*hop+1):((ii-1)*hop+nwin),1,:) ...
        + xifft(:,ii,:).*winNominator;
    normVal(((ii-1)*hop+1):((ii-1)*hop+nwin),:) = normVal(((ii-1)*hop+1):((ii-1)*hop+nwin),:)+winDeNominator;
end

% Normalize
normVal(normVal<(nseg*eps)) = 1; % Don't normalize by small values
X = squeeze(x)./normVal;

You should mind the zero point if you use Hann window, in my test rectangular window and Hamming window are fine.

@happyTonakai
Copy link
Author

It seems that your original implementation performs the OLA method, maybe you should try to divide the result by nfft?

@SuperKogito
Copy link
Owner

It is a bit tricky to do this as I couldn't find any reliable resource explaining how to do this in details. I have probably tried to divide by nfft, but possibly in the wrong place. However, I am sure I didn't try this (S->wnd[j] * S->wnd[j] * n_fft), so this is worth trying.
Thank you so much for providing these code snippets/ examples, they surely help a lot.

I will probably test these in the weekend. Feel free to submit a PR if you are interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants