dsp: p_fir: Fix FIR filter implementation #152

lfochamon · 2015-06-16T23:26:43Z

This is a fix to issue #151. It maintains the idea of evaluating 8 samples simultaneously to leverage vectorization (can get almost 4 times faster on my PC using -O3). I also removed the reference to dbuf from the function description as it does not exist anymore (it was removed from the API in commit #1fa28ced55ebddb522cd383ce41bf0a792aff1). A few details that are worth noticing:

I used malloc to dynamically allocate the delay line as p_malloc is not yet implemented. It should, however, be a simple drop replacement as soon as that function is available.
The function behaves exactly like MATLAB's filter, i.e., the delay line is initialized with 0s (the first nh samples are not operating on a "filled buffer").
As in MATLAB's filter, the filter coefficients are assumed to be in increasing delay order, i.e., this function implements the filter H(z) = h[0] + h[1] z^{-1} + h[2] z^{-2} + ... + h[nh] z^(-nh+1) (sorry, no math parser in GitHub). Although it is very easy to do it in reverse, this is by far the most common convention in DSP.

I have a test file for these functions, but I'm unsure how to post them (check_p_* files are in .gitignore). I'm somewhat new to git, so I didn't include them in this PR. In any case, I can easily push them.

The new implementation follows the same idea as the previous one: simultaneously processes blocks of 8 samples to leverage vectorization. On an i7-3770 PC, this function is almost 4 times faster than the evaluating the output sample-by-sample (compiled with -O3). Also removed the reference to dbuf from the function description as it does not exist anymore (it was removed from the API in commit #1fa28ced55ebddb522cd383ce41bf0a792aff177). Signed-off-by: Luiz Chamon <[email protected]>

lfochamon · 2015-06-19T19:07:06Z

Just to make the PR a little better documented, here is a small benchmark of this function compared with a straightforward circular buffer implementation. Since the proposed implementation (p_fir)does use an array of length 2*nh+1 compared to nh for the trivial implementation (p_fir_1), I thought it would be a good idea to justify that showing the speed increase. The tests were run on my PC (i7-3770, windows) and on a BeagleBone Black (AM3358, linux).

Function	Time (`nh = 16` and `nx = 30`)	Code size (bytes)	Delay line length
`p_fir`	0.250 us (i7) / 11.573 us (BBB)	52.452 (i7) / 9305 (BBB)	`2*nh + 1`
`p_fir_1`	0.535 us (i7) / 14.228 us (BBB)	51.428 (i7) / 7263 (BBB)	`nh`

aolofsson · 2015-06-19T19:14:08Z

Aweseome! The ~25x factor between Intel and BBB/Zynq based board seems to be repeatable. (not the first time I have seen this). Still not sure I can explain the full factor? Is the vectorization on ARM broken?

dsp: p_fir: Fix FIR filter implementation

lfochamon · 2015-06-19T20:47:36Z

This is really way beyond my expertise (I'm a DSP guy), but could it have something to do with all the Intel's hyperthreading and multicore stuff that goes on in there? Although to be honest, I couldn't see any difference in the CPU monitors while running the tests.

aolofsson added a commit that referenced this pull request Jun 19, 2015

Merge pull request #152 from lchamon/151-p_fir

087bda2

dsp: p_fir: Fix FIR filter implementation

aolofsson merged commit 087bda2 into parallella:master Jun 19, 2015

lfochamon deleted the 151-p_fir branch June 19, 2015 20:47

This was referenced Jul 1, 2015

dsp: p_firsym: Implement p_firsym #176

Merged

dsp: p_fir, p_firsym: Implement tests #182

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dsp: p_fir: Fix FIR filter implementation #152

dsp: p_fir: Fix FIR filter implementation #152

lfochamon commented Jun 16, 2015

lfochamon commented Jun 19, 2015

aolofsson commented Jun 19, 2015

lfochamon commented Jun 19, 2015

dsp: p_fir: Fix FIR filter implementation #152

dsp: p_fir: Fix FIR filter implementation #152

Conversation

lfochamon commented Jun 16, 2015

lfochamon commented Jun 19, 2015

aolofsson commented Jun 19, 2015

lfochamon commented Jun 19, 2015