Beat Tracking

Audio Source Selection

By default, BeatViewer uses the default audio input. You can specify an audio device using the -a <device-id> parameter. You can get a list of audio devices by using the -l flag:

$ python -m beatviewer -l
0     2 in, 0 out    0.09 ms - 0.18 ms    44.1 kHz    MME                    Mappeur de sons Microsoft - Input
1<    2 in, 0 out    0.09 ms - 0.18 ms    44.1 kHz    MME                    Mixage stéréo (Realtek(R) Audio
2     2 in, 0 out    0.09 ms - 0.18 ms    44.1 kHz    MME                    Ligne (USB AUDIO  CODEC)
10    2 in, 0 out    0.12 ms - 0.24 ms    44.1 kHz    Windows DirectSound    Pilote de capture audio principal
11    2 in, 0 out    0.12 ms - 0.24 ms    44.1 kHz    Windows DirectSound    Mixage stéréo (Realtek(R) Audio)
12    2 in, 0 out    0.12 ms - 0.24 ms    44.1 kHz    Windows DirectSound    Ligne (USB AUDIO  CODEC)
20    2 in, 8 out    0.01 ms - 0.05 ms    44.1 kHz    ASIO                   ASIO4ALL v2
27    2 in, 0 out    0.00 ms - 0.01 ms    44.1 kHz    Windows WASAPI         Ligne (USB AUDIO  CODEC)
28    2 in, 0 out    0.00 ms - 0.01 ms    44.1 kHz    Windows WASAPI         Mixage stéréo (Realtek(R) Audio)

In the above example, device 1 is the default audio input. To use the built-in line input (2 in this example), use the following command:

python -m beatviewer -a 2

Tips

Windows Audio Configuration

Newer versions of Windows made accessing the sound utility dialog (which, for instance, allows you to activate the Stereo Mix input) more complex. You can access it directly by making a shortcut to mmsys.cpl (or calling it in the terminal).

Windows Core Audio APIs

Windows offers several audio APIs. A single audio source may appear several times in the audio source selection, once for each API. It allows for balancing between latency and compatibility:

API	Behavior
MME (Multimedia Events, previously WinMM)	Oldest API, highest latency, best compatibility
DirectSound	DirectX-related interface
WASAPI (Windows Audio Session API)	Most recent API, lowest latency

Offline Analysis

You can also execute the module offline, by passing the path to an audio file with the -f argument:

python -m beatviewer -f track.wav

For now, only WAVE files are supported. The algorithm used is the same as for online tracking, an audio stream is simply emulated from the file. By default, tracking is not realtime, the tracker goes as fast as it can. You can use the -t flag to make it realtime, allowing for offline visualizations. Offline track analysis can then be performed by analyzing the generated output.

Recording

You may record the audio from the selected source by passing a path to an output WAV file to the -r argument:

python -m beatviewer -r ~/Desktop/recording.wav

Output

You may specify an output file with the -o [PATH] argument. It will generate a TSV file listing detected events, with the following columns:

Event type: either BEAT (detected beat), ONSET (detected onset) or BPM (change of BPM estimation)
Event frame: OSS frame index at which the event occured
Event time: time (in seconds) when the event occured; it is the event frame index divided by the OSS sampling rate (which is the audio sampling rate divided by the audio hop size, see the parameter table below)
Event value: for BPM events, the associated new BPM value

Graph Analysis

You may visualize processed signals by using the -g flag:

python -m beatviewer -g

This shows the Onset Strength Signal (OSS) with its mean and the detection threshold, the tempo, the Cumulative Beat Strength Signal (CBSS) and the detected period (Δt), the Beat Prediction Signal (BPS) and the beat trigger index (εt). Left part of the graph is the past, right part of the graph is the (predicted) future. Note that the past and the future plots are both scaled to the window height independently.

You may change the graph framerate with the -gf [int] argument (default is 30 fps).

Tuning Parameters

The beat tracking algorithm depends on many parameters. To use a specific configuration, use the -c <path-to-config-file> parameter. Take inspiration from the config.txt file.

Parameter	Default	Description
`audio_window_size`	1024	Audio window size for computing FFT.
`audio_hop_size`	128	Number of new samples in the window at each iteration. It will set the sampling rate for the onset strength signal. Given the audio sampling rate Fs, and the hop size H, the OSS sampling rate will be FsO = Fs / H. For Fs = 44100 and H = 128, we have FsO = 344.53 Hz.
`compression_gamma`	1	The spectral flux is compressed to reduce the dynamic range of the signal, and adapt it to the human hearing mechanism which is logarithmically sensitive to amplitude. Set to 0 to ignore compression. Greater values (1000) will deaden strong values and make lower values have more impact.
`noise_cancellation_level`	-74	After compression, frequency bins with levels below this threshold are set to zero. The value is specified in dB.
`hamming_window_size`	15	The width of windowing function applied to the spectral flux, to make it smoother. This acts as a low-pass filter. The greater the width, the lower the cutoff frequency will be. At 15, it is about 7 Hz.
`oss_buffer_size`	1024	Number of OSS samples used to compute the OSS mean and the OSS variance.
`onset_threshold`	0.1	If the OSS becomes greater than this number of standard deviation above the mean, an onset is detected.
`onset_threshold_min`	5.0	If the variance is too small, this absolute threshold is used.
`oss_window_size`	2048	Number of OSS samples used to estimate the tempo.
`oss_hop_size`	128	Number of new samples in the window at each iteration. If FsO is the OSS sampling rate and H is the hop size, a new tempo is estimated with rate FsO / H. With FsO = 344.53 Hz, this yields 2.7 Hz.
`frequency_domain_compression`	0.5	The OSS is autocorrelated to find tempo lag candidates. This is computed by performing an FFT and a IFFT on the OSS. A power compression is applied in the frequency domain. Smaller values will increase the lag resolution but negatively impact noise.
`min_bpm_detection`	50	Minimum BPM detected.
`max_bpm_detection`	210	Maximum BPM detected.
`tempo_candidates`	10	Number of tempo candidates considered when estimating tempo.
`tempo_accumulator_decay`	0.9	Detected tempi are added to an accumulated sum. This sum decreases overtime to allow for tempo variation detection. The greater the value (0.99, 0.999) the more stable the estimator is, but the longer it takes for new tempi to be detected.
`tempo_accumulator_gaussian_width`	10	The tempo accumulated sum is made of Gaussian curves centered on each detected tempi. This Gaussian width allows for getting over slight variations.
`min_bpm_rescaled`	90	If the result BPM is lower than this, it gets doubled.
`max_bpm_rescaled`	180	If the result BPM is greater than this, it gets halved.
`cbss_buffer_size`	512	Number of CBSS samples used to determined the previous beat location.
`cbss_eta`	300	The log-gaussian width around previous beat locations.
`cbss_alpha`	0.9	Trade-off between the OSS and a pure periodic signal. It takes values between 0 and 1. At 0, only the OSS is considered. At 1, only the periodic signal is considered.
`bps_epsilon_o`	0	Offline latency correction factor, in number of OSS samples. See Section 6.1. of Musical Robot Swarms and Equilibria (Krzyżaniak, 2020) for details.
`bps_epsilon_r`	0	Realtime latency correction factor, in number of OSS samples. See Section 6.2. of Musical Robot Swarms and Equilibria (Krzyżaniak, 2020) for details.
`bps_epsilon_t`	20	Beat trigger index. Greater values means detecting beats earlier.
`bps_gaussian_width`	10	Width of the gaussian representing the next beat locations.
`bps_buffer_size`	512	Number of samples for which beat locations are predicted, in the future. As this is a cumulative process, bigger buffer will result in a more stable behavior.
`bps_cooldown_ratio`	0.4	Ratio of samples ignored right after a beat is detected, relative to the tempo lag (ie. the number of samples between two beats).
`key_trigger_beats_earlier`	page up	Increase the value of bps_epsilon_t.
`key_trigger_beats_later`	page down	Decrease the value of bps_epsilon_t.
`key_set_mode_regular`	f9	Change tracking mode to default.
`key_set_mode_tempo_locked`	f10	Change tracking mode to tempo locked, where current BPM is locked and CBSS will only depend on the so generated pulse train.

If the -k flag is set, then PageUp and PageDown keys can be used to increase or decrease bps_epsilon_t while the tracker is running, for manually synchronizing the tracker live.

Tracking Mode

If the -k flag is set, then F9 and F10 keys can be used to switch between two tracking modes:

Key	Mode	Behavior
F9	Regular	Regular tracking mode
F10	Tempo locked	The current tempo value is kept and further estimations are discarded until mode is switched back to regular

Provide feedback

Saved searches

Use saved searches to filter your results more quickly