-
Notifications
You must be signed in to change notification settings - Fork 0
Beat Tracking
By default, BeatViewer uses the default audio input. You can specify an audio device using the -a <device-id>
parameter. You can get a list of audio devices by using the -l
flag:
$ python -m beatviewer -l
0 2 in, 0 out 0.09 ms - 0.18 ms 44.1 kHz MME Mappeur de sons Microsoft - Input
1< 2 in, 0 out 0.09 ms - 0.18 ms 44.1 kHz MME Mixage stéréo (Realtek(R) Audio
2 2 in, 0 out 0.09 ms - 0.18 ms 44.1 kHz MME Ligne (USB AUDIO CODEC)
10 2 in, 0 out 0.12 ms - 0.24 ms 44.1 kHz Windows DirectSound Pilote de capture audio principal
11 2 in, 0 out 0.12 ms - 0.24 ms 44.1 kHz Windows DirectSound Mixage stéréo (Realtek(R) Audio)
12 2 in, 0 out 0.12 ms - 0.24 ms 44.1 kHz Windows DirectSound Ligne (USB AUDIO CODEC)
20 2 in, 8 out 0.01 ms - 0.05 ms 44.1 kHz ASIO ASIO4ALL v2
27 2 in, 0 out 0.00 ms - 0.01 ms 44.1 kHz Windows WASAPI Ligne (USB AUDIO CODEC)
28 2 in, 0 out 0.00 ms - 0.01 ms 44.1 kHz Windows WASAPI Mixage stéréo (Realtek(R) Audio)
In the above example, device 1
is the default audio input. To use the built-in line input (2
in this example), use the following command:
python -m beatviewer -a 2
Newer versions of Windows made accessing the sound utility dialog (which, for instance, allows you to activate the Stereo Mix input) more complex. You can access it directly by making a shortcut to mmsys.cpl
(or calling it in the terminal).
Windows offers several audio APIs. A single audio source may appear several times in the audio source selection, once for each API. It allows for balancing between latency and compatibility:
API | Behavior |
---|---|
MME (Multimedia Events, previously WinMM) | Oldest API, highest latency, best compatibility |
DirectSound | DirectX-related interface |
WASAPI (Windows Audio Session API) | Most recent API, lowest latency |
You can also execute the module offline, by passing the path to an audio file with the -f
argument:
python -m beatviewer -f track.wav
For now, only WAVE files are supported. The algorithm used is the same as for online tracking, an audio stream is simply emulated from the file. By default, tracking is not realtime, the tracker goes as fast as it can. You can use the -t
flag to make it realtime, allowing for offline visualizations. Offline track analysis can then be performed by analyzing the generated output.
You may record the audio from the selected source by passing a path to an output WAV file to the -r
argument:
python -m beatviewer -r ~/Desktop/recording.wav
You may specify an output file with the -o [PATH]
argument. It will generate a TSV file listing detected events, with the following columns:
- Event type: either
BEAT
(detected beat),ONSET
(detected onset) orBPM
(change of BPM estimation) - Event frame: OSS frame index at which the event occured
- Event time: time (in seconds) when the event occured; it is the event frame index divided by the OSS sampling rate (which is the audio sampling rate divided by the audio hop size, see the parameter table below)
- Event value: for
BPM
events, the associated new BPM value
You may visualize processed signals by using the -g
flag:
python -m beatviewer -g
This shows the Onset Strength Signal (OSS) with its mean and the detection threshold, the tempo, the Cumulative Beat Strength Signal (CBSS) and the detected period (Δt), the Beat Prediction Signal (BPS) and the beat trigger index (εt). Left part of the graph is the past, right part of the graph is the (predicted) future. Note that the past and the future plots are both scaled to the window height independently.
You may change the graph framerate with the -gf [int]
argument (default is 30 fps).
The beat tracking algorithm depends on many parameters. To use a specific configuration, use the -c <path-to-config-file>
parameter. Take inspiration from the config.txt
file.
Parameter | Default | Description |
---|---|---|
audio_window_size |
1024 | Audio window size for computing FFT. |
audio_hop_size |
128 | Number of new samples in the window at each iteration. It will set the sampling rate for the onset strength signal. Given the audio sampling rate Fs, and the hop size H, the OSS sampling rate will be FsO = Fs / H. For Fs = 44100 and H = 128, we have FsO = 344.53 Hz. |
compression_gamma |
1 | The spectral flux is compressed to reduce the dynamic range of the signal, and adapt it to the human hearing mechanism which is logarithmically sensitive to amplitude. Set to 0 to ignore compression. Greater values (1000) will deaden strong values and make lower values have more impact. |
noise_cancellation_level |
-74 | After compression, frequency bins with levels below this threshold are set to zero. The value is specified in dB. |
hamming_window_size |
15 | The width of windowing function applied to the spectral flux, to make it smoother. This acts as a low-pass filter. The greater the width, the lower the cutoff frequency will be. At 15, it is about 7 Hz. |
oss_buffer_size |
1024 | Number of OSS samples used to compute the OSS mean and the OSS variance. |
onset_threshold |
0.1 | If the OSS becomes greater than this number of standard deviation above the mean, an onset is detected. |
onset_threshold_min |
5.0 | If the variance is too small, this absolute threshold is used. |
oss_window_size |
2048 | Number of OSS samples used to estimate the tempo. |
oss_hop_size |
128 | Number of new samples in the window at each iteration. If FsO is the OSS sampling rate and H is the hop size, a new tempo is estimated with rate FsO / H. With FsO = 344.53 Hz, this yields 2.7 Hz. |
frequency_domain_compression |
0.5 | The OSS is autocorrelated to find tempo lag candidates. This is computed by performing an FFT and a IFFT on the OSS. A power compression is applied in the frequency domain. Smaller values will increase the lag resolution but negatively impact noise. |
min_bpm_detection |
50 | Minimum BPM detected. |
max_bpm_detection |
210 | Maximum BPM detected. |
tempo_candidates |
10 | Number of tempo candidates considered when estimating tempo. |
tempo_accumulator_decay |
0.9 | Detected tempi are added to an accumulated sum. This sum decreases overtime to allow for tempo variation detection. The greater the value (0.99, 0.999) the more stable the estimator is, but the longer it takes for new tempi to be detected. |
tempo_accumulator_gaussian_width |
10 | The tempo accumulated sum is made of Gaussian curves centered on each detected tempi. This Gaussian width allows for getting over slight variations. |
min_bpm_rescaled |
90 | If the result BPM is lower than this, it gets doubled. |
max_bpm_rescaled |
180 | If the result BPM is greater than this, it gets halved. |
cbss_buffer_size |
512 | Number of CBSS samples used to determined the previous beat location. |
cbss_eta |
300 | The log-gaussian width around previous beat locations. |
cbss_alpha |
0.9 | Trade-off between the OSS and a pure periodic signal. It takes values between 0 and 1. At 0, only the OSS is considered. At 1, only the periodic signal is considered. |
bps_epsilon_o |
0 | Offline latency correction factor, in number of OSS samples. See Section 6.1. of Musical Robot Swarms and Equilibria (Krzyżaniak, 2020) for details. |
bps_epsilon_r |
0 | Realtime latency correction factor, in number of OSS samples. See Section 6.2. of Musical Robot Swarms and Equilibria (Krzyżaniak, 2020) for details. |
bps_epsilon_t |
20 | Beat trigger index. Greater values means detecting beats earlier. |
bps_gaussian_width |
10 | Width of the gaussian representing the next beat locations. |
bps_buffer_size |
512 | Number of samples for which beat locations are predicted, in the future. As this is a cumulative process, bigger buffer will result in a more stable behavior. |
bps_cooldown_ratio |
0.4 | Ratio of samples ignored right after a beat is detected, relative to the tempo lag (ie. the number of samples between two beats). |
key_trigger_beats_earlier |
page up | Increase the value of bps_epsilon_t. |
key_trigger_beats_later |
page down | Decrease the value of bps_epsilon_t. |
key_set_mode_regular |
f9 | Change tracking mode to default. |
key_set_mode_tempo_locked |
f10 | Change tracking mode to tempo locked, where current BPM is locked and CBSS will only depend on the so generated pulse train. |
If the -k
flag is set, then PageUp and PageDown keys can be used to increase or decrease bps_epsilon_t
while the tracker is running, for manually synchronizing the tracker live.
If the -k
flag is set, then F9 and F10 keys can be used to switch between two tracking modes:
Key | Mode | Behavior |
---|---|---|
F9 | Regular | Regular tracking mode |
F10 | Tempo locked | The current tempo value is kept and further estimations are discarded until mode is switched back to regular |