A clustering approach to sound source tracking in Ambisonic audio. This modules contains code for:
- Spherical harmonic eamforming using Plane Wave Decomposition and Cross-pattern coherence beams.
- Rotation of non axis-symmetric spherical functions using Wigner-D matrices.
- Fibonacci, regular and geodesic spherical sampling schemes.
- Clustering (DBSCAN) and regression (SVR) for estimating coherent sound sources from power maps.
find_sources
wrapper function for automating estimation of source trajectories from an Ambisonic audio file.- Functions to plot outputs.
- Implementations of Frame Recall and DOA Error performance metrics from DCASE 2019.
find_sources(input, *args, **kwargs)
input
should be a path to an Ambisonic audio file.
*kwargs
passed to sph_peaks_t
:
max_n_peaks=20
- the maximum number of peaks that will be saved per frame.audio_length_seconds=None
- optional variable replacing output frame numbers with time in seconds.
*args
passed to obj_trajectories
:
eps=0.1
- DBSCAN Eps parameter.min_samples=10
- DBSCAN MinPts parameter.relative_peak_threshold=0.5
- dipy rel_pk parameter.min_separation_angle=5
- dipy min_sep parameter.