This repository hosts all scripts developed for this project. As the project comprises three chapters, each directory contains the analysis carried out in the chapter.
The first chapter of the project is aimed to assess the suitability of SSMs for the purpose of speaker identification. In order to achieve this aim, we address the following research questions to carry out this assessment:
- Are self-supervised models good candidates to study speaker identity coding?
- What aspects of speech do self-supervised models encode?
- What are the models’ invariances and equivariances when recognizing a speaker?
In this chapter, we study the models’ encoding spaces as analogous to the perceptual space of humans. Here are the research questions we are tackling in this chapter:
- Is there a correlation between linear distances computed in the embeddings space and theperceptual space of humans?
- Does learnable decision models explain human behavior better than linear distance metrics?
- What are the commonalities and differences between the representational spaces of the models and humans?
Taking a step further, in this chapter, we aim to investigate the correspondence between models’ encoding spaces and human neural representational space. The main question we ask in this chapter:
- Where is the information content of speech models best represented in the brain?
Further details regarding the code scripts are provided in the directory of each chapter.