-
Notifications
You must be signed in to change notification settings - Fork 0
/
main.tex
100 lines (64 loc) · 13.1 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
% interactapasample.tex
% v1.05 - August 2017
\documentclass[]{interact}
\usepackage{epstopdf}% To incorporate .eps illustrations using PDFLaTeX, etc.
\usepackage{listings}
\usepackage{caption}
\usepackage{subcaption}
% \usepackage[caption=false]{subfig}% Support for small, `sub' figures and tables
%\usepackage[nolists,tablesfirst]{endfloat}% To `separate' figures and tables from text if required
%\usepackage[doublespacing]{setspace}% To produce a `double spaced' document if required
%\setlength\parindent{24pt}% To increase paragraph indentation when line spacing is doubled
\usepackage{multirow}
\usepackage{tabularx}
\usepackage{hyperref}
% \usepackage[longnamesfirst,sort]{natbib}% Citation support using natbib.sty
% \bibpunct[, ]{(}{)}{;}{a}{,}{,}% Citation support using natbib.sty
% \renewcommand\bibfont{\fontsize{10}{12}\selectfont}% To set the list of references in 10 point font using natbib.sty
% \usepackage[natbibapa,nodoi]{apacite}% Citation support using apacite.sty. Commands using natbib.sty MUST be deactivated first!
% \setlength\bibhang{12pt}% To set the indentation in the list of references using apacite.sty. Commands using natbib.sty MUST be deactivated first!
% \renewcommand\bibliographytypesize{\fontsize{10}{12}\selectfont}% To set the list of references in 10 point font using apacite.sty. Commands using natbib.sty MUST be deactivated first!
\usepackage{apacite}
\theoremstyle{plain}% Theorem-like structures provided by amsthm.sty
\newtheorem{theorem}{Theorem}[section]
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{proposition}[theorem]{Proposition}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\theoremstyle{remark}
\newtheorem{remark}{Remark}
\newtheorem{notation}{Notation}
\begin{document}
\articletype{ARTICLE TEMPLATE}% Specify the article type or omit as appropriate
\title{Sketching Sounds Journal Paper Draft}
\author{
\name{A.~N. Author\textsuperscript{a}\thanks{CONTACT A.~N. Author. Email: [email protected]} and John Smith\textsuperscript{b}}
\affil{\textsuperscript{a}Taylor \& Francis, 4 Park Square, Milton Park, Abingdon, UK; \textsuperscript{b}Institut f\"{u}r Informatik, Albert-Ludwigs-Universit\"{a}t, Freiburg, Germany}
}
\maketitle
\begin{abstract}
\end{abstract}
\begin{keywords}
Sections; lists; figures; tables; mathematics; fonts; references; appendices
\end{keywords}
\section{Introduction}\label{sec:introduction}
Humans make sense of sound in various ways often borrowing from associations with other sensory domains. While arguably most known for connections between sound and colour, these cross-modal associations can take many forms. This research focuses on connections between sound and shapes in graphical sketches. Historically, sound-shape associations have been researched through matching studies where participants are presented with a sound stimulus and multiple visual stimuli. More recent research has participants respond freely through actions like gesture, movement or drawing. Results suggest that to some extent people share similar associations despite individual variance. While existing research mainly looks at associations between shapes and phonetic sounds, pitch or music, this research focuses on the character of a sound or its timbre. This is investigated through two studies that ask participants to sketch their associations with sounds using a digital interface. The first study (n$=$28) uses an exploratory design that imposes minimal restrictions to gain an overview of the various representational approaches participants will deploy for different types of sound. The second study (n$=$88) focuses on simple, abstract representations produced by a frequency modulation (FM) synthesizer. Alongside sound-sketches, qualitative feedback was collected from participants through interviews and surveys to find out how they approached the tasks. The analysis categorised the various representational approaches and tested for correlations between sounds and shapes through statistical analyses of quantitative audio and visual features. The results lay the groundwork for the wider context of this research that aims to find out if a graphical sketch input can be used to help improve interaction with digital music production software, specifically for the control of digital synthesizers. Contemporary music production heavily uses digital technologies and musical timbre occupies a prominent role for many modern music styles that distinguish themselves through their `specific' sound rather than harmony, melody or musical structure. Finding or crafting a desired sound becomes an integral part of the production process which often involves wading through large sample libraries or tweaking software parameters. These parameters often relate to the underlying digital signal processing (DSP) rather than human perception of sound which can make it difficult to realise sound ideas or explore sonic spaces in an intuitive way. With the development of a sketch-controlled synthesizer in mind, the sound-sketches collected in these studies aid the future implementation of a mapping architecture between a sketch input and an FM synthesizer. The process of designing the second study included the development of the digital sketching interface from a simple, generic sketchpad to a specialised sketching interface that could be used for controlling a sketch-based synthesizer.
This paper first introduces relevant research into cross-modal associations, with a focus on sound-shape associations, musical timbre and sketch recognition in Section~\ref{sec:background}. The methodology is described in Section~\ref{sec:methodology}, followed by the setup of both studies in Section~\ref{sec:study_setup}. The results are presented in multiple sections: an overview of extracted audio and visual features in Section~\ref{sec:extracted_features}, analysis of participant feedback in Section~\ref{sec:interview_analysis}, sound-sketch categorisation in Section~\ref{sec:sketch_categorisation} and statistical feature analysis in Section~\ref{sec:feature_analysis}. Discussion and conclusion can be found in Sections~\ref{sec:discussion} and~\ref{sec:conclusion}.
\section{Background}\label{sec:background}
This section first gives an overview of cross-modal research with a focus on sound-shape associations that build the basis for this research. It then continues to introduce research into timbre and sketch recognition that is relevant for the analysis of the user studies.
\subsection{Cross-modal associations}\label{subsec:sound-shape}
Cross-modal associations describe how a stimulus from one modality can induce a response in another modality. While often reported between sounds and colour, they can occur across various modalities, for example between colours and odors~\cite{}, sounds and tastes~\cite{} or sound and shapes~\cite{}. Cross-modal associations are sometimes wrongly referred to as synesthesia. Synesthesia is a rare condition occurring in less than 5{\%} of the population~\cite{} and synesthetes experience cross-modal connections involuntarily and consistently; the same stimulus always induces the same response. Cross-modal associations on the other hand are experienced in some form by most people, but connections tend to be far less consistent and might only occur situationally. Cross-modal associations can be found when describing a sound in everyday communication referencing concepts like brightness, warmth, sharpness or colour among others. This requires some level of shared understanding which is supported by research that found consistent cross-modal associations between people despite influence by personal factors~\cite{}. One of the earliest examples of cross-modal research is provided by Wolfgang K{\"o}holer, a member of the \textit{Gestaltpsychology} movement in the 1920s, who found that people associate the made-up words \textit{takete} or \textit{kiki} with sharp, jagged shapes and \textit{maluma} or \textit{boubou} with soft, round shapes~\cite{kohler1929gestalt}.
\begin{figure}[h]
\centering
\includegraphics[width=0.5\columnwidth]{Images/maluma-takete.jpg}
\caption{Visual stimuli used in Wolfgang K{\"o}hler's experiments. The left shape is overwhelmingly associated with the made-up word `maluma' and the right one with `takete'.}
\end{figure}
The effect was confirmed in multiple studies~\cite{ramachandranSynaesthesiaWindowPerception} and generalised to all phonemes~\cite{nielsenParsingRoleConsonants2013}. It was observed across cultures~\cite{davisFitnessNamesDrawings1961,taylorPhoneticSymbolismFour1962,bremnerBoubaKikiNamibia2013}, age groups including toddlers~\cite{maurerShapeBoubasSound2006}, to some extent, with the visually impaired~\cite{bottiniSoundSymbolismSighted2019} and between movement and phonemes~\cite{shinoharaTaketeMalumaAction2016}. Similar associations were not only found for phonetic sounds but also for musical instruments~\cite{adeliAudiovisualCorrespondenceMusical2014} and abstract sonic textures~\cite{grillVisualizationPerceptualQualities2012}. In these studies, associations are found by explicitly asking participants to match stimuli from different modalities for example words to shapes.
Another approach to studying cross-modal associations is to measure how a stimulus in one domain can influence the perception of another stimulus in a different domain. A famous example is the McGurk effect~\cite{}, that shows how a video of a speaker voicing different vowels can influence which vowels participants hear in an audio recording. Other examples include the impact of a mug's color on the taste of coffee~\cite{van2014does} and the effect of musical groove on sexual attractiveness~\cite{}. Actions can be also be influenced by sound; Thoret et al. found that participants drew circles with a more elliptical skew when listening to sounds that evoked elliptical kinematics~\cite{thoretSeeingCirclesDrawing2016}. Salgado‑Montejo et al. showed the influence of pitch on the location of free hand movement. They also found that movement was more jagged for higher pitches and rounder for lower pitches~\cite{spence_paper}. Both examples show that consistencies can be found between people not just in matching or rating tasks but also when given free agency over their response. However, participants' actions were not visualised posing the question if seeing one's action drawn out as a sketch has an influence on the response.
In visual art, connections between the auditory and visual domains are a reoccurring theme. Notably, Russian artist Wassily Kandinsky developed multiple hypotheses on associations between shapes, colours and music leading to a number of cross-modal artworks inspired by pieces of composer Arnold Sch{\"o}nberg~\cite{}. Consistent associations were found between complex stimuli of visual artworks and musical compositions~\cite{painting&spanish_music_paper} and piano music excerpts and simple visual structures~\cite{clemente2020set}. K{\"u}ssner investigated how participants represent pure tones varied in pitch, loudness and tempo producing similar findings about space and pitch as Salgado-Montejo et al.~\cite{}.
While timbre is acknowledged to play a role in cross-modal associations, the aforementioned research focuses on the musical context of the sound stimuli. Grill et al. showed that visualisation of cross-modal associations can help participants retrieve sound samples outside of a specific musical context~\cite{grillVisualizationPerceptualQualities2012}. Knees and Andersen further developed this idea by proposing sound retrieval from a graphical sketch input and built a non-functioning prototype of such a system~\cite{kneesSearchingAudioSketching2016}. Compared to K{\"u}ssner's research mentioned above, timbre plays a more dominant role here which participants tend to visualise through shapes, contours or textures - what the researchers call \textit{symbolic}. Recent research further investigated how participants represent different timbres in a more controlled way~\cite{engelnCoHEARenceAudibleShapes2020} and show how a functioning sketch-based sound retrieval pipeline could be implemented with the help of deep learning~\cite{engeln2021similarity}. The results suggest, but also images that could be classified as more complex artwork as discussed above. Work by the authors produced similar results~\cite{icmc2021} and showed that participants can deduct information about sound from sketches~\cite{icmc2022}. The research presented in this paper takes a closer look at how to classify different representational approaches and collects a comprehensive sound-sketch dataset.
% - However, an important distinction has to be made to sound-shape associations that relates to the characteristics of sound (musical timbre) and not musical structure like melody and harmony
% - Sound-shape associations can inform visualisations that communicate sound characteristics without audio playback for example to improve the organisation of one's personal music library~\cite{kolhoffMusicIconsProcedural2006}, sample selection for live DJ performances~\cite{chen2010thumbnaildj}, exploration of different natural sounds~\cite{wan2019towards,ishibashi2020investigating} and retrieval of abstract sounds
\end{document}