-
Notifications
You must be signed in to change notification settings - Fork 0
/
doc-bass.txt
190 lines (152 loc) · 8.81 KB
/
doc-bass.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
**********************************************************************
*********** BASS-GENOM3: THE BINAURAL AUDIO STREAM SERVER ************
**********************************************************************
Copyright (c) 2014, LAAS/CNRS
All rights reserved.
CONTENTS
1. OVERVIEW
2. BASS BASICS
3. SERVICES
4. OUTPUT PORT
1. OVERVIEW ----------------------------------------------------------
The binaural audio stream server, named 'bass', is a GenoM3 module in
charge of acquiring binaural audio data from a hardware sound
interface, and making it available to other components, branded as
clients of bass.
The module offers services to start and stop the acquisition of audio
data, and streams the captured data on an output port. A sliding
window of the most recent data is kept on the port, the size of which
can be set at runtime (for instance, the port keeps the last 2 seconds
of acquired signals).
bass relies on ALSA (Advanced Linux Sound Architecture) to communicate
with the hardware interface.
2. BASS BASICS -------------------------------------------------------
This section defines the notions and the vocabulary that bass uses.
Some obvious notions:
* Interface and device: They are synonyms for the hardware board in
charge of converting analog sound signals into digital streams.
* Acquisition and capture: They are synonyms for retrieving audio data
from microphones through an audio interface.
* Binaural audio and channels: Binaural audio consists of two channels
(like stereo audio), corresponding to left and right audio streams.
Less obvious notions:
* Samples and frames: A sample is a digital value encoding the signal
on one channel at one point in time. A frame is a collection of
samples, one from each channel, at one point in time (thus for
binaural audio, a frame is two samples).
* Chunks: In capturing state, the sound device regularly delivers
blocks of new data to the module. These blocks are called chunks.
The size of these chunks (commonly given in amount of frames) can be
chosen when starting an acquisition, and is then fixed for this
acquisition.
Note that these definitions can differ from other applications where
the word 'frame' may refer to data blocks of a few milliseconds. Here,
these blocks are rather called chunks, a frame being a single point in
time.
3. SERVICES ----------------------------------------------------------
Services offered by the bass module are defined in the description
file 'bass-genom3/bass.gen', along with their documentation. This
section recaps them and adds a few specifications.
* The 'ListDevices' service can be called to print the available sound
devices for acquisition on standard output.
* The 'Acquire' service starts the acquisition of audio data and
updates the output port with captured data (see details about the
port in next section). This service expects 4 input parameters:
(data_type name = default_value : "documentation")
string device = "hw:1,0" : "ALSA name of the sound device",
unsigned long sampleRate = 44100 : "Sample rate in Hz",
unsigned long nFramesPerChunk = 2205 : "Chunk size in frames",
unsigned long nChunksOnPort = 20 : "Port size in chunks"
The name of a sound device (1st parameter) can be found with the
ListDevices service mentioned above. The chunk size in amount of
frames (3rd parameter) is an important parameter, as smaller chunks
will lead to shorter latency but also higher computational
demand. Last, the 4th parameter sets the amount of chunks kept on
the port. With the default values given above, 20 chunks of 2205
frames is a total of 44100 frames kept on the port, i.e. 1 second of
audio data at the default sample rate.
The Acquire service can return an exception if the configuration of
the interface fails (e.g. the requested sample rate is not
supported), if a problem occurs during the acquisition (e.g. the
interface gets unplugged), etc. If an exception occur, the user can
get more information by reading the message printed on standard
error stream.
* The 'Stop' service stops the acquisition of audio data. (Note that
the Acquire service also interrupts itself, so a new acquisition
with different parameters can directly be started from a running one
without having to call Stop).
4. OUTPUT PORT -------------------------------------------------------
Data captured by the Acquire service are streamed on an output port
named 'Audio' (defined in 'bass-genom3/bassInterface.gen'). They are
gathered in two arrays, one for each channel, updated as FIFO stacks:
every time a new chunk is retrieved from the hardware interface, the
content of the arrays is shifted, deleting the oldest chunk of data
and making room to the newest one, as detailed below.
At the beginning of the acquisition, the arrays are filled with zeros
and the first captured chunks are progressively added, following the
FIFO design. For instance, the state of one array before and after
adding the 4th chunk is illustrated here (assuming that the port is
longer than 4 chunks):
+-------+-------+-------+-------+-------+-------+-------+-------+
| zeros | 1 | 2 | 3 |
| | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| zeros | 1 | 2 | 3 | 4 |
| | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
The length of the port is a round number of chunks, set with parameter
nChunksOnPort of the Acquire service (noted COP below). The size of
one chunk is also set when calling 'Acquire', with parameter
nFramesPerChunk (noted FPC below). Thus, the left and right arrays
contain FPC*COP samples each. Once the port is entirely filled with
data (all beginning zeros have been erased), the oldest chunk is
deleted as a new chunk arrives:
+-------+-------+-------+-------+-------+-------+-------+-------+
| 1 | 2 | 3 | ... | COP-1 | COP |
| (old) | | | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| 2 | 3 | 4 | ... | COP | COP+1 |
| (old) | | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| 3 | 4 | 5 | ... | COP+1 | COP+2 |
| | | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
In order to let the clients keep track of the data and detect any
loss, the port also publishes an index that indicates the number of
frames that have been streamed since the beginning of the
acquisition. In other words, it is the index of the last streamed
frame, noted lastFrameIndex. To summarize, the data structure of the
Audio output port (defined in 'bass-genom3/bassStruct.idl') is the
following:
struct portStruct {
unsigned long sampleRate; //sample rate in Hertz
unsigned long nChunksOnPort; //COP: amount of Chunks On the Port
unsigned long nFramesPerChunk; //FPC: amount of Frames Per Chunk
unsigned long long lastFrameIndex; //the index defined above
sequence<long> left; //audio data from left channel
sequence<long> right; //audio data from right channel
};
* The sampleRate, nChunksOnPort and nFramesPerChunk members are set as
input parameters of the Acquire service.
* The left and right members are dynamic arrays ('sequence<long>'
type) of FPC*COP samples. Samples are signed integers coded on 32
bits ('long' type).
* The index for tracking data is stored in the lastFrameIndex
member. As this index is incremented by FPC frames everytime a new
chunk is published on the port, it is important to check that it
will not overflow. The index is therefore coded as an unsigned
integer on 64 bits ('unsigned long long' type. With a sample rate of
192kHz for instance, the order of magnitude of the index overflow is
a million years).