-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset directory structure #215
Comments
I like the proposed new structure, I think it solves most of our issues with the folder names. I just have a few questions:
For me, all adjustments spectrograms can be put into the same fodler, and they can also be deleted once the whole spectrogram genereation is launched. |
Easier question first:
The names are placeholders atm, but the idea is to make deserializable files: the It may not be a good idea for the
This simple, quick question led to a complicated, long discussion, which in turn led to another draft structure, involving even more drastic changes to OSEkit 👺 Basically, here are the changes we evoked: Time period is moved above audio parameters in the structure:This might help keeping track on which time regions of the dataset has already been analyzed
Store LTAS in the time period rootWould store LTAS as an analysis, but with specifing LTAS instead of the audio duration (which would implicitly be something like For example, I want to generate a LTAS with a sr of 258 Hz over the whole example dataset time period, with a sr of 258 Hz, a time resolution of 30 minutes (that is, 258 * 1800 = 230400-samples-wide temporal windows), and nfft=256. Moreover, I want to generate a LTAS with the same parameters, only on the period covered by Analyses A1 & A2. This would lead to the following structure:
Replace
|
As discussed with @mathieudpnt and @PaulCarvaillo, we might keep features that risk breaking the retro-compatibility for later, in a brighter future when OSEkit is reformatted and easier to maintain! ☀️ |
The current dataset directory structure suffers from some flaws. For example, running an analysis that differ from a previous one only in time period request overwriting the previous analysis.
In this issue, I try to expose these flaws by using an example dataset in which I run 4 analyses that differ by the audio parameters (time duration, sample rate) and/or by the fft parameters (in that case no reshaping of the audio files is needed).
I'll first describe the analyses and the original dataset, then show the code snippets matching each analysis, and then the directory structure that results from these analyses.
Finally, I've added 2 draft directory structures:
What do you, as OSEkit users, think of these draft structures?
Example
An original dataset, from which 4 analyses are run:
Different start/end times than original
Original Dataset :
Analyses :
Analysis A1
Analysis A2
Analysis B
Analysis C
Current directory structure
Problems:
t_start
andt_stop
processed
: could be replaced byoutput
.spectrogram
andmatrix
could fall into aspectrum
upper levelDraft modifications of existing structure
Remarks
There still are some flaws in this structure:
metadata.csv
name is used several times for different usesfile_metadata.csv
andtimestamp.csv
contain redundant information, keep onlyfile_metadata.csv
?xxx_metadata.csv
files withxxx.json
files that could be used for serializing python classes? (e.g., ananalysis_dataset.json
file in each analysis folder that can be parsed to aDataset
object in OSEkit).Draft new structure
dataset\audiolength_samplerate\tstart_tend\
: correspond to one call to the reshaper module).data
andoutput
folders.The text was updated successfully, but these errors were encountered: