Merge pull request #9 from Music-and-Culture-Technology-Lab/vocal-mix

Add note-level vocal transcription, with integration of vocal-contour module.
Music-and-Culture-Technology-Lab · Dec 12, 2020 · 34916bb · 34916bb
2 parents b931b4a + 32e1818
commit 34916bb
Show file tree

Hide file tree

Showing 45 changed files with 2,288 additions and 245 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,26 @@
 # Changelog
 
+## 0.2.0 - 2020-
+
+### Vocal transcription is available now!
+After a long development and experiments, we finally complete the vocal transcription module
+and integrate them into omnizart.
+
+### Features
+- Release `vocal` and `vocal-contour` submodules.
+
+### Enhancement
+- Improve chord transcription results by filtering out chord predictions with short duration.
+- Unify the way of resolving the transcirption results' output path.
+
+### Documentation
+- Re-organize the quick start and tutorial page to give a more clean and fluent reading experience.
+- Move the development section origially in README.md to CONTRIBUTING.md.
+
+### Bug Fix
+- Fix bug of passing the wrong parameter to vamp of chroma feature extraction.
+
+---
 
 ## 0.1.1 - 2020-12-01
 ### Features

diff --git a/README.md b/README.md
@@ -43,8 +43,8 @@ Comprehensive usage and API references can be found in the [official documentati
 |------------------|--------------------|--------------------|----------|-----------------------------------|
 | music            | :heavy_check_mark: | :heavy_check_mark: |          | Transcribes notes of instruments. |
 | drum             | :heavy_check_mark: | :interrobang:      |          | Transcribes drum tracks.          |
-| vocal            |                    |                    |          | Transcribes pitch of vocal.       |
-| vocal-contour    |                    |                    |          | Transcribes contour of vocal.     |
+| vocal            | :heavy_check_mark: | :heavy_check_mark: |          | Transcribes pitch of vocal.       |
+| vocal-contour    | :heavy_check_mark: | :heavy_check_mark: |          | Transcribes contour of vocal.     |
 | chord            | :heavy_check_mark: | :heavy_check_mark: |          | Transcribes chord progression.    |
 | beat             |                    |                    |          | Transcribes beat position.        |
 

diff --git a/docs/source/base.rst b/docs/source/base.rst
@@ -2,5 +2,22 @@ Base Classes
 ============
 
 .. automodule:: omnizart.base
+
+
+Transcription
+-------------
+.. autoclass:: omnizart.base.BaseTranscription
+    :members:
+
+
+Label
+-----
+.. autoclass:: omnizart.base.Label
     :members:
-    :undoc-members:
+
+
+Dataset Loader
+--------------
+.. autoclass:: omnizart.base.BaseDatasetLoader
+    :members:
+
diff --git a/docs/source/chord/api.rst b/docs/source/chord/api.rst
@@ -7,7 +7,7 @@ Chord Transcription
 
 App
 ###
-.. automodule:: omnizart.chord.app
+.. autoclass:: omnizart.chord.app.ChordTranscription
     :members:
     :show-inheritance:
 
@@ -19,6 +19,13 @@ Feature
     :undoc-members:
 
 
+Dataset
+#######
+.. autoclass:: omnizart.chord.app.McGillDatasetLoader
+    :members:
+    :show-inheritance:
+
+
 Inference
 #########
 .. automodule:: omnizart.chord.inference

diff --git a/docs/source/drum/api.rst b/docs/source/drum/api.rst
@@ -7,7 +7,14 @@ Drum Transcription
 
 App
 ###
-.. automodule:: omnizart.drum.app
+.. autoclass:: omnizart.drum.app.DrumTranscription
+    :members:
+    :show-inheritance:
+
+
+Dataset
+#######
+.. autoclass:: omnizart.drum.app.PopDatasetLoader
     :members:
     :show-inheritance:
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -57,6 +57,31 @@ The result of drum transcription
    </audio>
 
 
+The result of vocal transcription.
+
+.. raw:: html
+
+   <audio controls="controls">
+      <source src="_audio/high_vocal_synth.mp3" type="audio/mpeg">
+      Your browser does not support the <code>audio</code> element.
+   </audio>
+
+
+The result of vocal pitch contour transcription.
+
+.. raw:: html
+
+   <audio controls="controls">
+      <source src="_audio/high_vocal_contour.mp3" type="audio/mpeg">
+      Your browser does not support the <code>audio</code> element.
+   </audio>
+
+
+Source files can be downloaded `here <https://drive.google.com/file/d/15VqHearznV9L83cyl61ccACsXXJ4vBHo/view?usp=sharing>`_.
+You can use *Audacity* to open it.
+
+All works are developed under `MCTLab <https://sites.google.com/view/mctl/home>`_.
+
 
 .. toctree::
    :maxdepth: 2
@@ -67,6 +92,7 @@ The result of drum transcription
    music/cli.rst
    drum/cli.rst
    chord/cli.rst
+   vocal/cli.rst
    vocal-contour/cli.rst
 
 
@@ -77,6 +103,7 @@ The result of drum transcription
    music/api.rst
    drum/api.rst
    chord/api.rst
+   vocal/api.rst
    vocal-contour/api.rst
    feature.rst
    models.rst

diff --git a/docs/source/models.rst b/docs/source/models.rst
@@ -42,6 +42,15 @@ Chord Transformer
     :show-inheritance:
 
 
+Pyramid Net
+###########
+
+.. automodule:: omnizart.models.pyramid_net
+    :members:
+    :undoc-members:
+    :show-inferitance:
+
+
 Utils
 #####
 

diff --git a/docs/source/music/api.rst b/docs/source/music/api.rst
@@ -7,7 +7,14 @@ Music Transcription
 
 App
 ###
-.. automodule:: omnizart.music.app
+.. autoclass:: omnizart.music.app.MusicTranscription
+    :members:
+    :show-inheritance:
+
+
+Dataset
+#######
+.. autoclass:: omnizart.music.app.MusicDatasetLoader
     :members:
     :show-inheritance:
 

diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst
@@ -47,8 +47,8 @@ The supported applications are as follows:
 * ``music`` - Transcribes polyphonic music, and outputs notes of pitched instruments in MIDI.
 * ``drum`` - Transcribes polyphonic music, and outputs events of percussive instruments in MIDI.
 * ``chord`` - Transcribes polyphonic music, and outputs chord progression in MIDI and CSV.
+* ``vocal`` - Transcribes polyphonic music, and outputs note-level vocal melody.
 * ``vocal-contour`` - Transcribes polyphonic music, and outputs frame-level vocal melody (F0) in text.
-* ``vocal`` *(preparing)* - Transcribes polyphonic music, and outputs note-level vocal melody.
 * ``beat`` *(preparing)* - MIDI-domain beat tracking.
 
 Except ``beat`` which takes as input a MIDI file, all the applications receive audio files in WAV.
@@ -73,27 +73,29 @@ The processed features will be stored in *<path/to/dataset>/train_feature* and *
 
 The supported datasets for feature processing are application-dependent, summarized as follows:
 
-+-----------+-------+------+-------+------+---------------+
-| Module    | music | drum | chord | beat | vocal-contour |
-+===========+=======+======+=======+======+===============+
-| Maestro   |   O   |      |       |      |               |
-+-----------+-------+------+-------+------+---------------+
-| Maps      |   O   |      |       |      |               |
-+-----------+-------+------+-------+------+---------------+
-| MusicNet  |   O   |      |       |      |               |
-+-----------+-------+------+-------+------+---------------+
-| Pop       |   O   |  O   |       |      |               |
-+-----------+-------+------+-------+------+---------------+
-| Ext-Su    |   O   |      |       |      |               |
-+-----------+-------+------+-------+------+---------------+
-| BillBoard |       |      |   O   |      |               |
-+-----------+-------+------+-------+------+---------------+
-| BPS-FH    |       |      |       |      |               |
-+-----------+-------+------+-------+------+---------------+
-| MIR-1K    |       |      |       |      |       O       |
-+-----------+-------+------+-------+------+---------------+
-| MedleyDB  |       |      |       |      |       O       |
-+-----------+-------+------+-------+------+---------------+
++-------------+-------+------+-------+------+-------+---------------+
+| Module      | music | drum | chord | beat | vocal | vocal-contour |
++=============+=======+======+=======+======+=======+===============+
+| Maestro     |   O   |      |       |      |       |               |
++-------------+-------+------+-------+------+-------+---------------+
+| Maps        |   O   |      |       |      |       |               |
++-------------+-------+------+-------+------+-------+---------------+
+| MusicNet    |   O   |      |       |      |       |               |
++-------------+-------+------+-------+------+-------+---------------+
+| Pop         |   O   |  O   |       |      |       |               |
++-------------+-------+------+-------+------+-------+---------------+
+| Ext-Su      |   O   |      |       |      |       |               |
++-------------+-------+------+-------+------+-------+---------------+
+| BillBoard   |       |      |   O   |      |       |               |
++-------------+-------+------+-------+------+-------+---------------+
+| BPS-FH      |       |      |       |      |       |               |
++-------------+-------+------+-------+------+-------+---------------+
+| MIR-1K      |       |      |       |      |   O   |       O       |
++-------------+-------+------+-------+------+-------+---------------+
+| MedleyDB    |       |      |       |      |       |       O       |
++-------------+-------+------+-------+------+-------+---------------+
+| Tonas       |       |      |       |      |   O   |               |
++-------------+-------+------+-------+------+-------+---------------+
 
 Before running the commands below, make sure to download the corresponding datasets first.
 This can be easily done in :ref:`Download Datasets`.

diff --git a/docs/source/vocal-contour/api.rst b/docs/source/vocal-contour/api.rst
@@ -31,7 +31,7 @@ It will be loaded by the class :class:`omnizart.setting_loaders.VocalContourSett
 The name of the attributes will be converted to snake-case (e.g. HopSize -> hop_size). 
 There is also a path transformation when applying the settings into the ``VocalContourSettings`` instance. 
 For example, the attribute ``BatchSize`` defined in the yaml path *General/Training/Settings/BatchSize* is transformed 
-to *MusicSettings.training.batch_size*. 
+to *VocalContourSettings.training.batch_size*. 
 The level of */Settings* is removed among all fields.
 
 .. literalinclude:: ../../../omnizart/defaults/vocal_contour.yaml

diff --git a/docs/source/vocal/api.rst b/docs/source/vocal/api.rst
@@ -0,0 +1,54 @@
+Vocal Transcription
+===================
+
+
+.. automodule:: omnizart.vocal
+
+
+App
+###
+.. autoclass:: omnizart.vocal.app.VocalTranscription
+    :members:
+    :show-inheritance:
+
+
+Dataset
+#######
+.. autoclass:: omnizart.vocal.app.VocalDatasetLoader
+    :members:
+    :show-inheritance:
+
+
+Inference
+#########
+.. automodule:: omnizart.vocal.inference
+    :members:
+
+
+Labels
+######
+.. automodule:: omnizart.vocal.labels
+    :members:
+    :undoc-members:
+
+
+Prediction
+##########
+.. automodule:: omnizart.vocal.prediction
+    :members:
+    :undoc-members:
+
+
+Settings
+########
+Below are the default settings for building the vocal model. It will be loaded
+by the class :class:`omnizart.setting_loaders.VocalSettings`. The name of the
+attributes will be converted to snake-case (e.g. HopSize -> hop_size). There
+is also a path transformation process when applying the settings into the
+``VocalSettings`` instance. For example, if you want to access the attribute
+``BatchSize`` defined in the yaml path *General/Training/Settings/BatchSize*,
+the coressponding attribute will be *VocalSettings.training.batch_size*.
+The level of */Settings* is removed among all fields.
+
+.. literalinclude:: ../../../omnizart/defaults/vocal.yaml
+    :language: yaml
diff --git a/docs/source/vocal/cli.rst b/docs/source/vocal/cli.rst
@@ -0,0 +1,25 @@
+omnizart vocal
+==============
+
+Lists the detailed available options of each sub-commands.
+
+
+transcribe
+##########
+
+.. click:: omnizart.cli.vocal.transcribe:transcribe
+    :prog: omnizart vocal transcribe
+
+
+generate-feature
+################
+
+.. click:: omnizart.cli.vocal.generate_feature:generate_feature
+    :prog: omnizart vocal generate-feature
+
+
+train-model
+###########
+
+.. click:: omnizart.cli.vocal.train_model:train_model
+    :prog: omnizart vocal train-model
diff --git a/omnizart/__init__.py b/omnizart/__init__.py
@@ -7,4 +7,4 @@
 os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
 os.environ['VAMP_PATH'] = os.path.join(MODULE_PATH, "resource", "vamp")
 
-__version__ = "0.1.1"
+__version__ = "0.2.0"