Skip to content

Commit

Permalink
Merge pull request #10 from Music-and-Culture-Technology-Lab/vocal_frame
Browse files Browse the repository at this point in the history
Updated docs
  • Loading branch information
BreezeWhite authored Dec 12, 2020
2 parents 34916bb + ac001d8 commit 2e969b3
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 66 deletions.
14 changes: 7 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,23 @@

## 0.2.0 - 2020-

### Vocal transcription is available now!
After a long development and experiments, we finally complete the vocal transcription module
and integrate them into omnizart.
### Vocal melody transcription in both frame- and note-level are live!
We release the modules for vocal melody transcription after a decent amount of effort.
Now you can transcribe your favorite singing voice.

### Features
- Release `vocal` and `vocal-contour` submodules.

### Enhancement
- Improve chord transcription results by filtering out chord predictions with short duration.
- Unify the way of resolving the transcirption results' output path.
- Resolve the path for transcription output in a consistent way.

### Documentation
- Re-organize the quick start and tutorial page to give a more clean and fluent reading experience.
- Move the development section origially in README.md to CONTRIBUTING.md.
- Re-organize Quick Start and Tutorial pages to improve accessibility.
- Move the section for development from README.md to CONTRIBUTING.md.

### Bug Fix
- Fix bug of passing the wrong parameter to vamp of chroma feature extraction.
- Fix bugs of passing the wrong parameter to vamp for chroma feature extraction.

---

Expand Down
66 changes: 25 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# omnizart
# OMNIZART

[![build](https://github.com/Music-and-Culture-Technology-Lab/omnizart/workflows/general-check/badge.svg)](https://github.com/Music-and-Culture-Technology-Lab/omnizart/actions?query=workflow%3Ageneral-check)
[![docs](https://github.com/Music-and-Culture-Technology-Lab/omnizart/workflows/docs/badge.svg?branch=build_doc)](https://music-and-culture-technology-lab.github.io/omnizart-doc/)
Expand All @@ -7,63 +7,47 @@
[![PyPI - Downloads](https://img.shields.io/pypi/dm/omnizart)](https://pypistats.org/packages/omnizart)
[![Docker Pulls](https://img.shields.io/docker/pulls/mctlab/omnizart)](https://hub.docker.com/r/mctlab/omnizart)

Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.
Combines all the hard works developed by everyone in MCTLab into a single command line tool. Python package and docker
image are also available.
Omnizart is a Python library that aims for democratizing automatic music transcription.
Given polyphonic music, it is able to transcribe pitched instruments, vocal melody, chords, drum events, and beat.
This is powered by the research outcomes from [Music and Culture Technology (MCT) Lab](https://sites.google.com/view/mctl/home).

### Try omnizart now!! [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://bit.ly/omnizart-colab)
### Transcribe your favorite songs now in Colab! [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://bit.ly/omnizart-colab)

A quick-start example is as following:
# Quick start

Visit the [complete document](https://music-and-culture-technology-lab.github.io/omnizart-doc/) for detailed guidance.

## Pip
``` bash
# Install omnizart
pip install omnizart

# Download the checkpoints after installation
# Download the checkpoints
omnizart download-checkpoints

# Now it's ready for the transcription~
# Transcribe your songs
omnizart drum transcribe <path/to/audio.wav>
omnizart chord transcribe <path/to/audio.wav>
omnizart music transcribe <path/to/audio.wav>
```

Or use the docker image:
## Docker
```
docker pull mctlab/omnizart:latest
docker run -it mctlab/omnizart:latest bash
```

Comprehensive usage and API references can be found in the [official documentation site](https://music-and-culture-technology-lab.github.io/omnizart-doc/).

# About
[Music and Culture Technology Lab (MCTLab)](https://sites.google.com/view/mctl/home) aims to develop technology for music and relevant applications by leveraging cutting-edge AI techiniques.

# Plan to support
| Commands | transcribe | train | evaluate | Description |
|------------------|--------------------|--------------------|----------|-----------------------------------|
| music | :heavy_check_mark: | :heavy_check_mark: | | Transcribes notes of instruments. |
| drum | :heavy_check_mark: | :interrobang: | | Transcribes drum tracks. |
| vocal | :heavy_check_mark: | :heavy_check_mark: | | Transcribes pitch of vocal. |
| vocal-contour | :heavy_check_mark: | :heavy_check_mark: | | Transcribes contour of vocal. |
| chord | :heavy_check_mark: | :heavy_check_mark: | | Transcribes chord progression. |
| beat | | | | Transcribes beat position. |

**NOTES** Though the implementation of training the drum model is 90% complete, but there still exists some
invisible bugs that cause the training fails to converge compared to the author's original implementation.

Example usage
<pre>
omnizart music transcribe <i>path/to/audio</i>
omnizart chord transcribe <i>path/to/audio</i>
omnizart drum transcribe <i>path/to/audio</i>
</pre>
# Supported applications
| Application | Transcription | Training | Evaluation | Description |
|------------------|--------------------|--------------------|------------|--------------------------------------------------|
| music | :heavy_check_mark: | :heavy_check_mark: | | Transcribe musical notes of pitched instruments. |
| drum | :heavy_check_mark: | :interrobang: | | Transcribe events of percussive instruments. |
| vocal | :heavy_check_mark: | :heavy_check_mark: | | Transcribe note-level vocal melody. |
| vocal-contour | :heavy_check_mark: | :heavy_check_mark: | | Transcribe frame-level vocal melody (F0). |
| chord | :heavy_check_mark: | :heavy_check_mark: | | Transcribe chord progressions. |
| beat | | | | Transcribe beat position. |

For training a new model, download the dataset first and follow steps described below.
<pre>
# The following command will default saving the extracted feature under the same folder,
# called <b>train_feature</b> and <b>test_feature</b>
omnizart music generate-feature -d <i>path/to/dataset</i>
**NOTES**
The current implementation for the drum model has unknown bugs, preventing loss convergence when training from scratch.
Fortunately, you can still enjoy drum transcription with the provided checkpoints.

# Train a new model
omnizart music train-model -d <i>path/to/dataset</i>/train_feature --model-name My-Model
</pre>
23 changes: 13 additions & 10 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,24 @@ Omnizart provides the main functionalities that construct the life-cycle of deep
covering from *dataset downloading*, *feature pre-processing*, *model training*, to *transcription* and *sonification*.
Pre-trained checkpoints are also provided for the immediate usage of transcription.

Demonstration
#############

Demo
####
Colab
*****

Play with the `Colab notebook <https://bit.ly/omnizart-colab>`_ to transcribe your favorite song almost immediately!

Below is a demonstration of chord and drum transcription.
Sound samples
*************

Original song

.. raw:: html

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/hjJhweRlE-A" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

The result of chord transcription
Chord transcription

.. raw:: html

Expand All @@ -47,7 +52,7 @@ The result of chord transcription
</audio>


The result of drum transcription
Drum transcription

.. raw:: html

Expand All @@ -57,7 +62,7 @@ The result of drum transcription
</audio>


The result of vocal transcription.
Note-level vocal transcription

.. raw:: html

Expand All @@ -67,7 +72,7 @@ The result of vocal transcription.
</audio>


The result of vocal pitch contour transcription.
Frame-level vocal transcription*

.. raw:: html

Expand All @@ -78,9 +83,7 @@ The result of vocal pitch contour transcription.


Source files can be downloaded `here <https://drive.google.com/file/d/15VqHearznV9L83cyl61ccACsXXJ4vBHo/view?usp=sharing>`_.
You can use *Audacity* to open it.

All works are developed under `MCTLab <https://sites.google.com/view/mctl/home>`_.
You can use *Audacity* to open the files.


.. toctree::
Expand Down
16 changes: 8 additions & 8 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Detailed descriptions for the usage of each sub-command can be found in the dedi
* :doc:`drum/cli`
* :doc:`chord/cli`
* :doc:`vocal-contour/cli`
* vocal *(preparing)*
* :doc:`vocal/cli`
* beat *(preparing)*

All the applications share a same set of actions: **transcribe**, **generate-feature**, and **train-model**.
Expand All @@ -44,14 +44,14 @@ Transcribe
As the name suggests, this action transcribes a given input.
The supported applications are as follows:

* ``music`` - Transcribes polyphonic music, and outputs notes of pitched instruments in MIDI.
* ``drum`` - Transcribes polyphonic music, and outputs events of percussive instruments in MIDI.
* ``chord`` - Transcribes polyphonic music, and outputs chord progression in MIDI and CSV.
* ``vocal`` - Transcribes polyphonic music, and outputs note-level vocal melody.
* ``vocal-contour`` - Transcribes polyphonic music, and outputs frame-level vocal melody (F0) in text.
* ``beat`` *(preparing)* - MIDI-domain beat tracking.
* ``music`` - Transcribe musical notes of pitched instruments in MIDI.
* ``drum`` - Transcribe events of percussive instruments in MIDI.
* ``chord`` - Transcribe chord progressions in MIDI and CSV.
* ``vocal`` - Transcribe note-level vocal melody in MIDI.
* ``vocal-contour`` - Transcribe frame-level vocal melody (F0) in text.
* ``beat`` *(preparing)* - Transcribe beat position.

Except ``beat`` which takes as input a MIDI file, all the applications receive audio files in WAV.
Note that all the applications receive polyphonic music in WAV, except ``beat`` receives inputs in MIDI.

Example usage:

Expand Down

0 comments on commit 2e969b3

Please sign in to comment.