Term Discovery Evaluation

Toolbox to evaluate Term Discovery systems.

Implements some of the metrics described in this paper.

This toolbox transcribed phoneticall each discovered interval, then applies NLP evaluation to judge the quality of the discovery. The metrics are:

NED : mean of the edit distance between all the discovered pairs
coverage: percentage of the corpus covered
token/type: measure how good the system was at finding gold tokens and gold types
boundary: measure how good the system was at finding gold boundaries
grouping: judge the purity of the clusters formed by the system

Installation

Install the required packages using pip

pip install -r requirements.txt

And install the package

python setup.py build && python setup.py install

How To Use

The discovered intervals should be in the following format:

    Class 1:
    wav1 on1 off1
    wav2 on2 off2

    Class 2:
    wav1 on3 off3
    wav3 on4 off4

and finish by an empty line (which is important).

You can compute the measures using the eval.py script

python eval.py discovered_class corpus output/

where corpus is the corpus you want to evaluate (currently supporting ['english' , 'french', 'mandarin', 'buckeye'], where the first three are the corpora of the ZeroSpeech 2017 challenge).

You can also use the python API

import pkg_resources 
from WDE.readers.gold_reader import *
from WDE.readers.disc_reader import *
wrd_path = pkg_resources.resource_filename(
            pkg_resources.Requirement.parse('WDE'),
            'WDE/share/mandarin.wrd')
phn_path = pkg_resources.resource_filename(
            pkg_resources.Requirement.parse('WDE'),
            'WDE/share/mandarin.phn')

gold = Gold(wrd_path=wrd_path, 
                phn_path=phn_path) 

disc_clsfile = "/path/to/discovered/file"

disc = Disc(disc_clsfile, gold) 

from WDE.measures.grouping import * 
grouping = Grouping(discovered)
grouping.compute_grouping()

print(grouping.precision)
print(grouping.recall)

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.conda		.conda
tdev2		tdev2
test		test
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
TODOS.todo		TODOS.todo
clusters_example.json		clusters_example.json
config.json		config.json
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py
tde_run.ipynb		tde_run.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Term Discovery Evaluation

Installation

How To Use

About

Releases

Packages

Languages

License

korhanpolat/tdev2

Folders and files

Latest commit

History

Repository files navigation

Term Discovery Evaluation

Installation

How To Use

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages