-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #8 from BU-Spark/analysis-checkin1
[CORRECT ONE | IGNORE THE OTHER REQUEST] Analysis checkin
- Loading branch information
Showing
62 changed files
with
736,492 additions
and
11,939 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Content | ||
This subdirectory consists mainly of analysis python notebooks. | ||
|
||
# Metrics | ||
We are using: | ||
* WER - Calculated using **Levenschtein Distance Algorithm** | ||
* ROUGE Score - A combination of `Jieba` and `rouge_chinese` library | ||
|
||
# Attempts | ||
We have attempted to draw relationship between **stuttering count** and performance by the followings: | ||
* plotting **stuttering count** to performance metrics => no pattern found | ||
* plotting **audio length** to performance metrics => no pattern found | ||
|
||
# Notes | ||
* We are thinking about the indirect performance analysis assuming shorter **audio length** implies less **stuttering count** => no evidence for assumption | ||
|
||
# Issues | ||
* Some `pandas` parsed dataframe has mismatched ground truth to the original ground truth transcription data file provided. We are checking if the parsing has gone wrong, otherwise removing those transcripts for analysis. | ||
|
||
# Upcoming Agenda | ||
* Normalized analysis => merging **stuttering count** and **audio length** into new parameter **frequency** | ||
* Process the falsely parsed transcription |
Oops, something went wrong.