Skip to content

Commit

Permalink
Merge pull request #8 from BU-Spark/analysis-checkin1
Browse files Browse the repository at this point in the history
[CORRECT ONE | IGNORE THE OTHER REQUEST] Analysis checkin
  • Loading branch information
wylliamunlimited authored Dec 3, 2024
2 parents 9b699d6 + e97a74c commit ebc36c3
Show file tree
Hide file tree
Showing 62 changed files with 736,492 additions and 11,939 deletions.
22 changes: 22 additions & 0 deletions fall2024/PoC/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Content
This subdirectory consists mainly of analysis python notebooks.

# Metrics
We are using:
* WER - Calculated using **Levenschtein Distance Algorithm**
* ROUGE Score - A combination of `Jieba` and `rouge_chinese` library

# Attempts
We have attempted to draw relationship between **stuttering count** and performance by the followings:
* plotting **stuttering count** to performance metrics => no pattern found
* plotting **audio length** to performance metrics => no pattern found

# Notes
* We are thinking about the indirect performance analysis assuming shorter **audio length** implies less **stuttering count** => no evidence for assumption

# Issues
* Some `pandas` parsed dataframe has mismatched ground truth to the original ground truth transcription data file provided. We are checking if the parsing has gone wrong, otherwise removing those transcripts for analysis.

# Upcoming Agenda
* Normalized analysis => merging **stuttering count** and **audio length** into new parameter **frequency**
* Process the falsely parsed transcription
Loading

0 comments on commit ebc36c3

Please sign in to comment.