Skip to content

Commit

Permalink
Merge pull request #13 from opensource-spraakherkenning-nl/add-variat…
Browse files Browse the repository at this point in the history
…ions

Add variation + add back dash (-) to punctuation exceptions
  • Loading branch information
greenw0lf authored Nov 13, 2023
2 parents 86ec141 + 655cf48 commit fc17e2c
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 1 deletion.
2 changes: 1 addition & 1 deletion ASR_NL_benchmark/normalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def replace_numbers_and_symbols(text):
>>> replace_numbers_and_symbols('12,3%')
'twaalf komma drie procent'
"""
removed_punct = string.punctuation.replace("'", '')
removed_punct = string.punctuation.replace("'", '').replace('-', '')
text_without_symbols = replace_symbols(text)
clean_text = replace_numbers(text_without_symbols)
clean_text = clean_text.translate(str.maketrans('', '', removed_punct))
Expand Down
1 change: 1 addition & 0 deletions ASR_NL_benchmark/variations.glm
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ tewerk => te werk / [ ] __ [ ]
[concept-] => [{ concept- / concept }] / [ ] __ [ ]
[NAVO-] => [{ NAVO- / NAVO }] / [ ] __ [ ]
[uh] => [{ uh / %HESITATION }] / [ ] __ [ ]
[bnr-nieuwsradio] => [{ bnr-nieuwsradio / bnr nieuwsradio }]
;;
;; BN-VL
[Darfour] => [{ Darfour / Darfur }] / [ ] __ [ ]
Expand Down

0 comments on commit fc17e2c

Please sign in to comment.