Skip to content

Commit

Permalink
Merge pull request #12 from opensource-spraakherkenning-nl/add-variat…
Browse files Browse the repository at this point in the history
…ions

Add some variations + remove dash (-) again
  • Loading branch information
greenw0lf authored Nov 6, 2023
2 parents 0016129 + 221eeeb commit 86ec141
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
2 changes: 1 addition & 1 deletion ASR_NL_benchmark/normalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def replace_numbers_and_symbols(text):
>>> replace_numbers_and_symbols('12,3%')
'twaalf komma drie procent'
"""
removed_punct = string.punctuation.replace("'", '').replace("-", '')
removed_punct = string.punctuation.replace("'", '')
text_without_symbols = replace_symbols(text)
clean_text = replace_numbers(text_without_symbols)
clean_text = clean_text.translate(str.maketrans('', '', removed_punct))
Expand Down
2 changes: 2 additions & 0 deletions ASR_NL_benchmark/variations.glm
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ z'n => zijn / [ ] __ [ ]
'k => ik / [ ] __ [ ]
'r => er / [ ] __ [ ]
'ns => eens / [ ] __ [ ]
ie => hij / [ ] __ [ ]
da's => dat is / [ ] __ [ ]
d'ruit => eruit / [ ] __ [ ]
restaurant- => restaurant / [ ] __ [ ]
jeugd- => jeugd / [ ] __ [ ]
Expand Down

0 comments on commit 86ec141

Please sign in to comment.