You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey gusy, I finally got some spare time to look into this now. Thanks a lot for putting this together!
I'm looking at the symbol tables fro words and characters. I noticed that 0 was reserved for in words.txt, but was used for in characters.txt. As a results, in the resulting SG.fst graph, on the output side you have separate and symbols, while on the input side, you have a mixed and symbol. This is because OpenFST treat 0 as epsilon in all algorithms by default.
Shall we reserve 0 for as long as OpenFST is involved? This requires changes to both Athena and Athena-decoder. Correct me if I'm wrong though. @tjadamlee@godjealous
The text was updated successfully, but these errors were encountered:
Hey gusy, I finally got some spare time to look into this now. Thanks a lot for putting this together!
I'm looking at the symbol tables fro words and characters. I noticed that 0 was reserved for in words.txt, but was used for in characters.txt. As a results, in the resulting SG.fst graph, on the output side you have separate and symbols, while on the input side, you have a mixed and symbol. This is because OpenFST treat 0 as epsilon in all algorithms by default.
Shall we reserve 0 for as long as OpenFST is involved? This requires changes to both Athena and Athena-decoder. Correct me if I'm wrong though. @tjadamlee@godjealous
Thanks for your interest in athena-decoder project.
Actually, we always reserve 0 for epsilon on the input side and output side in WFST. As you have mentioned, symbol 0 is reserved in file words.txt. Symbol 0 is also reserved in file characters_disambig.txt.
The input symbol table for SG.fst graph is file "characters_disambig.txt" rather than the file "characters.txt". The output symbol table for SG.fst graph is file "words.txt".
Compared with file "characters.txt", file "characters_disambig.txt" contains some extra information including epsilon symbol and some disambiguate symbols.
Hey gusy, I finally got some spare time to look into this now. Thanks a lot for putting this together!
I'm looking at the symbol tables fro words and characters. I noticed that 0 was reserved for in words.txt, but was used for in characters.txt. As a results, in the resulting SG.fst graph, on the output side you have separate and symbols, while on the input side, you have a mixed and symbol. This is because OpenFST treat 0 as epsilon in all algorithms by default.
Shall we reserve 0 for as long as OpenFST is involved? This requires changes to both Athena and Athena-decoder. Correct me if I'm wrong though. @tjadamlee @godjealous
The text was updated successfully, but these errors were encountered: