-
Notifications
You must be signed in to change notification settings - Fork 12
/
README
52 lines (35 loc) · 2.34 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Utilities for OpenFst
---------------------
Benoit Favre <[email protected]>
released under Apache License Version 2.0
This is a set of useful programs for manipulating Finite State Transducer with the OpenFst library.
* fstcompile-nolex [-t]: compile an acceptor [transducer], generate symbol lexicons on the fly and save them with the fst. Useful for quick hacks on a single fst.
* fstcompose-maplex <fst1> <fst2>: compose after mapping symbol tables (for use with fstcompile-nolex which generates different symbol tables) fst1 or fst2 can be "" (empty string) which means stdin.
* fstminimize-transducer: encode input/output, rmepsilon, determinize, minimize and decode in one pass.
* fstdeterminize-tc-lex: keep the best output for each input sequence in a transducer using determinization in the (Tropical, Categorial)-Lexicographic semiring. See "Efficient Determinization of Tagged Word Lattices using Categorial and Lexicographic Semirings", by Izhak Shafran et al, ASRU 2011.
* fstsuperfinal-noepsilon: add a superfinal state without adding epsilon arcs
* fstoracle <fst1> <fst2>: compute the shortest distance alignment between two transducers
* fstcompose-specials <fst1> <fst2>: compose two transducers using special <phi>, <rho> and <sigma> transitions. <sigma> can replace any input symbol; <rho> is like sigma but only if no other path can be followed; <phi> is an epsilon transition which can be followed if no other transition matches an input symbol. Note that lexicons from the two fsts are mapped.
fstprint sentence.fst:
0 1 the
1 2 boy
2 3 is
3 4 here
4
fstprint model.fst:
0 0 <rho> <eps>
0 1 boy XXX
1 2 is YYY
2 2 <rho> <eps>
2
./fstcompose-specials model.fst < sentence.fst | fstprint
0 1 the <eps>
1 2 boy XXX
2 3 is YYY
3 4 here <eps>
4
* fstposteriors: compute arc-level posterior probabilities from an automaton where weights are -log probs in the tropical semiring.
* fstprint-nbest-strings <n>: compute nbest and print string of input/output symbols (or input if input == output)
--- less useful / non-working stuff ---
* add-tags <dict>: generate a tagging transducer from a word acceptor according to a tab-separated, one-word-per-line tag dictionary
* ngram-expand: expand transducer so that each arc has a fixed context of up to n-1 arcs (so called ngramize) -- non functional at this time