Phrase extraction

Project was created during Statistical Machine Translation at Computer Science, Adam Mickiewicz University. It parses GIZA++ output format then run grow-diag-final-and algorithm and finally extract possible phrases. It was tested on portuguese-polish languages.

To run:

python main.py [fe_file] [ef_file]

Both files has to be in GIZA++ format.

For example:

python main.py data/pt-pl data/pl-pt

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
.gitignore		.gitignore
.gitignore~		.gitignore~
README.md		README.md
alignment.py		alignment.py
main.py		main.py
phrase_extraction.py		phrase_extraction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phrase extraction

About

Releases

Packages

Languages

maciejbiesek/smt-phrase-extraction

Folders and files

Latest commit

History

Repository files navigation

Phrase extraction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages