GitHub - mhmotallebi/NLP-Project

This project aims at annotating one side of a parallel corpus givene the annotated text of the other side and outputs of the GIZA++ sentence aligner. For example, given a English-Persian parallel corpus, alignments created using GIZA++, and Annotations of English text, this program annotates Persian side.

More precisely, final file of Giza (containing sentences and ids of corresponding persian translation of each word in each sentence), vocabulary of the Persian (containing each word and its id), tokenized persian and english texts, annotated English text (preferrably tokenized in advance) are needed. The annotated English text may differ in each project, hence may need to modify the code accordingly.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
cleansing.py		cleansing.py
create_target_trainset.py		create_target_trainset.py
detect-NEs-noSERVER.py		detect-NEs-noSERVER.py
detect-NEs.py		detect-NEs.py
find_match_sentences.py		find_match_sentences.py
get_tags_distribution.py		get_tags_distribution.py
improve_recall.py		improve_recall.py
out.1.10k		out.1.10k

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

mhmotallebi/NLP-Project

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages