Skip to content

FlorentRamb/probabilistic_french_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 

Repository files navigation

Probabilistic French Parser

TD 2 of the course MVA - Algorithms for Speech and NLP.

Description

The task is the folllowing: given a tokenized sentence as input, parse it and output it's most likely bracketed format thanks to the CYK algorithm.

The system first learn the PCFG on the dataset sequoia-corpus to learn the grammar and its associated probabilities.

The system also deals with Out-Of-Vocabulary words using Levenshtein Distance and the polyglot embeddings to find close candidate in the training set.

Example

Input: Pourquoi ce thème ?

Output: ( (SENT (ADVWH Pourquoi) (NP (DET ce) (NC thème)) (PONCT ?)))

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published