-
Notifications
You must be signed in to change notification settings - Fork 0
Specs
David Leoni edited this page Jul 3, 2013
·
6 revisions
What follows is a a very early draft.
A short text in our case might have the following characteristics (from http://autayeu.com/research.html):
- brevity
- lack of context
- limited syntax with mainly nouns and adjectives organized into noun phrases. words are not separated by spaces, but rather by dashes or case.
- Abbreviations
Here, vocabulary has the meaning of lexical ontology, computational lexicon. Examples: WordNet, UKC
NLP process for identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings.
- All produced software MUST be open source
- NLPrise MUST support multiple languages
- Supported languages: * Italian * English * German?
- Labelled corporas for training SHOULD be free
- It MUST be possible to plug-in several vocabularies
- NLPrise SHOULD be fine tuned for coarse-grained dictionaries
- NLPrise User sends NLPrise many short texts and a time limit. NLPrise Performs WSD on the texts and returns labelled texts with a best-effort quality.
- todo define context (ie I’m sending all the column names of one table...)
- NLPrise User select a vocabulary and a language to use.
- Even a mix of vocabularies? mmm