Skip to content

Natural language library for tagger, parser and question generation for Portuguese (PT-BR)

License

Notifications You must be signed in to change notification settings

Kunze/kunze-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kunze-automatic-question-generator

This library has been made in typescript and nodejs. It will be used for gererating automatic questions from a text in portuguese (pt-br). This library works through a corpora (corpora/macmorpho-v3/train) and uses a probabilistic bigram to solve part of the speech tags, even for unknown words. Currently I am using a brazilian portuguese corpus named Macmorpho-v3 (http://nilc.icmc.usp.br/macmorpho)

Temporary tests: http://www.murilokunze.com.br

Getting part of speech tags from text

import DefaultViterbiTaggerFactory = require("./PartOfSpeechTagger/Factory/DefaultViterbiTaggerFactory");
import DefaultQuestionGeneratorFactory = require("./QuestionGenerator/Factory/DefaultQuestionGeneratorFactory");
import CorporaCYKParserFactory = require("./Parser/Factory/CorporaCYKParserFactory");
import Text = require("./Text");
import TaggedToken = require("./TaggedToken");
import CYKTable = require("./Parser/CYKTable");

let questionGenerator = DefaultQuestionGeneratorFactory.create();
CorporaCYKParserFactory.create().then((parser) => {
    DefaultViterbiTaggerFactory.create().generateModel().then(tagger => {
        console.time("tagger");

        let phrases = "Murilo Kunze gosta de programar sozinho de noite.";
        let tokens = tagger.tag(phrases);
        let text = new Text(tokens);

        for (let phrase of text.getPhrases()) {
            console.log("-".repeat(50));
            console.log(`Text: ${phrase.toString()} \n`)
            console.log("Questions:")

            let cykTable: CYKTable = parser.parse(phrase.getTokens());

            for (let question of questionGenerator.generate(cykTable)) {
                console.log(question);
            }

            for (let token of phrase.getTokens()) {
                console.log("-".repeat(40));

                console.log(`word:         ${token.getWord()}`);
                console.log(`tag:          ${token.getTag()}`);
                console.log(`known word:   ${token.getKnown()}`);
                console.log(`probability:  ${token.getProbability()}`);
            }
        }
        console.timeEnd("tagger");
    });
});

Running on nodejs

Just run npm start.

The result should be as shown below:

Text: Murilo Kunze gosta de programar sozinho de noite.

Questions:
Quem gosta de programar?
Qual o nome da pessoa que gosta de programar?
Murilo Kunze gosta de programar?
----------------------------------------
word:         Murilo Kunze
tag:          NPROP
known word:   true
probability:  0.07448322988440294
----------------------------------------
word:         gosta
tag:          V
known word:   true
probability:  0.0971433588948077
----------------------------------------
word:         de
tag:          PREP
known word:   true
probability:  0.10937787990826241
----------------------------------------
word:         programar
tag:          V
known word:   true
probability:  0.08880982082104785
----------------------------------------
word:         sozinho
tag:          ADJ
known word:   true
probability:  0.03734058192619177
----------------------------------------
word:         de
tag:          PREP
known word:   true
probability:  0.11378864211986091
----------------------------------------
word:         noite
tag:          N
known word:   true
probability:  0.3886058596720611
----------------------------------------
word:         .
tag:          END
known word:   true
probability:  0.10145041539848605
tagger: 40.990ms

Tagset

CLASSE GRAMATICAL ETIQUETA
ADJETIVO ADJ
ADVÉRBIO ADV
ADVÉRBIO CONECTIVO SUBORDINATIVO ADV-KS
ADVÉRBIO RELATIVOSUBORDINATIVO ADV-KS-REL
ARTIGO (def. ou indef.) ART
CONJUNÇÃO COORDENATIVA KC
CONJUNÇÃO SUBORDINATIVA KS
INTERJEIÇÃO IN
NOME(SUBSTANTIVO) N
NOME PRÓPRIO NPROP
NUMERAL NUM
PARTICÍPIO PCP
PALAVRA DENOTATIVA PDEN
PREPOSIÇÃO PREP
PRONOME ADJETIVO PROADJ
PRONOME CONECTIVO SUBORDINATIVO PRO-KS
PRONOME PESSOAL PROPESS
PRONOME RELATIVO CONECTIVO SUBORDINATIVO PRO-KS-REL
PRONOME SUBSTANTIVO PROSUB
VERBO V
VERBO AUXILIAR VAUX
SÍMBOLO DE MOEDA CORRENTE CUR

About

Natural language library for tagger, parser and question generation for Portuguese (PT-BR)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published