Understanding of natural language + knowledge of about the world

Challenging task of reading comprehension

ImageNet (Deng et al), Penn Treebank for syntactic parsing (Marcus et al)

Shortcomings:

The answer to question is text segment - span.

Distances in dependency trees to quantify diversity (of questions and answers types).

Implemented a logistic regression.

Hirschman et al. (1999) - curated a dataset of 600 3rd-6th grade reading comprehension questions.

Syntactic divergence

Candidate answers were generated by Stanford CoreNLP

Sliding window approach + distance-based extension by Richardson et al. (2013)

Logistic Regression

Features (bold are most important):

matching word frequencies (sum of the tf-idf)
matching bigram frequencies (generalization of the tf-idf described in Shirakawa et al. (2015))
root match (dependency parse tree roots)
lengths
span word frequencies
constituent label
span POS tags
lexicalized
dependency tree paths

Provide feedback

Saved searches