QAing on a larger corpus? #11

bigrig2212 · 2020-05-09T04:58:07Z

Hi. Awesome project. So fun.
Wondering what is the technique to ask a question to a larger corpus? More like hundreds of documents versus a short text snippet as in the sample code. As it seems to take longer and get less accurate the more text I supply - i'm wondering if there's another technique to work with a larger corpus? Filter first using TF-IDF and then run this QA only on the returned documents?

Thx.

martinnormark · 2020-05-26T21:41:28Z

You could divide your corpus into sections of "a reasonable size" (whatever that size is), then run QnA on all sections, perhaps in parallel, then sort all the answers by the score returned by the model and grab the 10 answers with the highest score.

You will potentially end up running QnA on lots of irrelevant text.

Is there a way your corpus is structured so that you can filter it down?

You could also label sections of your corpus into a set of categories, then build a corpus of questions label by the same categories, then run classification on the question to get the category, filter the corpus by that category and then run QnA.

That would require a labelled dataset for both corpus sections and a good corpus of questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QAing on a larger corpus? #11

QAing on a larger corpus? #11

bigrig2212 commented May 9, 2020

martinnormark commented May 26, 2020

QAing on a larger corpus? #11

QAing on a larger corpus? #11

Comments

bigrig2212 commented May 9, 2020

martinnormark commented May 26, 2020