Name		Name	Last commit message	Last commit date
parent directory ..
config		config
README.md		README.md

README.md

TextNet Models for Quora Question Pairs

Deep Text Matching model for Quora Question Pairs. To refer feature based part, please see kaggle-quora-question-pairs.

Dataset

You can download the dataset from the Kaggle.

Data fields

id - the id of a training set question pair
qid1, qid2 - unique ids of each question (only available in train.csv)
question1, question2 - the full text of each question
is_duplicate - the target variable, set to 1 if question1 and question2 have essentially the same meaning, and 0 otherwise.

Preprocess

In order to run TextNet models, we need prepare files below:

Word Dictionary File

(eg. word_dict.txt)

We map each word to a uniqe number, called wid, and save this mapping in the word dictionary file.

For example,

word   wid
machine 1232
learning 1156

Corpus File

(eg. qid_query.txt and docid_doc.txt)

We use a value of string identifier (qid/docid) to represent a sentence, such as a query or a document. The second number denotes the length of the sentence. The following numbers are the wids of the sentence.

For example,

docid  sentence_length  sentence_wid_sequence
GX000-00-0000000 42 2744 1043 377 2744 1043 377 187 117961 ...

Relation File

(eg. relation.train.fold1.txt, relation.test.fold1.txt ...)

The relation files are used to store the relation between two sentences, such as the relevance relation between query and document.

For example,

relevance   qid   docid
1 3571 GX245-00-1220850
0 3571 GX004-51-0504917
0 3571 GX006-36-4612449

Embedding File

(eg. embed_wiki-pdc_d50_norm)

We store the word embedding into the embedding file.

For example,

wid   embedding
13275 -0.050766 0.081548 -0.031107 0.131772 0.172194 ... 0.165506 0.002235

Config Files

The example config file is config/quora_blend.config.

Config Fields	File Type
data1_file	Corpus File
data2_file	Corpus File
rel_file	Relation File
embedding_file	Embedding File

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quora

quora

README.md

TextNet Models for Quora Question Pairs

Dataset

Data fields

Preprocess

Word Dictionary File

Corpus File

Relation File

Embedding File

Config Files

Files

quora

Directory actions

More options

Directory actions

More options

Latest commit

History

quora

Folders and files

parent directory

README.md

TextNet Models for Quora Question Pairs

Dataset

Data fields

Preprocess

Word Dictionary File

Corpus File

Relation File

Embedding File

Config Files