Dataset seperation #9

AnuradhaSK · 2019-05-19T04:25:52Z

Can anyone explain to me the logic behind train/valid/test node separation of this code?
For the cora dataset out of shuffled 2708 nodes, first 1000 is taken as test nodes, next 500 as valid nodes and the rest as train nodes. Similarly for the pubmed dataset, out of shuffled 19717 nodes, first 1000 is taken as test nodes. next 500 as valid nodes and rest as the train nodes.
So, test:valid:train proportion of cora is 36.9 : 18.5 : 44.6, pubmed is 5.1 : 2.5 : 92.4.

Don't we have to keep the same ratio between test:valid:train nodes?
How can I seperate a new dataset to these categories?

I belive that we need to seperate nodes into train/valid/test categories for a node classification problem. What about the link prediction problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset seperation #9

Dataset seperation #9

AnuradhaSK commented May 19, 2019

Dataset seperation #9

Dataset seperation #9

Comments

AnuradhaSK commented May 19, 2019