Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset seperation #9

Open
AnuradhaSK opened this issue May 19, 2019 · 0 comments
Open

Dataset seperation #9

AnuradhaSK opened this issue May 19, 2019 · 0 comments

Comments

@AnuradhaSK
Copy link

Can anyone explain to me the logic behind train/valid/test node separation of this code?
For the cora dataset out of shuffled 2708 nodes, first 1000 is taken as test nodes, next 500 as valid nodes and the rest as train nodes. Similarly for the pubmed dataset, out of shuffled 19717 nodes, first 1000 is taken as test nodes. next 500 as valid nodes and rest as the train nodes.
So, test:valid:train proportion of cora is 36.9 : 18.5 : 44.6, pubmed is 5.1 : 2.5 : 92.4.

  1. Don't we have to keep the same ratio between test:valid:train nodes?

  2. How can I seperate a new dataset to these categories?

I belive that we need to seperate nodes into train/valid/test categories for a node classification problem. What about the link prediction problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant