-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Graph Database #12
Comments
Progress I have completed functions to store data from information in scopus csv files to Neo4j nodes and edges, and is currently storing all papers from 2022 that have Dutch researchers involved. Why Scopus? Scopus has very comprehensive paper data, especially its metadata contains details of authors' affiliations, countries and paper keywords (which are not available on other paper search websites) How? As the number of papers involving Dutch researchers in just one year is 50,000+, the Scopus API does not offer to handle such a large amount of data. Therefore, I use the Scopus Document Search website (which requires academic IPs, such as the UvA VPN). The query string is as follows: Neo4j database structure:
Person Node properties: Publication Node properties: IS_Author relationship properties: |
It takes a long time to store data (storing 20,000 papers' metadata costs 6h+, but there are 50,000+ papers every year) Solved After creating CONSTRAINT for node Person and Publication, it only takes 20min to store a year's paper data. |
At the moment we are using SQLite databases, it is fine for the current amount of data, but in the future the amount of data will become much larger, so I should try graph databases such as Neo4j, which has a higher performance and runs much faster.
The text was updated successfully, but these errors were encountered: