Skip to content

Migrating more than 160GiB of research data from Microsoft Academic Graph into an Analytics engine - Elasticsearch!

Notifications You must be signed in to change notification settings

vwoloszyn/mag2elasticsearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Using Elasticsearch on Microsoft Academic Graph MAG

Exploring more than 160 GiB of publications from Microsoft Academic Graph (MAG) using Elasticsearch!

1. Download Microsoft Academic Graph

https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema

https://zenodo.org/record/2628216

     4564007 Affiliations.txt
 16528778635 Authors.txt
     2224843 ConferenceInstances.txt
      428103 ConferenceSeries.txt
    55188690 FieldsOfStudy.txt
     5689662 Journals.txt
 40976541540 PaperAuthorAffiliations.txt
 32446006785 PaperReferences.txt
     7763592 PaperResources.txt
 60213784152 Papers.txt
 23096534376 PaperUrls.txt
 ----------------------------------------
173337504385 (~161.4) GiB

2. Elasticsearch with Docker

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.3.2
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.3.

For more information: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

3. Installing mag2elasticsearch

Expected package dependencies are listed in the "requirements.txt" file for PIP, you need to run the following command to get dependencies:

git clone https://github.com/vwoloszyn/mag2elasticsearch/
cd mag2elasticsearch
pip install -r requirements.txt

Command-line usage

Indexing talbe Papers.

    python main.py -t Papers
  • read only N records
    python main.py --limit 6000000
  • Only institutions with id_
    python main.py --onlyInstitutions 75951250 4577782 39343248 

Making a Query in Elasticsearch

http://localhost:9200/mag/_search?pretty=true&q=*:*

About

Migrating more than 160GiB of research data from Microsoft Academic Graph into an Analytics engine - Elasticsearch!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages