Skip to content

Latest commit

 

History

History
60 lines (25 loc) · 1.19 KB

README.md

File metadata and controls

60 lines (25 loc) · 1.19 KB

Introduction

This repository contains code and data for my article "Neo4j for Diseases".

  1. The scripts are for data processing.

  2. The data folder contains the five CSV to be imported into Neo4j.

Prerequisite

Neo4j Desktop

Run

The data folder contain data from 2021. If you want to download the newest data, do these:

  1. Download the KEGG data with its API
python download_various_kegg.py ds [kegg_download_folder]
  1. Generate nodes and edges
python parse_disease.py [kegg_download_folder]
  1. Add taxonomy to pathogen. In step 2, a file called pathogen_tmp.csv is generated. We need to add the taxonomy to it via:
python add_taxonomy.py pathogen_tmp.csv > pathogen.csv
  1. Put all the CSV files, except pathogen_tmp.csv, into the Import folder of your Neo4j project. And then follow the instruction in the article.

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details