This repository contains code and data for my article "Neo4j for Diseases".
-
The scripts are for data processing.
-
The data folder contains the five CSV to be imported into Neo4j.
Neo4j Desktop
The data folder contain data from 2021. If you want to download the newest data, do these:
- Download the KEGG data with its API
python download_various_kegg.py ds [kegg_download_folder]
- Generate nodes and edges
python parse_disease.py [kegg_download_folder]
- Add taxonomy to pathogen. In step 2, a file called pathogen_tmp.csv is generated. We need to add the taxonomy to it via:
python add_taxonomy.py pathogen_tmp.csv > pathogen.csv
- Put all the CSV files, except pathogen_tmp.csv, into the Import folder of your Neo4j project. And then follow the instruction in the article.
- Sixing Huang - Concept and Coding
This project is licensed under the MIT License - see the LICENSE file for details