Skip to content

This is the concept acquisition pipeline for the concept graph construction of scientific

Notifications You must be signed in to change notification settings

yuq-1s/Concept-Acquisition-Pipeline

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Concept Acquisition Pipeline

Extract concepts from text with seed concepts.

Installation

pip install -r requirements.txt
mkdir processed_data/crawler_results
mkdir word_clustering/word_vectors
wget 'https://cloud.tsinghua.edu.cn/f/0c685ffb5fad4f6c9891/?dl=1' -O crawler_results.zip
unzip crawler_results.zip
wget 'https://cloud.tsinghua.edu.cn/f/a25be37fbab84b5e9c3b/?dl=1' -O word_clustering/word_vectors/sgns.baidubaike.bigram-char

Usage

Put seed concepts in input_data/seeds/seed_concepts_123456, one per line. Put unstructured text file in input_data/context/baike_context_123456. Currently only works with tf_idf and pagerank algorithm, see details.md for more details. To run graph_prop and average_distance algorithm, please refer to luogan's repository.

./run.sh 123456

see output concepts in baike_context_tf_idfmore_seed_nf_cluster_result_123456.json.

About

This is the concept acquisition pipeline for the concept graph construction of scientific

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.2%
  • Jupyter Notebook 40.8%
  • C++ 6.2%
  • Shell 2.2%
  • CMake 0.6%