Concept Acquisition Pipeline

Extract concepts from text with seed concepts.

Installation

pip install -r requirements.txt
mkdir processed_data/crawler_results
mkdir word_clustering/word_vectors
wget 'https://cloud.tsinghua.edu.cn/f/0c685ffb5fad4f6c9891/?dl=1' -O crawler_results.zip
unzip crawler_results.zip
wget 'https://cloud.tsinghua.edu.cn/f/a25be37fbab84b5e9c3b/?dl=1' -O word_clustering/word_vectors/sgns.baidubaike.bigram-char

Usage

Put seed concepts in input_data/seeds/seed_concepts_123456, one per line. Put unstructured text file in input_data/context/baike_context_123456. Currently only works with tf_idf and pagerank algorithm, see details.md for more details. To run graph_prop and average_distance algorithm, please refer to luogan's repository.

./run.sh 123456

see output concepts in baike_context_tf_idfmore_seed_nf_cluster_result_123456.json.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
bert_start		bert_start
confidence_propagation		confidence_propagation
crawler		crawler
input_data		input_data
processed_data		processed_data
scripts		scripts
suffix_tree		suffix_tree
tmp		tmp
word_clustering		word_clustering
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
align_to_video.py		align_to_video.py
annotated-as-seed.json		annotated-as-seed.json
annotated_concepts_to_json.py		annotated_concepts_to_json.py
clustering.py		clustering.py
confidence_prop.py		confidence_prop.py
config.py		config.py
config_idf.py		config_idf.py
config_nil.py		config_nil.py
details.md		details.md
evaluation.py		evaluation.py
init.sh		init.sh
os_concepts_annotated.csv		os_concepts_annotated.csv
parse_doc.py		parse_doc.py
parse_xlsx.py		parse_xlsx.py
pipeline.png		pipeline.png
print_evaluation.ipynb		print_evaluation.ipynb
requirements.txt		requirements.txt
rerank.py		rerank.py
result.xlsx		result.xlsx
run.sh		run.sh
run_example.sh		run_example.sh
show_clusters_xlsx.py		show_clusters_xlsx.py
word_bag.py		word_bag.py
xlink.py		xlink.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concept Acquisition Pipeline

Installation

Usage

About

Releases

Packages

Languages

yuq-1s/Concept-Acquisition-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Concept Acquisition Pipeline

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages