#Data analysis of Lagou ###Main Function
-
scrape data from Lagou, and know the latest info of Internet career
-
data analysis and visualize
-
crawl job details info and generate word cloud as Job Impression
###Note Because lagou's back-end API has been changed, this repository may not work well.
I will try to fix these problems and publish V2.0 in the near future.
THX for your star and watching!
I will try my best to make it better and more robust with more new features as well!
Sorry for the inconvenience it may bring!
V2.0_ALPHA is developing ~
###Install Prerequisition
- Python Version >= 3.4
- Third Party Library:
pip install requests pip install beautifulsoup4 pip install jieba pip install openpyxl
###Basic Usage
-
clone this project from github
-
change the path of job.xml in lagouspider.py readconfig() method configmap = toolkit.readconfig(YourLocalPath)
-
run lagouspider.py to get job data in JSON
-
run excelhelper.py to generate every Excel file towards each job
-
run jobdetailspider.py to get job recruitment details ----V1.3 updated
-
run analyser.py to cut sentences, and return TOP20 hot words ----V1.3 updated
###Analysis Results
For more information, please visit my answer at Zhihu