Skip to content

TylerJackk/LagouSpider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LagouSpider

项目简介

基于scrapy的拉勾网爬虫,抓取各种职位的信息,并存储在MySQL数据库中。

功能介绍

  1. 自定义爬取的职位
  2. 存储到MySQL数据库
  3. 部署在服务器,每天定时执行,完成后发送邮件报告

使用方法

Python环境

2.7.10

依赖包

安装requirements.txt依赖

pip install -r requirements.txt 

文件配置(settings.py)

设置要爬取的职位,格式严格按照拉勾网URL

例如:https://www.lagou.com/zhaopin/**ziranyuyanchuli**/2/

JOBS = {"Java", "Python", "PHP", "C++", "shujuwajue", "HTML5", "Android", "iOS", "webqianduan"}

# 执行create_table.sql
# 数据库配置
MYSQL_HOST = 'xxx.xx.xx.xx'
MYSQL_DBNAME = 'Spider'
MYSQL_USER = 'xx'
MYSQL_PASSWD = 'xx'
MYSQL_PORT = 0

# 邮件配置
From_ADDR = '[email protected]'
TO_ADDR = '[email protected]'
PASSWORD = 'xxxx'
SMTP = 'smtp.163.com'

运行

scrapy crawl lagou

or

python main.py

Linux部署

virtualenv创建环境

scrapyd + SpiderKeeper

新建screen开启scrapyd服务

scrapyd

进入scrapy根目录

scrapyd-deploy name

剩余步骤参考SpiderKeeper文档

DashBoard配置任务定时运行

职位分析

实现中

Releases

No releases published

Packages

No packages published

Languages