scraper

This repository contains our scraping code. The scraper pulls companies from /r/cscareerquestions and crawls the web for positions relating to those companies. We are also actively looking for other reliable sources of company names.

Installation

Make sure the following dependencies have been installed on your system.

Docker

You will also need to place a valid hibernate.cfg.xml file in the src/main/resources folder. This file is responsible for providing SQL database connection details, enabling the scraper to read and write companies/positions. Please see src/main/resources/hibernate.cfg.xml.example for an example.

Usage

The following commands are assumed to be run from the root of the repository directory.

To fetch all companies and save them to the database, ignoring duplicates, use:

scripts/start_docker.sh -c

To fetch all positions for each company in the database and then save them to the database, ignoring duplicates, use:

scripts/start_docker.sh -p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

scraper

Installation

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

scraper

Installation

Usage