Githunter-Scraper is a tool for scraper some public information about repositories hosted in GitHub, Gitlab and others providers. The scraper is made based on an entry point (trending page on GitHub/GitLab, Mongo database, list of organization's members etc).
After clone this repository, execute this command on terminal:
npm install
So on, run a command line like this:
node githunter-scraper.js --scraperPoint trending --provider github --nodes issuesV1
scraperPoint (required): It is the start point, from where the script should get the repositories to be scraper. For trending means that will crawl the github explore page, in trending tab.
provider (required): Where should read all information.
nodes (optional): Which king of information should read. Known nodes are: repository, issues, pulls and commits
You only need execute the script "startDocker.sh" present in root of the application:
./startDocker.sh
Or execute a simply docker-compose up -d
You can start a workflow defined inside ./conductor/server/provisioning
by sending a post to the conductor-server url defined in docker compose file, like this:
curl -X POST \
http://localhost:8080/api/workflow \
-H 'Accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"name": "scraper_users",
"version": 1,
"input":{
"scraperPoint": "organization.members",
"nodes": "userStats",
"organization": "bancodobrasil",
"provider": "github"
}
}'
Or you can start a workflow by scheduling it with Schellar, like this:
curl -X POST \
http://localhost:3000/schedule \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache' \
-d '{
"name": "scraper-users-minute",
"enabled": true,
"parallelRuns": false,
"workflowName": "scraper_users",
"workflowVersion": "1",
"cronString": "* * * * *",
"workflowContext": {
"scraperPoint": "organization.members",
"nodes": "userStats",
"organization": "bancodobrasil",
"provider": "github"
},
"fromDate": "2019-01-01T15:04:05Z",
"toDate": "2029-07-01T15:04:05Z"
}'