GitHunter-Scraper

Githunter-Scraper is a tool for scraper some public information about repositories hosted in GitHub, Gitlab and others providers. The scraper is made based on an entry point (trending page on GitHub/GitLab, Mongo database, list of organization's members etc).

How to run

Run locally

After clone this repository, execute this command on terminal:

npm install

So on, run a command line like this:

node githunter-scraper.js --scraperPoint trending --provider github --nodes issuesV1

scraperPoint (required): It is the start point, from where the script should get the repositories to be scraper. For trending means that will crawl the github explore page, in trending tab.

provider (required): Where should read all information.

nodes (optional): Which king of information should read. Known nodes are: repository, issues, pulls and commits

Run in Docker with Conductor

You only need execute the script "startDocker.sh" present in root of the application:

./startDocker.sh

Or execute a simply docker-compose up -d

Usage

With Conductor

You can start a workflow defined inside ./conductor/server/provisioning by sending a post to the conductor-server url defined in docker compose file, like this:

curl -X POST \
  http://localhost:8080/api/workflow \
  -H 'Accept: */*' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "scraper_users",
    "version": 1,
    "input":{
      "scraperPoint": "organization.members",
      "nodes": "userStats",
      "organization": "bancodobrasil",
      "provider": "github"
    }
}'

With Schellar

Or you can start a workflow by scheduling it with Schellar, like this:

curl -X POST \
  http://localhost:3000/schedule \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
    "name": "scraper-users-minute",
    "enabled": true,
    "parallelRuns": false,
    "workflowName": "scraper_users",
    "workflowVersion": "1",
    "cronString": "* * * * *",
    "workflowContext": {
      "scraperPoint": "organization.members",
      "nodes": "userStats",
      "organization": "bancodobrasil",
      "provider": "github"
    },
    "fromDate": "2019-01-01T15:04:05Z",
    "toDate": "2029-07-01T15:04:05Z"
}'

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.vscode		.vscode
conductor		conductor
config		config
contract		contract
controller		controller
crawler		crawler
githunter-api		githunter-api
githunter-data-provider		githunter-data-provider
rest		rest
star-ws		star-ws
utils		utils
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.npmrc		.npmrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
githunter-scraper.js		githunter-scraper.js
hosts		hosts
package-lock.json		package-lock.json
package.json		package.json
prettier.config.js		prettier.config.js
startDocker.sh		startDocker.sh
startup.sh		startup.sh
stopDocker.sh		stopDocker.sh
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHunter-Scraper

How to run

Run locally

Run in Docker with Conductor

Usage

With Conductor

With Schellar

License

About

Releases

Packages

Contributors 3

Languages

License

labbsr0x/githunter-scraper

Folders and files

Latest commit

History

Repository files navigation

GitHunter-Scraper

How to run

Run locally

Run in Docker with Conductor

Usage

With Conductor

With Schellar

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages