Welcome to docudigger 👋

Document scraper for getting invoices automagically as pdf (useful for taxes or DMS)

🏠 Homepage

Configuration

All settings can be changed via CLI, env variable (even when using docker).

Setting	Description	Default value
AMAZON_USERNAME	Your Amazon username	`null`
AMAZON_PASSWORD	Your amazon password	`null`
AMAZON_TLD	Amazon top level domain	`de`
AMAZON_YEAR_FILTER	Only extracts invoices from this year (i.e. 2023)	`2023`
AMAZON_PAGE_FILTER	Only extracts invoices from this page (i.e. 2)	`null`
ONLY_NEW	Tracks already scraped documents and starts a new run at the last scraped one	`true`
FILE_DESTINATION_FOLDER	Destination path for all scraped documents	`./documents/`
FILE_FALLBACK_EXTENSION	Fallback extension when no extension can be determined	`.pdf`
DEBUG	Debug flag (sets the loglevel to DEBUG)	`false`
SUBFOLDER_FOR_PAGES	Creates subfolders for every scraped page/plugin	`false`
LOG_PATH	Sets the log path	`./logs/`
LOG_LEVEL	Log level (see https://github.com/winstonjs/winston#logging-levels)	`info`
RECURRING	Flag for executing the script periodically. Needs 'RECURRING_PATTERN' to be set. Default `true`when using docker container	`false`
RECURRING_PATTERN	Cron pattern to execute periodically. Needs RECURRING to true	`/30 * * *`
TZ	Timezone used for docker enviroments	`Europe/Berlin`

Install

npm install

Usage

$ npm install -g @disane-dev/docudigger
$ docudigger COMMAND
running command...
$ docudigger (--version)
@disane-dev/docudigger/2.0.7 linux-x64 node-v20.18.0
$ docudigger --help [COMMAND]
USAGE
  $ docudigger COMMAND
...

Important

Don't forget to include --ignore-scripts in your install command.

`docudigger scrape all`

Scrapes all websites periodically (default for docker environment)

USAGE
  $ docudigger scrape all [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l <value>] [-c <value> -r]

FLAGS
  -c, --recurringCron=<value>  [default: * * * * *] Cron pattern to execute periodically
  -d, --debug
  -l, --logPath=<value>        [default: ./logs/] Log path
  -r, --recurring
  --logLevel=<option>          [default: info] Specify level for logging.
                               <options: trace|debug|info|warn|error>

GLOBAL FLAGS
  --json  Format output as json.

DESCRIPTION
  Scrapes all websites periodically

EXAMPLES
  $ docudigger scrape all

`docudigger scrape amazon`

Used to get invoices from amazon

USAGE
  $ docudigger scrape amazon -u <value> -p <value> [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l
    <value>] [-c <value> -r] [--fileDestinationFolder <value>] [--fileFallbackExentension <value>] [-t <value>]
    [--yearFilter <value>] [--pageFilter <value>] [--onlyNew]

FLAGS
  -c, --recurringCron=<value>        [default: */30 * * * *] Cron pattern to execute periodically
  -d, --debug
  -l, --logPath=<value>              [default: ./logs/] Log path
  -p, --password=<value>             (required) Password
  -r, --recurring
  -t, --tld=<value>                  [default: de] Amazon top level domain
  -u, --username=<value>             (required) Username
  --fileDestinationFolder=<value>    [default: ./data/] Amazon top level domain
  --fileFallbackExentension=<value>  [default: .pdf] Amazon top level domain
  --logLevel=<option>                [default: info] Specify level for logging.
                                     <options: trace|debug|info|warn|error>
  --onlyNew                          Gets only new invoices
  --pageFilter=<value>               Filters a page
  --yearFilter=<value>               Filters a year

GLOBAL FLAGS
  --json  Format output as json.

DESCRIPTION
  Used to get invoices from amazon

  Scrapes amazon invoices

EXAMPLES
  $ docudigger scrape amazon

Docker

docker run \
  -e AMAZON_USERNAME='[YOUR MAIL]' \
  -e AMAZON_PASSWORD='[YOUR PW]' \
  -e AMAZON_TLD='de' \
  -e AMAZON_YEAR_FILTER='2024' \
  -e AMAZON_PAGE_FILTER='1' \
  -e LOG_LEVEL='info' \
  -v "C:/temp/docudigger/:/home/node/docudigger" \
  ghcr.io/disane87/docudigger

Dev-Time 🪲

NPM

npm install
[Change created .env for your needs]
npm run start

Author

👤 Marco Franke

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.

Show your support

Give a ⭐️ if this project helped you!

This README was generated with ❤️ by readme-md-generator

Name		Name	Last commit message	Last commit date
Latest commit History 816 Commits
.devcontainer		.devcontainer
.github		.github
.husky		.husky
.vscode		.vscode
bin		bin
docs		docs
scripts		scripts
src		src
test		test
.commitlintrc.js		.commitlintrc.js
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.hintrc		.hintrc
.mocharc.json		.mocharc.json
.releaserc		.releaserc
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
README.md		README.md
dockerfile		dockerfile
dockerfile.debug		dockerfile.debug
eslint.config.mjs		eslint.config.mjs
npm-shrinkwrap.json		npm-shrinkwrap.json
package.json		package.json
renovate.json		renovate.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to docudigger 👋

🏠 Homepage

Configuration

Install

Usage

`docudigger scrape all`

`docudigger scrape amazon`

Docker

Dev-Time 🪲

NPM

Author

🤝 Contributing

Show your support

About

Releases 47

Sponsor this project

Packages

Contributors 7

Languages

License

Disane87/docudigger

Folders and files

Latest commit

History

Repository files navigation

Welcome to docudigger 👋

🏠 Homepage

Configuration

Install

Usage

docudigger scrape all

docudigger scrape amazon

Docker

Dev-Time 🪲

NPM

Author

🤝 Contributing

Show your support

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 47

Sponsor this project

Packages 0

Contributors 7

Languages

`docudigger scrape all`

`docudigger scrape amazon`

Packages