-
Notifications
You must be signed in to change notification settings - Fork 49
Machine Learning Models
Since regex scanners are prone to produce a lot of false positive discoveries, machine learning models can be used to reduce the number of discoveries to be manually analysed. In particular, models automatically classify discoveries as false_positive
(i.e., spam).
The models need an implementation (in credentialdigger/models
folder). Possible binaries are automatically downloaded on-the-fly.
If you want to propose a new model to reduce false positive discoveries, please contact us (or open an issue in the project)
The Path Model empowers regular expressions to match typical files that contain fake credentials.
After a pre-processing phase, the file path of a discovery is matched with a regular expression to guess whether the credentials contained in it will be real ones or not. Indeed, according to our observations, documentation (e.g., README and .md
files in general), tutorials, tests, virtual environments and dependencies pushed to the repository (e.g., node_modules
), don't contain real secrets used in production.
Up to v4.3 we used a ML approach based on fasttext
, but we shifted to regular expressions in v4.4 since it proved to be more performing without loss of precision. Please visit the OLD machine learning models page for further information regarding the old Path Model.
TODO
- Installation instructions: Readme
- Preparation for the scanner's rules
- Deploy over HTTPS (Optional)
- How to update the project
- How to install on MacOS ARM
- Python library
- CLI
- Web UI through the Docker installation
- Pre-commit hook