Machine Learning Models

Since regex scanners are prone to produce a lot of false positive discoveries, machine learning models can be used to reduce the number of discoveries to be manually analysed. In particular, models automatically classify discoveries as false_positive (i.e., spam).

The models need an implementation (in credentialdigger/models folder). Possible binaries are automatically downloaded on-the-fly.

If you want to propose a new model to reduce false positive discoveries, please contact us (or open an issue in the project)

Path Model

The Path Model empowers regular expressions to match typical files that contain fake credentials.

After a pre-processing phase, the file path of a discovery is matched with a regular expression to guess whether the credentials contained in it will be real ones or not. Indeed, according to our observations, documentation (e.g., README and .md files in general), tutorials, tests, virtual environments and dependencies pushed to the repository (e.g., node_modules), don't contain real secrets used in production.

Up to v4.3 we used a ML approach based on fasttext, but we shifted to regular expressions in v4.4 since it proved to be more performing without loss of precision. Please visit the OLD machine learning models page for further information regarding the old Path Model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine Learning Models

Supported Models

Path Model

Password Model

Setup

User interfaces

Features

Architecture

Clone this wiki locally