Skip to content

v4.4.0

Compare
Choose a tag to compare
@marcorosa marcorosa released this 18 Oct 09:26
· 317 commits to main since this release

With this release we restructure the ML models in order to improve their precision. Moreover, the new models will be directly integrated in the project, overcoming the painful download&linking needed for the former ones.

All the changes are transparent to the final user (i.e., no API or function definition changed), thus there was no need for a major upgrade to v5.

Path Model

We decided to deprecate the fasttext approach and shifted to the usage of a regex to filter out false positive file paths. Indeed, according to our tests, we noticed that we can keep a good precision while decreasing the overhead

SnippetModel PasswordModel

We decided to deprecate the old fasttext double-model (extractor+classifier) approach in order to shift to a NLP approach based on CodeBERT. Overall, it's slower but way more precise, even if it only works for password. Hence, the change of name from SnippetModel to PasswordModel.
Moreover, since the PasswordModel only works for passwords, we added a check in the Client to only run this model over password discoveries.

AoB

  • The download function has been deprecated and models are managed automatically by Credential Digger
  • The generator was strongly linked to the SnippetModel, so it has been deprecated
  • The documentation has been updated, both in the README and in the wiki
  • We added a categories enum in the postgres db in order to drive the users to 4 main rule categories. Nevertheless, this enum is only enforced in new postgres installations to make the transition smoother
  • The UI has been updated to use the new models
  • We ported the incremental scan_snapshot from v4.3.1
  • Minor bug fixes
  • Refresh the UI every 8s (was 5s)

Credits also go to the wonderful work from @melisande1