New Features:
- Add
Pipeline
module for streamlining text anonymization - Add better date handling and support in
DateGenerator
Bug Fixes:
- Fix unit tests
- Change unit test framework to
pytest
Breaking Changes:
- Change Python support between v3.9 and v3.12.
- Change default
model_name
forLLMLabelGenerator
to beHuggingFaceTB/SmolLM2-1.7B-Instruct
(for ease of use)
New Features:
- Enable CPU utilization for
LLMLabelGenerator
- Enable changing the input parameters for
LLMLabelGenerator
(model_name
anduse_gpu
) - Add additional unit tests for
NERExtractor
Bug Fixes:
- Fix package documentation
Bug Fixes:
- Fix entity creation in
PatternExtractor
- Fix documentation duplication
New Features:
- Add
Entity
regex group selection - Add option to ignore
Entity
regex pattern inLLMLabelGenerator.generate
Breaking Changes:
- Rename the
EntityExtractor
toNERExtractor
- Rename the input variable
output_gen
tosub_variant
inDateGenerator
- Rename the input variable
entity_prefix
toadd_entity_attrs
inLLMLabelGenerator.generate
- Move the
regex
submodule fromanonipy.anonymize
toanonipy.utils
New Features:
- Add a pattern extractor named
PatternExtractor
, used to extract entities using spacy pattern matching and regex - Add a multi extractor named
MultiExtractor
, used to extract entities using multiple extractors - Add the
DATE_TRANSFORM_VARIANTS
constant to help with date generator - Refine the
Entity
implementation - Improve package documentation
New Features:
- Add automatic date format detection support to
DateGenerator
New Features:
- Upgrade
gliner-spacy
to have cleaner code - Add function to help manual post-anonymization replacement fixing
New Features:
- Add GPU support and entity scores to
EntityExtractor
- Standardize the function naming in strategies
New Features:
- Re-implement file reading methods + add unit tests
- Expland the test environment on all OS
New Features:
- Add unit tests
- Refine the Entity implementation
- Update documentation
Bug Fixes:
- Fix the
LANGUAGES
constant
New features:
- Add
read_json
function - Add
write_json
function - Add blog post on anonymizing collections of documents
- Reduce the number of viable suggestions used to create a substitute in
MaskLabelGenerator
- Add the entity label to the replacements in strategies
Bug Fixes:
- Fix the entity regex checking in
EntityExtractor
New Features:
- Add
write_file
function - Add blog to the documentation
- Initial release