A tool that can mask words in an image file. User specifies which words should be masked, it can be any combinations of the following three types.
- Regular expressions
- Keywords
- PII (Personally Identifiable Information)
Under the hood, it use the Text Detection capability in Amazon Rekognition or AWS Textract to detect the texts in the image, then use Amazon Comprehend to detect PII entities in texts.
There are many use cases for such tooling, e.g masking sensitive information (PII) in the shared images on social media, 2nd hand car sales website and so on. The repository also includes a sample code of using this tool to detect and mask senstive information in Slack.
- Python 3.8+
- AWS account
-
Install the modules
pip install -r src/requirements.txt
. -
Set up your AWS credential.
-
Follow the below usage to mask your first image. Here are some examples:
- Mask AWS account ID in the design diagram:
python src/mask_it.py -i assets/design_diagram.png -r rules.yaml
- Mask the keyword "stack-set-id" in the code sample:
python src/mask_it.py -i assets/code_sample.png -k stack-set-id
- Mask AWS account ID in the design diagram:
usage: mask_it.py [-h] -i INPUT_FILE [-o OUTPUT_FILE] [-r RULES_FILE] [-k KEYWORDS] [-e EXCLUDE_WORDS] [-t {document,photo}] [--pii] [--pii-confidence-threshold PII_CONFIDENCE_THRESHOLD] [--use-grey-rectangle] [--verbose]
options:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
Original image file
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Masked image output file
-r RULES_FILE, --rules-file RULES_FILE
Rules configuration file
-k KEYWORDS, --keywords KEYWORDS
Mask the keywords (case insensitive)
-e EXCLUDE_WORDS, --exclude-words EXCLUDE_WORDS
Exclude the words (case insensitive)
-t {document,photo}, --image-type {document,photo}
Image type (default is document)
--pii Mask Personally Identifiable Information (PII)
--pii-confidence-threshold PII_CONFIDENCE_THRESHOLD
PII detection confidence threshold (in percentage). Default is 0.8
--use-grey-rectangle Use grey rectangle masking (default is asterisk)
--verbose Debugging mode
- Use
-t photo
if the image is a photo of the real world (by default it is-t document
). - Use
-k <keyword1> -k <keyword2>...
to specify mutliple keywords. - Use
-e <keyword1> -e <keyword2>...
to exclude multiple keywords (e.g exclude the words from the rules or pii). - Use
--pii
to mask PII in an image, e.g vehicle registration plate in a car image. - Use
--pii-confidence-threshold
to adjust the PII detection confidence threshold (in percentage). - Use
--use-grey-rectangle
if you prefer grey rectangle rather than asterisks. - Use
--verbose
to see detailed information.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.