A package for text anonymization and deanonymization (pseudonymization) using Presidio.
- All built-in presidion recognizers
- Sort code recognizer (any 6 digit number seperated by spaces or dashes, OR malformed sequence of 6 numbers which are preceded by the words "sort code" )
The project follows a structured layout to organize the code, configurations, and documentation effectively. Below is an overview of the project structure:
presidio/
├── anonymizer/
│ ├── __init__.py
│ └── anonymizer.py
├── fastapi_app/
│ ├── __init__.py
│ └── main.py
├── notebooks/
│ ├── pseudonomyzation.ipynb
│ ├── SortCode_Anon.ipynb
│ ├── SortCode_Context.ipynb
│ └── Test.ipynb
├── README.md
├── environment.yml
├── setup.py
├── .gitignore
└── tests/
├── __init__.py
├── test_anonymizer.py
└── test_sortcode_recognizer.py
-
anonymizer/
: The main package directory containing the implementation of the anonymizer functionalities.__init__.py
: An initialization file for the package. It makes the directory a Python package and allows you to import thesetup_presidio
function.anonymizer.py
: Contains the implementation of the Presidio setup and custom anonymizer logic.
-
tests/
: Directory containing test cases for the project.__init__.py
: Initialization file for the tests directory.test_anonymizer.py
: Contains test cases for the anonymizer functionalities.test_sortcode_recognizer.py
: Contains test cases for the sort code recognizer.
-
fastapi_app/
: The main directory for the fastapi application__init__.py
: An initialization file for the package.main.py
: FastAPI application containing the endpoints to interact with the presidio package
-
notebooks/
: Directory containing Jupyter notebooks for dev work on presidio components. -
setup.py
: Setup script that packages up the presidio anonymizer code as a pip package. -
README.md
: The main documentation file for the project. It provides an overview, installation instructions, and usage examples. -
environment.yaml
: Conda environment yaml file. -
.gitignore
: Specifies files and directories that should be ignored by Git. This typically includes environment files like.env
.
conda env create -f environment.yml
To build the pip package with the anonymizer code we use the setup.py script, run this command in the project working directory.
python setup.py sdist bdist_wheel
Once the setup.py is complete it will create the dist folder which will contain the package to be uploaded to whatever package repository needed.
Run the following command to run the application locally:
uvicorn main:app --reload
The application will be available at http://127.0.0.1:8000/
API Endpoints:
GET /: Serve the index page.
POST /api/process: Anonymize input text, send it to the OpenAI API, and deanonymize the response.
Example Request:
curl -X POST "http://127.0.0.1:8000/api/process" -H "Content-Type: application/json" -d '{"text": "Your input text here."}'
Example response:
{
"text": "Processed and deanonymized response text."
}