Skip to content

niv1612/pii-challenge-306524120

Repository files navigation

Personally Identifiable Information (PII) Detector

The Data Direct service enables the sharing of Comma Separated Value (CSV) files. The objective of this assignment is to enhance the service by implementing a sanitization pipeline that removes PII.

Objective

Implement a sanitization pipeline that removes Email addresses.

Prerequisites

Instructions

You are provided with two S3 buckets: input and output. The input bucket stores uploaded files, while the output bucket is for sanitized files (objects). The sanitization pipeline should scan each object in the input bucket. If an Email address is detected, the object should be moved to the blocked prefix within the input bucket. Otherwise, if no PII is detected, the object should be moved to the output bucket.

Clarification

  1. Email address has its own dedicated value/column in the CSV
  2. Each uploaded file is not larger than 64MB
  3. Store your implementation in my-solution Git branch
  4. Use src directory for the implementation
  5. All status checks must pass before submitting the assignment
  6. You may add your own workflows but do not modify existing steps
  7. Do not modify the tests directory
  8. We leverage Localstack, see Localstack AWS services supportability for more information

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published