From 9289b73824f806230d52f9e28444512052dd4681 Mon Sep 17 00:00:00 2001 From: jzonthemtn Date: Wed, 23 Oct 2024 18:27:14 -0400 Subject: [PATCH] Deployed 96ff2d1 with MkDocs version: 1.6.1 --- policies/filter_strategies/index.html | 16 ++++++++-------- search/search_index.json | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/policies/filter_strategies/index.html b/policies/filter_strategies/index.html index 2264b6e..bb8ce88 100644 --- a/policies/filter_strategies/index.html +++ b/policies/filter_strategies/index.html @@ -1708,14 +1708,14 @@

Filter Strategies

Filter Strategies

The filter strategies are described below. Each filter type can specify zero or more filter strategies. When no filter strategies are given, Philter will default to REDACT for that filter type. When multiple filter strategies are given for a single filter type, the filter strategies will be applied in order as they are listed in the policy, top to bottom.

The REDACT Filter Strategy

The REDACT filter strategy replaces sensitive information with a given redaction format. You can put variables in the redaction format that Philter will replace when performing the redaction.

diff --git a/search/search_index.json b/search/search_index.json index 34b34a2..7c45193 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Philter","text":"

This documentation applies to Philter 2.4.0. If you are upgrading to this version see Upgrading.

Philter is an API-based application that finds and redacts sensitive information, such as protected health information (PHI) and personally identifiable information (PII), and user-defined sensitive information from natural language text. Philter is ideal for usage in text processing pipelines where sensitive information needs removed, encrypted, or redacted from the text.

"},{"location":"#quick-start","title":"Quick Start","text":"

To get going fast, jump to one of the Quick Starts:

"},{"location":"#open-source","title":"Open Source","text":"

Philter is open source software.

"},{"location":"deidentification/","title":"De-identification Methods","text":"

There are several ways data can be de-identified, and which you use depends on the types of data you want to de-identify and your use-case for de-identifying the data. The terminology around the different methods is often used interchangeably, but there are differences between each method.

In this User's Guide, we may use the terms filter and redact interchangeably.

In Philter, de-identification methods vary for each type of sensitive information. For example, all types can be replaced or redacted, but only dates can be shifted and only zip codes can be truncated. How a de-identification method is applied by Philter is called a filter strategy. Each type of sensitive information can have one or more filter strategies, and the combination of the filter strategies you select is called a policy. A policy determines how a document will be de-identified.

The following is a list of de-identification methods that describes how each method works and its applicability to Philter. Deidentifying a document is likely to require a combination of the following methods. For instance, you may want to redact names, encrypt credit card numbers, and shift appointment dates.

"},{"location":"deidentification/#summary-of-deidentification-methods","title":"Summary of Deidentification Methods","text":"De-identification MethodDescriptionReplacementReplaces sensitive information with a defined value. For example, you might want to replace a credit card number with the literal value \"CREDIT_CARD_NUMBER\".Redaction and MaskingRemoves sensitive information. Philter gives you a choice of how to remove the sensitive information, whether it is by replacing it with ***** (masking) or by some other set of characters.EncryptionEncrypts sensitive information.Date ShiftingShifts dates either forward or backward by some interval.BucketingCategorizes data into buckets based on the data. Examples of bucketing is Philter can bucket dates into years, and zip codes by population.

A difference between Philter and other services is that Philter does not send your data to a third party for de-identification. Philter runs in your cloud and your data stays in your cloud.

"},{"location":"deidentification/#deidentification-methods","title":"Deidentification Methods","text":""},{"location":"deidentification/#redaction-and-masking","title":"Redaction and Masking","text":"

Redaction and masking are two methods of de-identification that are often used interchangeably. The term redaction refers to removing a sensitive value from a document. When we hear the term redaction we often think of an image of a document with black bars across pieces of the text.

Masking is similar to redaction but allows for configuring how the sensitive value is removed. The most common example is using asterisks (i.e. ******) in place of a sensitive value.

"},{"location":"deidentification/#replacement","title":"Replacement","text":"

Replacement is a method of de-identification that simply replaces a sensitive value with another value. Replacement is useful when the sensitive value is not needed once the document has been de-identified. Philter can replace a sensitive value with a preset value or with a random value.

In Philter's filter strategies, replacement is achieved by using the strategy to REDACT, STATIC_REPLACE , or RANDOM_REPLACE .

"},{"location":"deidentification/#bucketing","title":"Bucketing","text":""},{"location":"deidentification/#date-shifting","title":"Date Shifting","text":""},{"location":"deidentification/#encryption","title":"Encryption","text":""},{"location":"evaluating-performance/","title":"How to Evaluate Philter'ss Performance","text":"

A common question we receive is how well does Philter perform? Our answer to this question is probably less than satisfactory because it simply depends. What does it depend on? Philter's performance is heavily dependent upon your individual data. Sharing to compare metrics of Philter's performance between different customer datasets is like comparing apples and oranges.

If your data is not exactly like another customer's data then the metrics will not be applicable to your data. In terms of the classic information retrieval metrics precision and recall, comparing these values between customers can give false impressions about Philter's performance, both good and bad.

This guide walks you through how to evaluate Philter's performance. If you are just getting started with Philter please see the Quick Starts instead. Then you can come back here to learn how to evaluate Philter'ss performance.

"},{"location":"evaluating-performance/#guide-to-evaluating-performance","title":"Guide to Evaluating Performance","text":"

We have created this guide to help guide you in evaluating Philter's performance on your data. The guide involves determining the types of sensitive information you want to redact, configuring those filters, optimizing the configuration, and then capturing the performance metrics.

If you are using Philter we will gladly perform these steps for you and provide you a detailed Philter performance report generated from your data. Please contact us to start the process.

"},{"location":"evaluating-performance/#what-you-need","title":"What You Need","text":"

To evaluate Philter's performance you need:

"},{"location":"evaluating-performance/#configuring-philter","title":"Configuring Philter","text":"

Before we can begin our evaluation we need to create a policy. A policy is a file that defines the types of sensitive information that will be redacted and how it will be redacted. The policies are stored on the Philter instance under /opt/Philter/policies. You can edit the policies directly there using a text editor or you can use Philter's API to upload a policy. In this case we recommend just using a text editor on the Philter instance to create a policy.

When using a text editor to create and edit a policy, be sure to save the policy often. Frequent saving can make editing a policy easier.

We also recommend considering to place your policy directory under source control to have a history and change log of your policies.

"},{"location":"evaluating-performance/#creating-a-policy","title":"Creating a Policy","text":"

Make a copy of the default policy, and we will modify the copy for our needs.

cp /opt/Philter/policies/default.json /opt/Philter/policies/evaluation.json

Now open /opt/Philter/policies/evaluation.json in a text editor. (The content of evaluation.json will be similar to what's shown below but may have minor differences between different versions of Philter.)

{\n   \"name\": \"default\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

The first thing we need to do is to set the name of the policy. Replace default with evaluation and save the file.

"},{"location":"evaluating-performance/#identifying-the-filters-you-need","title":"Identifying the Filters You Need","text":"

The rest of the file contains the filters that are enabled in the default policy. We need to make sure that each type of sensitive information that you want to redact is represented by a filter in this file. Look through the rest of the policy and determine which filters are listed that you do not need and also which filters you do need that are not listed.

"},{"location":"evaluating-performance/#disabling-filters-we-do-not-need","title":"Disabling Filters We Do Not Need","text":"

If a filter is listed in the policy, and you do not need the filter you have two options. You can either delete those lines from the policy and save the file, or you can set the filter's enabled property to false. Using the enabled property allows you to keep the filter configuration in the policy in case it is needed later but both options have the same effect.

"},{"location":"evaluating-performance/#enabling-filters-not-in-the-default-policy","title":"Enabling Filters Not in the Default Policy","text":"

Let's say you want to redact bitcoin addresses. The bitcoin address filter is not in the default policy. To add the bitcoin address filter we will refer to Philter's documentation on the bitcoin address filter, get the configuration, and copy it into the policy.

From the bitcoin address filter documentation we see the configuration for the bitcoin address filter is:

      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n

We can copy this configuration and paste it into our policy:

{\n   \"name\": \"evaluation\",\n   \"identifiers\": {\n      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

The order of the filters in the policy does not matter and has no impact on performance. We typically place the filters in the policy alphabetically just to improve readability.

Repeat these steps until you have added a filter for each of the types of sensitive information you want to redact. Typically, the default redaction strategy and redactionFormat values for each filter should be fine for evaluation.

When finished modifying the policy, save the file and close the text editor. Now restart Philter for the policy changes to be loaded:

sudo systemctl restart Philter\n
"},{"location":"evaluating-performance/#submitting-text-for-redaction","title":"Submitting Text for Redaction","text":"

With our policy in place we can now send text to Philter for redaction using that policy:

PhilterConfiguration PhilterConfiguration = ConfigFactory.create(PhilterConfiguration.class);\n\nFilterService filterService = new PhilterFilterService(PhilterConfiguration);\n\nFilterResponse response = filterService.filter(policies, context, documentId, body, MimeType.TEXT_PLAIN);\n

The explain API endpoint produces a detailed description of the redaction. The response will include a list of spans that contain the start and stop positions of redacted text and the type of sensitive information that was redacted. Using this information we can compare the redacted information to our annotated file to calculate precision and recall metrics.

"},{"location":"evaluating-performance/#calculating-precision-and-recall","title":"Calculating Precision and Recall","text":"

Now we can calculate the precision and recall metrics.

"},{"location":"monitoring_and_logging/","title":"Monitoring and Logging","text":""},{"location":"monitoring_and_logging/#service-management","title":"Service Management","text":"

Philter installs itself as a system service. The service can be controlled using the commands:

sudo systemctl stop philter\nsudo systemctl start philter\nsudo systemctl restart philter\nsudo systemctl status philter\n

Philter is installed in the /opt/philter directory. This directory contains the Philter binaries, configuration files, and supporting files.

"},{"location":"monitoring_and_logging/#metrics","title":"Metrics","text":"

Philter collects metrics while running to provide insights into its operation and the text being processed. The metrics collected include a count of the documents processed by Philter, counts of the types of sensitive information identified per type, and the entity confidence values of entities extracted by non-deterministic natural language processing methods. These metrics can be reported via JMX, and to external services Prometheus, Amazon CloudWatch, and Datadog).

"},{"location":"monitoring_and_logging/#reporting-metrics-to-prometheus","title":"Reporting Metrics to Prometheus","text":"

To enable Philter metric reporting to Prometheus modify Philter's Settings to enable the Prometheus metrics. When enabled, the metrics HTTP endpoint will be http://philter-ip:9100/metrics.

Enable scraping of Philter's metrics in Prometheus' settings:

global:\n  scrape_interval: 10s\n\nscrape_configs:\n- job_name: philter\n  static_configs:\n  - targets: ['10.0.2.104:9100']\n

You may need to make port 9100 accessible to Prometheus. For example, if you launch Philter in AWS you will need to modify Philter's security group to permit inbound network traffic on port 9100 to Prometheus.

"},{"location":"monitoring_and_logging/#reporting-metrics-to-amazon-cloudwatch","title":"Reporting Metrics to Amazon CloudWatch","text":"

To enable Philter metric reporting to Amazon CloudWatch modify Philter's Settings to set the AWS properties. Metrics will be published to CloudWatch every 60 seconds, by default, when enabled.

The AWS IAM user or role being used should have PutMetricData permissions:

{\n    \"Version\": \"2012-10-17\",\n    \"Statement\": [\n        {\n            \"Sid\": \"VisualEditor0\",\n            \"Effect\": \"Allow\",\n            \"Action\": [\n                \"cloudwatch:PutMetricData\"\n            ],\n            \"Resource\": \"*\"\n        }\n    ]\n}\n

The metrics will be published to the Amazon CloudWatch namespace provided in Philter's settings. Amazon CloudWatch can then be used to visualize the metrics, set performance alarms, or perform other integrations with AWS services.

"},{"location":"monitoring_and_logging/#reporting-metrics-to-datadog","title":"Reporting Metrics to Datadog","text":"

Metrics will be published to Datadog every 60 seconds when enabled.

Metrics published to Datadog will have a philter prefix.

The metrics can be used to make graphs and dashboards.

"},{"location":"monitoring_and_logging/#reporting-metrics-to-jmx","title":"Reporting Metrics to JMX","text":"

Metrics in JMX can be viewed using visualvm or similar tool.

"},{"location":"monitoring_and_logging/#metrics-collected-and-reported","title":"Metrics Collected and Reported","text":"

The listing below shows an example of the metrics Philter collects and writes to standard out while running. The metrics reported to supported services such as JMX, Amazon CloudWatch and Datadog will contain the same metrics but may be represented or visualized differently between the services.

The metrics collected include:

These metrics will be reset when Philter is stopped and restarted.

"},{"location":"monitoring_and_logging/#logging","title":"Logging","text":"

Philter's log file can be viewed using the command journalctl -u philter. This log should be the first place checked for more information on Philter's status.

The log level can be set using the logging.level.root property in Philter's Settings.

Philter's log file may contain sensitive information. It is possible that through the normal use of Philter, sensitive information may be written to the log file.

"},{"location":"pii_phi_nppi/","title":"PII, PHI, and NPPI","text":"

Philter can redact many predefined types of sensitive information through filters. Each type of predefined sensitive information is described below.

"},{"location":"pii_phi_nppi/#predefined-types-of-pii-in-philter","title":"Predefined Types of PII in Philter","text":"

The types of sensitive information that Philter will identify is customizable. For example, if you are not interested in VIN numbers you can have Philter ignore them. This configuration is performed through Policies.

Because Philter only operates on text, the biometric identifiers and face images outlined in the HIPAA regulations as PHI are not applicable to Philter. The types of sensitive information and how Philter identifies each one is listed in the table below.

Type of PHI How Philter Identifies It 1

Names

Ex: John Smith, Jane Doe

2

All geographical identifiers smaller than a state, except for the initial three digits of a zip code if, according to the current publicly available data from the U.S. Bureau of the Census: the geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000

Ex: 85055, 90213-1544

3

Dates (other than year) directly related to an individual

Ex: 10-10-2000. 10/10/2000, October 10, 2000

4

Phone Numbers

Ex: (304) 555-5555, 304-555-5555, 1-800-123-4567

5

Fax numbers

Ex: (304) 555-5555, 304-555-5555, 1-800-123-4567

6

Email addresses

Ex: john.fake.address@hotmail.com

7

Social Security numbers

Ex: 123-45-6789, 123456789

8

Medical record numbers

Ex: 86637729, AB473-6021, 473-6AB021

9

Health insurance beneficiary numbers

Ex: 86637729, AB473-6021, 473-6AB021

10

Account numbers

Ex: 86637729, AB473-6021, 473-6AB021

11

Certificate/license numbers

Ex: 86637729, AB473-6021, 473-6AB021

12

Vehicle identifiers and serial numbers, including license plate numbers

Ex: WBAPM7G50ANL19218, 1GBJC34K3RE176005

13

Device identifiers and serial numbers

Ex: H3SNPUHYEE7JD3H, 33778376

14

Web Uniform Resource Locators (URLs)

Ex: myhomepage.com, http://myhomepage.com/folder/page.html, www.myhomepage.com/folder/page.html

15

Internet Protocol (IP) address numbers

Ex: 127.0.0.1, 192.168.3.58, 2001:0db8:85a3:0000:0000:8a2e:0370:7334

16 Biometric identifiers, including finger, retinal and voice prints 17 Full face photographic images and any comparable images 18

Any other unique identifying number, characteristic, or code except the unique code assigned by the investigator to code the data

Ex: 86637729, AB473-6021, 473-6AB021

"},{"location":"settings/","title":"Settings","text":"

Phileas has settings to control how it operates. The settings and how to configure each are described below.

The configuration for the types of sensitive information that Phileas identifies are defined in filter policies outside of Phileas' configuration properties described on this page.

"},{"location":"settings/#configuring-phileas","title":"Configuring Phileas","text":""},{"location":"settings/#the-phileas-settings-file","title":"The Phileas Settings File","text":"

Phileas looks for its settings in an application.properties file.

"},{"location":"settings/#using-environment-variables","title":"Using Environment Variables","text":"

Properties set via environment variables take precedence over properties set in Phileas' settings file.

All following properties can also be set as environment variables by prepending PHILTER_ to the property name and changing periods to underscores. For example, the property filter.profiles.directory can be set using the environment variable PHILTER_FILTER_PROFILES_DIRECTORY by:

export PHILTER_FILTER_PROFILES_DIRECTORY=/profiles/\n

Using environment variables to configure Phileas instead of using Phileas' settings file can allow for easier configuration management when deploying Phileas.

"},{"location":"settings/#policies","title":"Policies","text":"Setting Description Allowed Values Default Value filter.policies.directory The directory in which to look for policies. Any valid directory path. ./policies/"},{"location":"settings/#span-disambiguation","title":"Span Disambiguation","text":"

These values configure Phileas' span disambiguation feature to determine the most appropriate type of sensitive information when duplicate spans are identified. In a deployment of multiple Phileas instances, you must enable the cache service for span disambiguation to work as expected.

Description Allowed Values Default Value span.disambiguation.enabled Whether or not to enable span disambiguation. true, false false"},{"location":"settings/#cache-service","title":"Cache Service","text":"

The cache service is required to use consistent anonymization and policies stored in Amazon S3. Phileas supports Redis as the backend cache. When Redis is not used, an in-memory cache is used instead. The in-memory cache is not recommended because all contents will be stored in memory on the local Phileas instance.

The cache will contain sensitive information. It is important that you take the necessary precautions to secure the cache itself and all communication between Phileas and the cache.

Setting Description Allowed Values Default Value cache.redis.enabled Whether or not to use Redis as the cache. true, false false cache.redis.host The hostname or IP address of the Redis cache. Any valid Redis endpoint. None cache.redis.port The Redis cache port. Any valid port. 6379 cache.redis.auth.token The Redis auth token. Any valid token. None cache.redis.ssl Whether or not to use SSL for communication with the Redis cache. true, false false

The following Redis settings are only required when using a self-signed SSL certificate.

Setting Description Allowed Values Default Value cache.redis.truststore The path to the trust store. Any valid file path. None cache.redis.truststore.password The trust store password. Any valid file path. None cache.redis.keystore The path to the keystore. Any valid file path. None cache.redis.keystore.password The keystore password. Any valid file path. None"},{"location":"settings/#advanced-settings","title":"Advanced Settings","text":"

In most cases the settings below do not need changed. Contact us for more information on any of these settings.

Setting Description Allowed Values Default Value ner.timeout.sec Controls the timeout in seconds when performing name entity recognition. Longer text may require longer processing times. An integer value 600 ner.max.idle.connections The maximum number of idle connections to maintain for the named entity recognition. More connections may improve performance in some cases. An integer value. 30 ner.keep.alive.duration.ms The amount of time in milliseconds to keep named entity recognition connections alive. Longer text may require longer processing times. An integer value. 60"},{"location":"system_requirements/","title":"System Requirements","text":"

When launched from a cloud marketplace, Philter is pre-configured and contains all required dependencies.

Philter requires the following:

"},{"location":"upgrading/","title":"Upgrading Philter","text":"

We recommend reviewing the Philter Release Notes prior to upgrading.

"},{"location":"upgrading/#upgrading-from-a-2x-version","title":"Upgrading from a 2.x Version","text":"

Upgrading Philter to the newest version requires moving Philter's configuration to the new version of Philter. To upgrade Philter from a 2.x version, follow the steps below.

  1. Launch a new instance of the newest version of Philter.
  2. Copy your policies from /opt/philter/policies to the new instance.
  3. Copy your /opt/philter/philter.properties to the new instance.
  4. Copy your /opt/philter/philter-ui.properties to the new instance.
  5. Replace the new virtual machine's properties file with your copy from step 1.
  6. Copy your policies from /opt/philter/policies to the new instance.
  7. If you have configured any SSL certificates for Philter, copy those files over to the new instance.
  8. Restart Philter: sudo systemctl restart philter.service && sudo systemctl restart philter-ui.service && sudo systemctl restart philter-ner.service
  9. Test the new Philter virtual machine to make sure it is behaving as expected.
  10. Decommission the old Philter instance.
"},{"location":"upgrading/#upgrading-from-a-1x-version","title":"Upgrading from a 1.x Version","text":"

Upgrading Philter to the newest version requires moving Philter's configuration to the new version of Philter. To upgrade Philter from a 1.x version, follow the steps below.

  1. Make local copies of your current Philter's properties files.

  2. /opt/philter/philter.properties (prior to 1.10.1 the filename was /opt/philter/application.properties)

  3. /opt/philter/philter-ui.properties (not applicable prior to version 1.10)

  4. Launch a new instance of the newest version of Philter.

  5. Replace the new virtual machine's properties file with your copy from step 1.
  6. Restart Philter: sudo systemctl restart philter.service sudo systemctl restart philter-ui.service sudo systemctl restart philter-ner.service
  7. Test the new Philter virtual machine to make sure it is behaving appropriately.
  8. Decommission the old Philter instance.
"},{"location":"api_and_sdks/api/","title":"API","text":"

Philter's API is divided into three parts, the Filtering API the Policies API, and the Alerts API.

The Philter SDKs provide convenient methods for using Philter's API methods for various programming languages.

"},{"location":"api_and_sdks/api/#securing-philters-api","title":"Securing Philter's API","text":"

Philter's API supports one-way and two-way SSL/TLS authentication. See the settings for more information.

"},{"location":"api_and_sdks/sdks/","title":"Client SDKs","text":"

Philter SDKs are available for use in your projects. The SDKs are licensed under the Apache License, version 2.0]. Refer to the GitHub projects below for your language of choice for usage examples.

"},{"location":"api_and_sdks/api/alerts_api/","title":"Alerts API","text":"

The Alerts API provides endpoints for retrieving and deleting alerts. Alerts can optionally be generated when a filter strategy's condition is met. See Alerts for more information on Philter alerts.

The curl example commands shown on this page are written assuming Philter has been enabled for SSL and it is using a self-signed certificate. If launched from a cloud marketplace, SSL will be enabled automatically with a self-signed SSL certificate. See the SSL/TLS settings for more information. {style=\"note\"}

"},{"location":"api_and_sdks/api/alerts_api/#get-alerts","title":"Get Alerts","text":"Method Endpoint Description GET /api/alerts Get alerts.

Example request:

curl -k https://localhost:8080/api/alerts\n
"},{"location":"api_and_sdks/api/alerts_api/#delete-an-alert","title":"Delete an Alert","text":"Method Endpoint Description DELETE /api/alerts/{alertId} Delete an alert, where alertId is the ID of the alert to delete.

Example request to delete an alert with id 12345:

curl -k -X DELETE https://localhost:8080/api/alerts/12345\n
"},{"location":"api_and_sdks/api/filtering_api/","title":"Filtering API","text":"

Philter\u2019s filtering API provides access to Philter\u2019s ability to filter sensitive information from text and to retrieve the health status of Philter.

The curl example commands shown on this page are written assuming Philter has been enabled for SSL and it is using a self-signed certificate. If launched from a cloud marketplace, SSL will be enabled automatically with a self-signed SSL certificate. See the SSL/TLS settings for more information. {style=\"note\"}

Each filter request can optionally have a context. When not provided, the context defaults to none. Contexts provide a means for logically grouping your documents during filtering. For example, documents pertaining to one health care provider may be submitted under the context hospital1, and documents pertaining to another health care provider may be submitted under the context hospital2.

The context for each filter request impacts how sensitive information is replaced when found in the text. Consistent anonymization can be enabled at either the context or document level. When enabled at the context level, all instances of a given piece of sensitive information will be replaced consistently by the same value. This allows for maintaining meaning across all documents in the context.

Each filter request submitted to Philter is automatically assigned a document identifier. The document identifier is an alphanumeric value unique to that request. No two documents should be assigned the same document identifier. The document identifier is returned in the x-document-id header with each filter or explain API response.

"},{"location":"api_and_sdks/api/filtering_api/#filter","title":"Filter","text":"

The filter endpoint receives plain text or a PDF document and returns the redacted text or redacted PDF document.

The types of sensitive information found and how each type is redacted is determined by the chosen policy.

Method Endpoint Description POST /api/filter Filter the given text."},{"location":"api_and_sdks/api/filtering_api/#query-parameters","title":"Query Parameters","text":""},{"location":"api_and_sdks/api/filtering_api/#headers","title":"Headers","text":"

Example request to filter plain text:

curl -k -X POST \"https://localhost:8080/api/filter\" -d @file.txt -H Content-Type \"text/plain\"\n

Example request to filter a PDF document:

curl -k -X POST \"https://localhost:8080/api/filter?\" -d @file.pdf -H Content-Type \"application/pdf\" -O redacted.zip\n
"},{"location":"api_and_sdks/api/filtering_api/#explain","title":"Explain","text":"

The explain endpoint behaves much like the filter endpoint in that receives plain text and returns the redacted plain text. However, the explain endpoint provides a detailed explanation describing how the text was redacted. Also, the explain endpoint does not support PDF documents.

The types of sensitive information found and how each type is redacted is determined by the chosen policy.

Method Endpoint Description POST /api/explain Filter the given text and provide a detailed explanation."},{"location":"api_and_sdks/api/filtering_api/#query-parameters_1","title":"Query Parameters","text":""},{"location":"api_and_sdks/api/filtering_api/#headers_1","title":"Headers","text":"

Example explain request:

curl -k -X POST \"https://localhost:8080/api/explain\" -d @file.txt -H Content-Type \"text/plain\"\n

Example explain response:

{\n  \"filteredText\": \"{{{REDACTED-entity}}} was a patient and his ssn was {{{REDACTED-ssn}}}.\",\n  \"context\": \"none\",\n  \"documentId\": \"7a906866-4fc9-44d6-9bc3-22728b93a602\",\n  \"explanation\": {\n    \"appliedSpans\": [\n      {\n        \"id\": \"c78fb69c-84d6-4189-b376-63791793cbd2\",\n        \"characterStart\": 0,\n        \"characterEnd\": 17,\n        \"filterType\": \"NER_ENTITY\",\n        \"context\": \"C1\",\n        \"documentId\": \"7a906866-4fc9-44d6-9bc3-22728b93a602\",\n        \"confidence\": 0.9189682900905609,\n        \"text\": \"George Washington\",\n        \"replacement\": \"{{{REDACTED-entity}}}\",\n        \"ignored\": false\n      },\n      {\n        \"id\": \"f4556f62-2f80-4edc-96f0-aa1d44802157\",\n        \"characterStart\": 48,\n        \"characterEnd\": 59,\n        \"filterType\": \"SSN\",\n        \"context\": \"C1\",\n        \"documentId\": \"7a906866-4fc9-44d6-9bc3-22728b93a602\",\n        \"confidence\": 1,\n        \"text\": \"123-45-6789\",\n        \"replacement\": \"{{{REDACTED-ssn}}}\",\n        \"ignored\": false\n      }\n    ],\n    \"ignoredSpans\": []\n  }\n}\n
"},{"location":"api_and_sdks/api/filtering_api/#status","title":"Status","text":"

The status endpoint is useful in determining the current state of Philter. The status endpoint can be used by monitoring software to assess Philter's availability or by your cloud provider for purposes of determining Philter's health when deployed behind a load balancer.

Method Endpoint Description GET /api/status Gets the status of Philter.

Example request:

curl -k -X POST \"https://localhost:8080/api/status\"\n
"},{"location":"api_and_sdks/api/policies_api/","title":"Policies API","text":"

The Policies API provides endpoints for retrieving, uploading, and deleting policies.

The curl example commands shown on this page are written assuming Philter has been enabled for SSL and it is using a self-signed certificate. If launched from a cloud marketplace, SSL will be enabled automatically with a self-signed SSL certificate. See the SSL/TLS settings for more information. {style=\"note\"}

"},{"location":"api_and_sdks/api/policies_api/#get-policy-names","title":"Get Policy Names","text":"Method Endpoint Description GET /api/policies Get the names of all policies.

Example request:

curl -k https://localhost:8080/api/policies\n
"},{"location":"api_and_sdks/api/policies_api/#get-a-policy","title":"Get a Policy","text":"Method Endpoint Description GET /api/policies/{policyName} Get the content of a policy, where {policyName} is the name of the policy to get.

Example request:

curl -k https://localhost:8080/api/policies/my-policy\n

Example response:

{\n  \"name\": \"just-phone-numbers\",\n  \"ignored\": [\n  ],\n  \"identifiers\": {\n    \"dictionaries\": [\n    ],\n    \"phoneNumber\": {\n      \"phoneNumberFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"api_and_sdks/api/policies_api/#upload-a-policy","title":"Upload a Policy","text":"Method Endpoint Description PUT /api/policies/{policyName} Upload a policy, where {policyName} is the name of the policy to get. If a policy with this name already exists it will be overwritten.

Example request:

curl -X PUT -H \"Content-Type: application/json\" -k https://localhost:8080/api/profiles/my-profile -d @policy.json\n
"},{"location":"api_and_sdks/api/policies_api/#delete-a-policy","title":"Delete a Policy","text":"Method Endpoint Description DELETE /api/policies/{policyName} Delete a policy, where {policyName} is the name of the policy to delete.

Example request:

curl -X DELETE -k https://localhost:8080/api/policies/exprofile\n
"},{"location":"howtos/apache_proxy/","title":"How to Use an Apache Reverse Proxy with Philter","text":"

Running the Apache web server in front of Philter can have a few benefits. You can use Apache's authentication mechanisms to have greater control over who can access Philter's API, you can use SSL termination at Apache, use Apache's logs for access statistics, for example.

When terminating the SSL at Apache, make sure that the Apache reverse proxy and Philter are running on the same host so unencrypted traffic is not being sent over the network. To install and configure Apache on CentOS, RHEL and Amazon Linux follow the steps below. First, install the Apache:

sudo yum install httpd\n

Create the Philter configuration by creating a configuration file at /etc/httpd/conf.d/philter.conf:

<VirtualHost *:80>\n\n  ProxyPreserveHost On\n  ServerName philter.mydomain.com\n\n  LogLevel warn\n  ErrorLog logs/philter.mydomain.com-error_log\n  CustomLog logs/philter.mydomain.com-access_log combined\n\n  <Location />\n    ProxyPass http://localhost:8080/\n    ProxyPassReverse http://localhost:8080/\n  </Location>\n\n</VirtualHost>\n

Start Apache:

sudo systemctl start httpd\n

Make sure it started successfully:

sudo systemctl status httpd\n

Set the Apache service to start automatically:

sudo systemctl enable httpd\n

Verify you can access Philter through the reverse proxy:

curl http://philter.mydomain.com/api/status\n
"},{"location":"howtos/evaluate_performance/","title":"How to Evaluate Philter's Performance","text":"

A common question we receive is how well does Philter perform? Our answer to this question is probably less than satisfactory because it simply depends. What does it depend on? Philter's performance is heavily dependent upon your individual data. Sharing to compare metrics of Philter's performance between different customer datasets is like comparing apples and oranges.

If your data is not exactly like another customer's data then the metrics will not be applicable to your data. In terms of the classic information retrieval metrics precision and recall, comparing these values between customers can give false impressions about Philter's performance, both good and bad.

This guide walks you through how to evaluate Philter's performance. If you are just getting started with Philter please see the Quick Starts instead. Then you can come back here to learn how to evaluate Philter's performance.

"},{"location":"howtos/evaluate_performance/#guide-to-evaluating-performance","title":"Guide to Evaluating Performance","text":"

We have created this guide to help guide you in evaluating Philter's performance on your data. The guide involves determining the types of sensitive information you want to redact, configuring those filters, optimizing the configuration, and then capturing the performance metrics.

We will gladly perform these steps for you and provide you a detailed Philter performance report generated from your data. Please contact us to start the process. {style=\"note\"}

"},{"location":"howtos/evaluate_performance/#what-you-need","title":"What You Need","text":"

To evaluate Philter's performance you need:

"},{"location":"howtos/evaluate_performance/#configuring-philter","title":"Configuring Philter","text":"

Before we can begin our evaluation we need to create a policy. A policy is a file that defines the types of sensitive information that will be redacted and how it will be redacted. The policies are stored on the Philter instance under /opt/philter/policies. You can edit the policies directly there using a text editor or you can use Philter's API to upload a policy. In this case we recommend just using a text editor on the Philter instance to create a policy.

When using a text editor to create and edit a policy, be sure to save the policy often. Frequent saving can make editing a policy easier.

We also recommend considering to place your policy directory under source control to have a history and change log of your policies.

"},{"location":"howtos/evaluate_performance/#creating-a-policy","title":"Creating a Policy","text":"

Make a copy of the default policy and we will modify the copy for our needs.

cp /opt/philter/policies/default.json /opt/philter/policies/evaluation.json

Now open /opt/philter/policies/evaluation.json in a text editor. (The content of evaluation.json will be similar to what's shown below but may have minor differences between different versions of Philter.)

{\n   \"name\": \"default\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

The first thing we need to do is to set the name of the policy. Replace default with evaluation and save the file.

"},{"location":"howtos/evaluate_performance/#identifying-the-filters-you-need","title":"Identifying the Filters You Need","text":"

The rest of the file contains the filters that are enabled in the default policy. We need to make sure that each type of sensitive information that you want to redact is represented by a filter in this file. Look through the rest of the policy and determine which filters are listed that you do not need and also which filters you do need that are not listed.

"},{"location":"howtos/evaluate_performance/#disabling-filters-we-do-not-need","title":"Disabling Filters We Do Not Need","text":"

If a filter is listed in the policy and you do not need the filter you have two options. You can either delete those lines from the policy and save the file, or you can set the filter's enabled property to false. Using the enabled property allows you to keep the filter configuration in the policy in case it is needed later but both options have the same effect.

"},{"location":"howtos/evaluate_performance/#enabling-filters-not-in-the-default-policy","title":"Enabling Filters Not in the Default Policy","text":"

Let's say you want to redact bitcoin addresses. The bitcoin address filter is not in the default policy. To add the bitcoin address filter we will refer to Philter's documentation on the bitcoin address filter, get the configuration, and copy it into the policy.

From the bitcoin address filter documentation we see the configuration for the bitcoin address filter is:

      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n

We can copy this configuration and paste it into our policy:

{\n   \"name\": \"evaluation\",\n   \"identifiers\": {\n      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

The order of the filters in the policy does not matter and has no impact on performance. We typically place the filters in the policy alphabetically just to improve readability.

Repeat these steps until you have added a filter for each of the types of sensitive information you want to redact. Typically, the default redaction strategy and redactionFormat values for each filter should be fine for evaluation.

When finished modifying the policy, save the file and close the text editor. Now restart Philter for the policy changes to be loaded:

sudo systemctl restart philter\n
"},{"location":"howtos/evaluate_performance/#submitting-text-for-redaction","title":"Submitting Text for Redaction","text":"

With our policy in place we can now send text to Philter for redaction using that policy:

curl -k -X POST \"https://localhost:8080/api/filter?p=evaluation\" -d @file.txt -H \"Content-Type: text/plain\"\n

In the command above, we are sending the file file.txt to Philter. The ?p=evaluation tells Philter to apply the evaluation policy that we have been editing. Philter's response to this command will be the redacted contents of file.txt as defined in the policy.

"},{"location":"howtos/evaluate_performance/#comparing-documents","title":"Comparing Documents","text":"

With the original document file.txt and the redacted contents returned by Philter, we can now compare those files to begin evaluating Philter's performance. You can diff the text to find the redacted information or use some other method.

A visual comparison provides a quick overview of how Philter is performing on your text but does not give us precision and recall metrics. To calculate these metrics we must compare the redacted document with an annotated file instead of the original file. The annotated file should have the same contents of the original file but with the sensitive information denoted or somehow marked.

There are many industry-standard ways to annotate text and many tools to assist with text annotation. We recommend using a tool to help you annotate and compare instead of performing only a visual comparison which does not provide metric values.

Let's resubmit the file to Philter but instead this time use the explain API endpoint:

curl -k -X POST \"https://localhost:8080/api/explain?p=evaluation\" -d @file.txt -H \"Content-Type: text/plain\"\n

The explain API endpoint produces a detailed description of the redaction. The response will include a list of spans that contain the start and stop positions of redacted text and the type of sensitive information that was redacted. Using this information we can compare the redacted information to our annotated file to calculate precision and recall metrics.

"},{"location":"howtos/evaluate_performance/#calculating-precision-and-recall","title":"Calculating Precision and Recall","text":"

Now we can calculate the precision and recall metrics.

"},{"location":"howtos/signed_certificate/","title":"How to Use a Signed SSL Certificate with Philter","text":"

When Philter is deployed via the AWS Marketplace, Windows Azure Marketplace or other third-party cloud marketplace, SSL will already be enabled via a self-signed certificate. It is recommended you replace this self-signed certificate with a valid certificate issued to your organization by a trusted authority. The instructions for how to do this are described below.

First, create a private key and a certificate signing request (CSR) for Philter on your domain. In this walkthrough we are using the domain philter.yourdomain.com as an example.

openssl req -new -newkey rsa:2048 -nodes -keyout philter_yourdomain_com.key -out philter_youdomain_com.csr\n

Submit the CSR to your SSL certificate vendor of choice and complete the SSL certificate ordering process. If prompted for a web server during the process, select Apache or Nginx. Once the process is complete and the certificate is issued you will receive a few files. The files you will need are summarized in the table below. The file names may vary and you may also receive other files as well.

File Name Description Creator philter_yourdomain_com.csr Certificate signing request Created by you philter_yourdomain_com.key Certificate private key Created by you philter_yourdomain_com.ca-bundle Intermediate certificates provided by the issuing authority Received from SSL authority philter_yourdomain_com.crt The SSL certificate for philter.yourdomain.com Received from SSL authority

When prompted for a keystore password we will use changeit. It's recommended you use a more secure password.

The first thing to do is to convert the certificate and the private key to PKCS12 format in philter.p12:

openssl pkcs12 -export -in philter_yourdomain_com.crt -inkey philter_yourdomain_com.key -name philter -out philter.p12\n

Now import the P12 file into a keystore philter.jks:

keytool -importkeystore -deststorepass changeit -destkeystore philter.jks -srckeystore philter.p12 -srcstoretype PKCS12\n

Add the intermediate certificate provided by the issuing authority to the keystore:

keytool -import -alias intermediate -trustcacerts -file philter_yourdomain.com.ca-bundle -keystore philter.jks\n

Update Philter's settings in application.properties:

# SSL certificate settings\nserver.ssl.key-store-type=JKS\nserver.ssl.key-store=/path/to/philter.jks\nserver.ssl.key-store-password=changeit\nserver.ssl.key-alias=philter\n

Restart Philter:

sudo systemctl restart philter\n

Execute an API status request to verify Philter is running as expected. With the -v option we can see the details of the SSL certificate:

curl -v https://philter.yourdomain.com:8080/api/status\n

Look in the response for details of the certificate. Our domain was philter.mtnfog.dev:

* Server certificate:\n*  subject: CN=philter.mtnfog.dev\n*  start date: Apr 21 00:00:00 2020 GMT\n*  expire date: Apr 21 23:59:59 2021 GMT\n*  subjectAltName: host \"philter.mtnfog.dev\" matched cert's \"philter.mtnfog.dev\"\n*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA\n*  SSL certificate verify ok.\n
"},{"location":"other_features/alerts/","title":"Alerts","text":"

Phileas can optionally generate alerts when a particular type of sensitive information is identified.

"},{"location":"other_features/alerts/#alert-conditions","title":"Alert Conditions","text":"

In a policy, each type of sensitive information can have zero or more filter strategies. Each filter strategy can optionally have a condition associated with it. When a condition is present, the filter strategy will only be satisfied when the condition is satisfied. For example, a condition may be created to only filter phone numbers that start with the digits 123 or only filter names that start with John. Filter strategy conditions give you granular control over the filtering process.

When a filter strategy condition is satisfied, Phileas can optionally generate an alert. This feature allows you to be notified when a particular type of sensitive information is identified.

"},{"location":"other_features/alerts/#enabling-alerts","title":"Enabling Alerts","text":"

Alerts are enabled on a per-condition basis. For instance, given the following policy to identify email addresses, a condition has been added to only match the email address test@test.com. Because of the property alert set to true, an alert will be generated when this condition is satisfied. By default, the alert property is set to false disabling alerts for the condition.

{\n  \"name\": \"email-address-alert\",\n  \"identifiers\": {\n    \"emailAddress\": {\n      \"emailAddressFilterStrategies\": [\n        {\n          \"id\": \"my-email-strategy\",\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\",\n          \"condition\": \"token == \\\"test@test.com\\\"\",\n          \"alert\": true\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"other_features/alerts/#structure-of-an-alert","title":"Structure of an Alert","text":"

An alert contains the following information:

Property Name Description id A unique ID for the alert formatted as an UUID. filterProfile The name of the policy triggering the alert. strategyId The ID of the filter strategy triggering the alert. In the example above the id would be my-email-strategy. context The context. documentId The ID of the document which triggered the alert. filterType The filter type (\"email-address\", \"credit-card\", etc.) triggering the alert. date A timestamp when the alert was generated formatted as yyyy-MM-dd'T'HH:mm:ss.SSS'Z'."},{"location":"other_features/alerts/#retrieving-and-deleting-alerts","title":"Retrieving and Deleting Alerts","text":"

The alerts that Phileas has generated are available through Phileas' alerts API. This API allows for retrieving and deleting alerts. Using this API you can build sophisticated notification systems around Phileas' capabilities.

"},{"location":"other_features/consistent_anonymization/","title":"Consistent Anonymization","text":"

Anonymization in the context of Philter is the process of replacing certain values with random but similar values. For example, the identified name of \u201cJohn Smith\u201d may be replaced with \u201cDavid Jones\u201d, or an identified phone number of 123-555-9358 may be replaced by 842-436-2042. A VIN number will be replaced by a 17 character randomly selected VIN number that adheres to the standard for VIN numbers.

Anonymization is useful in instances where you want to remove sensitive information from text without changing the meaning of the text. Anonymization can be enabled for each type of sensitive information in the policy by setting the filter strategy to RANDOM_REPLACE. (See Policies for more information.)

"},{"location":"other_features/consistent_anonymization/#consistent-anonymization_1","title":"Consistent Anonymization","text":"

Consistent anonymization refers to the process of always anonymizing the same sensitive information with the same replacement values. For example, if the name \"John Smith\" is randomly replaced with \"Pete Baker\", all other occurrences of \"John Smith\" will also be replaced by \"Pete Baker.\"

Consistent anonymization can be done on the document level or on the context level. When enabled on the document level, \"John Smith\" will only be replaced by \"Pete Baker\" in the same document. If \"John Smith\" occurs in a separate document it will be anonymized with a different random name. When enabled on the context level, \"John Smith\" will be replaced by \"Pete Baker\" whenever \"John Smith\" is found in all documents in the same context.

Enabling consistent anonymization on the context level requires a cache to store the sensitive information and the corresponding replacement values. If a single instance of Philter is running, its internal cache service (enabled by default) is the best choice and no additional configuration is required.

If multiple instances of Philter are deployed together, Philter requires access to a Redis cache service as shown below. See Philter' Settings on how to configure the cache.

When Philter is deployed in a cluster, a Redis cache is required to enable consistent anonymization.

The anonymization cache will contain PHI. It is important that you take the necessary precautions to secure the cache and all communication to and from the cache.

"},{"location":"other_features/dashboard/","title":"Dashboard","text":"

Philter includes a user interface dashboard that can be accessed at https://<Philter>:9000.

The Philter UI dashboard is intended only for configuration testing. Use Philter's API for document redaction.

The dashboard provides the ability to test Philter's configuration and manage policies. Text and PDF documents can be submitted through the dashboard to analyze the redacted text and modify your filter policies.

"},{"location":"other_features/span_disambiguation/","title":"Span Disambiguation","text":"

Span disambiguation is an optional feature in Philter that is disabled by default. Refer to Philter' Settings to enable and configure span disambiguation.

In Philter, a span is a piece of the input text that Philter has identified as sensitive information. A span has a start and end positions, a confidence, a type, and other attributes. Ideally, each piece of identified sensitive information will only have a single span associated with it. In this case, the type of sensitive information is unambiguous. The goal of span disambiguation is to provide more accurate filtering by removing the potential ambiguities in the types of sensitive information for duplicate spans.

However, sometimes a piece of text can be identified by multiple spans, each having a different type of sensitive information. In an example hypothetical scenario, let's say given the input text My SSN is 123456789. , Philter identifies 123456789 as an SSN and as a phone number. This type of scenario can be quite common, and its likelihood increases as the number of enabled filters in a policy increase.

"},{"location":"other_features/span_disambiguation/#how-philter-span-disambiguation-works","title":"How Philter' Span Disambiguation Works","text":"

When we read the sentence My SSN is 123456789. we can tell the span in question should be identified as an SSN because we can look at the text surrounding the span. We use the surrounding words to deduce the correct type of sensitive information for 123456789.

That is exactly how Philter' span disambiguation works. When presented with identical spans differing only by the type of sensitive information, Philter looks at the text surrounding the span in question in combination with the previous spans it has seen in the same context to determine which type of sensitive information is most likely to be correct. Philter then removes the ambiguous spans from the results and replaces them with a single span.

"},{"location":"other_features/span_disambiguation/#improves-over-time","title":"Improves Over Time","text":"

Because Philter is able to consider previously seen text to make its decision concerning ambiguous spans, Philter' span disambiguation gets \"smarter\" as more text is filtered. This is because Philter will have more text to consider in its calculations.

"},{"location":"other_features/span_disambiguation/#more-details","title":"More Details","text":""},{"location":"other_features/span_disambiguation/#span-disambiguation-and-confidence-values","title":"Span Disambiguation and Confidence Values","text":"

Span disambiguation is only invoked for spans that differ only by the type of sensitive information. This means the span's location (start and end positions), confidence, and all other values must match. If two spans have identical locations but have different confidence values, span disambiguation will not be applied and the span having the highest confidence will be used.

"},{"location":"other_features/span_disambiguation/#cache-service","title":"Cache Service","text":"

When multiple application using Philter are deployed alongside each other behind a load balancer, Philter' cache service should be configured and enabled. Philter will store the information needed to disambiguate spans in the cache such that the information is available to each instance of Philter. If only a single instance of Philter is running then the cache service is not required, however, the information needed to disambiguate spans will be stored in memory and will be lost when Philter is stopped or restarted. Because of this, we recommend the cache service always be used unless there is a specific reason not to.

"},{"location":"other_features/span_disambiguation/#fine-tuning-the-span-disambiguation","title":"Fine-Tuning the Span Disambiguation","text":"

There are properties available to fine-tune how the span disambiguation operates. These properties are not documented because improper use of the properties could have a negative impact on performance. We will be glad to walk through these properties upon request.

"},{"location":"policies/document_analysis/","title":"Document Analysis","text":"

Philter analyzes received documents prior to redacting the document. This analysis is done to help Philter get a better understanding of the document. The results of the analysis are used to exclude certain document types from redaction and to improve Philter's redaction performance.

While not recommended, the automatic document analysis can be disabled in a policy. By default, document analysis is enabled.

Disabling document analysis will cause any policy features dependent on the results of the document analysis to not function.

An example policy with disabled document analysis is shown below.

{\n  \"name\": \"email-and-phone-numbers\",\n  \"config\": {\n    \"analysis\": {\n      \"enabled\": false\n    }\n  },\n  \"identifiers\": {\n    \"emailAddress\": {\n      \"emailAddressFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/excluding_by_document_type/","title":"Excluding by Document Type","text":"

Philter can automatically detect certain types of documents and exclude those documents from redaction of certain sensitive information. For example, you want to redact SSN/TINs in all but one type of document.

To exclude a document type from a specific filter, set the excludeDocumentTypes value to a list of document types to exclude for a filter strategy. Filter strategies for all filter types support the excludeDocumentTypes property.

An example to exclude email addresses from being redacted in a subpoena document is given below:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\",\n               \"excludeDocumentTypes\": [\"SUBPOENA\"]\n            }\n         ]\n      }\n   }\n}\n

In this example, email addresses are redacted in all document types except documents Philter identifies as being subpoena documents.

"},{"location":"policies/excluding_by_document_type/#document-types-supported-by-automatic-detection","title":"Document Types Supported by Automatic Detection","text":"

Philter currently supports automatically detecting the following document types.

Document Type Document Description Subpoena Form 2540 Federal Bankruptcy - SUBPOENA FOR RULE 2004 EXAMINATION Subpoena Form 2550 - Federal Bankruptcy - SUBPOENA TO APPEAR AND TESTIFY Subpoena Form 2560 - Federal Bankruptcy - SUBPOENA TO TESTIFY AT A DEPOSITION Subpoena Form 2570 - Federal Bankruptcy - SUBPOENA TO PRODUCE DOCUMENTS Subpoena AO 88 - SUBPOENA TO APPEAR AND TESTIFY AT A HEARING OR TRIAL IN A CIVIL ACTION Subpoena AO 88A - SUBPOENA TO TESTIFY AT A DEPOSITION IN A CIVIL ACTION Subpoena AO 88B - SUBPOENA TO PRODUCE DOCUMENTS, INFORMATION, OR OBJECTS Subpoena AO 89 - SUBPOENA TO TESTIFY AT A HEARING OR TRIAL IN A CRIMINAL CASE Subpoena AO 90 - SUBPOENA TO TESTIFY AT A DEPOSITION IN A CRIMINAL CASE Subpoena AO 110 - SUBPOENA TO TESTIFY BEFORE A GRAND JURY"},{"location":"policies/filter_policies/","title":"Filter Policies","text":"

The types of sensitive information identified by Phileas and how that information is de-identified are controlled through policies. A policy is a file stored under Phileas\u2019s policies directory, which by default is located at /opt/Phileas/policies/. You can have an unlimited number of policies.

Each policy has a name that is used by Phileas to apply the appropriate de-identification methods. The name is passed to Phileas\u2019s API along with the text to be filtered when submitting text to Phileas. This provides flexibility and allows you to de-identify different types of documents in differing manners with a single instance of Phileas. For example, you may have a policy for bankruptcy documents and a separate policy for financial documents.

There are sample policies available for immediate use or customization to fit your use-cases.

"},{"location":"policies/filter_policies/#the-structure-of-a-policy","title":"The Structure of a Policy","text":"

A policy:

"},{"location":"policies/filter_policies/#an-example-policy","title":"An Example Policy","text":"

The following is an example policy. In the example below you can see the types of sensitive information that are enabled and the strategy for manipulating each type when found. This policy identifies email addresses and phone numbers and redacts each with the format given.

{\n   \"name\": \"email-and-phone-numbers\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

When an email address is identified by this policy, the email address is replaced with the text {{{REDACTED-email-address}}}. The %t gets replaced by the type of the filter. Likewise, when a phone number is found it is replaced with the text {{{REDACTED-phone-number}}}. You are free to change the redaction formats to whatever fits your use-case. See Filter Strategies for all replacement options.

The name of the policy is email-and-phone-numbers. Policies can be named anything you like but their names must be unique from all other policies. As a best practice, the policy should be saved as [name].json, e.g. email-and-phone-numbers.json.

"},{"location":"policies/filter_policies/#applying-a-policy-to-text","title":"Applying a Policy to Text","text":"

To use this policy we will save it as /opt/Phileas/profiles/email-and-phone-numbers.json. We must restart Phileas for the new profile to be available for use. To apply the policy we will pass the policy's name to Phileas when making a filter request, as shown in the example request below.

curl -k -X POST \"https://localhost:8080/api/filter?c=context&p=email-and-phone-numbers\" \\\n  -d @file.txt -H Content-Type \"text/plain\"\n

In this command, we have provided the parameter p along with a value that is the name of the policy we want to use for this request. If we had multiple policies in Phileas we could choose a different policy for this request simply by changing the name given to the parameter p. For more details see Phileas\u2019s API.

Phileas will process the contents of file.txt by applying the policy named email-and-phone-numbers. As we saw in the policy above, this policy redacts email addresses and phone numbers. Phileas will return the redacted text in response to the API call.

To manipulate the sensitive information by methods other than redaction, see the Filter Strategies.

"},{"location":"policies/filter_strategies/","title":"Filter Strategies","text":"

A filter strategy defines how sensitive information identified by Philter should be manipulated, whether it is redacted, replaced, encrypted, or manipulated in some other fashion.

In a policy, you list the types of sensitive information that should be filtered. How Philter replaces each type of sensitive information is specific to each type. For instance, zip codes can be truncated based on the leading digits or zip code population while phone numbers are redacted. These replacements are performed by \"filter strategies.\"

Each filter can have one or more filter strategies and conditions can be used to determine when to apply each filter strategy.

A sample policy containing a filter strategy is shown below. In this example, email addresses will be redacted.

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

Most of the filter strategies apply to all types of data, however, some filter strategies only apply to a few types. For example, the TRUNCATE filter strategy only applies to a zip code filter.

"},{"location":"policies/filter_strategies/#filter-strategies_1","title":"Filter Strategies","text":"

The filter strategies are described below. Each filter type can specify zero or more filter strategies. When no filter strategies are given, Philter will default to REDACT for that filter type. When multiple filter strategies are given for a single filter type, the filter strategies will be applied in order as they are listed in the policy, top to bottom.

"},{"location":"policies/filter_strategies/#the-redact-filter-strategy","title":"The REDACT Filter Strategy","text":"

The REDACT filter strategy replaces sensitive information with a given redaction format. You can put variables in the redaction format that Philter will replace when performing the redaction.

The available redaction variables are:

Redaction Variable Description %t Will be replaced with the type of sensitive information. This is to allow you to know the type of sensitive information that was identified and redacted. %l Will be replaced by the given classification for the type of sensitive information. %v Will be replaced by the original value of the sensitive text. With %v you can annotate sensitive information instead of masking or removing it.

To redact sensitive information by replacing it with the type of sensitive information, the redaction format would be REDACTED-%t.

An example filter using the REDACT filter strategy:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-crypto_replace-filter-strategy","title":"The CRYPTO_REPLACE Filter Strategy","text":"

The CRYPTO_REPLACE filter strategy replaces each identified piece of sensitive information by encrypting it using the AES encryption algorithm. To use this filter strategy, the policy must include the details of the encryption key as shown below:

{\n   \"name\":\"sample-profile\",\n   \"crypto\": {\n     \"key\": \"....\",\n     \"iv\": \"....\"\n   },\n   ...\n

In the snippet of a policy shown above, a crypto element is defined with a key and an initialization vector (iv). These two items are required to encrypt the sensitive information. To generate a key, run the following command:

openssl enc -e -aes-256-cbc -a -salt -P\n

You will be prompted to enter an encryption password. Once entered, the values of the key and iv will be shown. Copy and paste those values into the policy.

An example policy using the CRYPTO_REPLACE filter strategy:

{\n   \"name\": \"email-address\",\n   \"crypto\": {\n     \"key\": \"....\",\n     \"iv\": \"....\"\n   },\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"CRYPTO_REPLACE\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-hash_sha256_replace-filter-strategy","title":"The HASH_SHA256_REPLACE Filter Strategy","text":"

The HASH_SHA256_REPLACE filter strategy replaces sensitive information with the SHA256 hash value of the sensitive information. To append a random salt value to each value prior to hashing, set the salt property to true. The salt value used will be returned in the explain response from Philter' API.

An example policy using the HASH_SHA256_REPLACE filter strategy:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"HASH_SHA256_REPLACE\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-fpe_encrypt_replace-filter-strategy","title":"The FPE_ENCRYPT_REPLACE Filter Strategy","text":"

The FPE_ENCRYPT_REPLACE filter strategy uses format-preserving encryption (FPE) to encrypt the sensitive information. Philter uses the FF3-1 algorithm for format-preserving encryption. The FPE_ENCRYPT_REPLACE filter strategy requires a key and a tweak value. These values control the format-preserving encryption. For more information on these values and format-preserving encryption, refer to the resources below:

An example policy using the FPE_ENCRYPT_REPLACE filter strategy:

{\n   \"name\": \"credit-cards\",\n   \"identifiers\": {\n      \"creditCardNumbers\": {\n         \"creditCardNumbersFilterStrategies\": [\n            {\n               \"strategy\": \"FPE_ENCRYPT_REPLACE\",\n               \"key\": \"...\",\n               \"tweak\": \"...\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-random_replace-filter-strategy","title":"The RANDOM_REPLACE Filter Strategy","text":"

Replaces the identified text with a fake value but of the same type. For example, an SSN will be replaced by a random text having the format ###-##-####, such as 123-45-6789. An email address will be replaced with a randomly generated email address. Available to all filter types.

An example policy using the RANDOM_REPLACE filter strategy:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"RANDOM_REPLACE\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-static_replace-filter-strategy","title":"The STATIC_REPLACE Filter Strategy","text":"

Replaces the identified text with a given static value. Available to all filter types.

An example policy using the STATIC_REPLACE filter strategy:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"STATIC_REPLACE\",\n               \"staticReplacement\": \"some new value\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-truncate-filter-strategy","title":"The TRUNCATE Filter Strategy","text":"

Available only to zip codes, this strategy allows for truncating zip codes to only a select number of digits. Specify truncateDigits to set the desired number of leading digits to leave. For example, if truncateDigits is 2, the zip code 90210 will be truncated to 90***.

The TRUNCATE filter strategy is available only to the zip code filter. An example policy using the TRUNCATE filter strategy:

{\n   \"name\": \"zip-codes\",\n   \"identifiers\": {\n      \"zipCode\": {\n         \"zipCodeFilterStrategies\": [\n            {\n               \"strategy\": \"TRUNCATE\",\n               \"truncateDigits\": 3\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-zero_leading-filter-strategy","title":"The ZERO_LEADING Filter Strategy","text":"

Available only to zip codes, this strategy changes the first 3 digits of a zip code to be 0. For example, the zip code 90210 will be changed to 00010.

The ZERO_LEADING filter strategy is only available to zip code filters. An example zip code filter using the ZERO_LEADING filter strategy:

{\n   \"name\": \"zip-codes\",\n   \"identifiers\": {\n      \"zipCodes\": {\n         \"zipCodeFilterStrategies\": [\n            {\n               \"strategy\": \"ZERO_LEADING\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#filter-strategy-conditions","title":"Filter Strategy Conditions","text":"

A replacement strategy can be applied based on the sensitive information meeting one or more conditions. For example, you can create a condition such that only dates of 11/05/2010 are replaced by using the condition token == \"11/05/2010\". The conditions that can be applied vary based on the type of sensitive information. For instance, zip codes can have conditions based on their population. Refer to each specific filter type for the conditions available.

The following is an example policy for credit cards that contains a condition to only redact credit card numbers that start with the digits 3000:

{\n  \"name\": \"default\",\n  \"identifiers\": {\n    \"creditCard\": {\n      \"creditCardFilterStrategies\": [\n        {\n          \"condition\": \"token startswith \\\"3000\\\"\",\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/filter_strategies/#combining-conditions","title":"Combining Conditions","text":"

Conditions can be joined through the use of the and keyword. When conditions are joined, each condition must be satisfied for the identified text to be filtered. If any of the conditions are not satisfied the identified text will not be filtered. Below is an example joined condition:

token != \"123-45-6789\" and context == \"my-context\"\n

This condition requires that the identified text (the token) not be equal to 123-45-6789 and the context be equal to my-context. Both of these conditions must be satisfied for the identified text to be filtered.

Conversely, conditions can be OR'd through the use of multiple filter strategies. For example, if we want to OR a condition on the token and a condition on the context, we would use two filter strategies:

\"ssnFilterStrategies\": [\n  {\n    \"condition\": \"token != \\\"123-45-6789\\\"\",\n    \"strategy\": \"REDACT\",\n    \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n  },\n  {\n    \"condition\": \"context == \\\"my-context\\\"\",\n    \"strategy\": \"REDACT\",\n    \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n  }        \n]\n
"},{"location":"policies/filters/","title":"Filters","text":"

A \"filter\" corresponds to a type of sensitive information. Phileas has filters for sensitive information such as names, addresses, ages, and lots of others.

These are predefined filters that are ready to be used as well as custom filters that let you define your own Phileas to identify sensitive information outside of what the predefined filters can identify. An example of a custom filter is a filter to identify your patient account numbers, where the structure of an account number is specific to your organization.

Each filter is capable of identifying and redacting a specific type of sensitive information. For example, there is a filter for phone numbers, a filter for US social security numbers, and a filter for person's names. You can enable any combination of these filters based on the types of sensitive information you need to redact.

This section of the documentation describes the filters available in Phileas. The configuration options for each filter can vary due to the type of the sensitive information. For instance, only the zip code filter has a configuration to truncate the zip code.

A selection of filters and their configurations is called a policy. A policy describes how to de-identify a document.

"},{"location":"policies/filters/#predefined-filters","title":"Predefined Filters","text":""},{"location":"policies/filters/#persons-names","title":"Person's Names","text":"

Phileas uses several methods to identify person's names.

Type Description First Names Identifies common first names Surnames Identifies common surnames Person's Names (NER) Identifies full names using natural language processing analysis Physician's Names (NER) Identifies physican names using natural language processing analysis"},{"location":"policies/filters/#other-filters","title":"Other Filters","text":"Type Description Ages Identifies ages such as 3.5 years old Bank Routing Numbers Identifies bank routing numbers Bitcoin Addresses Identifies Bitcoin addresses such as 127NVqnjf8gB9BFAW2dnQeM6wqmy1gbGtv Cities Identifies common cities Counties Identifies common counties Credit Card Numbers Identifies VISA, American Express, MasterCard, and Discover credit card numbers Dates Identifies dates in many formats such as May 22, 1999 Driver's License Numbers Identifies driver's license numbers for all 50 US states Email Addresses Identifies email addresses Hospitals Identifies common hospital names Hospital Abreviations Identifies common hospitals by their name abbreviations IBAN Codes Identifies international bank account numbers IP Addresses Identifies IPv4 and IPv6 addresses MAC Addresses Identifies network MAC addresses Passport Numbers Identifies US passport numbers Phone Numbers Identifies phone numbers Phone Number Extensions Identifies phone numbers Sections Identifies sections in text denoted by SSNs and TINs Identifies US SSNs and TINs States Identifies US state names State Abbreviations Identifies US state names by their abbreviations Tracking Numbers Identifies UPS, FedEx, and USPS tracking numbers URLs Identifies URLs VINs Identifies vehicle identification numbers Zip Codes Identifies US zip codes"},{"location":"policies/filters/#custom-filter-types-of-sensitive-information","title":"Custom Filter Types of Sensitive Information","text":"

In addition to the predefined types of sensitive information listed in the table above, you can also define your own types of sensitive information. Through custom identifiers and dictionaries, Phileas can identify many other types of information that may be sensitive in your use-case. For example, if you have patient identifiers that follow a pattern of AA-00000 you can define a custom identifier for this sensitive information.

Phileas can be configured to look identify sensitive information based on custom dictionaries. When a term in the dictionary is found in the text, Phileas will treat the term as sensitive information and apply the given filter strategy.

Custom dictionaries support fuzziness to accommodate for misspellings. The replacement strategy for a custom dictionary has a sensitivityLevel that controls the amount of allowed fuzziness.

Type Description Custom Dictionaries Identifies sensitive information based on dictionary values. Custom Identifiers Identifies custom alphanumeric identifiers that may be used for medical record numbers, patient identifiers, account number, or other specific identifier."},{"location":"policies/ignoring_specific_information/","title":"Ignoring Specific Information","text":"

Phileas can optionally ignore a list of terms and prevent those terms from being redacted. For example, if the name John Smith is being redacted and you do not want it to be redacted, you can add John Smith to an ignore list. Each time Phileas identifies sensitive information it will check the ignore lists to see if the sensitive information is to be ignored.

Phileas can ignore terms and patterns per-policy, meaning each policy can have its own unique list of terms or patterns to ignore.

"},{"location":"policies/ignoring_specific_information/#ignore-lists","title":"Ignore Lists","text":"

Ignore lists can be specified at the policy level and/or for each filter in the policy. When set for the policy, the list of ignored terms will be applied to all filter types. When set for a filter, the list of ignored terms will be applied only to that filter.

"},{"location":"policies/ignoring_specific_information/#ignore-list-for-a-policy","title":"Ignore List for a Policy","text":"

In the policy shown below, an ignore list is set at the level of the policy. The terms specified in the list will be ignored for all filter types enabled in the policy. Only the terms property is required. The name and caseSensitive properties are optional.

{\n   \"name\": \"example-policy\",\n   \"ignored\": [\n     {\n       \"name\": \"names to ignore\",\n       \"terms\": [\"john smith\", \"jane doe\"],\n       \"caseSensitive\": false\n     }\n   ],\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

Terms to be ignored at the policy level can also be read from one or more files located on the local file system. The file must be formatted as one term per line.

{\n   \"name\": \"example-policy\",\n   \"ignored\": [\n     {\n       \"name\": \"names to ignore\",\n       \"terms\": [\"john smith\", \"jane doe\"],\n       \"files\": [\"/tmp/names.txt\"]\n       \"caseSensitive\": false\n     }\n   ],   \n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/ignoring_specific_information/#ignore-list-for-a-filter","title":"Ignore List for a Filter","text":"

In the policy shown below, an ignore list is set at the level of a filter. The terms specified in the list will be ignored only for that filter type. Each filter in a policy can have its own list of ignored terms. The terms listed will be ignored case-sensitive, meaning, \"John\" will be ignored if \"John\" is an ignored term but will not be ignored if \"john\" is an ignored term.

{\n   \"name\": \"example-filter-profile\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"ignored\": [\"john smith\", \"jane doe\"],\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/ignoring_specific_information/#ignoring-patterns","title":"Ignoring Patterns","text":"

Phileas can ignore information based on a regular expression pattern. An example use of this feature is to ignore terms that are present in your text but are dynamic, such as logged timestamps. When using the date filter these timestamps may be identified as being sensitive but you do not want them redacted. With an ignore pattern we can ignore the logged timestamps.

"},{"location":"policies/ignoring_specific_information/#ignore-patterns","title":"Ignore Patterns","text":"

Ignore patterns can be specified at the policy level and/or at the level of each type of filter. When set at the policy level, the list of ignored patterns will be applied to all filter types. When set for an individual filter, the list of ignored patterns will be applied only to that filter.

"},{"location":"policies/ignoring_specific_information/#ignore-patterns-for-a-policy","title":"Ignore Patterns for a Policy","text":"

In the policy shown below, ignore patterns are set at the level of the policy. The patterns specified in the list will be ignored for all filter types enabled in the policy.

{\n   \"name\": \"example-policy\",\n   \"ignoredPatterns\": [\n     {\n       \"name\": \"ignore-room-numbers\",\n       \"pattern\": \"Room [A-Z0-4]{4}\"\n     }\n   ],\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/ignoring_specific_information/#ignore-patterns-for-a-filter","title":"Ignore Patterns for a Filter","text":"

In the policy shown below, ignore patterns are set at the level of a filter. The patterns specified in the list will be ignored only for that filter type. Each filter in a policy can have its own list of ignored patterns.

{\n   \"name\": \"example-policy\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"ignoredPatterns\": [\n           {\n             \"name\": \"ignore-room-numbers\",\n             \"pattern\": \"Room [A-Z0-4]{4}\"\n           }\n         ],\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/sample_policies/","title":"Sample Policies","text":"

This page lists some sample policies. You can use these policies either as-is or as starting points for customizing them to meet your specific de-identification needs.

These policies are examples and not an exhaustive list of all the sensitive information Phileas can identify. Items from each of these policies can be combined to make policies to meet your use-cases.

"},{"location":"policies/sample_policies/#email-addresses-and-phone-numbers","title":"Email Addresses and Phone Numbers","text":"

This policy finds email addresses and phone numbers and redacts them with {{{REDACTED-email-address}}} and {{{REDACTED-phone-number}}}, respectively.

{\n  \"name\": \"email-and-phone-numbers\",\n  \"identifiers\": {\n    \"emailAddress\": {\n      \"emailAddressFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    },\n    \"phoneNumber\": {\n      \"phoneNumberFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#persons-names-and-ssns","title":"Persons Names and SSNs","text":"

This policy finds persons names and SSNs and redacts them with {{{REDACTED-entity}}} and {{{REDACTED-ssn}}}, respectively.

{\n  \"name\": \"persons-names-ssn\",\n  \"identifiers\": {\n    \"ner\": {\n      \"nerFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    },\n    \"ssn\": {\n      \"ssnFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#dates-urls-and-vins","title":"Dates, URLs, and VINs","text":"

This policy finds dates, URLs, and VINs. Dates and URLs are redacted with {{{REDACTED-date}}} and {{{REDACTED-url}}}, respectively. Each VIN number are replaced by a randomly generated VIN number.

{\n  \"name\": \"dates-urls-vin\",\n  \"identifiers\": {\n    \"date\": {\n      \"dateFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    },\n    \"url\": {\n      \"urlFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    },\n    \"vin\": {\n      \"vinFilterStrategies\": [\n        {\n          \"strategy\": \"RANDOM_REPLACE\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#ip-addresses","title":"IP Addresses","text":"

This policy finds IP addresses and replaces each identified IP address with the static text IP_ADDRESS as long as the IP address is not 127.0.0.1. (A condition on the filter strategy sets the IP address requirement.)

{\n  \"name\": \"ip-addresses\",\n  \"identifiers\": {\n    \"ipAddress\": {\n      \"ipAddressFilterStrategies\": [\n        {\n          \"strategy\": \"STATIC_REPLACE\",\n          \"redactionFormat\": \"IP_ADDRESS\",\n          \"condition\": \"token != \\\"127.0.0.1\\\"\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#zip-codes","title":"Zip Codes","text":"

This policy finds ZIP codes starting with 90 and truncates the zip code to just the first two digits.

{\n  \"name\": \"zip-codes\",\n  \"identifiers\": {\n    \"creditCard\": {\n      \"creditCardFilterStrategies\": [\n        {\n          \"condition\": \"token startswith \\\"90\\\"\",\n          \"strategy\": \"TRUNCATE\",\n          \"truncateDigits\": 2\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#enable-text-splitting","title":"Enable Text Splitting","text":"

This policy enables text splitting for input over 10,000 characters.

{\n  \"name\": \"default-split-enabled\",\n  \"config\": {\n    \"splitting\": {\n      \"enabled\": true,\n      \"threshold\": 10000,\n      \"method\": \"newline\"\n    }\n  },\n  \"identifiers\": {\n    \"ssn\": {\n      \"ssnFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#globally-ignored-terms","title":"Globally Ignored Terms","text":"

This policy has a list of globally ignored terms.

{\n  \"name\": \"default-global-ignore\",\n  \"ignored\": [\n    {\n      \"name\": \"ignored credit cards\",\n      \"terms\": [\"4111111111111111\", \"0000000000000000\"]\n    }\n  ],\n  \"identifiers\": {\n    \"creditCard\": {\n      \"creditCardFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#generating-alerts","title":"Generating Alerts","text":"

This policy generates an alert when a matching email address is identified.

{\n  \"name\": \"email-address-alert\",\n  \"identifiers\": {\n    \"emailAddress\": {\n      \"emailAddressFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\",\n          \"condition\": \"token == \\\"test@test.com\\\"\",\n          \"alert\": true\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/splitting_input_text/","title":"Splitting Input Text","text":"

On a per-policy basis, Philter can split input text to process each split individually. This can improve performance and allows for handling long input text. Splitting is disabled by default.

An example split configuration in a policy is shown below

{\n  \"name\": \"default\",\n  \"identifiers\": {}, \n  \"config\": {\n    \"splitting\": {\n      \"enabled\": true,\n      \"threshold\": 10000,\n      \"method\": \"newline\"\n    }\n  }\n}\n

In this example policy, splitting is enabled for inputs greater than equal to 10,000 characters in length.

The method of splitting the text will be the newline method. This method will cause Philter to split the text based on the locations of new line characters in the input text. Additional methods of text splitting may be added in future versions.

Because the newline method splits text based on the locations of new line characters in the text, the text contained in the reassembled filter responses may not be an exact match of the input text. This is due to white space and other characters that may reside near the new line characters that get omitted during processing.

"},{"location":"policies/splitting_input_text/#text-splitting-policy-properties","title":"Text Splitting Policy Properties","text":"Property Description Allowed Values Default Value enabled Whether or not input texts are split. Whether or not input texts are split. When false, requests with text exceeding the threshold generate a HTTP 413 PayloadTooLarge error response. true or false false threshold When to split the input text. Set to -1 to disable splitting. Any integer value. 10000 method How to split the text. newline newline"},{"location":"policies/splitting_input_text/#alternative-to-philter-splitting-text","title":"Alternative to Philter Splitting Text","text":"

In some cases it may be best to split your input text client side prior to sending the text to Philter. This gives you full control over how the text will be split and provides more predictable responses from Philter because you know how the text is split.

An example of splitting text into chunks prior to sending the text to Philter is given in the commands below:

# Given a large file called largefile.txt, split it into 10k pieces.\n$ split -b 10k largefile.txt segment\n\n# Now process the pieces.\n$ curl -s -X POST -k \"https://philter:8080/api/filter?d=document1\" --data \"@/tmp/segmentaa\" -H \"Content-type: text/plain\" > out1\n$ curl -s -X POST -k \"https://philter:8080/api/filter?d=document1\" --data \"@/tmp/segmentab\" -H \"Content-type: text/plain\" > out2\n\n# Now recombine the outputs into a single file.\n$ cat out1 out2 > filtered.txt\n
"},{"location":"policies/filters/common_filters/ages/","title":"Ages","text":""},{"location":"policies/filters/common_filters/ages/#filter","title":"Filter","text":"

This filter identifies ages such as 3.5 years old in text.

"},{"location":"policies/filters/common_filters/ages/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/ages/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value ageFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/ages/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/ages/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. The filter will only be applied when the condition is satisfied. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/ages/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"ages-example\",\n   \"identifiers\": {\n      \"age\": {\n         \"ageFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/bank-routing-numbers/","title":"Bank Routing Numbers","text":""},{"location":"policies/filters/common_filters/bank-routing-numbers/#filter","title":"Filter","text":"

This filter identifies bank routing numbers (ABA routing transit numbers) such as 111000025 in text. Identified routing numbers must pass checksum validation.

"},{"location":"policies/filters/common_filters/bank-routing-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/bank-routing-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value bankRoutingNumberFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/bank-routing-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE)"},{"location":"policies/filters/common_filters/bank-routing-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. The filter will only be applied when the condition is satisfied. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/bank-routing-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"bank-routing-number-example\",\n   \"identifiers\": {\n      \"bankRoutingNumber\": {\n         \"bankRoutingNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/bitcoin-addresses/","title":"Bitcoin Addresses","text":""},{"location":"policies/filters/common_filters/bitcoin-addresses/#filter","title":"Filter","text":"

This filter identifies bitcoin addresses such as 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 in text.

"},{"location":"policies/filters/common_filters/bitcoin-addresses/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/bitcoin-addresses/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value bitcoinAddressFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/bitcoin-addresses/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE)"},{"location":"policies/filters/common_filters/bitcoin-addresses/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/bitcoin-addresses/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"bitcoin-address-example\",\n   \"identifiers\": {\n      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/creditcards/","title":"Credit Cards","text":""},{"location":"policies/filters/common_filters/creditcards/#filter","title":"Filter","text":"

This filter identifies credit cards such as 378282246310005 in text.

"},{"location":"policies/filters/common_filters/creditcards/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/creditcards/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value creditCardFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None onlyValidCreditCardNumbers When set to true, only valid credit card numbers will be filtered. true ignoreWhenInUnixTimestamp When set to true, only credit card numbers that do not match the pattern for a Unix timestamp will be filtered. false"},{"location":"policies/filters/common_filters/creditcards/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/creditcards/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/creditcards/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"credit-cards-example\",\n   \"identifiers\": {\n      \"creditcard\": {\n         \"onlyValidCreditCardNumbers\": false,\n         \"creditCardFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/dates/","title":"Dates","text":""},{"location":"policies/filters/common_filters/dates/#filter","title":"Filter","text":"

This filter identifies dates such as May 22, 2014 in text. The supported date formats are:

Format Example yyyy-MM-d 2020-05-10 MM-dd-yyyy 05-10-2020 M-d-y 5-10-2020 MMM dd May 5 or May 05 MMMM dd, yyyy May 5, 2020 or May 5 2020"},{"location":"policies/filters/common_filters/dates/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/dates/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value dateFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None onlyValidDates When set to true, only valid dates will be filtered. false"},{"location":"policies/filters/common_filters/dates/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. SHIFT Shift the date by a number of months, days, and/or years. SHIFTRANDOM Shift the data by a random number of months, days, and years. RELATIVE Replace the date by a words relative to the date."},{"location":"policies/filters/common_filters/dates/#filter-strategy-options","title":"Filter Strategy Options","text":"

The following filter strategy options are available for the RELATIVE filter strategy.

Description Default Value futureDates When true, future dates are replaced by relative words. When false, future dates are redacted. false

The following filter strategy options are available for the SHIFT filter strategy.

Option Description Default Value shiftDays The number of days to shift the date. Can be a negative or positive integer. Defaults to 0 if not specified. 0 shiftMinutes The number of minutes to shift the date. Can be a negative or positive integer. Defaults to 0 if not specified. 0 shiftYears The number of years to shift the date. Can be a negative or positive integer. Defaults to 0 if not specified. 0"},{"location":"policies/filters/common_filters/dates/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != TOKEN Compares the sensitive text to some category, e.g. birthdate. is CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/dates/#differentiating-between-dates-and-birth-dates","title":"Differentiating Between Dates and Birth Dates","text":"

In some cases it may be necessary to redact birth dates and dates differently. Using conditions it is possible to determine if an identified date is a birth date. The conditional token is birthdate will determine if the identified date (token) is a birth date by analyzing the content surrounding the date.

"},{"location":"policies/filters/common_filters/dates/#example-policy-to-redact-dates","title":"Example Policy to Redact Dates","text":"

The following policy redacts dates.

{\n   \"name\": \"dates-example\",\n   \"identifiers\": {\n      \"date\": {\n         \"onlyValidDates\": false,\n         \"dateFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/dates/#example-policy-to-shift-dates","title":"Example Policy to Shift Dates","text":"

The following policy to shift dates forward by 2 days and 4 months.

{\n   \"name\": \"dates-example\",\n   \"identifiers\": {\n      \"date\": {\n         \"onlyValidDates\": false,\n         \"dateFilterStrategies\": [\n            {\n               \"strategy\": \"SHIFT\",\n               \"shiftDays\": 2,\n               \"shiftMonths\": 4,\n               \"shiftYears\": 0\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/drivers-license-numbers/","title":"Driver's License Numbers","text":""},{"location":"policies/filters/common_filters/drivers-license-numbers/#filter","title":"Filter","text":"

This filter identifies driver's license numbers such as 194784357 in text. Driver's license number formats for all 50 US states are supported.

"},{"location":"policies/filters/common_filters/drivers-license-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/drivers-license-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value driversLicenseFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/drivers-license-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE)"},{"location":"policies/filters/common_filters/drivers-license-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/drivers-license-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"drivers-license-example\",\n   \"identifiers\": {\n      \"driversLicense\": {\n         \"driversLicenseFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/email-addresses/","title":"Email Addresses","text":""},{"location":"policies/filters/common_filters/email-addresses/#filter","title":"Filter","text":"

This filter identifies email addresses such as john.fake.address@hotmail.com in text.

"},{"location":"policies/filters/common_filters/email-addresses/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/email-addresses/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value emailAddressFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None onlyStrictMatches When set to false, the pattern for identifying email addresses will be relaxed. Filtered email addresses will have a lower confidence, but filter performance will increase. true onlyValidTLDs When set to true, only email addresses that are for a top-level domain are filtered. false"},{"location":"policies/filters/common_filters/email-addresses/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/email-addresses/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/email-addresses/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"email-address-example\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/iban-codes/","title":"IBAN Codes","text":""},{"location":"policies/filters/common_filters/iban-codes/#filter","title":"Filter","text":"

This filter identifies IBAN (international banking account numbers) Codes such as HU4211773016111110180000000 in text. Driver's license number formats for all 50 US states are supported.

"},{"location":"policies/filters/common_filters/iban-codes/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/iban-codes/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value allowSpaces When true, IBAN codes will be allowed to contain spaces and grouped in sections of 4. Set to false to disallow spaces in IBAN codes. true ibanCodeFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None onlyValidIBANCodes When set to true, only valid IBAN codes will be filtered. true"},{"location":"policies/filters/common_filters/iban-codes/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/iban-codes/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/iban-codes/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"iban-example\",\n   \"identifiers\": {\n      \"ibanCode\": {\n         \"onlyValidIBANCodes\": false,\n         \"ibanCodeFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/ip-addresses/","title":"IP Addresses","text":""},{"location":"policies/filters/common_filters/ip-addresses/#filter","title":"Filter","text":"

This filter identifies IPv4 and IPv6 addresses 127.0.0.1, 192.168.3.58, and 2001:0db8:85a3:0000:0000:8a2e:0370:7334 in text.

"},{"location":"policies/filters/common_filters/ip-addresses/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/ip-addresses/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value ipAddressFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/ip-addresses/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/ip-addresses/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/ip-addresses/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"ip-address-example\",\n   \"identifiers\": {\n      \"ipAddress\": {\n         \"ipAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/mac-addresses/","title":"MAC Addresses","text":""},{"location":"policies/filters/common_filters/mac-addresses/#filter","title":"Filter","text":"

This filter identifies MAC addresses in text.

"},{"location":"policies/filters/common_filters/mac-addresses/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/mac-addresses/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value macAddressFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/mac-addresses/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/mac-addresses/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/mac-addresses/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"mac-address-example\",\n   \"identifiers\": {\n      \"macAddress\": {\n         \"macAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/passport-numbers/","title":"Passport Numbers","text":""},{"location":"policies/filters/common_filters/passport-numbers/#filter","title":"Filter","text":"

This filter identifies US passport numbers in text.

"},{"location":"policies/filters/common_filters/passport-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/passport-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value passportNumberFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/passport-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE)"},{"location":"policies/filters/common_filters/passport-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CLASSIFICATION Compares the issuing country of the passport number. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/passport-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"passport-number-example\",\n   \"identifiers\": {\n      \"passportNumber\": {\n         \"passportNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/phone-number-extensions/","title":"Phone Number Extensions","text":""},{"location":"policies/filters/common_filters/phone-number-extensions/#filter","title":"Filter","text":"

This filter identifies phone numbers extensions such as \"x100\" in text.

"},{"location":"policies/filters/common_filters/phone-number-extensions/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/phone-number-extensions/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value phoneNumberExtensionFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/phone-number-extensions/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/phone-number-extensions/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/phone-number-extensions/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"phone-number-ext-example\",\n   \"identifiers\": {\n      \"phoneNumberExtension\": {\n         \"phoneNumberExtensionFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      } \n   }     \n}\n
"},{"location":"policies/filters/common_filters/phone-numbers/","title":"Phone Numbers","text":""},{"location":"policies/filters/common_filters/phone-numbers/#filter","title":"Filter","text":"

This filter identifies phone and fax numbers such as (304) 555-5555, 304-555-5555, and 1-800-123-4567 in text.

"},{"location":"policies/filters/common_filters/phone-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/phone-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value phoneNumberFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/phone-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/phone-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/phone-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"phone-number-example\",\n   \"identifiers\": {\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }     \n}\n
"},{"location":"policies/filters/common_filters/sections/","title":"Sections","text":""},{"location":"policies/filters/common_filters/sections/#filter","title":"Filter","text":"

This filter identifies sections in text between a given start regular expression pattern and a given end regular expression pattern.

"},{"location":"policies/filters/common_filters/sections/#required-parameters","title":"Required Parameters","text":"Parameter Description Default Value startPattern A regular expression denoting the start of the section. None endPattern A regular expression denoting the end of the section. None"},{"location":"policies/filters/common_filters/sections/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value sectionFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/sections/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/sections/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/sections/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"sections-example\",\n   \"identifiers\": {\n      \"section\": {\n         \"startPattern\": \"START\",\n         \"endPattern\": \"END\",\n         \"sectionFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n}\n
"},{"location":"policies/filters/common_filters/ssns-and-tins/","title":"SSNs and TINs","text":""},{"location":"policies/filters/common_filters/ssns-and-tins/#filter","title":"Filter","text":"

This filter identifies US SSNs and TINs such as 123-45-6789 and 123456789 in text.

"},{"location":"policies/filters/common_filters/ssns-and-tins/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/ssns-and-tins/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value ssnFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/ssns-and-tins/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/ssns-and-tins/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/ssns-and-tins/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"ssn-tin-example\",\n   \"identifiers\": {\n      \"ssn\": {\n         \"ssnFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/tracking-numbers/","title":"Tracking Numbers","text":""},{"location":"policies/filters/common_filters/tracking-numbers/#filter","title":"Filter","text":"

This filter identifies tracking numbers in text. FedEx, UPS, and USPS tracking number formats are supported.

"},{"location":"policies/filters/common_filters/tracking-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/tracking-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value trackingNumberFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/tracking-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/tracking-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/tracking-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"tracking-numbers-example\",\n   \"identifiers\": {\n      \"trackingNumber\": {\n         \"trackingNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/urls/","title":"URLs","text":""},{"location":"policies/filters/common_filters/urls/#filter","title":"Filter","text":"

This filter identifies URLs such as myhomepage.com, http://myhomepage.com/folder/page.html, and www.myhomepage.com/folder/page.html in text.

"},{"location":"policies/filters/common_filters/urls/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/urls/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value urlFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None requireHttpWwwPrefix When set to true, only URLs that begin with http or www will be filtered. true"},{"location":"policies/filters/common_filters/urls/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/urls/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/urls/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"urls-example\",\n   \"identifiers\": {\n      \"url\": {\n         \"requireHttpWwwPrefix\": true,\n         \"urlFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/vins/","title":"VINs","text":""},{"location":"policies/filters/common_filters/vins/#filter","title":"Filter","text":"

This filter identifies 17-digit vehicle identification numbers (VINs) such as WBAPM7G50ANL19218 and 1GBJC34K3RE176005 in text.

"},{"location":"policies/filters/common_filters/vins/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/vins/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value vinFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/vins/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/vins/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/vins/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"vins-example\",\n   \"identifiers\": {\n      \"vin\": {\n         \"vinFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/zip-codes/","title":"Zip Codes","text":""},{"location":"policies/filters/common_filters/zip-codes/#filter","title":"Filter","text":"

This filter identifies zip codes in text.

"},{"location":"policies/filters/common_filters/zip-codes/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/zip-codes/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value zipCodeFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None requireDelimiter When set to false, the filter will not require a dash in 9 digit zip codes, e.g. 12345-6789. Setting to false may increase the number of zip code false positives. true"},{"location":"policies/filters/common_filters/zip-codes/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. TRUNCATE Replace the sensitive text by removing the last x digits. (Set the number of digits using the truncateDigits parameter of the filter strategy.) ZERO_LEADING Replace the sensitive text by zeroing the first 3 digits."},{"location":"policies/filters/common_filters/zip-codes/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, != POPULATION Compares the population of the zip code against the 2010 census values. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/zip-codes/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"zip-code-example\",\n   \"identifiers\": {\n      \"zipCode\": {\n         \"zipCodeFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/custom_filters/dictionary/","title":"Dictionary","text":""},{"location":"policies/filters/custom_filters/dictionary/#filter","title":"Filter","text":"

This filter identifies custom text based on a given dictionary.

"},{"location":"policies/filters/custom_filters/dictionary/#required-parameters","title":"Required Parameters","text":"

At least one of terms or files must be provided.

Parameter Description Default Value terms A list of terms in the dictionary. None files A list of files containing terms one per line. None"},{"location":"policies/filters/custom_filters/dictionary/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None fuzzy When set to true, the dictionary will employ fuzzy comparisons. Use the sensitivity parameter to control the level of fuzziness. Setting this value to false will disable fuzziness and provide a higher level of performance. false classification Used to apply an arbitrary label to the identifier, such as \"patient-id\", or \"account-number.\" \"custom-identifier\" sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. Only applies when fuzzy is set to true. medium"},{"location":"policies/filters/custom_filters/dictionary/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/custom_filters/dictionary/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/custom_filters/dictionary/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"dictionary-example\",\n   \"identifiers\": {\n      \"dictionaries\": [\n         \"customDictionary\": {\n            \"terms\": [\"john\", \"jane\", \"doe\"],\n            \"files\": \"c:\\temp\\dictionary.txt\",\n            \"fuzzy\": true,\n            \"sensitivity\": \"medium\",\n            \"sectionFilterStrategies\": [\n               {\n                  \"strategy\": \"REDACT\",\n                  \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n               }\n            ]\n         }\n      ]\n   }   \n}\n
"},{"location":"policies/filters/custom_filters/identifier/","title":"Identifier","text":""},{"location":"policies/filters/custom_filters/identifier/#filter","title":"Filter","text":"

This filter identifies custom text based on a given regular expression.

The Identifier filter accepts a list of regular expression-based identifiers. See the policy at the bottom of this page for an example.

Note that backslashes in the regular expression will need to be escaped for the policy to be valid JSON.

"},{"location":"policies/filters/custom_filters/identifier/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/custom_filters/identifier/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None caseSensitive When set to true, the regular expression will be case sensitive. true classification Used to apply an arbitrary label to the identifier, such as \"patient-id\", or \"account-number.\" \"custom-identifier\" pattern A regular expression for the identifier. Note that backslashes will need to be escaped. \\b[A-Z0-9_-]{4,}\\b"},{"location":"policies/filters/custom_filters/identifier/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/custom_filters/identifier/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, != CLASSIFICATION Compares the classification of the sensitive text. == , !="},{"location":"policies/filters/custom_filters/identifier/#example-policy","title":"Example Policy","text":"
{\n  \"name\": \"default\",\n  \"identifiers\": {\n    \"identifiers\": [\n      {\n        \"pattern\": \"[A-Z]{9}\",\n        \"caseSensitive\": false,\n        \"classification\": \"custom-identifier\",\n        \"enabled\": true,\n        \"identifierFilterStrategies\": [\n          {\n            \"strategy\": \"REDACT\",\n            \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n          }\n        ]        \n      }\n    ]\n  }\n}\n
"},{"location":"policies/filters/locations/cities/","title":"Cities","text":""},{"location":"policies/filters/locations/cities/#filter","title":"Filter","text":"

This filter identifies common US cities as determined by the US census in text.

"},{"location":"policies/filters/locations/cities/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/cities/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value cityFilterStrategies A list of filter strategies. None sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium"},{"location":"policies/filters/locations/cities/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/cities/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/cities/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"cities-example\",\n   \"identifiers\": {\n      \"city\": {\n         \"sensitivity\": \"medium\",\n         \"cityFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/counties/","title":"Counties","text":""},{"location":"policies/filters/locations/counties/#filter","title":"Filter","text":"

This filter identifies common US counties as determined by the US census in text.

"},{"location":"policies/filters/locations/counties/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/counties/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value countyFilterStrategies A list of filter strategies. None sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium"},{"location":"policies/filters/locations/counties/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/counties/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/counties/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"counties-example\",\n   \"identifiers\": {\n      \"county\": {\n         \"sensitivity\": \"medium\",\n         \"countyFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/hospital-abbreviations/","title":"Hospital Abbreviations","text":""},{"location":"policies/filters/locations/hospital-abbreviations/#filter","title":"Filter","text":"

This filter identifies US hospital abbreviations in text.

"},{"location":"policies/filters/locations/hospital-abbreviations/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/hospital-abbreviations/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value hospitalAbbreviationFilterStrategies A list of filter strategies. None sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium"},{"location":"policies/filters/locations/hospital-abbreviations/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/hospital-abbreviations/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/hospital-abbreviations/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"hospital-abbreviations-example\",\n   \"identifiers\": {\n      \"hospitalAbbreviation\": {\n         \"sensitivity\": \"medium\",\n         \"hospitalAbbreviationFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/hospitals/","title":"Hospitals","text":""},{"location":"policies/filters/locations/hospitals/#filter","title":"Filter","text":"

This filter identifies US hospitals in text.

"},{"location":"policies/filters/locations/hospitals/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/hospitals/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value hospitalFilterStrategies A list of filter strategies. None sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium"},{"location":"policies/filters/locations/hospitals/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/hospitals/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/hospitals/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"hospitals-example\",\n   \"identifiers\": {\n      \"hospital\": {\n         \"sensitivity\": \"medium\",\n         \"hospitalFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/state-abbreviations/","title":"State Abbreviations","text":""},{"location":"policies/filters/locations/state-abbreviations/#filter","title":"Filter","text":"

This filter identifies US state abbreviations in text.

"},{"location":"policies/filters/locations/state-abbreviations/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/state-abbreviations/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value stateAbbreviationsFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/locations/state-abbreviations/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/state-abbreviations/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/state-abbreviations/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"states-abbreviations-example\",\n   \"identifiers\": {\n      \"stateAbbreviation\": {\n         \"stateAbbreviationFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/states/","title":"States","text":""},{"location":"policies/filters/locations/states/#filter","title":"Filter","text":"

This filter identifies US states in text.

"},{"location":"policies/filters/locations/states/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/states/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value stateFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/locations/states/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/states/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/states/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"states-example\",\n   \"identifiers\": {\n      \"state\": {\n         \"stateFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/persons_names/first-names/","title":"First Names","text":""},{"location":"policies/filters/persons_names/first-names/#filter","title":"Filter","text":"

This filter identifies common first names as identified by the US census in text.

"},{"location":"policies/filters/persons_names/first-names/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/persons_names/first-names/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium firstNameFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/persons_names/first-names/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/persons_names/first-names/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/persons_names/first-names/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"first-names-example\",\n   \"identifiers\": {\n      \"firstName\": {\n         \"firstNameFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/persons_names/persons-names-ner/","title":"Person's Names (NER)","text":""},{"location":"policies/filters/persons_names/persons-names-ner/#filter","title":"Filter","text":"

This filter identifies person's names based on natural language processing (NLP) and named-entity recognition (NER) in text.

"},{"location":"policies/filters/persons_names/persons-names-ner/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/persons_names/persons-names-ner/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value removePunctuation When set to true, punctuation will be removed prior to analysis. false firstNameFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/persons_names/persons-names-ner/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. ABBREVIATE Replace the sensitive text with the initials of the text."},{"location":"policies/filters/persons_names/persons-names-ner/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/persons_names/persons-names-ner/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"ner-example\",\n   \"identifiers\": {\n      \"ner\": {\n         \"nerFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/persons_names/physician-names-ner/","title":"Physician Names","text":""},{"location":"policies/filters/persons_names/physician-names-ner/#filter","title":"Filter","text":"

This filter identifies physician names (e.g. Dr. John Smith) in text.

"},{"location":"policies/filters/persons_names/physician-names-ner/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/persons_names/physician-names-ner/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value physicianNameFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/persons_names/physician-names-ner/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/persons_names/physician-names-ner/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/persons_names/physician-names-ner/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"physician-names-example\",\n   \"identifiers\": {\n      \"physicianName\": {\n         \"physicianNameFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/persons_names/surnames/","title":"Surnames","text":""},{"location":"policies/filters/persons_names/surnames/#filter","title":"Filter","text":"

This filter identifies common surnames as identified by the US census in text.

"},{"location":"policies/filters/persons_names/surnames/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/persons_names/surnames/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium surnameFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/persons_names/surnames/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/persons_names/surnames/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/persons_names/surnames/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"surnames-example\",\n   \"identifiers\": {\n      \"surname\": {\n         \"surnameFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"quick_starts/quick_start_aws/","title":"Philter Quick Start on AWS","text":"

Philter on AWS is a virtual machine-based product. It runs in EC2 on its own EC2 instance. A free trial period is available during which there is no charge for the Philter software but there may be charges for the underlying AWS infrastructure.

Cloud virtual machines launched from a cloud marketplace may not be immediately suitable for a HIPAA environment. Refer to your compliance officer for your organization's requirements to ensure compliance with all relevant regulations.

Here\u2019s a brief screencast showing how to launch Philter in AWS.

"},{"location":"quick_starts/quick_start_aws/#launch-philter-in-aws","title":"Launch Philter in AWS","text":"
  1. Go to Philter in the AWS Marketplace. On this page you can see the Philter overview, the pricing, and the supported EC2 instance types.
  2. Select an instance type. We recommend m5.large. The smaller instance types are intended only for testing and are not well-suited for production usage.
  3. Click the Continue to Subscribe button.
  4. View and accept Philter\u2019s license agreement. Then click Accept Terms.
  5. The subscription will now be created and you will be notified when it is ready! This usually only takes less than a minute.
  6. Click the Continue to Configuration button to select the AMI, the version, and the region. We recommend using the newest version if multiple are available.
  7. Click the Continue to Launch button to launch Philter in your AWS account!

AWS will automatically open ports 22 (SSH) and 8080 (Philter API) for the Philter instance's security group. These ports are required to be open but you may want to modify the security groups to limit their scope of availability by restricting access to specific CIDR ranges.

Congratulations! You have deployed Philter in AWS. You are now ready to filter text!

"},{"location":"quick_starts/quick_start_aws/#try-it-out","title":"Try it out!","text":"

With Philter now running we can take it for a spin. We will send some text to Philter and inspect at the response we get back. The Philter virtual machine running in your cloud account should have a public IP address (unless you customized the deployment). We will use that public IP address to interact with Philter.

Philter, by default, will be configured with an HTTPS listener on port 8080 using a self-signed certificate. It is recommended that prior to use in a production environment the self-signed certificate is replaced by a valid certificate owned by your organization.

In the command below, replace <PUBLIC_IP> with the virtual machine\u2019s public IP address or public host name.

curl -k -X POST https://<PUBLIC_IP>:8080/api/filter --data \"George Washington was a patient and his SSN is 123-45-6789.\" -H \"Content-type: text/plain\"\n

With this command we are sending the text in the command to Philter for filtering. Philter will identify the patient name (George Washington) and the SSN (123-45-6789) and redact those values in the response. You can always use curl to send text to Philter as in these examples but there are also SDKs you can use, too, to integrate Philter with your applications.

"},{"location":"quick_starts/quick_start_aws/#redacting-sensitive-information-from-text","title":"Redacting Sensitive Information from Text","text":"

The types of sensitive information that Philter identifies and removes is controlled by policies. By default, Philter includes a filter profile that includes many of the types of sensitive information, such as names and social security numbers. We can send text to filter to Philter for filtering using this default filter profile with the following command:

curl -k -X POST https://localhost:8080/api/filter -d @file.txt -H \"Content-Type: text/plain\"\n

This command sends the contents of the file file.txt to Philter. Philter will apply the enabled filters and return a plain-text response consisting of the filtered text. (Replace localhost with the IP address or host name of Philter if you are not running the command where Philter is running.) You can also send text directly in the request instead of sending it as a file:

curl -k -X POST https://localhost:8080/api/filter --data \"Your text goes here...\" -H \"Content-type: text/plain\"\n
"},{"location":"quick_starts/quick_start_aws/#next-steps","title":"Next Steps","text":"

Now that you have Philter running and know how to send text to it, you are ready to integrate Philter into your existing workflow and systems. Philter\u2019s API details how to send files to Philter. Clients for some languages for Philter\u2019s API are available on GitHub.

Be sure to check out Policies to see how you can customize the types of sensitive information Philter redacts!

"},{"location":"quick_starts/quick_start_azure/","title":"Philter Quick Start on Microsoft Azure","text":"

Philter on Microsoft Azure is a virtual machine-based product. A free trial period is available during which there is no charge for the Philter software but there may be charges for the underlying Azure infrastructure.

Cloud virtual machines launched from a cloud marketplace may not be immediately suitable for a HIPAA environment. Refer to your compliance officer for your organization's requirements to ensure compliance with all relevant regulations.

"},{"location":"quick_starts/quick_start_azure/#launch-philter-on-microsoft-azure","title":"Launch Philter on Microsoft Azure","text":"
  1. Go to Philter in the Azure Marketplace.
  2. Click the Get It Now button.
  3. Review the information that is shown on the popup and click Continue when ready.
  4. You will now be asked to log in to your Microsoft Azure account if you were not already logged in.
  5. Click the Create button to begin making a Philter virtual machine.
  6. Enter the required details of the virtual machine and click the Review + create button.
  7. Review the virtual machine details and click Create when ready!

Your Philter virtual machine will now be launching.

Microsoft Azure will automatically open ports 22 (SSH) and 8080 (Philter API). These ports are required to be open but you may want to modify the security groups to limit their scope of availability by restricting access to specific CIDR ranges.

Congratulations! You have deployed Philter in Azure. You are now ready to filter text!

"},{"location":"quick_starts/quick_start_azure/#try-it-out","title":"Try it out!","text":"

With Philter now running we can take it for a spin. We will send some text to Philter and inspect at the response we get back. The Philter virtual machine running in your cloud account should have a public IP address (unless you customized the deployment). We will use that public IP address to interact with Philter.

Philter, by default, will be configured with an HTTPS listener on port 8080 using a self-signed certificate. It is recommended that prior to use in a production environment the self-signed certificate is replaced by a valid certificate owned by your organization.

In the command below, replace <PUBLIC_IP> with the virtual machine\u2019s public IP address or public host name.

curl -k -X POST https://<PUBLIC_IP>:8080/api/filter --data \"George Washington was a patient and his SSN is 123-45-6789.\" -H \"Content-type: text/plain\"\n

With this command we are sending the text in the command to Philter for filtering. Philter will identify the patient name (George Washington) and the SSN (123-45-6789) and redact those values in the response. You can always use curl to send text to Philter as in these examples but there are also SDKs you can use, too, to integrate Philter with your applications.

"},{"location":"quick_starts/quick_start_azure/#redacting-sensitive-information-from-text","title":"Redacting Sensitive Information from Text","text":"

The types of sensitive information that Philter identifies and removes is controlled by policies. By default, Philter includes a filter profile that includes many of the types of sensitive information, such as names and social security numbers. We can send text to filter to Philter for filtering using this default filter profile with the following command:

curl -k -X POST https://localhost:8080/api/filter -d @file.txt -H \"Content-Type: text/plain\"\n

This command sends the contents of the file file.txt to Philter. Philter will apply the enabled filters and return a plain-text response consisting of the filtered text. (Replace localhost with the IP address or host name of Philter if you are not running the command where Philter is running.) You can also send text directly in the request instead of sending it as a file:

curl -k -X POST https://localhost:8080/api/filter --data \"Your text goes here...\" -H \"Content-type: text/plain\"\n
"},{"location":"quick_starts/quick_start_azure/#next-steps","title":"Next Steps","text":"

Now that you have Philter running and know how to send text to it, you are ready to integrate Philter into your existing workflow and systems. Philter\u2019s API details how to send files to Philter. Clients for some languages for Philter\u2019s API are available on GitHub.

Be sure to check out Policies to see how you can customize the types of sensitive information Philter redacts!

"},{"location":"quick_starts/quick_start_gcp/","title":"Philter Quick Start on Google Cloud","text":"

Philter on Google Cloud is a virtual machine-based product. A free trial period is available during which there is no charge for the Philter software but there may be charges for the underlying Google Cloud infrastructure.

Cloud virtual machines launched from a cloud marketplace may not be immediately suitable for a HIPAA environment. Refer to your compliance officer for your organization's requirements to ensure compliance with all relevant regulations.

"},{"location":"quick_starts/quick_start_gcp/#launch-philter-in-google-cloud","title":"Launch Philter in Google Cloud","text":"
  1. Go to Philter in the Google Cloud Marketplace.
  2. Click the Launch on Compute Engine button.

Virtual Machine Recommendations

The general purpose machine type is n2-standard-2 and this machine type should be adequate for most use-cases. We recommend 8 vCPUs and 8-16 GB of RAM for a production deployment.

Google Cloud will automatically open ports 22 (SSH) and 8080 (Philter API). These ports are required to be open but you may want to modify the security groups to limit their scope of availability by restricting access to specific CIDR ranges.

Congratulations! You have deployed Philter in Google Cloud. You are now ready to filter text!

"},{"location":"quick_starts/quick_start_gcp/#try-it-out","title":"Try it out!","text":"

With Philter now running we can take it for a spin. We will send some text to Philter and inspect at the response we get back. The Philter virtual machine running in your cloud account should have a public IP address (unless you customized the deployment). We will use that public IP address to interact with Philter.

Philter, by default, will be configured with an HTTPS listener on port 8080 using a self-signed certificate. It is recommended that prior to use in a production environment the self-signed certificate is replaced by a valid certificate owned by your organization.

In the command below, replace <PUBLIC_IP> with the virtual machine\u2019s public IP address or public host name.

curl -k -X POST https://<PUBLIC_IP>:8080/api/filter --data \"George Washington was a patient and his SSN is 123-45-6789.\" -H \"Content-type: text/plain\"\n

With this command we are sending the text in the command to Philter for filtering. Philter will identify the patient name (George Washington) and the SSN (123-45-6789) and redact those values in the response. You can always use curl to send text to Philter as in these examples but there are also SDKs you can use, too, to integrate Philter with your applications.

"},{"location":"quick_starts/quick_start_gcp/#redacting-sensitive-information-from-text","title":"Redacting Sensitive Information from Text","text":"

The types of sensitive information that Philter identifies and removes is controlled by policies. By default, Philter includes a filter profile that includes many of the types of sensitive information, such as names and social security numbers. We can send text to filter to Philter for filtering using this default filter profile with the following command:

curl -k -X POST https://localhost:8080/api/filter -d @file.txt -H \"Content-Type: text/plain\"\n

This command sends the contents of the file file.txt to Philter. Philter will apply the enabled filters and return a plain-text response consisting of the filtered text. (Replace localhost with the IP address or host name of Philter if you are not running the command where Philter is running.) You can also send text directly in the request instead of sending it as a file:

curl -k -X POST https://localhost:8080/api/filter --data \"Your text goes here...\" -H \"Content-type: text/plain\"\n
"},{"location":"quick_starts/quick_start_gcp/#next-steps","title":"Next Steps","text":"

Now that you have Philter running and know how to send text to it, you are ready to integrate Philter into your existing workflow and systems. Philter\u2019s API details how to send files to Philter. Clients for some languages for Philter\u2019s API are available on GitHub.

Be sure to check out Policies to see how you can customize the types of sensitive information Philter redacts!

"},{"location":"solutions/apache-nifi-and-philter/","title":"Apache NiFi and Philter","text":"

This article describes how Philter can be used with Apache NiFi to filter sensitive information such as PII and PHI within an Apache NiFi data flow.

Philter is available on the AWS, Azure, and Google Cloud marketplaces. So, fire up an instance of Philter and let's get started using it alongside your Apache NiFi data flow!

"},{"location":"solutions/apache-nifi-and-philter/#configuring-philter-with-cloudera-dataflow-cdf","title":"Configuring Philter with Cloudera DataFlow (CDF)","text":"

Philter is certified to work with Cloudera DataFlow (CDF) as a custom Apache NiFi processor. There are two options for deploying Philter with CDF.

"},{"location":"solutions/apache-nifi-and-philter/#option-1-using-philter-via-its-api","title":"Option 1 - Using Philter via its API","text":"

In the first option, a custom NiFi processor performs redaction by communicating with an instance of Philter through Philter's API. The processor sends text to Philter for redaction and receives back the redacted text. This option requires deploying an instance of Philter alongside your Cloudera DataFlow installation. Next, get the Philter NiFi processor from GitHub. Deploy the NAR file to CDF and make it accessible to Apache NiFi.

Configure the Philter processor by specifying the location of Philter and any other necessary connection configuration, as shown in the image below.

For a production environment, a cluster of Philter instances deployed behind a load balancer would provide improved performance and increased availability over a single instance.

"},{"location":"solutions/apache-nifi-and-philter/#option-2-using-philter-embedded-into-nifi","title":"Option 2 - Using Philter Embedded into NiFi","text":"

The second option does not require an instance of Philter. Please contact us to receive a NiFi processor with all of Philter's capabilities embedded in it. This processor performs the text redaction entirely within your NiFi data flow with no external communication required. This processor is significantly more performant than the processor in the first option. When you receive the processor NAR file from us, deploy it to NiFi.

Configure the processor as shown in the image below by specifying the name of the desired policy and filtering context:

"},{"location":"solutions/apache-nifi-and-philter/#creating-a-flow","title":"Creating a Flow","text":"

Both processors support the same transitions. The redacted transition contains the redacted version of the flow file's content. In the example flows shown below, the top flow uses the Philter processor utilizing Philter's API. The bottom flow uses the Philter embedded processor. As you can see, both flows are the same. The only differences are the middle processors and their individual configuration.

"},{"location":"solutions/consistent-anonymization-with-redis/","title":"Consistent Anonymization with Redis","text":"

The consistent anonymization feature in Philter ensures that filtered values are anonymized consistently across documents or contexts. When Philter is deployed in a cluster and is using consistent anonymization across contexts, a Redis cache is required. The cache stores the anonymized values so that all instances of Philter have access to the values.

The Redis cache will contain PHI. It is important to prepare your Redis cache such that it can contain PHI.

"},{"location":"solutions/consistent-anonymization-with-redis/#enabling-consistent-anonymization","title":"Enabling Consistent Anonymization","text":"

To enable consistent anonymization in Philter set the following property in Philter's configuration:

consistent.anonymization=true\nconsistent.anonymization.scope=context\n
"},{"location":"solutions/consistent-anonymization-with-redis/#configuring-redis-cache","title":"Configuring Redis Cache","text":"

To enable Philter to use the Redis cache, set the following options in Philter's configuration:

anonymization.cache.service=redis\nanonymization.cache.service.host=127.0.0.1\nanonymization.cache.service.port=6379\nanonymization.cache.service.ssl=true\n

Replace 127.0.0.1 with the IP address or host name of your Redis cache.

If you are using Redis on AWS ElastiCache see ElastiCache for Redis In-Transit Encryption (TLS) for information on using in-transit encryption.

"},{"location":"solutions/consistent-anonymization-with-redis/#restart-philter","title":"Restart Philter","text":"

After starting (or restarting) Philter, Philter will use the Redis cache for consistent anonymization across contexts. You can restart Philter with the command:

sudo systemctl restart philter.service\n
"},{"location":"solutions/deploying-philter-in-a-hipaa-environment/","title":"Deploying Philter in a HIPAA Environment","text":"

This is not intended to be a comprehensive or legal HIPAA guide so please refer to your HIPAA compliance or security officer prior to deploying and using Philter in a PHI environment.

The steps below outline how to configure a Philter deployment for encryption of data at rest and in motion.

"},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#encryption-of-data-at-rest","title":"Encryption of Data at Rest","text":""},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#amazon-web-services","title":"Amazon Web Services","text":"
  1. Stop the Philter EC2 instance.
  2. Make an AMI of the instance.
  3. Make an encrypted copy of the Philter AMI.

The created AMI is encrypted. EC2 instances launched from the AMI will utilize an encrypted EBS volume and all snapshots will be encrypted. Refer to the AWS documentation Creating an Amazon EBS-Backed Linux AMI for assistance.

"},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#encryption-of-data-in-motion","title":"Encryption of Data in Motion","text":""},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#amazon-web-services_1","title":"Amazon Web Services","text":"

If launched from the Amazon Web Services, Google Cloud, or Microsoft Azure marketplace Philter's REST API will be pre-configured with a self-signed certificate. It is recommended you replace the self-signed certificate with a certificate from a trusted certificate authority.

  1. Log in to the Philter EC2 instance via SSH. (On AWS the username is ec2-user. On Azure the username is centos.)
  2. Stop the Philter service: sudo systemctl stop philter.service
  3. Edit Philter's settings to utilize an SSL certificate.
  4. Start the Philter service: sudo systemctl start philter.service
  5. Connect to Philter's API and verify the connection succeeds: curl https://philter:8080/api/status and returns HTTP 200 OK.
"},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#related-links","title":"Related Links","text":""},{"location":"solutions/deploying-philter-via-an-aws-cloudformation-template/","title":"Deploying Philter in AWS via a CloudFormation Template","text":"

AWS CloudFormation can be used to automate the creation and tear down of your AWS cloud resources in a repeatable manner. Philter can be included in your CloudFormation templates to also automate its deployment and configuration.

This article is designed to be a \"quick start\" into CloudFormation and Philter. This article describes a CloudFormation template suitable for deploying Philter for purposes of integration testing. A template for deploying Philter for production use requires a few more changes.

"},{"location":"solutions/deploying-philter-via-an-aws-cloudformation-template/#finding-philters-ami","title":"Finding Philter's AMI","text":"

To begin, you must have the AMI (e.g. ami-123456789) of Philter.

Alternatively, to find the AMI, launch Philter from the AWS Marketplace. If you have not already you will be prompted by the AWS Marketplace to subscribe to Philter. At the end of the subscription process you will be able to launch an instance into your AWS account. (You can select the smallest available instance size.) Do this and then navigate to your EC2 instances in the AWS Console.

In the EC2 Console locate the newly launched Philter instance. It will likely still be in a \"Pending\" state if not already completed launching. Click on the instance such that its details are displayed at the bottom of the EC2 Console. Locate the \"AMI\" property. This is the Philter AMI identifier. Make a note of this AMI or copy and paste it so you can reference it in your CloudFormation templates. You can now terminate the instance.

Note that when a new version of Philter is published to the AWS Marketplace it will have a different AMI identifier. If you want to use the newest version you will need to do the steps above again to find the new AMI identifier. See the Philter AWS AMIs for a sample script to automate finding the AMIs. If you have difficulties finding the Philter AMI identifier please contact us for assistance.

"},{"location":"solutions/deploying-philter-via-an-aws-cloudformation-template/#cloudformation-template","title":"CloudFormation Template","text":"

You can use the AMI ID to launch one or more instances of Philter via your CloudFormation template. You can launch a single instance, multiple instances, or you can launch one or more instances as part of an autoscaling group. You have flexibility depending on your requirements for deploying Philter. In the example below we are going to launch a single instance of Philter.

We are going to base our template off the AWS sample for a single EC2 instance in a VPC. The sample template can be found here. This template creates a new VPC along with the required subnet and route table.

Note that we only replaced the Philter AMI for your region in the template. The Philter AMI will be different for each AWS region.

"},{"location":"solutions/deploying-philter-via-an-aws-cloudformation-template/#launch-the-stack","title":"Launch the Stack","text":"

Now that we have the template we can create a stack from it. A stack is the set of resources that the template defines. You can think of a stack as being an instance of the template. We will use the AWS Console to create the stack. In the AWS Console navigate to the CloudFormation console. Locate the button to create a stack, walk through the steps uploading your template when prompted, and finish. Your new stack with Philter will now be launched. You can watch the stack's progress as CloudFormation creates its resources. When you are finished with the stack you can delete it and all resources that were created for the stack, such as the Philter instance, will be deleted.

If you try to launch a CloudFormation stack that uses a Philter AMI but you do not have an active subscription to Philter via the AWS Marketplace the stack creation will fail. To remedy this, go the AWS Marketplace and subscribe to Philter.

"},{"location":"solutions/managing-philters-configuration-in-an-auto-scaling-environment/","title":"Managing Philter\u2019s Configuration in an Auto-Scaling Environment","text":"

This article describes how Philter's configuration can be managed when Philter is deployed in an auto-scaling environment.

"},{"location":"solutions/managing-philters-configuration-in-an-auto-scaling-environment/#updating-philter-configuration-values","title":"Updating Philter Configuration Values","text":"

Philter reads its settings from the philter.properties file when Philter starts. This file must reside alongside Philter wherever Philter is deployed. When Philter is deployed in an auto-scaling environment, updating a configuration requires updating the configuration value on all instances of Philter. There are a few approaches that can be taken.

"},{"location":"solutions/managing-philters-configuration-in-an-auto-scaling-environment/#deployment-via-a-custom-machine-image","title":"Deployment via a Custom Machine Image","text":"

One way to update the configuration values is to use a custom machine (\"pre-baked\") image of Philter. When a configuration needs changed, change the configuration value in the machine image and update the auto-scaling environment with the latest machine image. Now, begin substituting the currently running Philter instances with new instances from the updated machine image.

"},{"location":"solutions/managing-philters-configuration-in-an-auto-scaling-environment/#updating-configuration-using-an-external-file","title":"Updating Configuration using an External File","text":"

In this method, a copy of Philter's application.properties file is stored on a remote file system, such as Amazon S3. A cron job runs on each deployed Philter instance to periodically download the application.properties file, copy it to the appropriate location, and then restart the Philter service. This method allows you to modify the configuration on Philter on all of the instances with less moving parts than the previous option.

The following is an example bash script that uses the AWS CLI to copy the philter.properties file and restart Philter.

#!/bin/bash\naws s3 cp s3://your-bucket/application.properties /opt/philter/application.properties\nsudo systemctl restart philter.service\nsudo systemctl restart philter-ner.service\n
"},{"location":"solutions/monitoring-philter-in-aws/","title":"Monitoring Philter in AWS","text":"

A deployment of Philter in AWS can be monitored by multiple methods. Here we'll discuss some of the options available when Philter is used in AWS.

"},{"location":"solutions/monitoring-philter-in-aws/#monitoring-philters-application-log-with-cloudwatch-logs","title":"Monitoring Philter's Application Log with CloudWatch Logs","text":"

Although no sensitive information is purposely logged to Philter's log files, it is possible for sensitive information to be inadvertently included through some events. For this reason, it is important to ensure that your location for storing Philter's logs are suitable for containing sensitive information such as PHI and PII.

Philter's application log is located at /var/log/philter/philter.log. When deploying multiple instances of Philter it is useful to have the log files centralized in a single location. We can do this using CloudWatch Logs.

The first thing to do is to ensure the Philter instance has an appropriate IAM role and policy. The policy must allow write access to CloudWatch Logs. The following policy is sufficient:

{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:CreateLogGroup\",\n        \"logs:CreateLogStream\",\n        \"logs:PutLogEvents\",\n        \"logs:DescribeLogStreams\"\n    ],\n      \"Resource\": [\n        \"arn:aws:logs:*:*:*\"\n    ]\n  }\n ]\n}\n

Next, install the CloudWatch Logs Agent on the instance. Configure the agent to send Philter's log to CloudWatch Logs. Modify the CloudWatch Logs configuration file to include Philter's log file:

[/var/log/philter/philter.log]\nfile = /var/log/philter/philter.log\nlog_group_name = /var/log/philter/philter.log\nlog_stream_name = {instance_id}\ndatetime_format = %b %d %H:%M:%S\n

After restarting the agent, Philter's log file will be available in the CloudWatch Logs console.

"},{"location":"solutions/monitoring-philter-in-aws/#monitoring-philters-availability-with-an-elastic-load-balancer","title":"Monitoring Philter's Availability with an Elastic Load Balancer","text":"

Philter's REST API includes an endpoint that returns the status of Philter. When operating normally, the /api/status endpoint returns HTTP 200 OK. This endpoint is ideal for monitoring by a service such as an Elastic Load Balancer's health checks. The full endpoint URL will be similar to https://instance:8080/api/status.

Note that, by default, Philter uses a self-signed SSL certificate for its HTTPS interface. In some situations it may be necessary to replace this self-signed certificate with a certificate signed by a trusted authority.

"},{"location":"solutions/monitoring-philter-in-aws/#monitoring-philters-metrics-with-cloudwatch-metrics","title":"Monitoring Philter's Metrics with CloudWatch Metrics","text":"

Philter captures various metrics during its operation. These metrics are exposed via several interfaces. The metrics are exposed via JMX and can also be reported to CloudWatch Metrics as custom metrics. To enable metric reporting to CloudWatch set the appropriate configuration settings in Philter's properties. (Refer to Philter's user documentation for a description of the configuration properties.) Now restart Philter for the changes to take affect. Philter will now publish metrics to CloudWatch Metrics.

These metrics can be used to trigger alerts based on certain thresholds or be used to trigger auto-scaling if Philter is deployed in an auto-scaling group.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/","title":"Using AWS Kinesis Firehose Transformations to Filter Sensitive Information from Streaming Text","text":"

AWS Kinesis Firehose is a managed streaming service designed to take large amounts of data from one place to another. For example, you can take data from places such as CloudWatch, AWS IoT, and custom applications using the AWS SDK to places such as Amazon S3, Amazon Redshift, Amazon Elasticsearch, and others. In this post we will use Amazon S3 as the firehose's destination.

Sometimes you want to manipulate the data as it goes through the firehose. This example solution shows how Philter can be used with AWS Kinesis Firehose and AWS Lambda to remove sensitive information, such as PII and PHI, from the text as it travels through the firehose.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#prerequisites","title":"Prerequisites","text":"

You must have a running instance of Philter. If you don't already have a running instance of Philter you can launch one through the AWS Marketplace. It is not required that the instance of Philter be running in AWS, but it is required that the instance of Philter be accessible from your AWS Lambda function. Running Philter and your AWS Lambda function in your own VPC allows you to communicate locally with Philter from the function. Otherwise, Philter will need to be available over the public internet or accessible over a VPN connection. See all Philter launch options.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#configuring-the-firehose-and-the-lambda-function","title":"Configuring the Firehose and the Lambda Function","text":"

There is no need to duplicate an excellent blog post on creating a Firehose Data Transformation with AWS Lambda to establish the Firehose and Lambda function resources in AWS. So, refer to that blog post and substitute the Python 3 code below.

To start, create an AWS Firehose and configure an AWS Lambda transformation. When creating the AWS Lambda function, select Python 3.7 and use the following code to submit text to Philter's API.

from botocore.vendored import requests\nimport base64\n\ndef handler(event, context):\n\n    output = []\n\n    for record in event['records']:\n        payload=base64.b64decode(record[\"data\"])\n        headers = {'Content-type': 'text/plain'}\n        r = requests.post(\"https://PHILTER_IP:8080/api/filter\", verify=False, data=payload, headers=headers, timeout=20)\n        filtered = r.text\n        output_record = {\n            'recordId': record['recordId'],\n            'result': 'Ok',\n            'data': base64.b64encode(filtered.encode('utf-8') + b'\\n').decode('utf-8')\n        }\n        output.append(output_record)\n\n    return output\n

The following Kinesis Firehose test event can be used to test the function:

{\n  \"invocationId\": \"invocationIdExample\",\n  \"deliveryStreamArn\": \"arn:aws:kinesis:EXAMPLE\",\n  \"region\": \"us-east-1\",\n  \"records\": [\n    {\n      \"recordId\": \"49546986683135544286507457936321625675700192471156785154\",\n      \"approximateArrivalTimestamp\": 1495072949453,\n      \"data\": \"R2VvcmdlIFdhc2hpbmd0b24gd2FzIHByZXNpZGVudCBhbmQgaGlzIHNzbiB3YXMgMTIzLTQ1LTY3ODkgYW5kIGhlIGxpdmVkIGF0IDkwMjEwLiBQYXRpZW50IGlkIDAwMDc2YSBhbmQgOTM4MjFhLiBIZSBpcyBvbiBiaW90aW4uIERpYWdub3NlZCB3aXRoIEEwMTAwLg==\"\n    },\n    {\n      \"recordId\": \"49546986683135544286507457936321625675700192471156785154\",\n      \"approximateArrivalTimestamp\": 1495072949453,\n      \"data\": \"R2VvcmdlIFdhc2hpbmd0b24gd2FzIHByZXNpZGVudCBhbmQgaGlzIHNzbiB3YXMgMTIzLTQ1LTY3ODkgYW5kIGhlIGxpdmVkIGF0IDkwMjEwLiBQYXRpZW50IGlkIDAwMDc2YSBhbmQgOTM4MjFhLiBIZSBpcyBvbiBiaW90aW4uIERpYWdub3NlZCB3aXRoIEEwMTAwLg==\"\n    }    \n  ]\n}\n

This test event contains 2 messages and the data for each is base 64 encoded, which is the value \"He lived in 90210 and his SSN was 123-45-6789.\" When the test is executed the response will be:

[\n  \"He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}.\",\n  \"He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}.\"\n]\n

When executing the test, the AWS Lambda function will extract the data from the requests in the firehose and submit each to Philter for filtering. The responses from each request will be returned from the function as a JSON list.

Note that in our Python function we are ignoring Philter's self-signed certificate. You should use a valid signed certificate for Philter and never disable certificate validation on clients.

When data is now published to the Kinesis Firehose stream, the data will be processed by the AWS Lambda function and Philter prior to exiting the firehose at its configured destination.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#processing-data","title":"Processing Data","text":"

We can use the AWS CLI to publish data to our Kinesis Firehose stream called sensitive-text:

aws firehose put-record --delivery-stream-name sensitive-text --record \"He lived in 90210 and his SSN was 123-45-6789.\"\n

Check the destination Amazon S3 bucket and you will have a single object with the following line:

He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}.\n

You're now ready to pump data through the firehose.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#conclusion","title":"Conclusion","text":"

In this blog post we have created an AWS Firehose pipeline that uses an AWS Lambda function to remove sensitive information from the text in the streaming pipeline.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#resources","title":"Resources","text":""},{"location":"solutions/using-philter-with-microsoft-power-automate-flow/","title":"Using Philter with Microsoft Power Automate (Flow)","text":"

Microsoft Power Automate (formerly Microsoft Flow) is an online application to automate tasks using an intuitive online editor. Using the tool you can create automations that are triggered by events, such as the receiving of an email or a new file being stored in OneDrive. In this example solution we will create a trivial automation that uses Philter to filter sensitive information from text.

We will use an HTTP step to make the call to Philter. An upstream action is setting the content of Input that we are putting into the body of the message. The Input is plain text so we add an HTTP Content-Type header with the value of text/plain. In our example, the value of Input will be \"George Washington was president and his SSN was 123-45-6789.\" Be sure to replace the IP address in the URI with the IP address or hostname of your Philter instance.

We are now ready to run our flow. We can do so by clicking the Run button. You can now switch to the Runs view to see the run.

Clicking on our run we can see the results of the HTTP step.

In the screen capture above, we can see a summary of the HTTP step run. We see the body of the message that was sent to Philter. At the bottom we can see the filtered text that was returned by Philter.

Integrating Philter with Microsoft Power Automate is a fairly trivial exercise thanks to Philter's API. Although this example was trivial, it should show the potential possibilities for using Philter with Microsoft Power Automate.

"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Philter","text":"

This documentation applies to Philter 2.4.0. If you are upgrading to this version see Upgrading.

Philter is an API-based application that finds and redacts sensitive information, such as protected health information (PHI) and personally identifiable information (PII), and user-defined sensitive information from natural language text. Philter is ideal for usage in text processing pipelines where sensitive information needs removed, encrypted, or redacted from the text.

"},{"location":"#quick-start","title":"Quick Start","text":"

To get going fast, jump to one of the Quick Starts:

"},{"location":"#open-source","title":"Open Source","text":"

Philter is open source software.

"},{"location":"deidentification/","title":"De-identification Methods","text":"

There are several ways data can be de-identified, and which you use depends on the types of data you want to de-identify and your use-case for de-identifying the data. The terminology around the different methods is often used interchangeably, but there are differences between each method.

In this User's Guide, we may use the terms filter and redact interchangeably.

In Philter, de-identification methods vary for each type of sensitive information. For example, all types can be replaced or redacted, but only dates can be shifted and only zip codes can be truncated. How a de-identification method is applied by Philter is called a filter strategy. Each type of sensitive information can have one or more filter strategies, and the combination of the filter strategies you select is called a policy. A policy determines how a document will be de-identified.

The following is a list of de-identification methods that describes how each method works and its applicability to Philter. Deidentifying a document is likely to require a combination of the following methods. For instance, you may want to redact names, encrypt credit card numbers, and shift appointment dates.

"},{"location":"deidentification/#summary-of-deidentification-methods","title":"Summary of Deidentification Methods","text":"De-identification MethodDescriptionReplacementReplaces sensitive information with a defined value. For example, you might want to replace a credit card number with the literal value \"CREDIT_CARD_NUMBER\".Redaction and MaskingRemoves sensitive information. Philter gives you a choice of how to remove the sensitive information, whether it is by replacing it with ***** (masking) or by some other set of characters.EncryptionEncrypts sensitive information.Date ShiftingShifts dates either forward or backward by some interval.BucketingCategorizes data into buckets based on the data. Examples of bucketing is Philter can bucket dates into years, and zip codes by population.

A difference between Philter and other services is that Philter does not send your data to a third party for de-identification. Philter runs in your cloud and your data stays in your cloud.

"},{"location":"deidentification/#deidentification-methods","title":"Deidentification Methods","text":""},{"location":"deidentification/#redaction-and-masking","title":"Redaction and Masking","text":"

Redaction and masking are two methods of de-identification that are often used interchangeably. The term redaction refers to removing a sensitive value from a document. When we hear the term redaction we often think of an image of a document with black bars across pieces of the text.

Masking is similar to redaction but allows for configuring how the sensitive value is removed. The most common example is using asterisks (i.e. ******) in place of a sensitive value.

"},{"location":"deidentification/#replacement","title":"Replacement","text":"

Replacement is a method of de-identification that simply replaces a sensitive value with another value. Replacement is useful when the sensitive value is not needed once the document has been de-identified. Philter can replace a sensitive value with a preset value or with a random value.

In Philter's filter strategies, replacement is achieved by using the strategy to REDACT, STATIC_REPLACE , or RANDOM_REPLACE .

"},{"location":"deidentification/#bucketing","title":"Bucketing","text":""},{"location":"deidentification/#date-shifting","title":"Date Shifting","text":""},{"location":"deidentification/#encryption","title":"Encryption","text":""},{"location":"evaluating-performance/","title":"How to Evaluate Philter'ss Performance","text":"

A common question we receive is how well does Philter perform? Our answer to this question is probably less than satisfactory because it simply depends. What does it depend on? Philter's performance is heavily dependent upon your individual data. Sharing to compare metrics of Philter's performance between different customer datasets is like comparing apples and oranges.

If your data is not exactly like another customer's data then the metrics will not be applicable to your data. In terms of the classic information retrieval metrics precision and recall, comparing these values between customers can give false impressions about Philter's performance, both good and bad.

This guide walks you through how to evaluate Philter's performance. If you are just getting started with Philter please see the Quick Starts instead. Then you can come back here to learn how to evaluate Philter'ss performance.

"},{"location":"evaluating-performance/#guide-to-evaluating-performance","title":"Guide to Evaluating Performance","text":"

We have created this guide to help guide you in evaluating Philter's performance on your data. The guide involves determining the types of sensitive information you want to redact, configuring those filters, optimizing the configuration, and then capturing the performance metrics.

If you are using Philter we will gladly perform these steps for you and provide you a detailed Philter performance report generated from your data. Please contact us to start the process.

"},{"location":"evaluating-performance/#what-you-need","title":"What You Need","text":"

To evaluate Philter's performance you need:

"},{"location":"evaluating-performance/#configuring-philter","title":"Configuring Philter","text":"

Before we can begin our evaluation we need to create a policy. A policy is a file that defines the types of sensitive information that will be redacted and how it will be redacted. The policies are stored on the Philter instance under /opt/Philter/policies. You can edit the policies directly there using a text editor or you can use Philter's API to upload a policy. In this case we recommend just using a text editor on the Philter instance to create a policy.

When using a text editor to create and edit a policy, be sure to save the policy often. Frequent saving can make editing a policy easier.

We also recommend considering to place your policy directory under source control to have a history and change log of your policies.

"},{"location":"evaluating-performance/#creating-a-policy","title":"Creating a Policy","text":"

Make a copy of the default policy, and we will modify the copy for our needs.

cp /opt/Philter/policies/default.json /opt/Philter/policies/evaluation.json

Now open /opt/Philter/policies/evaluation.json in a text editor. (The content of evaluation.json will be similar to what's shown below but may have minor differences between different versions of Philter.)

{\n   \"name\": \"default\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

The first thing we need to do is to set the name of the policy. Replace default with evaluation and save the file.

"},{"location":"evaluating-performance/#identifying-the-filters-you-need","title":"Identifying the Filters You Need","text":"

The rest of the file contains the filters that are enabled in the default policy. We need to make sure that each type of sensitive information that you want to redact is represented by a filter in this file. Look through the rest of the policy and determine which filters are listed that you do not need and also which filters you do need that are not listed.

"},{"location":"evaluating-performance/#disabling-filters-we-do-not-need","title":"Disabling Filters We Do Not Need","text":"

If a filter is listed in the policy, and you do not need the filter you have two options. You can either delete those lines from the policy and save the file, or you can set the filter's enabled property to false. Using the enabled property allows you to keep the filter configuration in the policy in case it is needed later but both options have the same effect.

"},{"location":"evaluating-performance/#enabling-filters-not-in-the-default-policy","title":"Enabling Filters Not in the Default Policy","text":"

Let's say you want to redact bitcoin addresses. The bitcoin address filter is not in the default policy. To add the bitcoin address filter we will refer to Philter's documentation on the bitcoin address filter, get the configuration, and copy it into the policy.

From the bitcoin address filter documentation we see the configuration for the bitcoin address filter is:

      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n

We can copy this configuration and paste it into our policy:

{\n   \"name\": \"evaluation\",\n   \"identifiers\": {\n      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

The order of the filters in the policy does not matter and has no impact on performance. We typically place the filters in the policy alphabetically just to improve readability.

Repeat these steps until you have added a filter for each of the types of sensitive information you want to redact. Typically, the default redaction strategy and redactionFormat values for each filter should be fine for evaluation.

When finished modifying the policy, save the file and close the text editor. Now restart Philter for the policy changes to be loaded:

sudo systemctl restart Philter\n
"},{"location":"evaluating-performance/#submitting-text-for-redaction","title":"Submitting Text for Redaction","text":"

With our policy in place we can now send text to Philter for redaction using that policy:

PhilterConfiguration PhilterConfiguration = ConfigFactory.create(PhilterConfiguration.class);\n\nFilterService filterService = new PhilterFilterService(PhilterConfiguration);\n\nFilterResponse response = filterService.filter(policies, context, documentId, body, MimeType.TEXT_PLAIN);\n

The explain API endpoint produces a detailed description of the redaction. The response will include a list of spans that contain the start and stop positions of redacted text and the type of sensitive information that was redacted. Using this information we can compare the redacted information to our annotated file to calculate precision and recall metrics.

"},{"location":"evaluating-performance/#calculating-precision-and-recall","title":"Calculating Precision and Recall","text":"

Now we can calculate the precision and recall metrics.

"},{"location":"monitoring_and_logging/","title":"Monitoring and Logging","text":""},{"location":"monitoring_and_logging/#service-management","title":"Service Management","text":"

Philter installs itself as a system service. The service can be controlled using the commands:

sudo systemctl stop philter\nsudo systemctl start philter\nsudo systemctl restart philter\nsudo systemctl status philter\n

Philter is installed in the /opt/philter directory. This directory contains the Philter binaries, configuration files, and supporting files.

"},{"location":"monitoring_and_logging/#metrics","title":"Metrics","text":"

Philter collects metrics while running to provide insights into its operation and the text being processed. The metrics collected include a count of the documents processed by Philter, counts of the types of sensitive information identified per type, and the entity confidence values of entities extracted by non-deterministic natural language processing methods. These metrics can be reported via JMX, and to external services Prometheus, Amazon CloudWatch, and Datadog).

"},{"location":"monitoring_and_logging/#reporting-metrics-to-prometheus","title":"Reporting Metrics to Prometheus","text":"

To enable Philter metric reporting to Prometheus modify Philter's Settings to enable the Prometheus metrics. When enabled, the metrics HTTP endpoint will be http://philter-ip:9100/metrics.

Enable scraping of Philter's metrics in Prometheus' settings:

global:\n  scrape_interval: 10s\n\nscrape_configs:\n- job_name: philter\n  static_configs:\n  - targets: ['10.0.2.104:9100']\n

You may need to make port 9100 accessible to Prometheus. For example, if you launch Philter in AWS you will need to modify Philter's security group to permit inbound network traffic on port 9100 to Prometheus.

"},{"location":"monitoring_and_logging/#reporting-metrics-to-amazon-cloudwatch","title":"Reporting Metrics to Amazon CloudWatch","text":"

To enable Philter metric reporting to Amazon CloudWatch modify Philter's Settings to set the AWS properties. Metrics will be published to CloudWatch every 60 seconds, by default, when enabled.

The AWS IAM user or role being used should have PutMetricData permissions:

{\n    \"Version\": \"2012-10-17\",\n    \"Statement\": [\n        {\n            \"Sid\": \"VisualEditor0\",\n            \"Effect\": \"Allow\",\n            \"Action\": [\n                \"cloudwatch:PutMetricData\"\n            ],\n            \"Resource\": \"*\"\n        }\n    ]\n}\n

The metrics will be published to the Amazon CloudWatch namespace provided in Philter's settings. Amazon CloudWatch can then be used to visualize the metrics, set performance alarms, or perform other integrations with AWS services.

"},{"location":"monitoring_and_logging/#reporting-metrics-to-datadog","title":"Reporting Metrics to Datadog","text":"

Metrics will be published to Datadog every 60 seconds when enabled.

Metrics published to Datadog will have a philter prefix.

The metrics can be used to make graphs and dashboards.

"},{"location":"monitoring_and_logging/#reporting-metrics-to-jmx","title":"Reporting Metrics to JMX","text":"

Metrics in JMX can be viewed using visualvm or similar tool.

"},{"location":"monitoring_and_logging/#metrics-collected-and-reported","title":"Metrics Collected and Reported","text":"

The listing below shows an example of the metrics Philter collects and writes to standard out while running. The metrics reported to supported services such as JMX, Amazon CloudWatch and Datadog will contain the same metrics but may be represented or visualized differently between the services.

The metrics collected include:

These metrics will be reset when Philter is stopped and restarted.

"},{"location":"monitoring_and_logging/#logging","title":"Logging","text":"

Philter's log file can be viewed using the command journalctl -u philter. This log should be the first place checked for more information on Philter's status.

The log level can be set using the logging.level.root property in Philter's Settings.

Philter's log file may contain sensitive information. It is possible that through the normal use of Philter, sensitive information may be written to the log file.

"},{"location":"pii_phi_nppi/","title":"PII, PHI, and NPPI","text":"

Philter can redact many predefined types of sensitive information through filters. Each type of predefined sensitive information is described below.

"},{"location":"pii_phi_nppi/#predefined-types-of-pii-in-philter","title":"Predefined Types of PII in Philter","text":"

The types of sensitive information that Philter will identify is customizable. For example, if you are not interested in VIN numbers you can have Philter ignore them. This configuration is performed through Policies.

Because Philter only operates on text, the biometric identifiers and face images outlined in the HIPAA regulations as PHI are not applicable to Philter. The types of sensitive information and how Philter identifies each one is listed in the table below.

Type of PHI How Philter Identifies It 1

Names

Ex: John Smith, Jane Doe

2

All geographical identifiers smaller than a state, except for the initial three digits of a zip code if, according to the current publicly available data from the U.S. Bureau of the Census: the geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000

Ex: 85055, 90213-1544

3

Dates (other than year) directly related to an individual

Ex: 10-10-2000. 10/10/2000, October 10, 2000

4

Phone Numbers

Ex: (304) 555-5555, 304-555-5555, 1-800-123-4567

5

Fax numbers

Ex: (304) 555-5555, 304-555-5555, 1-800-123-4567

6

Email addresses

Ex: john.fake.address@hotmail.com

7

Social Security numbers

Ex: 123-45-6789, 123456789

8

Medical record numbers

Ex: 86637729, AB473-6021, 473-6AB021

9

Health insurance beneficiary numbers

Ex: 86637729, AB473-6021, 473-6AB021

10

Account numbers

Ex: 86637729, AB473-6021, 473-6AB021

11

Certificate/license numbers

Ex: 86637729, AB473-6021, 473-6AB021

12

Vehicle identifiers and serial numbers, including license plate numbers

Ex: WBAPM7G50ANL19218, 1GBJC34K3RE176005

13

Device identifiers and serial numbers

Ex: H3SNPUHYEE7JD3H, 33778376

14

Web Uniform Resource Locators (URLs)

Ex: myhomepage.com, http://myhomepage.com/folder/page.html, www.myhomepage.com/folder/page.html

15

Internet Protocol (IP) address numbers

Ex: 127.0.0.1, 192.168.3.58, 2001:0db8:85a3:0000:0000:8a2e:0370:7334

16 Biometric identifiers, including finger, retinal and voice prints 17 Full face photographic images and any comparable images 18

Any other unique identifying number, characteristic, or code except the unique code assigned by the investigator to code the data

Ex: 86637729, AB473-6021, 473-6AB021

"},{"location":"settings/","title":"Settings","text":"

Phileas has settings to control how it operates. The settings and how to configure each are described below.

The configuration for the types of sensitive information that Phileas identifies are defined in filter policies outside of Phileas' configuration properties described on this page.

"},{"location":"settings/#configuring-phileas","title":"Configuring Phileas","text":""},{"location":"settings/#the-phileas-settings-file","title":"The Phileas Settings File","text":"

Phileas looks for its settings in an application.properties file.

"},{"location":"settings/#using-environment-variables","title":"Using Environment Variables","text":"

Properties set via environment variables take precedence over properties set in Phileas' settings file.

All following properties can also be set as environment variables by prepending PHILTER_ to the property name and changing periods to underscores. For example, the property filter.profiles.directory can be set using the environment variable PHILTER_FILTER_PROFILES_DIRECTORY by:

export PHILTER_FILTER_PROFILES_DIRECTORY=/profiles/\n

Using environment variables to configure Phileas instead of using Phileas' settings file can allow for easier configuration management when deploying Phileas.

"},{"location":"settings/#policies","title":"Policies","text":"Setting Description Allowed Values Default Value filter.policies.directory The directory in which to look for policies. Any valid directory path. ./policies/"},{"location":"settings/#span-disambiguation","title":"Span Disambiguation","text":"

These values configure Phileas' span disambiguation feature to determine the most appropriate type of sensitive information when duplicate spans are identified. In a deployment of multiple Phileas instances, you must enable the cache service for span disambiguation to work as expected.

Description Allowed Values Default Value span.disambiguation.enabled Whether or not to enable span disambiguation. true, false false"},{"location":"settings/#cache-service","title":"Cache Service","text":"

The cache service is required to use consistent anonymization and policies stored in Amazon S3. Phileas supports Redis as the backend cache. When Redis is not used, an in-memory cache is used instead. The in-memory cache is not recommended because all contents will be stored in memory on the local Phileas instance.

The cache will contain sensitive information. It is important that you take the necessary precautions to secure the cache itself and all communication between Phileas and the cache.

Setting Description Allowed Values Default Value cache.redis.enabled Whether or not to use Redis as the cache. true, false false cache.redis.host The hostname or IP address of the Redis cache. Any valid Redis endpoint. None cache.redis.port The Redis cache port. Any valid port. 6379 cache.redis.auth.token The Redis auth token. Any valid token. None cache.redis.ssl Whether or not to use SSL for communication with the Redis cache. true, false false

The following Redis settings are only required when using a self-signed SSL certificate.

Setting Description Allowed Values Default Value cache.redis.truststore The path to the trust store. Any valid file path. None cache.redis.truststore.password The trust store password. Any valid file path. None cache.redis.keystore The path to the keystore. Any valid file path. None cache.redis.keystore.password The keystore password. Any valid file path. None"},{"location":"settings/#advanced-settings","title":"Advanced Settings","text":"

In most cases the settings below do not need changed. Contact us for more information on any of these settings.

Setting Description Allowed Values Default Value ner.timeout.sec Controls the timeout in seconds when performing name entity recognition. Longer text may require longer processing times. An integer value 600 ner.max.idle.connections The maximum number of idle connections to maintain for the named entity recognition. More connections may improve performance in some cases. An integer value. 30 ner.keep.alive.duration.ms The amount of time in milliseconds to keep named entity recognition connections alive. Longer text may require longer processing times. An integer value. 60"},{"location":"system_requirements/","title":"System Requirements","text":"

When launched from a cloud marketplace, Philter is pre-configured and contains all required dependencies.

Philter requires the following:

"},{"location":"upgrading/","title":"Upgrading Philter","text":"

We recommend reviewing the Philter Release Notes prior to upgrading.

"},{"location":"upgrading/#upgrading-from-a-2x-version","title":"Upgrading from a 2.x Version","text":"

Upgrading Philter to the newest version requires moving Philter's configuration to the new version of Philter. To upgrade Philter from a 2.x version, follow the steps below.

  1. Launch a new instance of the newest version of Philter.
  2. Copy your policies from /opt/philter/policies to the new instance.
  3. Copy your /opt/philter/philter.properties to the new instance.
  4. Copy your /opt/philter/philter-ui.properties to the new instance.
  5. Replace the new virtual machine's properties file with your copy from step 1.
  6. Copy your policies from /opt/philter/policies to the new instance.
  7. If you have configured any SSL certificates for Philter, copy those files over to the new instance.
  8. Restart Philter: sudo systemctl restart philter.service && sudo systemctl restart philter-ui.service && sudo systemctl restart philter-ner.service
  9. Test the new Philter virtual machine to make sure it is behaving as expected.
  10. Decommission the old Philter instance.
"},{"location":"upgrading/#upgrading-from-a-1x-version","title":"Upgrading from a 1.x Version","text":"

Upgrading Philter to the newest version requires moving Philter's configuration to the new version of Philter. To upgrade Philter from a 1.x version, follow the steps below.

  1. Make local copies of your current Philter's properties files.

  2. /opt/philter/philter.properties (prior to 1.10.1 the filename was /opt/philter/application.properties)

  3. /opt/philter/philter-ui.properties (not applicable prior to version 1.10)

  4. Launch a new instance of the newest version of Philter.

  5. Replace the new virtual machine's properties file with your copy from step 1.
  6. Restart Philter: sudo systemctl restart philter.service sudo systemctl restart philter-ui.service sudo systemctl restart philter-ner.service
  7. Test the new Philter virtual machine to make sure it is behaving appropriately.
  8. Decommission the old Philter instance.
"},{"location":"api_and_sdks/api/","title":"API","text":"

Philter's API is divided into three parts, the Filtering API the Policies API, and the Alerts API.

The Philter SDKs provide convenient methods for using Philter's API methods for various programming languages.

"},{"location":"api_and_sdks/api/#securing-philters-api","title":"Securing Philter's API","text":"

Philter's API supports one-way and two-way SSL/TLS authentication. See the settings for more information.

"},{"location":"api_and_sdks/sdks/","title":"Client SDKs","text":"

Philter SDKs are available for use in your projects. The SDKs are licensed under the Apache License, version 2.0]. Refer to the GitHub projects below for your language of choice for usage examples.

"},{"location":"api_and_sdks/api/alerts_api/","title":"Alerts API","text":"

The Alerts API provides endpoints for retrieving and deleting alerts. Alerts can optionally be generated when a filter strategy's condition is met. See Alerts for more information on Philter alerts.

The curl example commands shown on this page are written assuming Philter has been enabled for SSL and it is using a self-signed certificate. If launched from a cloud marketplace, SSL will be enabled automatically with a self-signed SSL certificate. See the SSL/TLS settings for more information. {style=\"note\"}

"},{"location":"api_and_sdks/api/alerts_api/#get-alerts","title":"Get Alerts","text":"Method Endpoint Description GET /api/alerts Get alerts.

Example request:

curl -k https://localhost:8080/api/alerts\n
"},{"location":"api_and_sdks/api/alerts_api/#delete-an-alert","title":"Delete an Alert","text":"Method Endpoint Description DELETE /api/alerts/{alertId} Delete an alert, where alertId is the ID of the alert to delete.

Example request to delete an alert with id 12345:

curl -k -X DELETE https://localhost:8080/api/alerts/12345\n
"},{"location":"api_and_sdks/api/filtering_api/","title":"Filtering API","text":"

Philter\u2019s filtering API provides access to Philter\u2019s ability to filter sensitive information from text and to retrieve the health status of Philter.

The curl example commands shown on this page are written assuming Philter has been enabled for SSL and it is using a self-signed certificate. If launched from a cloud marketplace, SSL will be enabled automatically with a self-signed SSL certificate. See the SSL/TLS settings for more information. {style=\"note\"}

Each filter request can optionally have a context. When not provided, the context defaults to none. Contexts provide a means for logically grouping your documents during filtering. For example, documents pertaining to one health care provider may be submitted under the context hospital1, and documents pertaining to another health care provider may be submitted under the context hospital2.

The context for each filter request impacts how sensitive information is replaced when found in the text. Consistent anonymization can be enabled at either the context or document level. When enabled at the context level, all instances of a given piece of sensitive information will be replaced consistently by the same value. This allows for maintaining meaning across all documents in the context.

Each filter request submitted to Philter is automatically assigned a document identifier. The document identifier is an alphanumeric value unique to that request. No two documents should be assigned the same document identifier. The document identifier is returned in the x-document-id header with each filter or explain API response.

"},{"location":"api_and_sdks/api/filtering_api/#filter","title":"Filter","text":"

The filter endpoint receives plain text or a PDF document and returns the redacted text or redacted PDF document.

The types of sensitive information found and how each type is redacted is determined by the chosen policy.

Method Endpoint Description POST /api/filter Filter the given text."},{"location":"api_and_sdks/api/filtering_api/#query-parameters","title":"Query Parameters","text":""},{"location":"api_and_sdks/api/filtering_api/#headers","title":"Headers","text":"

Example request to filter plain text:

curl -k -X POST \"https://localhost:8080/api/filter\" -d @file.txt -H Content-Type \"text/plain\"\n

Example request to filter a PDF document:

curl -k -X POST \"https://localhost:8080/api/filter?\" -d @file.pdf -H Content-Type \"application/pdf\" -O redacted.zip\n
"},{"location":"api_and_sdks/api/filtering_api/#explain","title":"Explain","text":"

The explain endpoint behaves much like the filter endpoint in that receives plain text and returns the redacted plain text. However, the explain endpoint provides a detailed explanation describing how the text was redacted. Also, the explain endpoint does not support PDF documents.

The types of sensitive information found and how each type is redacted is determined by the chosen policy.

Method Endpoint Description POST /api/explain Filter the given text and provide a detailed explanation."},{"location":"api_and_sdks/api/filtering_api/#query-parameters_1","title":"Query Parameters","text":""},{"location":"api_and_sdks/api/filtering_api/#headers_1","title":"Headers","text":"

Example explain request:

curl -k -X POST \"https://localhost:8080/api/explain\" -d @file.txt -H Content-Type \"text/plain\"\n

Example explain response:

{\n  \"filteredText\": \"{{{REDACTED-entity}}} was a patient and his ssn was {{{REDACTED-ssn}}}.\",\n  \"context\": \"none\",\n  \"documentId\": \"7a906866-4fc9-44d6-9bc3-22728b93a602\",\n  \"explanation\": {\n    \"appliedSpans\": [\n      {\n        \"id\": \"c78fb69c-84d6-4189-b376-63791793cbd2\",\n        \"characterStart\": 0,\n        \"characterEnd\": 17,\n        \"filterType\": \"NER_ENTITY\",\n        \"context\": \"C1\",\n        \"documentId\": \"7a906866-4fc9-44d6-9bc3-22728b93a602\",\n        \"confidence\": 0.9189682900905609,\n        \"text\": \"George Washington\",\n        \"replacement\": \"{{{REDACTED-entity}}}\",\n        \"ignored\": false\n      },\n      {\n        \"id\": \"f4556f62-2f80-4edc-96f0-aa1d44802157\",\n        \"characterStart\": 48,\n        \"characterEnd\": 59,\n        \"filterType\": \"SSN\",\n        \"context\": \"C1\",\n        \"documentId\": \"7a906866-4fc9-44d6-9bc3-22728b93a602\",\n        \"confidence\": 1,\n        \"text\": \"123-45-6789\",\n        \"replacement\": \"{{{REDACTED-ssn}}}\",\n        \"ignored\": false\n      }\n    ],\n    \"ignoredSpans\": []\n  }\n}\n
"},{"location":"api_and_sdks/api/filtering_api/#status","title":"Status","text":"

The status endpoint is useful in determining the current state of Philter. The status endpoint can be used by monitoring software to assess Philter's availability or by your cloud provider for purposes of determining Philter's health when deployed behind a load balancer.

Method Endpoint Description GET /api/status Gets the status of Philter.

Example request:

curl -k -X POST \"https://localhost:8080/api/status\"\n
"},{"location":"api_and_sdks/api/policies_api/","title":"Policies API","text":"

The Policies API provides endpoints for retrieving, uploading, and deleting policies.

The curl example commands shown on this page are written assuming Philter has been enabled for SSL and it is using a self-signed certificate. If launched from a cloud marketplace, SSL will be enabled automatically with a self-signed SSL certificate. See the SSL/TLS settings for more information. {style=\"note\"}

"},{"location":"api_and_sdks/api/policies_api/#get-policy-names","title":"Get Policy Names","text":"Method Endpoint Description GET /api/policies Get the names of all policies.

Example request:

curl -k https://localhost:8080/api/policies\n
"},{"location":"api_and_sdks/api/policies_api/#get-a-policy","title":"Get a Policy","text":"Method Endpoint Description GET /api/policies/{policyName} Get the content of a policy, where {policyName} is the name of the policy to get.

Example request:

curl -k https://localhost:8080/api/policies/my-policy\n

Example response:

{\n  \"name\": \"just-phone-numbers\",\n  \"ignored\": [\n  ],\n  \"identifiers\": {\n    \"dictionaries\": [\n    ],\n    \"phoneNumber\": {\n      \"phoneNumberFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"api_and_sdks/api/policies_api/#upload-a-policy","title":"Upload a Policy","text":"Method Endpoint Description PUT /api/policies/{policyName} Upload a policy, where {policyName} is the name of the policy to get. If a policy with this name already exists it will be overwritten.

Example request:

curl -X PUT -H \"Content-Type: application/json\" -k https://localhost:8080/api/profiles/my-profile -d @policy.json\n
"},{"location":"api_and_sdks/api/policies_api/#delete-a-policy","title":"Delete a Policy","text":"Method Endpoint Description DELETE /api/policies/{policyName} Delete a policy, where {policyName} is the name of the policy to delete.

Example request:

curl -X DELETE -k https://localhost:8080/api/policies/exprofile\n
"},{"location":"howtos/apache_proxy/","title":"How to Use an Apache Reverse Proxy with Philter","text":"

Running the Apache web server in front of Philter can have a few benefits. You can use Apache's authentication mechanisms to have greater control over who can access Philter's API, you can use SSL termination at Apache, use Apache's logs for access statistics, for example.

When terminating the SSL at Apache, make sure that the Apache reverse proxy and Philter are running on the same host so unencrypted traffic is not being sent over the network. To install and configure Apache on CentOS, RHEL and Amazon Linux follow the steps below. First, install the Apache:

sudo yum install httpd\n

Create the Philter configuration by creating a configuration file at /etc/httpd/conf.d/philter.conf:

<VirtualHost *:80>\n\n  ProxyPreserveHost On\n  ServerName philter.mydomain.com\n\n  LogLevel warn\n  ErrorLog logs/philter.mydomain.com-error_log\n  CustomLog logs/philter.mydomain.com-access_log combined\n\n  <Location />\n    ProxyPass http://localhost:8080/\n    ProxyPassReverse http://localhost:8080/\n  </Location>\n\n</VirtualHost>\n

Start Apache:

sudo systemctl start httpd\n

Make sure it started successfully:

sudo systemctl status httpd\n

Set the Apache service to start automatically:

sudo systemctl enable httpd\n

Verify you can access Philter through the reverse proxy:

curl http://philter.mydomain.com/api/status\n
"},{"location":"howtos/evaluate_performance/","title":"How to Evaluate Philter's Performance","text":"

A common question we receive is how well does Philter perform? Our answer to this question is probably less than satisfactory because it simply depends. What does it depend on? Philter's performance is heavily dependent upon your individual data. Sharing to compare metrics of Philter's performance between different customer datasets is like comparing apples and oranges.

If your data is not exactly like another customer's data then the metrics will not be applicable to your data. In terms of the classic information retrieval metrics precision and recall, comparing these values between customers can give false impressions about Philter's performance, both good and bad.

This guide walks you through how to evaluate Philter's performance. If you are just getting started with Philter please see the Quick Starts instead. Then you can come back here to learn how to evaluate Philter's performance.

"},{"location":"howtos/evaluate_performance/#guide-to-evaluating-performance","title":"Guide to Evaluating Performance","text":"

We have created this guide to help guide you in evaluating Philter's performance on your data. The guide involves determining the types of sensitive information you want to redact, configuring those filters, optimizing the configuration, and then capturing the performance metrics.

We will gladly perform these steps for you and provide you a detailed Philter performance report generated from your data. Please contact us to start the process. {style=\"note\"}

"},{"location":"howtos/evaluate_performance/#what-you-need","title":"What You Need","text":"

To evaluate Philter's performance you need:

"},{"location":"howtos/evaluate_performance/#configuring-philter","title":"Configuring Philter","text":"

Before we can begin our evaluation we need to create a policy. A policy is a file that defines the types of sensitive information that will be redacted and how it will be redacted. The policies are stored on the Philter instance under /opt/philter/policies. You can edit the policies directly there using a text editor or you can use Philter's API to upload a policy. In this case we recommend just using a text editor on the Philter instance to create a policy.

When using a text editor to create and edit a policy, be sure to save the policy often. Frequent saving can make editing a policy easier.

We also recommend considering to place your policy directory under source control to have a history and change log of your policies.

"},{"location":"howtos/evaluate_performance/#creating-a-policy","title":"Creating a Policy","text":"

Make a copy of the default policy and we will modify the copy for our needs.

cp /opt/philter/policies/default.json /opt/philter/policies/evaluation.json

Now open /opt/philter/policies/evaluation.json in a text editor. (The content of evaluation.json will be similar to what's shown below but may have minor differences between different versions of Philter.)

{\n   \"name\": \"default\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

The first thing we need to do is to set the name of the policy. Replace default with evaluation and save the file.

"},{"location":"howtos/evaluate_performance/#identifying-the-filters-you-need","title":"Identifying the Filters You Need","text":"

The rest of the file contains the filters that are enabled in the default policy. We need to make sure that each type of sensitive information that you want to redact is represented by a filter in this file. Look through the rest of the policy and determine which filters are listed that you do not need and also which filters you do need that are not listed.

"},{"location":"howtos/evaluate_performance/#disabling-filters-we-do-not-need","title":"Disabling Filters We Do Not Need","text":"

If a filter is listed in the policy and you do not need the filter you have two options. You can either delete those lines from the policy and save the file, or you can set the filter's enabled property to false. Using the enabled property allows you to keep the filter configuration in the policy in case it is needed later but both options have the same effect.

"},{"location":"howtos/evaluate_performance/#enabling-filters-not-in-the-default-policy","title":"Enabling Filters Not in the Default Policy","text":"

Let's say you want to redact bitcoin addresses. The bitcoin address filter is not in the default policy. To add the bitcoin address filter we will refer to Philter's documentation on the bitcoin address filter, get the configuration, and copy it into the policy.

From the bitcoin address filter documentation we see the configuration for the bitcoin address filter is:

      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n

We can copy this configuration and paste it into our policy:

{\n   \"name\": \"evaluation\",\n   \"identifiers\": {\n      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

The order of the filters in the policy does not matter and has no impact on performance. We typically place the filters in the policy alphabetically just to improve readability.

Repeat these steps until you have added a filter for each of the types of sensitive information you want to redact. Typically, the default redaction strategy and redactionFormat values for each filter should be fine for evaluation.

When finished modifying the policy, save the file and close the text editor. Now restart Philter for the policy changes to be loaded:

sudo systemctl restart philter\n
"},{"location":"howtos/evaluate_performance/#submitting-text-for-redaction","title":"Submitting Text for Redaction","text":"

With our policy in place we can now send text to Philter for redaction using that policy:

curl -k -X POST \"https://localhost:8080/api/filter?p=evaluation\" -d @file.txt -H \"Content-Type: text/plain\"\n

In the command above, we are sending the file file.txt to Philter. The ?p=evaluation tells Philter to apply the evaluation policy that we have been editing. Philter's response to this command will be the redacted contents of file.txt as defined in the policy.

"},{"location":"howtos/evaluate_performance/#comparing-documents","title":"Comparing Documents","text":"

With the original document file.txt and the redacted contents returned by Philter, we can now compare those files to begin evaluating Philter's performance. You can diff the text to find the redacted information or use some other method.

A visual comparison provides a quick overview of how Philter is performing on your text but does not give us precision and recall metrics. To calculate these metrics we must compare the redacted document with an annotated file instead of the original file. The annotated file should have the same contents of the original file but with the sensitive information denoted or somehow marked.

There are many industry-standard ways to annotate text and many tools to assist with text annotation. We recommend using a tool to help you annotate and compare instead of performing only a visual comparison which does not provide metric values.

Let's resubmit the file to Philter but instead this time use the explain API endpoint:

curl -k -X POST \"https://localhost:8080/api/explain?p=evaluation\" -d @file.txt -H \"Content-Type: text/plain\"\n

The explain API endpoint produces a detailed description of the redaction. The response will include a list of spans that contain the start and stop positions of redacted text and the type of sensitive information that was redacted. Using this information we can compare the redacted information to our annotated file to calculate precision and recall metrics.

"},{"location":"howtos/evaluate_performance/#calculating-precision-and-recall","title":"Calculating Precision and Recall","text":"

Now we can calculate the precision and recall metrics.

"},{"location":"howtos/signed_certificate/","title":"How to Use a Signed SSL Certificate with Philter","text":"

When Philter is deployed via the AWS Marketplace, Windows Azure Marketplace or other third-party cloud marketplace, SSL will already be enabled via a self-signed certificate. It is recommended you replace this self-signed certificate with a valid certificate issued to your organization by a trusted authority. The instructions for how to do this are described below.

First, create a private key and a certificate signing request (CSR) for Philter on your domain. In this walkthrough we are using the domain philter.yourdomain.com as an example.

openssl req -new -newkey rsa:2048 -nodes -keyout philter_yourdomain_com.key -out philter_youdomain_com.csr\n

Submit the CSR to your SSL certificate vendor of choice and complete the SSL certificate ordering process. If prompted for a web server during the process, select Apache or Nginx. Once the process is complete and the certificate is issued you will receive a few files. The files you will need are summarized in the table below. The file names may vary and you may also receive other files as well.

File Name Description Creator philter_yourdomain_com.csr Certificate signing request Created by you philter_yourdomain_com.key Certificate private key Created by you philter_yourdomain_com.ca-bundle Intermediate certificates provided by the issuing authority Received from SSL authority philter_yourdomain_com.crt The SSL certificate for philter.yourdomain.com Received from SSL authority

When prompted for a keystore password we will use changeit. It's recommended you use a more secure password.

The first thing to do is to convert the certificate and the private key to PKCS12 format in philter.p12:

openssl pkcs12 -export -in philter_yourdomain_com.crt -inkey philter_yourdomain_com.key -name philter -out philter.p12\n

Now import the P12 file into a keystore philter.jks:

keytool -importkeystore -deststorepass changeit -destkeystore philter.jks -srckeystore philter.p12 -srcstoretype PKCS12\n

Add the intermediate certificate provided by the issuing authority to the keystore:

keytool -import -alias intermediate -trustcacerts -file philter_yourdomain.com.ca-bundle -keystore philter.jks\n

Update Philter's settings in application.properties:

# SSL certificate settings\nserver.ssl.key-store-type=JKS\nserver.ssl.key-store=/path/to/philter.jks\nserver.ssl.key-store-password=changeit\nserver.ssl.key-alias=philter\n

Restart Philter:

sudo systemctl restart philter\n

Execute an API status request to verify Philter is running as expected. With the -v option we can see the details of the SSL certificate:

curl -v https://philter.yourdomain.com:8080/api/status\n

Look in the response for details of the certificate. Our domain was philter.mtnfog.dev:

* Server certificate:\n*  subject: CN=philter.mtnfog.dev\n*  start date: Apr 21 00:00:00 2020 GMT\n*  expire date: Apr 21 23:59:59 2021 GMT\n*  subjectAltName: host \"philter.mtnfog.dev\" matched cert's \"philter.mtnfog.dev\"\n*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA\n*  SSL certificate verify ok.\n
"},{"location":"other_features/alerts/","title":"Alerts","text":"

Phileas can optionally generate alerts when a particular type of sensitive information is identified.

"},{"location":"other_features/alerts/#alert-conditions","title":"Alert Conditions","text":"

In a policy, each type of sensitive information can have zero or more filter strategies. Each filter strategy can optionally have a condition associated with it. When a condition is present, the filter strategy will only be satisfied when the condition is satisfied. For example, a condition may be created to only filter phone numbers that start with the digits 123 or only filter names that start with John. Filter strategy conditions give you granular control over the filtering process.

When a filter strategy condition is satisfied, Phileas can optionally generate an alert. This feature allows you to be notified when a particular type of sensitive information is identified.

"},{"location":"other_features/alerts/#enabling-alerts","title":"Enabling Alerts","text":"

Alerts are enabled on a per-condition basis. For instance, given the following policy to identify email addresses, a condition has been added to only match the email address test@test.com. Because of the property alert set to true, an alert will be generated when this condition is satisfied. By default, the alert property is set to false disabling alerts for the condition.

{\n  \"name\": \"email-address-alert\",\n  \"identifiers\": {\n    \"emailAddress\": {\n      \"emailAddressFilterStrategies\": [\n        {\n          \"id\": \"my-email-strategy\",\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\",\n          \"condition\": \"token == \\\"test@test.com\\\"\",\n          \"alert\": true\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"other_features/alerts/#structure-of-an-alert","title":"Structure of an Alert","text":"

An alert contains the following information:

Property Name Description id A unique ID for the alert formatted as an UUID. filterProfile The name of the policy triggering the alert. strategyId The ID of the filter strategy triggering the alert. In the example above the id would be my-email-strategy. context The context. documentId The ID of the document which triggered the alert. filterType The filter type (\"email-address\", \"credit-card\", etc.) triggering the alert. date A timestamp when the alert was generated formatted as yyyy-MM-dd'T'HH:mm:ss.SSS'Z'."},{"location":"other_features/alerts/#retrieving-and-deleting-alerts","title":"Retrieving and Deleting Alerts","text":"

The alerts that Phileas has generated are available through Phileas' alerts API. This API allows for retrieving and deleting alerts. Using this API you can build sophisticated notification systems around Phileas' capabilities.

"},{"location":"other_features/consistent_anonymization/","title":"Consistent Anonymization","text":"

Anonymization in the context of Philter is the process of replacing certain values with random but similar values. For example, the identified name of \u201cJohn Smith\u201d may be replaced with \u201cDavid Jones\u201d, or an identified phone number of 123-555-9358 may be replaced by 842-436-2042. A VIN number will be replaced by a 17 character randomly selected VIN number that adheres to the standard for VIN numbers.

Anonymization is useful in instances where you want to remove sensitive information from text without changing the meaning of the text. Anonymization can be enabled for each type of sensitive information in the policy by setting the filter strategy to RANDOM_REPLACE. (See Policies for more information.)

"},{"location":"other_features/consistent_anonymization/#consistent-anonymization_1","title":"Consistent Anonymization","text":"

Consistent anonymization refers to the process of always anonymizing the same sensitive information with the same replacement values. For example, if the name \"John Smith\" is randomly replaced with \"Pete Baker\", all other occurrences of \"John Smith\" will also be replaced by \"Pete Baker.\"

Consistent anonymization can be done on the document level or on the context level. When enabled on the document level, \"John Smith\" will only be replaced by \"Pete Baker\" in the same document. If \"John Smith\" occurs in a separate document it will be anonymized with a different random name. When enabled on the context level, \"John Smith\" will be replaced by \"Pete Baker\" whenever \"John Smith\" is found in all documents in the same context.

Enabling consistent anonymization on the context level requires a cache to store the sensitive information and the corresponding replacement values. If a single instance of Philter is running, its internal cache service (enabled by default) is the best choice and no additional configuration is required.

If multiple instances of Philter are deployed together, Philter requires access to a Redis cache service as shown below. See Philter' Settings on how to configure the cache.

When Philter is deployed in a cluster, a Redis cache is required to enable consistent anonymization.

The anonymization cache will contain PHI. It is important that you take the necessary precautions to secure the cache and all communication to and from the cache.

"},{"location":"other_features/dashboard/","title":"Dashboard","text":"

Philter includes a user interface dashboard that can be accessed at https://<Philter>:9000.

The Philter UI dashboard is intended only for configuration testing. Use Philter's API for document redaction.

The dashboard provides the ability to test Philter's configuration and manage policies. Text and PDF documents can be submitted through the dashboard to analyze the redacted text and modify your filter policies.

"},{"location":"other_features/span_disambiguation/","title":"Span Disambiguation","text":"

Span disambiguation is an optional feature in Philter that is disabled by default. Refer to Philter' Settings to enable and configure span disambiguation.

In Philter, a span is a piece of the input text that Philter has identified as sensitive information. A span has a start and end positions, a confidence, a type, and other attributes. Ideally, each piece of identified sensitive information will only have a single span associated with it. In this case, the type of sensitive information is unambiguous. The goal of span disambiguation is to provide more accurate filtering by removing the potential ambiguities in the types of sensitive information for duplicate spans.

However, sometimes a piece of text can be identified by multiple spans, each having a different type of sensitive information. In an example hypothetical scenario, let's say given the input text My SSN is 123456789. , Philter identifies 123456789 as an SSN and as a phone number. This type of scenario can be quite common, and its likelihood increases as the number of enabled filters in a policy increase.

"},{"location":"other_features/span_disambiguation/#how-philter-span-disambiguation-works","title":"How Philter' Span Disambiguation Works","text":"

When we read the sentence My SSN is 123456789. we can tell the span in question should be identified as an SSN because we can look at the text surrounding the span. We use the surrounding words to deduce the correct type of sensitive information for 123456789.

That is exactly how Philter' span disambiguation works. When presented with identical spans differing only by the type of sensitive information, Philter looks at the text surrounding the span in question in combination with the previous spans it has seen in the same context to determine which type of sensitive information is most likely to be correct. Philter then removes the ambiguous spans from the results and replaces them with a single span.

"},{"location":"other_features/span_disambiguation/#improves-over-time","title":"Improves Over Time","text":"

Because Philter is able to consider previously seen text to make its decision concerning ambiguous spans, Philter' span disambiguation gets \"smarter\" as more text is filtered. This is because Philter will have more text to consider in its calculations.

"},{"location":"other_features/span_disambiguation/#more-details","title":"More Details","text":""},{"location":"other_features/span_disambiguation/#span-disambiguation-and-confidence-values","title":"Span Disambiguation and Confidence Values","text":"

Span disambiguation is only invoked for spans that differ only by the type of sensitive information. This means the span's location (start and end positions), confidence, and all other values must match. If two spans have identical locations but have different confidence values, span disambiguation will not be applied and the span having the highest confidence will be used.

"},{"location":"other_features/span_disambiguation/#cache-service","title":"Cache Service","text":"

When multiple application using Philter are deployed alongside each other behind a load balancer, Philter' cache service should be configured and enabled. Philter will store the information needed to disambiguate spans in the cache such that the information is available to each instance of Philter. If only a single instance of Philter is running then the cache service is not required, however, the information needed to disambiguate spans will be stored in memory and will be lost when Philter is stopped or restarted. Because of this, we recommend the cache service always be used unless there is a specific reason not to.

"},{"location":"other_features/span_disambiguation/#fine-tuning-the-span-disambiguation","title":"Fine-Tuning the Span Disambiguation","text":"

There are properties available to fine-tune how the span disambiguation operates. These properties are not documented because improper use of the properties could have a negative impact on performance. We will be glad to walk through these properties upon request.

"},{"location":"policies/document_analysis/","title":"Document Analysis","text":"

Philter analyzes received documents prior to redacting the document. This analysis is done to help Philter get a better understanding of the document. The results of the analysis are used to exclude certain document types from redaction and to improve Philter's redaction performance.

While not recommended, the automatic document analysis can be disabled in a policy. By default, document analysis is enabled.

Disabling document analysis will cause any policy features dependent on the results of the document analysis to not function.

An example policy with disabled document analysis is shown below.

{\n  \"name\": \"email-and-phone-numbers\",\n  \"config\": {\n    \"analysis\": {\n      \"enabled\": false\n    }\n  },\n  \"identifiers\": {\n    \"emailAddress\": {\n      \"emailAddressFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/excluding_by_document_type/","title":"Excluding by Document Type","text":"

Philter can automatically detect certain types of documents and exclude those documents from redaction of certain sensitive information. For example, you want to redact SSN/TINs in all but one type of document.

To exclude a document type from a specific filter, set the excludeDocumentTypes value to a list of document types to exclude for a filter strategy. Filter strategies for all filter types support the excludeDocumentTypes property.

An example to exclude email addresses from being redacted in a subpoena document is given below:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\",\n               \"excludeDocumentTypes\": [\"SUBPOENA\"]\n            }\n         ]\n      }\n   }\n}\n

In this example, email addresses are redacted in all document types except documents Philter identifies as being subpoena documents.

"},{"location":"policies/excluding_by_document_type/#document-types-supported-by-automatic-detection","title":"Document Types Supported by Automatic Detection","text":"

Philter currently supports automatically detecting the following document types.

Document Type Document Description Subpoena Form 2540 Federal Bankruptcy - SUBPOENA FOR RULE 2004 EXAMINATION Subpoena Form 2550 - Federal Bankruptcy - SUBPOENA TO APPEAR AND TESTIFY Subpoena Form 2560 - Federal Bankruptcy - SUBPOENA TO TESTIFY AT A DEPOSITION Subpoena Form 2570 - Federal Bankruptcy - SUBPOENA TO PRODUCE DOCUMENTS Subpoena AO 88 - SUBPOENA TO APPEAR AND TESTIFY AT A HEARING OR TRIAL IN A CIVIL ACTION Subpoena AO 88A - SUBPOENA TO TESTIFY AT A DEPOSITION IN A CIVIL ACTION Subpoena AO 88B - SUBPOENA TO PRODUCE DOCUMENTS, INFORMATION, OR OBJECTS Subpoena AO 89 - SUBPOENA TO TESTIFY AT A HEARING OR TRIAL IN A CRIMINAL CASE Subpoena AO 90 - SUBPOENA TO TESTIFY AT A DEPOSITION IN A CRIMINAL CASE Subpoena AO 110 - SUBPOENA TO TESTIFY BEFORE A GRAND JURY"},{"location":"policies/filter_policies/","title":"Filter Policies","text":"

The types of sensitive information identified by Phileas and how that information is de-identified are controlled through policies. A policy is a file stored under Phileas\u2019s policies directory, which by default is located at /opt/Phileas/policies/. You can have an unlimited number of policies.

Each policy has a name that is used by Phileas to apply the appropriate de-identification methods. The name is passed to Phileas\u2019s API along with the text to be filtered when submitting text to Phileas. This provides flexibility and allows you to de-identify different types of documents in differing manners with a single instance of Phileas. For example, you may have a policy for bankruptcy documents and a separate policy for financial documents.

There are sample policies available for immediate use or customization to fit your use-cases.

"},{"location":"policies/filter_policies/#the-structure-of-a-policy","title":"The Structure of a Policy","text":"

A policy:

"},{"location":"policies/filter_policies/#an-example-policy","title":"An Example Policy","text":"

The following is an example policy. In the example below you can see the types of sensitive information that are enabled and the strategy for manipulating each type when found. This policy identifies email addresses and phone numbers and redacts each with the format given.

{\n   \"name\": \"email-and-phone-numbers\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      },\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

When an email address is identified by this policy, the email address is replaced with the text {{{REDACTED-email-address}}}. The %t gets replaced by the type of the filter. Likewise, when a phone number is found it is replaced with the text {{{REDACTED-phone-number}}}. You are free to change the redaction formats to whatever fits your use-case. See Filter Strategies for all replacement options.

The name of the policy is email-and-phone-numbers. Policies can be named anything you like but their names must be unique from all other policies. As a best practice, the policy should be saved as [name].json, e.g. email-and-phone-numbers.json.

"},{"location":"policies/filter_policies/#applying-a-policy-to-text","title":"Applying a Policy to Text","text":"

To use this policy we will save it as /opt/Phileas/profiles/email-and-phone-numbers.json. We must restart Phileas for the new profile to be available for use. To apply the policy we will pass the policy's name to Phileas when making a filter request, as shown in the example request below.

curl -k -X POST \"https://localhost:8080/api/filter?c=context&p=email-and-phone-numbers\" \\\n  -d @file.txt -H Content-Type \"text/plain\"\n

In this command, we have provided the parameter p along with a value that is the name of the policy we want to use for this request. If we had multiple policies in Phileas we could choose a different policy for this request simply by changing the name given to the parameter p. For more details see Phileas\u2019s API.

Phileas will process the contents of file.txt by applying the policy named email-and-phone-numbers. As we saw in the policy above, this policy redacts email addresses and phone numbers. Phileas will return the redacted text in response to the API call.

To manipulate the sensitive information by methods other than redaction, see the Filter Strategies.

"},{"location":"policies/filter_strategies/","title":"Filter Strategies","text":"

A filter strategy defines how sensitive information identified by Philter should be manipulated, whether it is redacted, replaced, encrypted, or manipulated in some other fashion.

In a policy, you list the types of sensitive information that should be filtered. How Philter replaces each type of sensitive information is specific to each type. For instance, zip codes can be truncated based on the leading digits or zip code population while phone numbers are redacted. These replacements are performed by \"filter strategies.\"

Each filter can have one or more filter strategies and conditions can be used to determine when to apply each filter strategy.

A sample policy containing a filter strategy is shown below. In this example, email addresses will be redacted.

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

Most of the filter strategies apply to all types of data, however, some filter strategies only apply to a few types. For example, the TRUNCATE filter strategy only applies to a zip code filter.

"},{"location":"policies/filter_strategies/#filter-strategies_1","title":"Filter Strategies","text":"

The filter strategies are described below. Each filter type can specify zero or more filter strategies. When no filter strategies are given, Philter will default to REDACT for that filter type. When multiple filter strategies are given for a single filter type, the filter strategies will be applied in order as they are listed in the policy, top to bottom.

"},{"location":"policies/filter_strategies/#the-redact-filter-strategy","title":"The REDACT Filter Strategy","text":"

The REDACT filter strategy replaces sensitive information with a given redaction format. You can put variables in the redaction format that Philter will replace when performing the redaction.

The available redaction variables are:

Redaction Variable Description %t Will be replaced with the type of sensitive information. This is to allow you to know the type of sensitive information that was identified and redacted. %l Will be replaced by the given classification for the type of sensitive information. %v Will be replaced by the original value of the sensitive text. With %v you can annotate sensitive information instead of masking or removing it.

To redact sensitive information by replacing it with the type of sensitive information, the redaction format would be REDACTED-%t.

An example filter using the REDACT filter strategy:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-crypto_replace-filter-strategy","title":"The CRYPTO_REPLACE Filter Strategy","text":"

The CRYPTO_REPLACE filter strategy replaces each identified piece of sensitive information by encrypting it using the AES encryption algorithm. To use this filter strategy, the policy must include the details of the encryption key as shown below:

{\n   \"name\":\"sample-profile\",\n   \"crypto\": {\n     \"key\": \"....\",\n     \"iv\": \"....\"\n   },\n   ...\n

In the snippet of a policy shown above, a crypto element is defined with a key and an initialization vector (iv). These two items are required to encrypt the sensitive information. To generate a key, run the following command:

openssl enc -e -aes-256-cbc -a -salt -P\n

You will be prompted to enter an encryption password. Once entered, the values of the key and iv will be shown. Copy and paste those values into the policy.

An example policy using the CRYPTO_REPLACE filter strategy:

{\n   \"name\": \"email-address\",\n   \"crypto\": {\n     \"key\": \"....\",\n     \"iv\": \"....\"\n   },\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"CRYPTO_REPLACE\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-hash_sha256_replace-filter-strategy","title":"The HASH_SHA256_REPLACE Filter Strategy","text":"

The HASH_SHA256_REPLACE filter strategy replaces sensitive information with the SHA256 hash value of the sensitive information. To append a random salt value to each value prior to hashing, set the salt property to true. The salt value used will be returned in the explain response from Philter' API.

An example policy using the HASH_SHA256_REPLACE filter strategy:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"HASH_SHA256_REPLACE\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-fpe_encrypt_replace-filter-strategy","title":"The FPE_ENCRYPT_REPLACE Filter Strategy","text":"

The FPE_ENCRYPT_REPLACE filter strategy uses format-preserving encryption (FPE) to encrypt the sensitive information. Philter uses the FF3-1 algorithm for format-preserving encryption. The FPE_ENCRYPT_REPLACE filter strategy requires a key and a tweak value. These values control the format-preserving encryption. For more information on these values and format-preserving encryption, refer to the resources below:

An example policy using the FPE_ENCRYPT_REPLACE filter strategy:

{\n   \"name\": \"credit-cards\",\n   \"identifiers\": {\n      \"creditCardNumbers\": {\n         \"creditCardNumbersFilterStrategies\": [\n            {\n               \"strategy\": \"FPE_ENCRYPT_REPLACE\",\n               \"key\": \"...\",\n               \"tweak\": \"...\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-random_replace-filter-strategy","title":"The RANDOM_REPLACE Filter Strategy","text":"

Replaces the identified text with a fake value but of the same type. For example, an SSN will be replaced by a random text having the format ###-##-####, such as 123-45-6789. An email address will be replaced with a randomly generated email address. Available to all filter types.

An example policy using the RANDOM_REPLACE filter strategy:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"RANDOM_REPLACE\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-static_replace-filter-strategy","title":"The STATIC_REPLACE Filter Strategy","text":"

Replaces the identified text with a given static value. Available to all filter types.

An example policy using the STATIC_REPLACE filter strategy:

{\n   \"name\": \"email-address\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"STATIC_REPLACE\",\n               \"staticReplacement\": \"some new value\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-truncate-filter-strategy","title":"The TRUNCATE Filter Strategy","text":"

Available only to zip codes, this strategy allows for truncating zip codes to only a select number of digits. Specify truncateDigits to set the desired number of leading digits to leave. For example, if truncateDigits is 2, the zip code 90210 will be truncated to 90***.

The TRUNCATE filter strategy is available only to the zip code filter. An example policy using the TRUNCATE filter strategy:

{\n   \"name\": \"zip-codes\",\n   \"identifiers\": {\n      \"zipCode\": {\n         \"zipCodeFilterStrategies\": [\n            {\n               \"strategy\": \"TRUNCATE\",\n               \"truncateDigits\": 3\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#the-zero_leading-filter-strategy","title":"The ZERO_LEADING Filter Strategy","text":"

Available only to zip codes, this strategy changes the first 3 digits of a zip code to be 0. For example, the zip code 90210 will be changed to 00010.

The ZERO_LEADING filter strategy is only available to zip code filters. An example zip code filter using the ZERO_LEADING filter strategy:

{\n   \"name\": \"zip-codes\",\n   \"identifiers\": {\n      \"zipCodes\": {\n         \"zipCodeFilterStrategies\": [\n            {\n               \"strategy\": \"ZERO_LEADING\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filter_strategies/#filter-strategy-conditions","title":"Filter Strategy Conditions","text":"

A replacement strategy can be applied based on the sensitive information meeting one or more conditions. For example, you can create a condition such that only dates of 11/05/2010 are replaced by using the condition token == \"11/05/2010\". The conditions that can be applied vary based on the type of sensitive information. For instance, zip codes can have conditions based on their population. Refer to each specific filter type for the conditions available.

The following is an example policy for credit cards that contains a condition to only redact credit card numbers that start with the digits 3000:

{\n  \"name\": \"default\",\n  \"identifiers\": {\n    \"creditCard\": {\n      \"creditCardFilterStrategies\": [\n        {\n          \"condition\": \"token startswith \\\"3000\\\"\",\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/filter_strategies/#combining-conditions","title":"Combining Conditions","text":"

Conditions can be joined through the use of the and keyword. When conditions are joined, each condition must be satisfied for the identified text to be filtered. If any of the conditions are not satisfied the identified text will not be filtered. Below is an example joined condition:

token != \"123-45-6789\" and context == \"my-context\"\n

This condition requires that the identified text (the token) not be equal to 123-45-6789 and the context be equal to my-context. Both of these conditions must be satisfied for the identified text to be filtered.

Conversely, conditions can be OR'd through the use of multiple filter strategies. For example, if we want to OR a condition on the token and a condition on the context, we would use two filter strategies:

\"ssnFilterStrategies\": [\n  {\n    \"condition\": \"token != \\\"123-45-6789\\\"\",\n    \"strategy\": \"REDACT\",\n    \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n  },\n  {\n    \"condition\": \"context == \\\"my-context\\\"\",\n    \"strategy\": \"REDACT\",\n    \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n  }        \n]\n
"},{"location":"policies/filters/","title":"Filters","text":"

A \"filter\" corresponds to a type of sensitive information. Phileas has filters for sensitive information such as names, addresses, ages, and lots of others.

These are predefined filters that are ready to be used as well as custom filters that let you define your own Phileas to identify sensitive information outside of what the predefined filters can identify. An example of a custom filter is a filter to identify your patient account numbers, where the structure of an account number is specific to your organization.

Each filter is capable of identifying and redacting a specific type of sensitive information. For example, there is a filter for phone numbers, a filter for US social security numbers, and a filter for person's names. You can enable any combination of these filters based on the types of sensitive information you need to redact.

This section of the documentation describes the filters available in Phileas. The configuration options for each filter can vary due to the type of the sensitive information. For instance, only the zip code filter has a configuration to truncate the zip code.

A selection of filters and their configurations is called a policy. A policy describes how to de-identify a document.

"},{"location":"policies/filters/#predefined-filters","title":"Predefined Filters","text":""},{"location":"policies/filters/#persons-names","title":"Person's Names","text":"

Phileas uses several methods to identify person's names.

Type Description First Names Identifies common first names Surnames Identifies common surnames Person's Names (NER) Identifies full names using natural language processing analysis Physician's Names (NER) Identifies physican names using natural language processing analysis"},{"location":"policies/filters/#other-filters","title":"Other Filters","text":"Type Description Ages Identifies ages such as 3.5 years old Bank Routing Numbers Identifies bank routing numbers Bitcoin Addresses Identifies Bitcoin addresses such as 127NVqnjf8gB9BFAW2dnQeM6wqmy1gbGtv Cities Identifies common cities Counties Identifies common counties Credit Card Numbers Identifies VISA, American Express, MasterCard, and Discover credit card numbers Dates Identifies dates in many formats such as May 22, 1999 Driver's License Numbers Identifies driver's license numbers for all 50 US states Email Addresses Identifies email addresses Hospitals Identifies common hospital names Hospital Abreviations Identifies common hospitals by their name abbreviations IBAN Codes Identifies international bank account numbers IP Addresses Identifies IPv4 and IPv6 addresses MAC Addresses Identifies network MAC addresses Passport Numbers Identifies US passport numbers Phone Numbers Identifies phone numbers Phone Number Extensions Identifies phone numbers Sections Identifies sections in text denoted by SSNs and TINs Identifies US SSNs and TINs States Identifies US state names State Abbreviations Identifies US state names by their abbreviations Tracking Numbers Identifies UPS, FedEx, and USPS tracking numbers URLs Identifies URLs VINs Identifies vehicle identification numbers Zip Codes Identifies US zip codes"},{"location":"policies/filters/#custom-filter-types-of-sensitive-information","title":"Custom Filter Types of Sensitive Information","text":"

In addition to the predefined types of sensitive information listed in the table above, you can also define your own types of sensitive information. Through custom identifiers and dictionaries, Phileas can identify many other types of information that may be sensitive in your use-case. For example, if you have patient identifiers that follow a pattern of AA-00000 you can define a custom identifier for this sensitive information.

Phileas can be configured to look identify sensitive information based on custom dictionaries. When a term in the dictionary is found in the text, Phileas will treat the term as sensitive information and apply the given filter strategy.

Custom dictionaries support fuzziness to accommodate for misspellings. The replacement strategy for a custom dictionary has a sensitivityLevel that controls the amount of allowed fuzziness.

Type Description Custom Dictionaries Identifies sensitive information based on dictionary values. Custom Identifiers Identifies custom alphanumeric identifiers that may be used for medical record numbers, patient identifiers, account number, or other specific identifier."},{"location":"policies/ignoring_specific_information/","title":"Ignoring Specific Information","text":"

Phileas can optionally ignore a list of terms and prevent those terms from being redacted. For example, if the name John Smith is being redacted and you do not want it to be redacted, you can add John Smith to an ignore list. Each time Phileas identifies sensitive information it will check the ignore lists to see if the sensitive information is to be ignored.

Phileas can ignore terms and patterns per-policy, meaning each policy can have its own unique list of terms or patterns to ignore.

"},{"location":"policies/ignoring_specific_information/#ignore-lists","title":"Ignore Lists","text":"

Ignore lists can be specified at the policy level and/or for each filter in the policy. When set for the policy, the list of ignored terms will be applied to all filter types. When set for a filter, the list of ignored terms will be applied only to that filter.

"},{"location":"policies/ignoring_specific_information/#ignore-list-for-a-policy","title":"Ignore List for a Policy","text":"

In the policy shown below, an ignore list is set at the level of the policy. The terms specified in the list will be ignored for all filter types enabled in the policy. Only the terms property is required. The name and caseSensitive properties are optional.

{\n   \"name\": \"example-policy\",\n   \"ignored\": [\n     {\n       \"name\": \"names to ignore\",\n       \"terms\": [\"john smith\", \"jane doe\"],\n       \"caseSensitive\": false\n     }\n   ],\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n

Terms to be ignored at the policy level can also be read from one or more files located on the local file system. The file must be formatted as one term per line.

{\n   \"name\": \"example-policy\",\n   \"ignored\": [\n     {\n       \"name\": \"names to ignore\",\n       \"terms\": [\"john smith\", \"jane doe\"],\n       \"files\": [\"/tmp/names.txt\"]\n       \"caseSensitive\": false\n     }\n   ],   \n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/ignoring_specific_information/#ignore-list-for-a-filter","title":"Ignore List for a Filter","text":"

In the policy shown below, an ignore list is set at the level of a filter. The terms specified in the list will be ignored only for that filter type. Each filter in a policy can have its own list of ignored terms. The terms listed will be ignored case-sensitive, meaning, \"John\" will be ignored if \"John\" is an ignored term but will not be ignored if \"john\" is an ignored term.

{\n   \"name\": \"example-filter-profile\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"ignored\": [\"john smith\", \"jane doe\"],\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/ignoring_specific_information/#ignoring-patterns","title":"Ignoring Patterns","text":"

Phileas can ignore information based on a regular expression pattern. An example use of this feature is to ignore terms that are present in your text but are dynamic, such as logged timestamps. When using the date filter these timestamps may be identified as being sensitive but you do not want them redacted. With an ignore pattern we can ignore the logged timestamps.

"},{"location":"policies/ignoring_specific_information/#ignore-patterns","title":"Ignore Patterns","text":"

Ignore patterns can be specified at the policy level and/or at the level of each type of filter. When set at the policy level, the list of ignored patterns will be applied to all filter types. When set for an individual filter, the list of ignored patterns will be applied only to that filter.

"},{"location":"policies/ignoring_specific_information/#ignore-patterns-for-a-policy","title":"Ignore Patterns for a Policy","text":"

In the policy shown below, ignore patterns are set at the level of the policy. The patterns specified in the list will be ignored for all filter types enabled in the policy.

{\n   \"name\": \"example-policy\",\n   \"ignoredPatterns\": [\n     {\n       \"name\": \"ignore-room-numbers\",\n       \"pattern\": \"Room [A-Z0-4]{4}\"\n     }\n   ],\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/ignoring_specific_information/#ignore-patterns-for-a-filter","title":"Ignore Patterns for a Filter","text":"

In the policy shown below, ignore patterns are set at the level of a filter. The patterns specified in the list will be ignored only for that filter type. Each filter in a policy can have its own list of ignored patterns.

{\n   \"name\": \"example-policy\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"ignoredPatterns\": [\n           {\n             \"name\": \"ignore-room-numbers\",\n             \"pattern\": \"Room [A-Z0-4]{4}\"\n           }\n         ],\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/sample_policies/","title":"Sample Policies","text":"

This page lists some sample policies. You can use these policies either as-is or as starting points for customizing them to meet your specific de-identification needs.

These policies are examples and not an exhaustive list of all the sensitive information Phileas can identify. Items from each of these policies can be combined to make policies to meet your use-cases.

"},{"location":"policies/sample_policies/#email-addresses-and-phone-numbers","title":"Email Addresses and Phone Numbers","text":"

This policy finds email addresses and phone numbers and redacts them with {{{REDACTED-email-address}}} and {{{REDACTED-phone-number}}}, respectively.

{\n  \"name\": \"email-and-phone-numbers\",\n  \"identifiers\": {\n    \"emailAddress\": {\n      \"emailAddressFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    },\n    \"phoneNumber\": {\n      \"phoneNumberFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#persons-names-and-ssns","title":"Persons Names and SSNs","text":"

This policy finds persons names and SSNs and redacts them with {{{REDACTED-entity}}} and {{{REDACTED-ssn}}}, respectively.

{\n  \"name\": \"persons-names-ssn\",\n  \"identifiers\": {\n    \"ner\": {\n      \"nerFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    },\n    \"ssn\": {\n      \"ssnFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#dates-urls-and-vins","title":"Dates, URLs, and VINs","text":"

This policy finds dates, URLs, and VINs. Dates and URLs are redacted with {{{REDACTED-date}}} and {{{REDACTED-url}}}, respectively. Each VIN number are replaced by a randomly generated VIN number.

{\n  \"name\": \"dates-urls-vin\",\n  \"identifiers\": {\n    \"date\": {\n      \"dateFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    },\n    \"url\": {\n      \"urlFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    },\n    \"vin\": {\n      \"vinFilterStrategies\": [\n        {\n          \"strategy\": \"RANDOM_REPLACE\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#ip-addresses","title":"IP Addresses","text":"

This policy finds IP addresses and replaces each identified IP address with the static text IP_ADDRESS as long as the IP address is not 127.0.0.1. (A condition on the filter strategy sets the IP address requirement.)

{\n  \"name\": \"ip-addresses\",\n  \"identifiers\": {\n    \"ipAddress\": {\n      \"ipAddressFilterStrategies\": [\n        {\n          \"strategy\": \"STATIC_REPLACE\",\n          \"redactionFormat\": \"IP_ADDRESS\",\n          \"condition\": \"token != \\\"127.0.0.1\\\"\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#zip-codes","title":"Zip Codes","text":"

This policy finds ZIP codes starting with 90 and truncates the zip code to just the first two digits.

{\n  \"name\": \"zip-codes\",\n  \"identifiers\": {\n    \"creditCard\": {\n      \"creditCardFilterStrategies\": [\n        {\n          \"condition\": \"token startswith \\\"90\\\"\",\n          \"strategy\": \"TRUNCATE\",\n          \"truncateDigits\": 2\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#enable-text-splitting","title":"Enable Text Splitting","text":"

This policy enables text splitting for input over 10,000 characters.

{\n  \"name\": \"default-split-enabled\",\n  \"config\": {\n    \"splitting\": {\n      \"enabled\": true,\n      \"threshold\": 10000,\n      \"method\": \"newline\"\n    }\n  },\n  \"identifiers\": {\n    \"ssn\": {\n      \"ssnFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#globally-ignored-terms","title":"Globally Ignored Terms","text":"

This policy has a list of globally ignored terms.

{\n  \"name\": \"default-global-ignore\",\n  \"ignored\": [\n    {\n      \"name\": \"ignored credit cards\",\n      \"terms\": [\"4111111111111111\", \"0000000000000000\"]\n    }\n  ],\n  \"identifiers\": {\n    \"creditCard\": {\n      \"creditCardFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/sample_policies/#generating-alerts","title":"Generating Alerts","text":"

This policy generates an alert when a matching email address is identified.

{\n  \"name\": \"email-address-alert\",\n  \"identifiers\": {\n    \"emailAddress\": {\n      \"emailAddressFilterStrategies\": [\n        {\n          \"strategy\": \"REDACT\",\n          \"redactionFormat\": \"{{{REDACTED-%t}}}\",\n          \"condition\": \"token == \\\"test@test.com\\\"\",\n          \"alert\": true\n        }\n      ]\n    }\n  }\n}\n
"},{"location":"policies/splitting_input_text/","title":"Splitting Input Text","text":"

On a per-policy basis, Philter can split input text to process each split individually. This can improve performance and allows for handling long input text. Splitting is disabled by default.

An example split configuration in a policy is shown below

{\n  \"name\": \"default\",\n  \"identifiers\": {}, \n  \"config\": {\n    \"splitting\": {\n      \"enabled\": true,\n      \"threshold\": 10000,\n      \"method\": \"newline\"\n    }\n  }\n}\n

In this example policy, splitting is enabled for inputs greater than equal to 10,000 characters in length.

The method of splitting the text will be the newline method. This method will cause Philter to split the text based on the locations of new line characters in the input text. Additional methods of text splitting may be added in future versions.

Because the newline method splits text based on the locations of new line characters in the text, the text contained in the reassembled filter responses may not be an exact match of the input text. This is due to white space and other characters that may reside near the new line characters that get omitted during processing.

"},{"location":"policies/splitting_input_text/#text-splitting-policy-properties","title":"Text Splitting Policy Properties","text":"Property Description Allowed Values Default Value enabled Whether or not input texts are split. Whether or not input texts are split. When false, requests with text exceeding the threshold generate a HTTP 413 PayloadTooLarge error response. true or false false threshold When to split the input text. Set to -1 to disable splitting. Any integer value. 10000 method How to split the text. newline newline"},{"location":"policies/splitting_input_text/#alternative-to-philter-splitting-text","title":"Alternative to Philter Splitting Text","text":"

In some cases it may be best to split your input text client side prior to sending the text to Philter. This gives you full control over how the text will be split and provides more predictable responses from Philter because you know how the text is split.

An example of splitting text into chunks prior to sending the text to Philter is given in the commands below:

# Given a large file called largefile.txt, split it into 10k pieces.\n$ split -b 10k largefile.txt segment\n\n# Now process the pieces.\n$ curl -s -X POST -k \"https://philter:8080/api/filter?d=document1\" --data \"@/tmp/segmentaa\" -H \"Content-type: text/plain\" > out1\n$ curl -s -X POST -k \"https://philter:8080/api/filter?d=document1\" --data \"@/tmp/segmentab\" -H \"Content-type: text/plain\" > out2\n\n# Now recombine the outputs into a single file.\n$ cat out1 out2 > filtered.txt\n
"},{"location":"policies/filters/common_filters/ages/","title":"Ages","text":""},{"location":"policies/filters/common_filters/ages/#filter","title":"Filter","text":"

This filter identifies ages such as 3.5 years old in text.

"},{"location":"policies/filters/common_filters/ages/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/ages/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value ageFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/ages/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/ages/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. The filter will only be applied when the condition is satisfied. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/ages/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"ages-example\",\n   \"identifiers\": {\n      \"age\": {\n         \"ageFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/bank-routing-numbers/","title":"Bank Routing Numbers","text":""},{"location":"policies/filters/common_filters/bank-routing-numbers/#filter","title":"Filter","text":"

This filter identifies bank routing numbers (ABA routing transit numbers) such as 111000025 in text. Identified routing numbers must pass checksum validation.

"},{"location":"policies/filters/common_filters/bank-routing-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/bank-routing-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value bankRoutingNumberFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/bank-routing-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE)"},{"location":"policies/filters/common_filters/bank-routing-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. The filter will only be applied when the condition is satisfied. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/bank-routing-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"bank-routing-number-example\",\n   \"identifiers\": {\n      \"bankRoutingNumber\": {\n         \"bankRoutingNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/bitcoin-addresses/","title":"Bitcoin Addresses","text":""},{"location":"policies/filters/common_filters/bitcoin-addresses/#filter","title":"Filter","text":"

This filter identifies bitcoin addresses such as 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 in text.

"},{"location":"policies/filters/common_filters/bitcoin-addresses/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/bitcoin-addresses/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value bitcoinAddressFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/bitcoin-addresses/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE)"},{"location":"policies/filters/common_filters/bitcoin-addresses/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/bitcoin-addresses/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"bitcoin-address-example\",\n   \"identifiers\": {\n      \"bitcoinAddress\": {\n         \"bitcoinAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/creditcards/","title":"Credit Cards","text":""},{"location":"policies/filters/common_filters/creditcards/#filter","title":"Filter","text":"

This filter identifies credit cards such as 378282246310005 in text.

"},{"location":"policies/filters/common_filters/creditcards/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/creditcards/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value creditCardFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None onlyValidCreditCardNumbers When set to true, only valid credit card numbers will be filtered. true ignoreWhenInUnixTimestamp When set to true, only credit card numbers that do not match the pattern for a Unix timestamp will be filtered. false"},{"location":"policies/filters/common_filters/creditcards/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/creditcards/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/creditcards/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"credit-cards-example\",\n   \"identifiers\": {\n      \"creditcard\": {\n         \"onlyValidCreditCardNumbers\": false,\n         \"creditCardFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/dates/","title":"Dates","text":""},{"location":"policies/filters/common_filters/dates/#filter","title":"Filter","text":"

This filter identifies dates such as May 22, 2014 in text. The supported date formats are:

Format Example yyyy-MM-d 2020-05-10 MM-dd-yyyy 05-10-2020 M-d-y 5-10-2020 MMM dd May 5 or May 05 MMMM dd, yyyy May 5, 2020 or May 5 2020"},{"location":"policies/filters/common_filters/dates/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/dates/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value dateFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None onlyValidDates When set to true, only valid dates will be filtered. false"},{"location":"policies/filters/common_filters/dates/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. SHIFT Shift the date by a number of months, days, and/or years. SHIFTRANDOM Shift the data by a random number of months, days, and years. RELATIVE Replace the date by a words relative to the date."},{"location":"policies/filters/common_filters/dates/#filter-strategy-options","title":"Filter Strategy Options","text":"

The following filter strategy options are available for the RELATIVE filter strategy.

Description Default Value futureDates When true, future dates are replaced by relative words. When false, future dates are redacted. false

The following filter strategy options are available for the SHIFT filter strategy.

Option Description Default Value shiftDays The number of days to shift the date. Can be a negative or positive integer. Defaults to 0 if not specified. 0 shiftMinutes The number of minutes to shift the date. Can be a negative or positive integer. Defaults to 0 if not specified. 0 shiftYears The number of years to shift the date. Can be a negative or positive integer. Defaults to 0 if not specified. 0"},{"location":"policies/filters/common_filters/dates/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != TOKEN Compares the sensitive text to some category, e.g. birthdate. is CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/dates/#differentiating-between-dates-and-birth-dates","title":"Differentiating Between Dates and Birth Dates","text":"

In some cases it may be necessary to redact birth dates and dates differently. Using conditions it is possible to determine if an identified date is a birth date. The conditional token is birthdate will determine if the identified date (token) is a birth date by analyzing the content surrounding the date.

"},{"location":"policies/filters/common_filters/dates/#example-policy-to-redact-dates","title":"Example Policy to Redact Dates","text":"

The following policy redacts dates.

{\n   \"name\": \"dates-example\",\n   \"identifiers\": {\n      \"date\": {\n         \"onlyValidDates\": false,\n         \"dateFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/dates/#example-policy-to-shift-dates","title":"Example Policy to Shift Dates","text":"

The following policy to shift dates forward by 2 days and 4 months.

{\n   \"name\": \"dates-example\",\n   \"identifiers\": {\n      \"date\": {\n         \"onlyValidDates\": false,\n         \"dateFilterStrategies\": [\n            {\n               \"strategy\": \"SHIFT\",\n               \"shiftDays\": 2,\n               \"shiftMonths\": 4,\n               \"shiftYears\": 0\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/drivers-license-numbers/","title":"Driver's License Numbers","text":""},{"location":"policies/filters/common_filters/drivers-license-numbers/#filter","title":"Filter","text":"

This filter identifies driver's license numbers such as 194784357 in text. Driver's license number formats for all 50 US states are supported.

"},{"location":"policies/filters/common_filters/drivers-license-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/drivers-license-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value driversLicenseFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/drivers-license-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE)"},{"location":"policies/filters/common_filters/drivers-license-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/drivers-license-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"drivers-license-example\",\n   \"identifiers\": {\n      \"driversLicense\": {\n         \"driversLicenseFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/email-addresses/","title":"Email Addresses","text":""},{"location":"policies/filters/common_filters/email-addresses/#filter","title":"Filter","text":"

This filter identifies email addresses such as john.fake.address@hotmail.com in text.

"},{"location":"policies/filters/common_filters/email-addresses/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/email-addresses/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value emailAddressFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None onlyStrictMatches When set to false, the pattern for identifying email addresses will be relaxed. Filtered email addresses will have a lower confidence, but filter performance will increase. true onlyValidTLDs When set to true, only email addresses that are for a top-level domain are filtered. false"},{"location":"policies/filters/common_filters/email-addresses/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/email-addresses/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/email-addresses/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"email-address-example\",\n   \"identifiers\": {\n      \"emailAddress\": {\n         \"emailAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/iban-codes/","title":"IBAN Codes","text":""},{"location":"policies/filters/common_filters/iban-codes/#filter","title":"Filter","text":"

This filter identifies IBAN (international banking account numbers) Codes such as HU4211773016111110180000000 in text. Driver's license number formats for all 50 US states are supported.

"},{"location":"policies/filters/common_filters/iban-codes/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/iban-codes/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value allowSpaces When true, IBAN codes will be allowed to contain spaces and grouped in sections of 4. Set to false to disallow spaces in IBAN codes. true ibanCodeFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None onlyValidIBANCodes When set to true, only valid IBAN codes will be filtered. true"},{"location":"policies/filters/common_filters/iban-codes/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/iban-codes/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/iban-codes/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"iban-example\",\n   \"identifiers\": {\n      \"ibanCode\": {\n         \"onlyValidIBANCodes\": false,\n         \"ibanCodeFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/ip-addresses/","title":"IP Addresses","text":""},{"location":"policies/filters/common_filters/ip-addresses/#filter","title":"Filter","text":"

This filter identifies IPv4 and IPv6 addresses 127.0.0.1, 192.168.3.58, and 2001:0db8:85a3:0000:0000:8a2e:0370:7334 in text.

"},{"location":"policies/filters/common_filters/ip-addresses/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/ip-addresses/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value ipAddressFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/ip-addresses/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/ip-addresses/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/ip-addresses/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"ip-address-example\",\n   \"identifiers\": {\n      \"ipAddress\": {\n         \"ipAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/mac-addresses/","title":"MAC Addresses","text":""},{"location":"policies/filters/common_filters/mac-addresses/#filter","title":"Filter","text":"

This filter identifies MAC addresses in text.

"},{"location":"policies/filters/common_filters/mac-addresses/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/mac-addresses/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value macAddressFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/mac-addresses/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/mac-addresses/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/mac-addresses/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"mac-address-example\",\n   \"identifiers\": {\n      \"macAddress\": {\n         \"macAddressFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/passport-numbers/","title":"Passport Numbers","text":""},{"location":"policies/filters/common_filters/passport-numbers/#filter","title":"Filter","text":"

This filter identifies US passport numbers in text.

"},{"location":"policies/filters/common_filters/passport-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/passport-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value passportNumberFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/passport-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE)"},{"location":"policies/filters/common_filters/passport-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CLASSIFICATION Compares the issuing country of the passport number. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/passport-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"passport-number-example\",\n   \"identifiers\": {\n      \"passportNumber\": {\n         \"passportNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/phone-number-extensions/","title":"Phone Number Extensions","text":""},{"location":"policies/filters/common_filters/phone-number-extensions/#filter","title":"Filter","text":"

This filter identifies phone numbers extensions such as \"x100\" in text.

"},{"location":"policies/filters/common_filters/phone-number-extensions/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/phone-number-extensions/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value phoneNumberExtensionFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/phone-number-extensions/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/phone-number-extensions/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/phone-number-extensions/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"phone-number-ext-example\",\n   \"identifiers\": {\n      \"phoneNumberExtension\": {\n         \"phoneNumberExtensionFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      } \n   }     \n}\n
"},{"location":"policies/filters/common_filters/phone-numbers/","title":"Phone Numbers","text":""},{"location":"policies/filters/common_filters/phone-numbers/#filter","title":"Filter","text":"

This filter identifies phone and fax numbers such as (304) 555-5555, 304-555-5555, and 1-800-123-4567 in text.

"},{"location":"policies/filters/common_filters/phone-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/phone-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value phoneNumberFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/phone-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/phone-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/phone-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"phone-number-example\",\n   \"identifiers\": {\n      \"phoneNumber\": {\n         \"phoneNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }     \n}\n
"},{"location":"policies/filters/common_filters/sections/","title":"Sections","text":""},{"location":"policies/filters/common_filters/sections/#filter","title":"Filter","text":"

This filter identifies sections in text between a given start regular expression pattern and a given end regular expression pattern.

"},{"location":"policies/filters/common_filters/sections/#required-parameters","title":"Required Parameters","text":"Parameter Description Default Value startPattern A regular expression denoting the start of the section. None endPattern A regular expression denoting the end of the section. None"},{"location":"policies/filters/common_filters/sections/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value sectionFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/sections/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/sections/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/sections/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"sections-example\",\n   \"identifiers\": {\n      \"section\": {\n         \"startPattern\": \"START\",\n         \"endPattern\": \"END\",\n         \"sectionFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n}\n
"},{"location":"policies/filters/common_filters/ssns-and-tins/","title":"SSNs and TINs","text":""},{"location":"policies/filters/common_filters/ssns-and-tins/#filter","title":"Filter","text":"

This filter identifies US SSNs and TINs such as 123-45-6789 and 123456789 in text.

"},{"location":"policies/filters/common_filters/ssns-and-tins/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/ssns-and-tins/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value ssnFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/ssns-and-tins/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/ssns-and-tins/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/ssns-and-tins/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"ssn-tin-example\",\n   \"identifiers\": {\n      \"ssn\": {\n         \"ssnFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/tracking-numbers/","title":"Tracking Numbers","text":""},{"location":"policies/filters/common_filters/tracking-numbers/#filter","title":"Filter","text":"

This filter identifies tracking numbers in text. FedEx, UPS, and USPS tracking number formats are supported.

"},{"location":"policies/filters/common_filters/tracking-numbers/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/tracking-numbers/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value trackingNumberFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/tracking-numbers/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/tracking-numbers/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/tracking-numbers/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"tracking-numbers-example\",\n   \"identifiers\": {\n      \"trackingNumber\": {\n         \"trackingNumberFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/urls/","title":"URLs","text":""},{"location":"policies/filters/common_filters/urls/#filter","title":"Filter","text":"

This filter identifies URLs such as myhomepage.com, http://myhomepage.com/folder/page.html, and www.myhomepage.com/folder/page.html in text.

"},{"location":"policies/filters/common_filters/urls/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/urls/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value urlFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None requireHttpWwwPrefix When set to true, only URLs that begin with http or www will be filtered. true"},{"location":"policies/filters/common_filters/urls/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/common_filters/urls/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/urls/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"urls-example\",\n   \"identifiers\": {\n      \"url\": {\n         \"requireHttpWwwPrefix\": true,\n         \"urlFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/vins/","title":"VINs","text":""},{"location":"policies/filters/common_filters/vins/#filter","title":"Filter","text":"

This filter identifies 17-digit vehicle identification numbers (VINs) such as WBAPM7G50ANL19218 and 1GBJC34K3RE176005 in text.

"},{"location":"policies/filters/common_filters/vins/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/vins/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value vinFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/common_filters/vins/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. FPE_ENCRYPT_REPLACE Replace the sensitive text with a value generated by format-preserving encryption (FPE) LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/common_filters/vins/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/vins/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"vins-example\",\n   \"identifiers\": {\n      \"vin\": {\n         \"vinFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/common_filters/zip-codes/","title":"Zip Codes","text":""},{"location":"policies/filters/common_filters/zip-codes/#filter","title":"Filter","text":"

This filter identifies zip codes in text.

"},{"location":"policies/filters/common_filters/zip-codes/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/common_filters/zip-codes/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value zipCodeFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None requireDelimiter When set to false, the filter will not require a dash in 9 digit zip codes, e.g. 12345-6789. Setting to false may increase the number of zip code false positives. true"},{"location":"policies/filters/common_filters/zip-codes/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in order as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. TRUNCATE Replace the sensitive text by removing the last x digits. (Set the number of digits using the truncateDigits parameter of the filter strategy.) ZERO_LEADING Replace the sensitive text by zeroing the first 3 digits."},{"location":"policies/filters/common_filters/zip-codes/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, != POPULATION Compares the population of the zip code against the 2010 census values. < , <=, > , >=, ==, !="},{"location":"policies/filters/common_filters/zip-codes/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"zip-code-example\",\n   \"identifiers\": {\n      \"zipCode\": {\n         \"zipCodeFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/custom_filters/dictionary/","title":"Dictionary","text":""},{"location":"policies/filters/custom_filters/dictionary/#filter","title":"Filter","text":"

This filter identifies custom text based on a given dictionary.

"},{"location":"policies/filters/custom_filters/dictionary/#required-parameters","title":"Required Parameters","text":"

At least one of terms or files must be provided.

Parameter Description Default Value terms A list of terms in the dictionary. None files A list of files containing terms one per line. None"},{"location":"policies/filters/custom_filters/dictionary/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None fuzzy When set to true, the dictionary will employ fuzzy comparisons. Use the sensitivity parameter to control the level of fuzziness. Setting this value to false will disable fuzziness and provide a higher level of performance. false classification Used to apply an arbitrary label to the identifier, such as \"patient-id\", or \"account-number.\" \"custom-identifier\" sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. Only applies when fuzzy is set to true. medium"},{"location":"policies/filters/custom_filters/dictionary/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/custom_filters/dictionary/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/custom_filters/dictionary/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"dictionary-example\",\n   \"identifiers\": {\n      \"dictionaries\": [\n         \"customDictionary\": {\n            \"terms\": [\"john\", \"jane\", \"doe\"],\n            \"files\": \"c:\\temp\\dictionary.txt\",\n            \"fuzzy\": true,\n            \"sensitivity\": \"medium\",\n            \"sectionFilterStrategies\": [\n               {\n                  \"strategy\": \"REDACT\",\n                  \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n               }\n            ]\n         }\n      ]\n   }   \n}\n
"},{"location":"policies/filters/custom_filters/identifier/","title":"Identifier","text":""},{"location":"policies/filters/custom_filters/identifier/#filter","title":"Filter","text":"

This filter identifies custom text based on a given regular expression.

The Identifier filter accepts a list of regular expression-based identifiers. See the policy at the bottom of this page for an example.

Note that backslashes in the regular expression will need to be escaped for the policy to be valid JSON.

"},{"location":"policies/filters/custom_filters/identifier/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/custom_filters/identifier/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None caseSensitive When set to true, the regular expression will be case sensitive. true classification Used to apply an arbitrary label to the identifier, such as \"patient-id\", or \"account-number.\" \"custom-identifier\" pattern A regular expression for the identifier. Note that backslashes will need to be escaped. \\b[A-Z0-9_-]{4,}\\b"},{"location":"policies/filters/custom_filters/identifier/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. LAST_4 Replace the sensitive text with just the last four characters of the text."},{"location":"policies/filters/custom_filters/identifier/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, != CLASSIFICATION Compares the classification of the sensitive text. == , !="},{"location":"policies/filters/custom_filters/identifier/#example-policy","title":"Example Policy","text":"
{\n  \"name\": \"default\",\n  \"identifiers\": {\n    \"identifiers\": [\n      {\n        \"pattern\": \"[A-Z]{9}\",\n        \"caseSensitive\": false,\n        \"classification\": \"custom-identifier\",\n        \"enabled\": true,\n        \"identifierFilterStrategies\": [\n          {\n            \"strategy\": \"REDACT\",\n            \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n          }\n        ]        \n      }\n    ]\n  }\n}\n
"},{"location":"policies/filters/locations/cities/","title":"Cities","text":""},{"location":"policies/filters/locations/cities/#filter","title":"Filter","text":"

This filter identifies common US cities as determined by the US census in text.

"},{"location":"policies/filters/locations/cities/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/cities/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value cityFilterStrategies A list of filter strategies. None sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium"},{"location":"policies/filters/locations/cities/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/cities/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/cities/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"cities-example\",\n   \"identifiers\": {\n      \"city\": {\n         \"sensitivity\": \"medium\",\n         \"cityFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/counties/","title":"Counties","text":""},{"location":"policies/filters/locations/counties/#filter","title":"Filter","text":"

This filter identifies common US counties as determined by the US census in text.

"},{"location":"policies/filters/locations/counties/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/counties/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value countyFilterStrategies A list of filter strategies. None sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium"},{"location":"policies/filters/locations/counties/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/counties/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/counties/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"counties-example\",\n   \"identifiers\": {\n      \"county\": {\n         \"sensitivity\": \"medium\",\n         \"countyFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/hospital-abbreviations/","title":"Hospital Abbreviations","text":""},{"location":"policies/filters/locations/hospital-abbreviations/#filter","title":"Filter","text":"

This filter identifies US hospital abbreviations in text.

"},{"location":"policies/filters/locations/hospital-abbreviations/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/hospital-abbreviations/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value hospitalAbbreviationFilterStrategies A list of filter strategies. None sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium"},{"location":"policies/filters/locations/hospital-abbreviations/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/hospital-abbreviations/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/hospital-abbreviations/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"hospital-abbreviations-example\",\n   \"identifiers\": {\n      \"hospitalAbbreviation\": {\n         \"sensitivity\": \"medium\",\n         \"hospitalAbbreviationFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/hospitals/","title":"Hospitals","text":""},{"location":"policies/filters/locations/hospitals/#filter","title":"Filter","text":"

This filter identifies US hospitals in text.

"},{"location":"policies/filters/locations/hospitals/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/hospitals/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value hospitalFilterStrategies A list of filter strategies. None sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium"},{"location":"policies/filters/locations/hospitals/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/hospitals/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/hospitals/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"hospitals-example\",\n   \"identifiers\": {\n      \"hospital\": {\n         \"sensitivity\": \"medium\",\n         \"hospitalFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/state-abbreviations/","title":"State Abbreviations","text":""},{"location":"policies/filters/locations/state-abbreviations/#filter","title":"Filter","text":"

This filter identifies US state abbreviations in text.

"},{"location":"policies/filters/locations/state-abbreviations/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/state-abbreviations/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value stateAbbreviationsFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/locations/state-abbreviations/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/state-abbreviations/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/state-abbreviations/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"states-abbreviations-example\",\n   \"identifiers\": {\n      \"stateAbbreviation\": {\n         \"stateAbbreviationFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/locations/states/","title":"States","text":""},{"location":"policies/filters/locations/states/#filter","title":"Filter","text":"

This filter identifies US states in text.

"},{"location":"policies/filters/locations/states/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/locations/states/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value stateFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/locations/states/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/locations/states/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/locations/states/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"states-example\",\n   \"identifiers\": {\n      \"state\": {\n         \"stateFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/persons_names/first-names/","title":"First Names","text":""},{"location":"policies/filters/persons_names/first-names/#filter","title":"Filter","text":"

This filter identifies common first names as identified by the US census in text.

"},{"location":"policies/filters/persons_names/first-names/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/persons_names/first-names/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium firstNameFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/persons_names/first-names/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/persons_names/first-names/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/persons_names/first-names/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"first-names-example\",\n   \"identifiers\": {\n      \"firstName\": {\n         \"firstNameFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/persons_names/persons-names-ner/","title":"Person's Names (NER)","text":""},{"location":"policies/filters/persons_names/persons-names-ner/#filter","title":"Filter","text":"

This filter identifies person's names based on natural language processing (NLP) and named-entity recognition (NER) in text.

"},{"location":"policies/filters/persons_names/persons-names-ner/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/persons_names/persons-names-ner/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value removePunctuation When set to true, punctuation will be removed prior to analysis. false firstNameFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/persons_names/persons-names-ner/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value. ABBREVIATE Replace the sensitive text with the initials of the text."},{"location":"policies/filters/persons_names/persons-names-ner/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/persons_names/persons-names-ner/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"ner-example\",\n   \"identifiers\": {\n      \"ner\": {\n         \"nerFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/persons_names/physician-names-ner/","title":"Physician Names","text":""},{"location":"policies/filters/persons_names/physician-names-ner/#filter","title":"Filter","text":"

This filter identifies physician names (e.g. Dr. John Smith) in text.

"},{"location":"policies/filters/persons_names/physician-names-ner/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/persons_names/physician-names-ner/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value physicianNameFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/persons_names/physician-names-ner/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/persons_names/physician-names-ner/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/persons_names/physician-names-ner/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"physician-names-example\",\n   \"identifiers\": {\n      \"physicianName\": {\n         \"physicianNameFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"policies/filters/persons_names/surnames/","title":"Surnames","text":""},{"location":"policies/filters/persons_names/surnames/#filter","title":"Filter","text":"

This filter identifies common surnames as identified by the US census in text.

"},{"location":"policies/filters/persons_names/surnames/#required-parameters","title":"Required Parameters","text":"

This filter has no required parameters.

"},{"location":"policies/filters/persons_names/surnames/#optional-parameters","title":"Optional Parameters","text":"Parameter Description Default Value sensitivity Controls the \"fuzziness\" of allowed values to account for misspellings and derivations. Valid values are low, medium, and high. medium surnameFilterStrategies A list of filter strategies. None enabled When set to false, the filter will be disabled and not applied true ignored A list of terms to be ignored by the filter. None"},{"location":"policies/filters/persons_names/surnames/#filter-strategies","title":"Filter Strategies","text":"

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy Description REDACT Replace the sensitive text with a placeholder. RANDOM_REPLACE Replace the sensitive text with a similar, random value. STATIC_REPLACE Replace the sensitive text with a given value. CRYPTO_REPLACE Replace the sensitive text with its encrypted value. HASH_SHA256_REPLACE Replace the sensitive text with its SHA256 hash value."},{"location":"policies/filters/persons_names/surnames/#conditions","title":"Conditions","text":"

Each filter strategy may have one condition. See Conditions for details.

Conditional Description Operators TOKEN Compares the value of the sensitive text. == , != CONTEXT Compares the filtering context. == , != CONFIDENCE Compares the confidence in the sensitive text against a threshold value. < , <=, > , >=, ==, !="},{"location":"policies/filters/persons_names/surnames/#example-policy","title":"Example Policy","text":"
{\n   \"name\": \"surnames-example\",\n   \"identifiers\": {\n      \"surname\": {\n         \"surnameFilterStrategies\": [\n            {\n               \"strategy\": \"REDACT\",\n               \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n            }\n         ]\n      }\n   }\n}\n
"},{"location":"quick_starts/quick_start_aws/","title":"Philter Quick Start on AWS","text":"

Philter on AWS is a virtual machine-based product. It runs in EC2 on its own EC2 instance. A free trial period is available during which there is no charge for the Philter software but there may be charges for the underlying AWS infrastructure.

Cloud virtual machines launched from a cloud marketplace may not be immediately suitable for a HIPAA environment. Refer to your compliance officer for your organization's requirements to ensure compliance with all relevant regulations.

Here\u2019s a brief screencast showing how to launch Philter in AWS.

"},{"location":"quick_starts/quick_start_aws/#launch-philter-in-aws","title":"Launch Philter in AWS","text":"
  1. Go to Philter in the AWS Marketplace. On this page you can see the Philter overview, the pricing, and the supported EC2 instance types.
  2. Select an instance type. We recommend m5.large. The smaller instance types are intended only for testing and are not well-suited for production usage.
  3. Click the Continue to Subscribe button.
  4. View and accept Philter\u2019s license agreement. Then click Accept Terms.
  5. The subscription will now be created and you will be notified when it is ready! This usually only takes less than a minute.
  6. Click the Continue to Configuration button to select the AMI, the version, and the region. We recommend using the newest version if multiple are available.
  7. Click the Continue to Launch button to launch Philter in your AWS account!

AWS will automatically open ports 22 (SSH) and 8080 (Philter API) for the Philter instance's security group. These ports are required to be open but you may want to modify the security groups to limit their scope of availability by restricting access to specific CIDR ranges.

Congratulations! You have deployed Philter in AWS. You are now ready to filter text!

"},{"location":"quick_starts/quick_start_aws/#try-it-out","title":"Try it out!","text":"

With Philter now running we can take it for a spin. We will send some text to Philter and inspect at the response we get back. The Philter virtual machine running in your cloud account should have a public IP address (unless you customized the deployment). We will use that public IP address to interact with Philter.

Philter, by default, will be configured with an HTTPS listener on port 8080 using a self-signed certificate. It is recommended that prior to use in a production environment the self-signed certificate is replaced by a valid certificate owned by your organization.

In the command below, replace <PUBLIC_IP> with the virtual machine\u2019s public IP address or public host name.

curl -k -X POST https://<PUBLIC_IP>:8080/api/filter --data \"George Washington was a patient and his SSN is 123-45-6789.\" -H \"Content-type: text/plain\"\n

With this command we are sending the text in the command to Philter for filtering. Philter will identify the patient name (George Washington) and the SSN (123-45-6789) and redact those values in the response. You can always use curl to send text to Philter as in these examples but there are also SDKs you can use, too, to integrate Philter with your applications.

"},{"location":"quick_starts/quick_start_aws/#redacting-sensitive-information-from-text","title":"Redacting Sensitive Information from Text","text":"

The types of sensitive information that Philter identifies and removes is controlled by policies. By default, Philter includes a filter profile that includes many of the types of sensitive information, such as names and social security numbers. We can send text to filter to Philter for filtering using this default filter profile with the following command:

curl -k -X POST https://localhost:8080/api/filter -d @file.txt -H \"Content-Type: text/plain\"\n

This command sends the contents of the file file.txt to Philter. Philter will apply the enabled filters and return a plain-text response consisting of the filtered text. (Replace localhost with the IP address or host name of Philter if you are not running the command where Philter is running.) You can also send text directly in the request instead of sending it as a file:

curl -k -X POST https://localhost:8080/api/filter --data \"Your text goes here...\" -H \"Content-type: text/plain\"\n
"},{"location":"quick_starts/quick_start_aws/#next-steps","title":"Next Steps","text":"

Now that you have Philter running and know how to send text to it, you are ready to integrate Philter into your existing workflow and systems. Philter\u2019s API details how to send files to Philter. Clients for some languages for Philter\u2019s API are available on GitHub.

Be sure to check out Policies to see how you can customize the types of sensitive information Philter redacts!

"},{"location":"quick_starts/quick_start_azure/","title":"Philter Quick Start on Microsoft Azure","text":"

Philter on Microsoft Azure is a virtual machine-based product. A free trial period is available during which there is no charge for the Philter software but there may be charges for the underlying Azure infrastructure.

Cloud virtual machines launched from a cloud marketplace may not be immediately suitable for a HIPAA environment. Refer to your compliance officer for your organization's requirements to ensure compliance with all relevant regulations.

"},{"location":"quick_starts/quick_start_azure/#launch-philter-on-microsoft-azure","title":"Launch Philter on Microsoft Azure","text":"
  1. Go to Philter in the Azure Marketplace.
  2. Click the Get It Now button.
  3. Review the information that is shown on the popup and click Continue when ready.
  4. You will now be asked to log in to your Microsoft Azure account if you were not already logged in.
  5. Click the Create button to begin making a Philter virtual machine.
  6. Enter the required details of the virtual machine and click the Review + create button.
  7. Review the virtual machine details and click Create when ready!

Your Philter virtual machine will now be launching.

Microsoft Azure will automatically open ports 22 (SSH) and 8080 (Philter API). These ports are required to be open but you may want to modify the security groups to limit their scope of availability by restricting access to specific CIDR ranges.

Congratulations! You have deployed Philter in Azure. You are now ready to filter text!

"},{"location":"quick_starts/quick_start_azure/#try-it-out","title":"Try it out!","text":"

With Philter now running we can take it for a spin. We will send some text to Philter and inspect at the response we get back. The Philter virtual machine running in your cloud account should have a public IP address (unless you customized the deployment). We will use that public IP address to interact with Philter.

Philter, by default, will be configured with an HTTPS listener on port 8080 using a self-signed certificate. It is recommended that prior to use in a production environment the self-signed certificate is replaced by a valid certificate owned by your organization.

In the command below, replace <PUBLIC_IP> with the virtual machine\u2019s public IP address or public host name.

curl -k -X POST https://<PUBLIC_IP>:8080/api/filter --data \"George Washington was a patient and his SSN is 123-45-6789.\" -H \"Content-type: text/plain\"\n

With this command we are sending the text in the command to Philter for filtering. Philter will identify the patient name (George Washington) and the SSN (123-45-6789) and redact those values in the response. You can always use curl to send text to Philter as in these examples but there are also SDKs you can use, too, to integrate Philter with your applications.

"},{"location":"quick_starts/quick_start_azure/#redacting-sensitive-information-from-text","title":"Redacting Sensitive Information from Text","text":"

The types of sensitive information that Philter identifies and removes is controlled by policies. By default, Philter includes a filter profile that includes many of the types of sensitive information, such as names and social security numbers. We can send text to filter to Philter for filtering using this default filter profile with the following command:

curl -k -X POST https://localhost:8080/api/filter -d @file.txt -H \"Content-Type: text/plain\"\n

This command sends the contents of the file file.txt to Philter. Philter will apply the enabled filters and return a plain-text response consisting of the filtered text. (Replace localhost with the IP address or host name of Philter if you are not running the command where Philter is running.) You can also send text directly in the request instead of sending it as a file:

curl -k -X POST https://localhost:8080/api/filter --data \"Your text goes here...\" -H \"Content-type: text/plain\"\n
"},{"location":"quick_starts/quick_start_azure/#next-steps","title":"Next Steps","text":"

Now that you have Philter running and know how to send text to it, you are ready to integrate Philter into your existing workflow and systems. Philter\u2019s API details how to send files to Philter. Clients for some languages for Philter\u2019s API are available on GitHub.

Be sure to check out Policies to see how you can customize the types of sensitive information Philter redacts!

"},{"location":"quick_starts/quick_start_gcp/","title":"Philter Quick Start on Google Cloud","text":"

Philter on Google Cloud is a virtual machine-based product. A free trial period is available during which there is no charge for the Philter software but there may be charges for the underlying Google Cloud infrastructure.

Cloud virtual machines launched from a cloud marketplace may not be immediately suitable for a HIPAA environment. Refer to your compliance officer for your organization's requirements to ensure compliance with all relevant regulations.

"},{"location":"quick_starts/quick_start_gcp/#launch-philter-in-google-cloud","title":"Launch Philter in Google Cloud","text":"
  1. Go to Philter in the Google Cloud Marketplace.
  2. Click the Launch on Compute Engine button.

Virtual Machine Recommendations

The general purpose machine type is n2-standard-2 and this machine type should be adequate for most use-cases. We recommend 8 vCPUs and 8-16 GB of RAM for a production deployment.

Google Cloud will automatically open ports 22 (SSH) and 8080 (Philter API). These ports are required to be open but you may want to modify the security groups to limit their scope of availability by restricting access to specific CIDR ranges.

Congratulations! You have deployed Philter in Google Cloud. You are now ready to filter text!

"},{"location":"quick_starts/quick_start_gcp/#try-it-out","title":"Try it out!","text":"

With Philter now running we can take it for a spin. We will send some text to Philter and inspect at the response we get back. The Philter virtual machine running in your cloud account should have a public IP address (unless you customized the deployment). We will use that public IP address to interact with Philter.

Philter, by default, will be configured with an HTTPS listener on port 8080 using a self-signed certificate. It is recommended that prior to use in a production environment the self-signed certificate is replaced by a valid certificate owned by your organization.

In the command below, replace <PUBLIC_IP> with the virtual machine\u2019s public IP address or public host name.

curl -k -X POST https://<PUBLIC_IP>:8080/api/filter --data \"George Washington was a patient and his SSN is 123-45-6789.\" -H \"Content-type: text/plain\"\n

With this command we are sending the text in the command to Philter for filtering. Philter will identify the patient name (George Washington) and the SSN (123-45-6789) and redact those values in the response. You can always use curl to send text to Philter as in these examples but there are also SDKs you can use, too, to integrate Philter with your applications.

"},{"location":"quick_starts/quick_start_gcp/#redacting-sensitive-information-from-text","title":"Redacting Sensitive Information from Text","text":"

The types of sensitive information that Philter identifies and removes is controlled by policies. By default, Philter includes a filter profile that includes many of the types of sensitive information, such as names and social security numbers. We can send text to filter to Philter for filtering using this default filter profile with the following command:

curl -k -X POST https://localhost:8080/api/filter -d @file.txt -H \"Content-Type: text/plain\"\n

This command sends the contents of the file file.txt to Philter. Philter will apply the enabled filters and return a plain-text response consisting of the filtered text. (Replace localhost with the IP address or host name of Philter if you are not running the command where Philter is running.) You can also send text directly in the request instead of sending it as a file:

curl -k -X POST https://localhost:8080/api/filter --data \"Your text goes here...\" -H \"Content-type: text/plain\"\n
"},{"location":"quick_starts/quick_start_gcp/#next-steps","title":"Next Steps","text":"

Now that you have Philter running and know how to send text to it, you are ready to integrate Philter into your existing workflow and systems. Philter\u2019s API details how to send files to Philter. Clients for some languages for Philter\u2019s API are available on GitHub.

Be sure to check out Policies to see how you can customize the types of sensitive information Philter redacts!

"},{"location":"solutions/apache-nifi-and-philter/","title":"Apache NiFi and Philter","text":"

This article describes how Philter can be used with Apache NiFi to filter sensitive information such as PII and PHI within an Apache NiFi data flow.

Philter is available on the AWS, Azure, and Google Cloud marketplaces. So, fire up an instance of Philter and let's get started using it alongside your Apache NiFi data flow!

"},{"location":"solutions/apache-nifi-and-philter/#configuring-philter-with-cloudera-dataflow-cdf","title":"Configuring Philter with Cloudera DataFlow (CDF)","text":"

Philter is certified to work with Cloudera DataFlow (CDF) as a custom Apache NiFi processor. There are two options for deploying Philter with CDF.

"},{"location":"solutions/apache-nifi-and-philter/#option-1-using-philter-via-its-api","title":"Option 1 - Using Philter via its API","text":"

In the first option, a custom NiFi processor performs redaction by communicating with an instance of Philter through Philter's API. The processor sends text to Philter for redaction and receives back the redacted text. This option requires deploying an instance of Philter alongside your Cloudera DataFlow installation. Next, get the Philter NiFi processor from GitHub. Deploy the NAR file to CDF and make it accessible to Apache NiFi.

Configure the Philter processor by specifying the location of Philter and any other necessary connection configuration, as shown in the image below.

For a production environment, a cluster of Philter instances deployed behind a load balancer would provide improved performance and increased availability over a single instance.

"},{"location":"solutions/apache-nifi-and-philter/#option-2-using-philter-embedded-into-nifi","title":"Option 2 - Using Philter Embedded into NiFi","text":"

The second option does not require an instance of Philter. Please contact us to receive a NiFi processor with all of Philter's capabilities embedded in it. This processor performs the text redaction entirely within your NiFi data flow with no external communication required. This processor is significantly more performant than the processor in the first option. When you receive the processor NAR file from us, deploy it to NiFi.

Configure the processor as shown in the image below by specifying the name of the desired policy and filtering context:

"},{"location":"solutions/apache-nifi-and-philter/#creating-a-flow","title":"Creating a Flow","text":"

Both processors support the same transitions. The redacted transition contains the redacted version of the flow file's content. In the example flows shown below, the top flow uses the Philter processor utilizing Philter's API. The bottom flow uses the Philter embedded processor. As you can see, both flows are the same. The only differences are the middle processors and their individual configuration.

"},{"location":"solutions/consistent-anonymization-with-redis/","title":"Consistent Anonymization with Redis","text":"

The consistent anonymization feature in Philter ensures that filtered values are anonymized consistently across documents or contexts. When Philter is deployed in a cluster and is using consistent anonymization across contexts, a Redis cache is required. The cache stores the anonymized values so that all instances of Philter have access to the values.

The Redis cache will contain PHI. It is important to prepare your Redis cache such that it can contain PHI.

"},{"location":"solutions/consistent-anonymization-with-redis/#enabling-consistent-anonymization","title":"Enabling Consistent Anonymization","text":"

To enable consistent anonymization in Philter set the following property in Philter's configuration:

consistent.anonymization=true\nconsistent.anonymization.scope=context\n
"},{"location":"solutions/consistent-anonymization-with-redis/#configuring-redis-cache","title":"Configuring Redis Cache","text":"

To enable Philter to use the Redis cache, set the following options in Philter's configuration:

anonymization.cache.service=redis\nanonymization.cache.service.host=127.0.0.1\nanonymization.cache.service.port=6379\nanonymization.cache.service.ssl=true\n

Replace 127.0.0.1 with the IP address or host name of your Redis cache.

If you are using Redis on AWS ElastiCache see ElastiCache for Redis In-Transit Encryption (TLS) for information on using in-transit encryption.

"},{"location":"solutions/consistent-anonymization-with-redis/#restart-philter","title":"Restart Philter","text":"

After starting (or restarting) Philter, Philter will use the Redis cache for consistent anonymization across contexts. You can restart Philter with the command:

sudo systemctl restart philter.service\n
"},{"location":"solutions/deploying-philter-in-a-hipaa-environment/","title":"Deploying Philter in a HIPAA Environment","text":"

This is not intended to be a comprehensive or legal HIPAA guide so please refer to your HIPAA compliance or security officer prior to deploying and using Philter in a PHI environment.

The steps below outline how to configure a Philter deployment for encryption of data at rest and in motion.

"},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#encryption-of-data-at-rest","title":"Encryption of Data at Rest","text":""},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#amazon-web-services","title":"Amazon Web Services","text":"
  1. Stop the Philter EC2 instance.
  2. Make an AMI of the instance.
  3. Make an encrypted copy of the Philter AMI.

The created AMI is encrypted. EC2 instances launched from the AMI will utilize an encrypted EBS volume and all snapshots will be encrypted. Refer to the AWS documentation Creating an Amazon EBS-Backed Linux AMI for assistance.

"},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#encryption-of-data-in-motion","title":"Encryption of Data in Motion","text":""},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#amazon-web-services_1","title":"Amazon Web Services","text":"

If launched from the Amazon Web Services, Google Cloud, or Microsoft Azure marketplace Philter's REST API will be pre-configured with a self-signed certificate. It is recommended you replace the self-signed certificate with a certificate from a trusted certificate authority.

  1. Log in to the Philter EC2 instance via SSH. (On AWS the username is ec2-user. On Azure the username is centos.)
  2. Stop the Philter service: sudo systemctl stop philter.service
  3. Edit Philter's settings to utilize an SSL certificate.
  4. Start the Philter service: sudo systemctl start philter.service
  5. Connect to Philter's API and verify the connection succeeds: curl https://philter:8080/api/status and returns HTTP 200 OK.
"},{"location":"solutions/deploying-philter-in-a-hipaa-environment/#related-links","title":"Related Links","text":""},{"location":"solutions/deploying-philter-via-an-aws-cloudformation-template/","title":"Deploying Philter in AWS via a CloudFormation Template","text":"

AWS CloudFormation can be used to automate the creation and tear down of your AWS cloud resources in a repeatable manner. Philter can be included in your CloudFormation templates to also automate its deployment and configuration.

This article is designed to be a \"quick start\" into CloudFormation and Philter. This article describes a CloudFormation template suitable for deploying Philter for purposes of integration testing. A template for deploying Philter for production use requires a few more changes.

"},{"location":"solutions/deploying-philter-via-an-aws-cloudformation-template/#finding-philters-ami","title":"Finding Philter's AMI","text":"

To begin, you must have the AMI (e.g. ami-123456789) of Philter.

Alternatively, to find the AMI, launch Philter from the AWS Marketplace. If you have not already you will be prompted by the AWS Marketplace to subscribe to Philter. At the end of the subscription process you will be able to launch an instance into your AWS account. (You can select the smallest available instance size.) Do this and then navigate to your EC2 instances in the AWS Console.

In the EC2 Console locate the newly launched Philter instance. It will likely still be in a \"Pending\" state if not already completed launching. Click on the instance such that its details are displayed at the bottom of the EC2 Console. Locate the \"AMI\" property. This is the Philter AMI identifier. Make a note of this AMI or copy and paste it so you can reference it in your CloudFormation templates. You can now terminate the instance.

Note that when a new version of Philter is published to the AWS Marketplace it will have a different AMI identifier. If you want to use the newest version you will need to do the steps above again to find the new AMI identifier. See the Philter AWS AMIs for a sample script to automate finding the AMIs. If you have difficulties finding the Philter AMI identifier please contact us for assistance.

"},{"location":"solutions/deploying-philter-via-an-aws-cloudformation-template/#cloudformation-template","title":"CloudFormation Template","text":"

You can use the AMI ID to launch one or more instances of Philter via your CloudFormation template. You can launch a single instance, multiple instances, or you can launch one or more instances as part of an autoscaling group. You have flexibility depending on your requirements for deploying Philter. In the example below we are going to launch a single instance of Philter.

We are going to base our template off the AWS sample for a single EC2 instance in a VPC. The sample template can be found here. This template creates a new VPC along with the required subnet and route table.

Note that we only replaced the Philter AMI for your region in the template. The Philter AMI will be different for each AWS region.

"},{"location":"solutions/deploying-philter-via-an-aws-cloudformation-template/#launch-the-stack","title":"Launch the Stack","text":"

Now that we have the template we can create a stack from it. A stack is the set of resources that the template defines. You can think of a stack as being an instance of the template. We will use the AWS Console to create the stack. In the AWS Console navigate to the CloudFormation console. Locate the button to create a stack, walk through the steps uploading your template when prompted, and finish. Your new stack with Philter will now be launched. You can watch the stack's progress as CloudFormation creates its resources. When you are finished with the stack you can delete it and all resources that were created for the stack, such as the Philter instance, will be deleted.

If you try to launch a CloudFormation stack that uses a Philter AMI but you do not have an active subscription to Philter via the AWS Marketplace the stack creation will fail. To remedy this, go the AWS Marketplace and subscribe to Philter.

"},{"location":"solutions/managing-philters-configuration-in-an-auto-scaling-environment/","title":"Managing Philter\u2019s Configuration in an Auto-Scaling Environment","text":"

This article describes how Philter's configuration can be managed when Philter is deployed in an auto-scaling environment.

"},{"location":"solutions/managing-philters-configuration-in-an-auto-scaling-environment/#updating-philter-configuration-values","title":"Updating Philter Configuration Values","text":"

Philter reads its settings from the philter.properties file when Philter starts. This file must reside alongside Philter wherever Philter is deployed. When Philter is deployed in an auto-scaling environment, updating a configuration requires updating the configuration value on all instances of Philter. There are a few approaches that can be taken.

"},{"location":"solutions/managing-philters-configuration-in-an-auto-scaling-environment/#deployment-via-a-custom-machine-image","title":"Deployment via a Custom Machine Image","text":"

One way to update the configuration values is to use a custom machine (\"pre-baked\") image of Philter. When a configuration needs changed, change the configuration value in the machine image and update the auto-scaling environment with the latest machine image. Now, begin substituting the currently running Philter instances with new instances from the updated machine image.

"},{"location":"solutions/managing-philters-configuration-in-an-auto-scaling-environment/#updating-configuration-using-an-external-file","title":"Updating Configuration using an External File","text":"

In this method, a copy of Philter's application.properties file is stored on a remote file system, such as Amazon S3. A cron job runs on each deployed Philter instance to periodically download the application.properties file, copy it to the appropriate location, and then restart the Philter service. This method allows you to modify the configuration on Philter on all of the instances with less moving parts than the previous option.

The following is an example bash script that uses the AWS CLI to copy the philter.properties file and restart Philter.

#!/bin/bash\naws s3 cp s3://your-bucket/application.properties /opt/philter/application.properties\nsudo systemctl restart philter.service\nsudo systemctl restart philter-ner.service\n
"},{"location":"solutions/monitoring-philter-in-aws/","title":"Monitoring Philter in AWS","text":"

A deployment of Philter in AWS can be monitored by multiple methods. Here we'll discuss some of the options available when Philter is used in AWS.

"},{"location":"solutions/monitoring-philter-in-aws/#monitoring-philters-application-log-with-cloudwatch-logs","title":"Monitoring Philter's Application Log with CloudWatch Logs","text":"

Although no sensitive information is purposely logged to Philter's log files, it is possible for sensitive information to be inadvertently included through some events. For this reason, it is important to ensure that your location for storing Philter's logs are suitable for containing sensitive information such as PHI and PII.

Philter's application log is located at /var/log/philter/philter.log. When deploying multiple instances of Philter it is useful to have the log files centralized in a single location. We can do this using CloudWatch Logs.

The first thing to do is to ensure the Philter instance has an appropriate IAM role and policy. The policy must allow write access to CloudWatch Logs. The following policy is sufficient:

{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:CreateLogGroup\",\n        \"logs:CreateLogStream\",\n        \"logs:PutLogEvents\",\n        \"logs:DescribeLogStreams\"\n    ],\n      \"Resource\": [\n        \"arn:aws:logs:*:*:*\"\n    ]\n  }\n ]\n}\n

Next, install the CloudWatch Logs Agent on the instance. Configure the agent to send Philter's log to CloudWatch Logs. Modify the CloudWatch Logs configuration file to include Philter's log file:

[/var/log/philter/philter.log]\nfile = /var/log/philter/philter.log\nlog_group_name = /var/log/philter/philter.log\nlog_stream_name = {instance_id}\ndatetime_format = %b %d %H:%M:%S\n

After restarting the agent, Philter's log file will be available in the CloudWatch Logs console.

"},{"location":"solutions/monitoring-philter-in-aws/#monitoring-philters-availability-with-an-elastic-load-balancer","title":"Monitoring Philter's Availability with an Elastic Load Balancer","text":"

Philter's REST API includes an endpoint that returns the status of Philter. When operating normally, the /api/status endpoint returns HTTP 200 OK. This endpoint is ideal for monitoring by a service such as an Elastic Load Balancer's health checks. The full endpoint URL will be similar to https://instance:8080/api/status.

Note that, by default, Philter uses a self-signed SSL certificate for its HTTPS interface. In some situations it may be necessary to replace this self-signed certificate with a certificate signed by a trusted authority.

"},{"location":"solutions/monitoring-philter-in-aws/#monitoring-philters-metrics-with-cloudwatch-metrics","title":"Monitoring Philter's Metrics with CloudWatch Metrics","text":"

Philter captures various metrics during its operation. These metrics are exposed via several interfaces. The metrics are exposed via JMX and can also be reported to CloudWatch Metrics as custom metrics. To enable metric reporting to CloudWatch set the appropriate configuration settings in Philter's properties. (Refer to Philter's user documentation for a description of the configuration properties.) Now restart Philter for the changes to take affect. Philter will now publish metrics to CloudWatch Metrics.

These metrics can be used to trigger alerts based on certain thresholds or be used to trigger auto-scaling if Philter is deployed in an auto-scaling group.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/","title":"Using AWS Kinesis Firehose Transformations to Filter Sensitive Information from Streaming Text","text":"

AWS Kinesis Firehose is a managed streaming service designed to take large amounts of data from one place to another. For example, you can take data from places such as CloudWatch, AWS IoT, and custom applications using the AWS SDK to places such as Amazon S3, Amazon Redshift, Amazon Elasticsearch, and others. In this post we will use Amazon S3 as the firehose's destination.

Sometimes you want to manipulate the data as it goes through the firehose. This example solution shows how Philter can be used with AWS Kinesis Firehose and AWS Lambda to remove sensitive information, such as PII and PHI, from the text as it travels through the firehose.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#prerequisites","title":"Prerequisites","text":"

You must have a running instance of Philter. If you don't already have a running instance of Philter you can launch one through the AWS Marketplace. It is not required that the instance of Philter be running in AWS, but it is required that the instance of Philter be accessible from your AWS Lambda function. Running Philter and your AWS Lambda function in your own VPC allows you to communicate locally with Philter from the function. Otherwise, Philter will need to be available over the public internet or accessible over a VPN connection. See all Philter launch options.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#configuring-the-firehose-and-the-lambda-function","title":"Configuring the Firehose and the Lambda Function","text":"

There is no need to duplicate an excellent blog post on creating a Firehose Data Transformation with AWS Lambda to establish the Firehose and Lambda function resources in AWS. So, refer to that blog post and substitute the Python 3 code below.

To start, create an AWS Firehose and configure an AWS Lambda transformation. When creating the AWS Lambda function, select Python 3.7 and use the following code to submit text to Philter's API.

from botocore.vendored import requests\nimport base64\n\ndef handler(event, context):\n\n    output = []\n\n    for record in event['records']:\n        payload=base64.b64decode(record[\"data\"])\n        headers = {'Content-type': 'text/plain'}\n        r = requests.post(\"https://PHILTER_IP:8080/api/filter\", verify=False, data=payload, headers=headers, timeout=20)\n        filtered = r.text\n        output_record = {\n            'recordId': record['recordId'],\n            'result': 'Ok',\n            'data': base64.b64encode(filtered.encode('utf-8') + b'\\n').decode('utf-8')\n        }\n        output.append(output_record)\n\n    return output\n

The following Kinesis Firehose test event can be used to test the function:

{\n  \"invocationId\": \"invocationIdExample\",\n  \"deliveryStreamArn\": \"arn:aws:kinesis:EXAMPLE\",\n  \"region\": \"us-east-1\",\n  \"records\": [\n    {\n      \"recordId\": \"49546986683135544286507457936321625675700192471156785154\",\n      \"approximateArrivalTimestamp\": 1495072949453,\n      \"data\": \"R2VvcmdlIFdhc2hpbmd0b24gd2FzIHByZXNpZGVudCBhbmQgaGlzIHNzbiB3YXMgMTIzLTQ1LTY3ODkgYW5kIGhlIGxpdmVkIGF0IDkwMjEwLiBQYXRpZW50IGlkIDAwMDc2YSBhbmQgOTM4MjFhLiBIZSBpcyBvbiBiaW90aW4uIERpYWdub3NlZCB3aXRoIEEwMTAwLg==\"\n    },\n    {\n      \"recordId\": \"49546986683135544286507457936321625675700192471156785154\",\n      \"approximateArrivalTimestamp\": 1495072949453,\n      \"data\": \"R2VvcmdlIFdhc2hpbmd0b24gd2FzIHByZXNpZGVudCBhbmQgaGlzIHNzbiB3YXMgMTIzLTQ1LTY3ODkgYW5kIGhlIGxpdmVkIGF0IDkwMjEwLiBQYXRpZW50IGlkIDAwMDc2YSBhbmQgOTM4MjFhLiBIZSBpcyBvbiBiaW90aW4uIERpYWdub3NlZCB3aXRoIEEwMTAwLg==\"\n    }    \n  ]\n}\n

This test event contains 2 messages and the data for each is base 64 encoded, which is the value \"He lived in 90210 and his SSN was 123-45-6789.\" When the test is executed the response will be:

[\n  \"He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}.\",\n  \"He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}.\"\n]\n

When executing the test, the AWS Lambda function will extract the data from the requests in the firehose and submit each to Philter for filtering. The responses from each request will be returned from the function as a JSON list.

Note that in our Python function we are ignoring Philter's self-signed certificate. You should use a valid signed certificate for Philter and never disable certificate validation on clients.

When data is now published to the Kinesis Firehose stream, the data will be processed by the AWS Lambda function and Philter prior to exiting the firehose at its configured destination.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#processing-data","title":"Processing Data","text":"

We can use the AWS CLI to publish data to our Kinesis Firehose stream called sensitive-text:

aws firehose put-record --delivery-stream-name sensitive-text --record \"He lived in 90210 and his SSN was 123-45-6789.\"\n

Check the destination Amazon S3 bucket and you will have a single object with the following line:

He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}.\n

You're now ready to pump data through the firehose.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#conclusion","title":"Conclusion","text":"

In this blog post we have created an AWS Firehose pipeline that uses an AWS Lambda function to remove sensitive information from the text in the streaming pipeline.

"},{"location":"solutions/using-aws-kinesis-firehose-transformations-to-filter-sensitive-information-from-streaming-text/#resources","title":"Resources","text":""},{"location":"solutions/using-philter-with-microsoft-power-automate-flow/","title":"Using Philter with Microsoft Power Automate (Flow)","text":"

Microsoft Power Automate (formerly Microsoft Flow) is an online application to automate tasks using an intuitive online editor. Using the tool you can create automations that are triggered by events, such as the receiving of an email or a new file being stored in OneDrive. In this example solution we will create a trivial automation that uses Philter to filter sensitive information from text.

We will use an HTTP step to make the call to Philter. An upstream action is setting the content of Input that we are putting into the body of the message. The Input is plain text so we add an HTTP Content-Type header with the value of text/plain. In our example, the value of Input will be \"George Washington was president and his SSN was 123-45-6789.\" Be sure to replace the IP address in the URI with the IP address or hostname of your Philter instance.

We are now ready to run our flow. We can do so by clicking the Run button. You can now switch to the Runs view to see the run.

Clicking on our run we can see the results of the HTTP step.

In the screen capture above, we can see a summary of the HTTP step run. We see the body of the message that was sent to Philter. At the bottom we can see the filtered text that was returned by Philter.

Integrating Philter with Microsoft Power Automate is a fairly trivial exercise thanks to Philter's API. Although this example was trivial, it should show the potential possibilities for using Philter with Microsoft Power Automate.

"}]} \ No newline at end of file