Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capt 1527/keep irp national insurance and name for 2 years #2961

Merged

Conversation

rjlynch
Copy link
Contributor

@rjlynch rjlynch commented Jul 2, 2024

Background

International relocation payments have different rules around scrubbing PII from submitted claims

We need to amend the purge rules for IRP only so that applicant name and NiNo are retained for 2 years.

Changes

There's a few commits in this pr, here's an over view of the main changes:

Changes how we identify claims from the same claimant.

Previously we used TRN to identify claims from the same claimant, however new policies, such as IRP, don't capture a TRN. We've now switched to using national insurance number to identify multiple claims belonging to the same claimant. This is a bit unrelated to the main work in this pr but was required to fix a test and we'll run into the issue eventually. See this commit for where these changes are implemented. I think this commit deserves the most scrutiny as I'm not 100% sure of the knock on effects of using NINO to identify claimants

Adds policy specific data scrubbers

IRP has a different retention policy for some PII fields. We've replaced the Claim::PersonalDataScrubber with policy specific Policies::<PolicyName>::ClaimPersonalDataScrubbers to allow overwriting how PII is scrubbed on a per policy level. See this commit

Refactor ClaimPersonalDataScrubber

We've moved some methods around and separated out finding claims to be scrubbed from scrubbing the claim, extracting a Claim::Scrubber class. This makes it easier for policy specific claim scrubbers to override the behaviours in the base class. See these commits 1, 2, 3

Retain NINO and claimant name for IRP claims

Scrub national insurance number and claimant name after 2 years. See this commit

Scrub PII from journey session

We may want to consider doing this when we submit the claim but this commit updates the claim scrubber to also remove all answers from the journey session

@rjlynch rjlynch force-pushed the CAPT-1527/keep-irp-national-insurance-and-name-for-2-years branch from 02ff6f4 to 973bfb6 Compare July 2, 2024 13:44
Copy link
Contributor

@slorek slorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach of defining the behaviours in policy specific classes is the right one. I would lean to moving these to the Policies namespace as well so that all of the policy specific code is housed together rather than dotted around the codebase and having to know where to look. Also agree with implementing the NOOP classes before merging. Thanks.

@rjlynch rjlynch force-pushed the CAPT-1527/keep-irp-national-insurance-and-name-for-2-years branch 8 times, most recently from 268dca6 to 70cae3f Compare July 8, 2024 09:24
@rjlynch rjlynch requested review from slorek and kenfodder July 8, 2024 09:40
@rjlynch rjlynch marked this pull request as ready for review July 8, 2024 09:40
Comment on lines 25 to 39
def scrub_amendment!(amendment)
amendment_data_to_scrub = attributes_to_delete & amendment.claim_changes.keys.map(&:to_s)
personal_data_mask = amendment_data_to_scrub.to_h { |attr| [attr, nil] }
amendment.claim_changes.merge!(personal_data_mask)
amendment.personal_data_removed_at = Time.zone.now
amendment.save!
end

def scrub_claim!
personal_data_mask = attributes_to_delete.to_h { |attr| [attr, nil] }
attributes_to_set = personal_data_mask.merge(
personal_data_removed_at: Time.zone.now
)
claim.update_columns(attributes_to_set)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we call save! on the amendment but update_columns on the claim.
so save! will fire validation and callbacks but update_columns will not
is there a reason for the difference? or should they both be calling the same thing?

Copy link
Contributor Author

@rjlynch rjlynch Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous version of this used update_all for persisting the changes to the claim and save! for persisting the changes to the amendment so I wanted to preserve that behaviour. Not sure if this is actually required but I assumed it was previously done like this for a reason!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using save! here will mean the callbacks on the claim fail, as there are a bunch of normalise_* callbacks that assume those fields are populated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to use save because the dfe_analytics gem does not hook into update_all or update_columns. This causes data inconsistencies in the BigQuery database (although PII is not sent there in the first place).

Copy link
Contributor Author

@rjlynch rjlynch Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use save due to the callbacks Alkesh highlighted. Does the existing data scrubbing implementation create data inconsistency issues in big query? We could try using save in a suppress callbacks block. Looks like DfE analytics uses callbacks to trigger it's events

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we update the 3 affected callbacks (for ni_number, first_name and surname so they don't get called if that field is nil?
e.g.
before_save :normalise_ni_number, if: [:national_insurance_number, :national_insurance_number_changed?]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we update the 3 affected callbacks (for ni_number, first_name and surname so they don't get called if that field is nil? e.g. before_save :normalise_ni_number, if: [:national_insurance_number, :national_insurance_number_changed?]

That seems to do the trick 👍

Comment on lines 25 to 39
def scrub_amendment!(amendment)
amendment_data_to_scrub = attributes_to_delete & amendment.claim_changes.keys.map(&:to_s)
personal_data_mask = amendment_data_to_scrub.to_h { |attr| [attr, nil] }
amendment.claim_changes.merge!(personal_data_mask)
amendment.personal_data_removed_at = Time.zone.now
amendment.save!
end

def scrub_claim!
personal_data_mask = attributes_to_delete.to_h { |attr| [attr, nil] }
attributes_to_set = personal_data_mask.merge(
personal_data_removed_at: Time.zone.now
)
claim.update_columns(attributes_to_set)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to use save because the dfe_analytics gem does not hook into update_all or update_columns. This causes data inconsistencies in the BigQuery database (although PII is not sent there in the first place).

scrub_claims(old_rejected_claims)
scrub_claims(old_paid_claims)
old_rejected_claims
.where(personal_data_removed_at: nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: a named scope for this condition would be a small improvement.

with_email_details
with_mobile_details
with_bank_details
with_employment_details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this trait with_employment_details seems to be missing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a duplicate - it already exists in db/migrate/20240709110147_add_award_amount_to_international_relocation_payments_eligibilities.rb

@rjlynch rjlynch force-pushed the CAPT-1527/keep-irp-national-insurance-and-name-for-2-years branch from 752b3c8 to 7cb3a78 Compare July 24, 2024 11:03
@rjlynch rjlynch force-pushed the CAPT-1527/keep-irp-national-insurance-and-name-for-2-years branch 2 times, most recently from 1c00120 to fa47478 Compare July 30, 2024 10:26
rjlynch added 7 commits August 1, 2024 11:21
When validating that all claims for a given payment belong to the same
claimant we can no longer use TRN as some journeys don't capture a trn.
Eg if we have two claims one for ECP and one for IRP then the ECP claim
will have a TRN but the IRP claim will not and so this check will fail.
After discussion with the policy team we've decided to use the NINO as
part of this check. There is some concern that applicants may mistype
their NINO, if this proves an issue in practice we can just drop the
NINO from the list of compared attributes and rely on the matching bank
details.

We've also removed the check on top ups that a candidate matching
claim's policy is in the `OTHER_CLAIMABLE_POLICIES` list of the claim
we're checking against. This check was in place to support some polices
having trns and some not having trns and as we're now matching on NINO
it is no longer needed.
We'll be wanting to have a sub class of the personal_data_scrubber for
each policy. We're updating the tests to set the policy to make pulling
out shared examples easier.
Pulls the examples in this class into shared examples. We'll soon be
introducing subclasses for this class so we'll want some shared examples
Different policies will have different rules around what data we need to
retain. This commit adds policy specific subclasses for handling
removing personal data. Subclasses can implement policy specific
behaviour by redefining the PERSONAL_DATA_ATTRIBUTES_TO_DELETE constant
or overwriting the `scrub_completed_claims` method.
Moves the `Claims::PersonalDataScrubber` into the `Policies` namespace
as this class is only subclassed by specific policies.
We'll want to use different date ranges on these querys so we're
seperating the claim scoping from the date filtering.
Extracts the scrubbing logic from the claim_personal_data_scrubber into
it's own class. Claim personal data scrubber was responsible for both
finding the claims to scrub and scrubbing them. This commit moves the
responsibility for scrubbing the data to a separate class as we're going
to want to scrub different attributes depending on the policy we're
scrubbing claims for.
The claim data scrubber preformed and `update_all` when nullifying the
attributes on the claim, we've preserved this behaviour of skipping
callbacks / validations by using `update_column`.
Previously we scrubbed all the claims inside a single transaction, this
commit changes the behaviour slightly in that we open a transaction for
each claim we scrub. This should slightly improve throughput (and makes
sense based on how the code is called) but slightly changes the
behaviour in that if there is an error when removing personal data from
a claim previous claim's changes won't be rolled back. Likely not an
issue but worth calling out.
rjlynch added 7 commits August 1, 2024 11:21
Moves the filtering of claims by `personal_data_removed_at` up a level.
This will allow subclasses to use the querys to find rejected and paid
claims again if they need to make a second pass through the data.
Updates the IRP claim scrubber to retain the national insurance number
and name details for 2 years before scrubbing them too.

When this method is called by the job we first call super to remove the
not retained PII fields from the claim then we deal with scrubbing any
retained PII claims.
Once we're at the point of removing PII from the claim we can safely
drop any attributes stored in the journey session. We may want to
consider purging the session content when the claim form is submitted
instead.
We want to make sure each policy is covered by this spec to ensure a
claim personal data scrubber is created for any new policies we add.
Looks like this was left over from when we split up the head teacher
question
PR feedback suggestion from Steve
This lets us call save when scrubbing attributes, rather than using
`update_columns` which bypasses dfe analytics
@rjlynch rjlynch force-pushed the CAPT-1527/keep-irp-national-insurance-and-name-for-2-years branch from fa47478 to f96cb79 Compare August 1, 2024 10:21
@rjlynch rjlynch merged commit 2ffc884 into master Aug 1, 2024
14 checks passed
@rjlynch rjlynch deleted the CAPT-1527/keep-irp-national-insurance-and-name-for-2-years branch August 1, 2024 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants