-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capt 1527/keep irp national insurance and name for 2 years #2961
Capt 1527/keep irp national insurance and name for 2 years #2961
Conversation
02ff6f4
to
973bfb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the approach of defining the behaviours in policy specific classes is the right one. I would lean to moving these to the Policies
namespace as well so that all of the policy specific code is housed together rather than dotted around the codebase and having to know where to look. Also agree with implementing the NOOP classes before merging. Thanks.
268dca6
to
70cae3f
Compare
app/models/claim/scrubber.rb
Outdated
def scrub_amendment!(amendment) | ||
amendment_data_to_scrub = attributes_to_delete & amendment.claim_changes.keys.map(&:to_s) | ||
personal_data_mask = amendment_data_to_scrub.to_h { |attr| [attr, nil] } | ||
amendment.claim_changes.merge!(personal_data_mask) | ||
amendment.personal_data_removed_at = Time.zone.now | ||
amendment.save! | ||
end | ||
|
||
def scrub_claim! | ||
personal_data_mask = attributes_to_delete.to_h { |attr| [attr, nil] } | ||
attributes_to_set = personal_data_mask.merge( | ||
personal_data_removed_at: Time.zone.now | ||
) | ||
claim.update_columns(attributes_to_set) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we call save!
on the amendment but update_columns
on the claim.
so save!
will fire validation and callbacks but update_columns
will not
is there a reason for the difference? or should they both be calling the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous version of this used update_all
for persisting the changes to the claim and save!
for persisting the changes to the amendment so I wanted to preserve that behaviour. Not sure if this is actually required but I assumed it was previously done like this for a reason!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using save!
here will mean the callbacks on the claim fail, as there are a bunch of normalise_*
callbacks that assume those fields are populated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferable to use save
because the dfe_analytics gem does not hook into update_all
or update_columns
. This causes data inconsistencies in the BigQuery database (although PII is not sent there in the first place).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use save
due to the callbacks Alkesh highlighted. Does the existing data scrubbing implementation create data inconsistency issues in big query? We could try using Looks like DfE analytics uses callbacks to trigger it's eventssave
in a suppress callbacks block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we update the 3 affected callbacks (for ni_number
, first_name
and surname
so they don't get called if that field is nil?
e.g.
before_save :normalise_ni_number, if: [:national_insurance_number, :national_insurance_number_changed?]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we update the 3 affected callbacks (for
ni_number
,first_name
andsurname
so they don't get called if that field is nil? e.g.before_save :normalise_ni_number, if: [:national_insurance_number, :national_insurance_number_changed?]
That seems to do the trick 👍
app/models/claim/scrubber.rb
Outdated
def scrub_amendment!(amendment) | ||
amendment_data_to_scrub = attributes_to_delete & amendment.claim_changes.keys.map(&:to_s) | ||
personal_data_mask = amendment_data_to_scrub.to_h { |attr| [attr, nil] } | ||
amendment.claim_changes.merge!(personal_data_mask) | ||
amendment.personal_data_removed_at = Time.zone.now | ||
amendment.save! | ||
end | ||
|
||
def scrub_claim! | ||
personal_data_mask = attributes_to_delete.to_h { |attr| [attr, nil] } | ||
attributes_to_set = personal_data_mask.merge( | ||
personal_data_removed_at: Time.zone.now | ||
) | ||
claim.update_columns(attributes_to_set) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferable to use save
because the dfe_analytics gem does not hook into update_all
or update_columns
. This causes data inconsistencies in the BigQuery database (although PII is not sent there in the first place).
scrub_claims(old_rejected_claims) | ||
scrub_claims(old_paid_claims) | ||
old_rejected_claims | ||
.where(personal_data_removed_at: nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: a named scope for this condition would be a small improvement.
with_email_details | ||
with_mobile_details | ||
with_bank_details | ||
with_employment_details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this trait with_employment_details
seems to be missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a duplicate - it already exists in db/migrate/20240709110147_add_award_amount_to_international_relocation_payments_eligibilities.rb
752b3c8
to
7cb3a78
Compare
1c00120
to
fa47478
Compare
When validating that all claims for a given payment belong to the same claimant we can no longer use TRN as some journeys don't capture a trn. Eg if we have two claims one for ECP and one for IRP then the ECP claim will have a TRN but the IRP claim will not and so this check will fail. After discussion with the policy team we've decided to use the NINO as part of this check. There is some concern that applicants may mistype their NINO, if this proves an issue in practice we can just drop the NINO from the list of compared attributes and rely on the matching bank details. We've also removed the check on top ups that a candidate matching claim's policy is in the `OTHER_CLAIMABLE_POLICIES` list of the claim we're checking against. This check was in place to support some polices having trns and some not having trns and as we're now matching on NINO it is no longer needed.
We'll be wanting to have a sub class of the personal_data_scrubber for each policy. We're updating the tests to set the policy to make pulling out shared examples easier.
Pulls the examples in this class into shared examples. We'll soon be introducing subclasses for this class so we'll want some shared examples
Different policies will have different rules around what data we need to retain. This commit adds policy specific subclasses for handling removing personal data. Subclasses can implement policy specific behaviour by redefining the PERSONAL_DATA_ATTRIBUTES_TO_DELETE constant or overwriting the `scrub_completed_claims` method.
Moves the `Claims::PersonalDataScrubber` into the `Policies` namespace as this class is only subclassed by specific policies.
We'll want to use different date ranges on these querys so we're seperating the claim scoping from the date filtering.
Extracts the scrubbing logic from the claim_personal_data_scrubber into it's own class. Claim personal data scrubber was responsible for both finding the claims to scrub and scrubbing them. This commit moves the responsibility for scrubbing the data to a separate class as we're going to want to scrub different attributes depending on the policy we're scrubbing claims for. The claim data scrubber preformed and `update_all` when nullifying the attributes on the claim, we've preserved this behaviour of skipping callbacks / validations by using `update_column`. Previously we scrubbed all the claims inside a single transaction, this commit changes the behaviour slightly in that we open a transaction for each claim we scrub. This should slightly improve throughput (and makes sense based on how the code is called) but slightly changes the behaviour in that if there is an error when removing personal data from a claim previous claim's changes won't be rolled back. Likely not an issue but worth calling out.
Moves the filtering of claims by `personal_data_removed_at` up a level. This will allow subclasses to use the querys to find rejected and paid claims again if they need to make a second pass through the data.
Updates the IRP claim scrubber to retain the national insurance number and name details for 2 years before scrubbing them too. When this method is called by the job we first call super to remove the not retained PII fields from the claim then we deal with scrubbing any retained PII claims.
Once we're at the point of removing PII from the claim we can safely drop any attributes stored in the journey session. We may want to consider purging the session content when the claim form is submitted instead.
We want to make sure each policy is covered by this spec to ensure a claim personal data scrubber is created for any new policies we add.
Looks like this was left over from when we split up the head teacher question
PR feedback suggestion from Steve
This lets us call save when scrubbing attributes, rather than using `update_columns` which bypasses dfe analytics
fa47478
to
f96cb79
Compare
Background
International relocation payments have different rules around scrubbing PII from submitted claims
Changes
There's a few commits in this pr, here's an over view of the main changes:
Changes how we identify claims from the same claimant.
Previously we used TRN to identify claims from the same claimant, however new policies, such as IRP, don't capture a TRN. We've now switched to using national insurance number to identify multiple claims belonging to the same claimant. This is a bit unrelated to the main work in this pr but was required to fix a test and we'll run into the issue eventually. See this commit for where these changes are implemented. I think this commit deserves the most scrutiny as I'm not 100% sure of the knock on effects of using NINO to identify claimants
Adds policy specific data scrubbers
IRP has a different retention policy for some PII fields. We've replaced the
Claim::PersonalDataScrubber
with policy specificPolicies::<PolicyName>::ClaimPersonalDataScrubber
s to allow overwriting how PII is scrubbed on a per policy level. See this commitRefactor
ClaimPersonalDataScrubber
We've moved some methods around and separated out finding claims to be scrubbed from scrubbing the claim, extracting a
Claim::Scrubber
class. This makes it easier for policy specific claim scrubbers to override the behaviours in the base class. See these commits 1, 2, 3Retain NINO and claimant name for IRP claims
Scrub national insurance number and claimant name after 2 years. See this commit
Scrub PII from journey session
We may want to consider doing this when we submit the claim but this commit updates the claim scrubber to also remove all answers from the journey session