Add temporary soft delete bulk rake task #1543
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Multiple Zendesk tickets related to
whitehall
assets being live when they should not be, triggered an analysis of the problematic data - see PR. We have now investigated asset deletion states further, and we've produced a set of assets that should be deleted, according towhitehall
- see extraction rake task.This PR is just for the data patch. Future work will deal with fixing the underlying issues in the code.
The rake task
This PR defines a rake task that soft deletes each asset ID passed through in a
csv
file. SinceAttachments
andAttachmentDatas
are already dealt with on thewhitehall
side, we can just soft delete the corresponding assets inasset-manager
.We discovered a set of assets that have an invalid deleted
state
field, most likely from older code logic. These are giving a "State is invalid" validation error. We're resetting them to "uploaded" state in order to pass validation.NB: There are around 600 assets in this invalid state, but they were not flagged up for deletion on our side. Work to fix these is captured in this card here.
The data extraction
We now know that there are a few broken flows in
whitehall
that are continuously producing incorrect asset states, specifically unintentionally live assets. The two main ones we've identified are:We've extracted all the
AttachmentData
s where thedeleted?
method is true and relevant database fields to help guide our understanding of the data. The fact that thedeleted?
method is true is a consequence of a previous discard event and has no bearing on the bugs. If the reported asset has a missing link or a replacement that is in draft, it is likely to have been the result of one of the known bugs.We had a choice to "fix" the data in more meaningful ways, but since
whitehall
has a deletion set, i.e. thedeleted?
method is true, we chose to mass delete these assets to keep the data "in sync". We've ensured that the data is not serving any live editions and it is not a risk to delete it.Splitting the data extraction based on the
deleted?
method evaluation was a way to reason about deletion state inwhitehall
, since there is no obvious way in which that state is represented in the database. Whilst a soft delete is applied to theAttachment
model, it is theAttachmentData
model that actually manages theAssets
. It deduces the deletion state based on theAttachment
state and theEdition
visibility. This logic is captured in thisdeleted?
method, which is evaluated whenever we try to updateasset-manager
. It must be true for adelete
value to be sent, and false for the other updates to be sent through -replacement
,draft
,redirect_url
.NB: The current data extracted to be fixed is based on an older integration dump (end of October 2024), meaning there may be wrong Assets that got created between then and the time an upcoming code fix goes live. Nonetheless, this data patch should eliminate a great deal of risk from past data.
Next steps
The other half of the data, where
AttachmentData
'sdeleted?
evaluates to false will be dealt with next. We will attempt a more comprehensive data "fix" in that case, rather than deletion. We'll attempt to fix the logic that causes the ongoing issues in an upcoming card.Fixing this data removes a big part of the risk, as does fixing the code. Nonetheless, there are batches of assets we've looked at, whose states we cannot account for. We've started ideating around some remodelling work of the attachments/assets workflow, to make it simpler, easier to understand and maintain, and bug-free.
Links
Trello
Original Platform card
Assets investigation document
Co-authored-by @minhngocd