You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This ticket is to document a problem Cornell had in destroying a stream:
On 5/11/2022, Cornell created a new stream called "2022-May" and uploaded a full set of records (about 6 million) to it. Frances Webb at Cornell realized there was a problem with the records, so they created a second stream, uploaded a corrected full set of records to it, and asked us to destroy the first set. We agreed, given that the stream hadn't been processed yet, so no one could have harvested the data.
When we tried to destroy the stream, the job timed out.
On 5/12/2022, Frances tried again to destroy the stream. This time, the job appears to have been successful: the "2022-May" stream no longer appears on Cornell's provider page.
Questions:
What causes the job to time out on 5/11/2022? Was it the sheer quantity of records in the stream, or was it the fact that they were brand-new, and thus presumably queued for processing?
What caused the job to succeed on 5/12/2022? Is the fact that the successful attempt was made by one of the organization's owners, instead of an admin unaffiliated with the org, relevant?
What historical data should POD retain about a destroyed stream? Should we be retaining, and displaying in the UI, some information about streams that no longer exist?
Should POD provide more feedback to the requester when a "destroy" job fails to complete?
The text was updated successfully, but these errors were encountered:
Comment from Frances:
"I’m still suspecting that it was the ongoing analysis jobs. It might be useful if the destroy function could identify current and pending jobs associated with the stream and cancel them. As long as it doesn’t cancel jobs associated with other streams. I noticed that when I open the “Processing status” tab under a stream, the listed processes were neither filtered to that stream nor identified the streams they were associated with. (Very noticeable after I made two full uploads at nearly the same time.) So there might need to be more work needed to associate jobs with streams and not just organizations."
This ticket is to document a problem Cornell had in destroying a stream:
On 5/11/2022, Cornell created a new stream called "2022-May" and uploaded a full set of records (about 6 million) to it. Frances Webb at Cornell realized there was a problem with the records, so they created a second stream, uploaded a corrected full set of records to it, and asked us to destroy the first set. We agreed, given that the stream hadn't been processed yet, so no one could have harvested the data.
When we tried to destroy the stream, the job timed out.
On 5/12/2022, Frances tried again to destroy the stream. This time, the job appears to have been successful: the "2022-May" stream no longer appears on Cornell's provider page.
Questions:
The text was updated successfully, but these errors were encountered: