Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem destroying stream #802

Open
bobpersing opened this issue May 12, 2022 · 1 comment
Open

Problem destroying stream #802

bobpersing opened this issue May 12, 2022 · 1 comment

Comments

@bobpersing
Copy link

bobpersing commented May 12, 2022

This ticket is to document a problem Cornell had in destroying a stream:

On 5/11/2022, Cornell created a new stream called "2022-May" and uploaded a full set of records (about 6 million) to it. Frances Webb at Cornell realized there was a problem with the records, so they created a second stream, uploaded a corrected full set of records to it, and asked us to destroy the first set. We agreed, given that the stream hadn't been processed yet, so no one could have harvested the data.

When we tried to destroy the stream, the job timed out.

On 5/12/2022, Frances tried again to destroy the stream. This time, the job appears to have been successful: the "2022-May" stream no longer appears on Cornell's provider page.

Questions:

  • What causes the job to time out on 5/11/2022? Was it the sheer quantity of records in the stream, or was it the fact that they were brand-new, and thus presumably queued for processing?
  • What caused the job to succeed on 5/12/2022? Is the fact that the successful attempt was made by one of the organization's owners, instead of an admin unaffiliated with the org, relevant?
  • What historical data should POD retain about a destroyed stream? Should we be retaining, and displaying in the UI, some information about streams that no longer exist?
  • Should POD provide more feedback to the requester when a "destroy" job fails to complete?
@bobpersing
Copy link
Author

Comment from Frances:
"I’m still suspecting that it was the ongoing analysis jobs. It might be useful if the destroy function could identify current and pending jobs associated with the stream and cancel them. As long as it doesn’t cancel jobs associated with other streams. I noticed that when I open the “Processing status” tab under a stream, the listed processes were neither filtered to that stream nor identified the streams they were associated with. (Very noticeable after I made two full uploads at nearly the same time.) So there might need to be more work needed to associate jobs with streams and not just organizations."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant