Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

level_3 cal collisions causing missing intermediate files #8729

Open
stscijgbot-jp opened this issue Aug 26, 2024 · 6 comments
Open

level_3 cal collisions causing missing intermediate files #8729

stscijgbot-jp opened this issue Aug 26, 2024 · 6 comments

Comments

@stscijgbot-jp
Copy link
Collaborator

stscijgbot-jp commented Aug 26, 2024

Issue JP-3717 was created on JIRA by Hien Tran:

ops has seen evidence that concurrent level 3 pipeline processes for associations with common input members can step on each other, causing missing intermediate files (i.e., *outlier_id2.fits), and crash. 

a recent example is jw01568-c1000_20240819t100727_image3_00001 and {}jw01568-c1004_20240819t100727_image3_00001{}. the c1000 asn consists of observations o001 and o002, while c1004 asn contains o001, o002, and o003. ALL of the same members in c1000 are also in c1004. therefore, when intermediate files for c1000 got produced and +cleaned up+ afterwards, the same intermediate files produced by the c1004 process got removed by, and along with those in the first (c1000) process, and became unavailable when they were needed by the 2nd process.  

the ALOG.out logs for the two processes are attached, along with an sdiff between the listings of the *outlier_id2.fits files generated in the alog for the failed c1004 and those available on disk. note that all the missing files are for o001 and o002 – exactly those that got wiped out by the c1000 process. 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Tyler Pauly on JIRA:

One solution to the issue could be to alter the intermediate filenames to include an association or product name string, such that an exposure residing in multiple level 3 associations would have unique intermediate filenames if multiple associations are being processed simultaneously.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

What version of jwst was used for these runs?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

Another possible solution, discussed elsewhere, might be to save the necessary intermediate data to temp files instead of to named files in the output directory.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Katie Kaleida on JIRA:

If we are removing group associations (which it looks like we are likely going to JSSET-236), does this problem go away?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Hien Tran on JIRA:

removing group association candidates would mitigate, but won't eliminate the problem completely, since any time there's a -c candidate, even if legitimate, containing multiple observations, there's always a chance of it colliding with the -o candidates. 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Hien Tran on JIRA:

we're currently seeing this issue affecting the reprocessing of program 01207 with b11.1.1:

jw01207-o002_20250102t121019_spec3_00001

jw01207-o004_20250102t121019_spec3_00002

jw01207-c1000_20250102t121019_spec3_00001 (consisting of o002, 003, 004)

jw01207-c1000_20250102t121019_spec3_00002 (consisting of o002, 003, 004)

are stepping on each other. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant