Faux-atomic write #8

clbarnes · 2022-06-15T12:18:35Z

Mentioned elsewhere but worth its own issue:

It would be really helpful for downstream processing purposes for the in-process writing to be done to some file which is named differently to the final output, and then at the end of the process, rename it. Currently, it's nontrivial to tell whether a file is still being written to or whether it's complete. For the purposes of per-slice post-processing (e.g. converting to a sensible format), it would be nice to regularly run a script which just looks for files of the right name and deals with them.

This should be a relatively small change: the current software should just write to f"{currentname}.part" and then do rename(f"{currentname}.part", currentname)at the end of the process.

The text was updated successfully, but these errors were encountered:

trautmane · 2022-07-20T17:47:17Z

To address this at Janelia, a companion .keep file is written after each dat file write completes.

For example:

/cygdrive/d/UploadFlags/0522-09_ZF-Card^E^^Images^Zebrafish^Y2022^M07^D12^Merlin-6257_22-07-12_153254_0-0-1.dat^keep

is written for

/cygdrive/e/Images/Zebrafish/Y2022/M07/D12/Merlin-6257_22-07-12_153254_0-0-1.dat

I'm not sure how/where this is done since I'm just a consumer of this data, but it might be available to you already.
I like the simplicity of your suggested .part naming scheme - the .keep file names are horrid because they are embedding so much information into the name.

However, a few advantages to the .keep file approach are:

You can see what is done and ready for transfer in one place (you don't need to scan the filesystem).
You can remove the .keep files post-transfer to easily track what remains to be transferred/processed. This could also be accomplished by removing the .dat from the scope, but we have not done that.
The .keep file name also includes a data set or project name (0522-09_ZF-Card in the example above) that is useful for organizing the data post transfer. This could be pulled from .dat header data instead.

I'm not a big fan of the .keep file setup, but I thought it was worth mentioning that it exists and how we currently use it.

clbarnes · 2022-07-20T17:54:02Z

Thanks! That is another way of doing it.

A halfway house would be to have the part files kept in a parallel directory hierarchy (under in_progres/ directory or something) and then moved into the complete/ hierarchy. So long as they're on the same file system, this should be just as fast, while keeping the first advantage you listed. There could be an equivalent processed/ hierarchy which satisfies the second advantage. I think the third property is probably best addressed another layer up, if possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faux-atomic write #8

Faux-atomic write #8

clbarnes commented Jun 15, 2022

trautmane commented Jul 20, 2022

clbarnes commented Jul 20, 2022

Faux-atomic write #8

Faux-atomic write #8

Comments

clbarnes commented Jun 15, 2022

trautmane commented Jul 20, 2022

clbarnes commented Jul 20, 2022