Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some operations fail with out-of-/tmp-space errors #887

Open
keflavich opened this issue Aug 23, 2023 · 12 comments
Open

Some operations fail with out-of-/tmp-space errors #887

keflavich opened this issue Aug 23, 2023 · 12 comments

Comments

@keflavich
Copy link
Contributor

@d-l-walker @ashleythomasbarnes please help fill in details!

The brief version is: running some code in this file:
https://github.com/ACES-CMZ/reduction_ACES/blob/main/aces/joint_deconvolution/reproject_mosaic_funcs.py
resulted in failures because the /tmp drive got filled up.

This is almost certainly a side effect of dask caching files to /tmp directories.

We need to add documentation about this problem, and/or, do a filesystem size check before dumping things to /tmp

@e-koch
Copy link
Contributor

e-koch commented Aug 23, 2023

Do we need to add a kwarg to set the temp directory to write to? Or write to the current directory like CASA?

@keflavich
Copy link
Contributor Author

I think this is a documentation need first. Writing to the current directory is not a better default option - it depends on the machine & architecture of the storage system. But we should try to prevent writing tmp files larger than the tmp drive - frankly, I think dask should be doing this, but we will lose users if we don't come up with a solution

@e-koch
Copy link
Contributor

e-koch commented Aug 23, 2023

@keflavich
Copy link
Contributor Author

@ashleythomasbarnes Could you fill in more details about what you're trying? I think we can come to a solution but we need tracebacks and/or details about what went wrong.

@ashleythomasbarnes
Copy link

I'm trying to create a mean spectrum of a large MUSE datacube (~60GB), but this was filling up /tmp on the computer I was using. The code I am using with spectral_cube.__version__ = '0.6.2' is given below. I can see if this is solved using the solutions from @e-koch.

infile = '../data/ngc0628c/muse/NGC0628-0.92asec.fits'
hdu = fits.open(infile)[1]
cube = SpectralCube.read(hdu)
cube.allow_huge_operations=True
spec_mean = cube.mean(axis=(1,2))

@keflavich
Copy link
Contributor Author

@ashleythomasbarnes Thanks, that's helpful. Could you confirm that the cube is being read as a DaskSpectralCube?

There are a few workarounds for this. Some are to do with dask, as noted above, but another approach is to force a non-dask spectral cube and do spec_mean = cube.mean(axis=(1,2), strategy='slice'), which will do a channel-by-channel mean and therefore only load a small fraction of the cube into memory at any given time.

@e-koch
Copy link
Contributor

e-koch commented Feb 1, 2024

Or set the temporary directory to a location that has sufficient storage:

TEMPDIR='mydir' python cube_script.py

@ashleythomasbarnes
Copy link

I don't think so @adamginsburg...
I'm not explicitly using the use_dask=True when loading, and this is cube I'm using.

SpectralCube with shape=(3761, 1426, 1412):
 n_x:   1412  type_x: RA---TAN  unit_x: deg    range:    24.133237 deg:   24.214712 deg
 n_y:   1426  type_y: DEC--TAN  unit_y: deg    range:    15.741643 deg:   15.820816 deg
 n_s:   3761  type_s: AWAV      unit_s: Angstrom  range:     4700.000 Angstrom:    9400.000 Angstrom

@keflavich
Copy link
Contributor Author

OK, then there's a different answer here. Try my suggestion, spec_mean = cube.mean(axis=(1,2), how='slice') (note: keyword is how, not strategy).

The other thing you can do is pass the memmap_dir keyword or specify TMPDIR globally to force it to write somewhere else.

@keflavich
Copy link
Contributor Author

@e-koch real issue here, though: How did Ash hit a case where tempfiles were being used? Tempfiles are only created by the parallel versions of the code, which .mean doesn't access, afaict.
https://github.com/radio-astro-tools/spectral-cube/blob/master/spectral_cube/spectral_cube.py#L2922-L2924

@e-koch
Copy link
Contributor

e-koch commented Feb 2, 2024

Is it the memory mapping in astropy.io.fits?

@keflavich
Copy link
Contributor Author

No, that's not relevant - fits's memory mapping just loads the file on disk, it won't create any new files in temp directories, at least afaik.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants