-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redoing hero calc because of boundary masking issue #1
Comments
Damn. How annoying, sorry.
That doesn't sound hard to fix though.
Pretty sure you don't need that for this calculation? It's just 3 sets of 1D bins.
@jbusecke or @dhruvbalwada would have to help you with that.
Not much else. The number of procs should only affect how fast it goes, not whether or not it completes. The important thing is that you give each processor enough memory. The calculation is parallel in time, so you would just get one timestep to work and then scale it up. I would use the latest version of dask, and if that behaves weirdly you could downgrade to the one I used...
I can't tell you that, I can only help give an idea of how easy / hard it would be to redo (+ help fix the bugs in xGCM). |
Sorry to hear about the trouble @cspencerjones, happy to support you with infrastructure from LEAP side!
Are you doing this on the LEAP-hub? If so then definitely not. We would want to store the output somewhere else for the long-term, but for now write away!
This is bad! We definitely need to fix that. Is there a simple demo that shows that setting fill_value on the decorator has no effect? On a large note, maybe we should pad with the values of the connected faces? Might be overkill here (and we need to fix the above issue regardless) but just wanted to throw out there that we can properly pad LLC grids! I am very busy at the moment but have been wanting to push on xgcm more lately. Is there a possibility for collaboration (which would justify time investment on my side)? Happy to chat more. |
We deliberately avoided doing this - I think the logic was that without connected faces the vort/strain/div functions were supposed to propagate NaNs in slightly from the boundaries, and we reasoned that that should not have much effect on the histograms... Sounds like we should check that assumption explicitly though. I would be hesitant to add that functionality to this calculation at this point, as I think it's unlikely that it (a) is simple (b) works nicely with dask without a rabbit hole of work. |
Fair |
Thanks @jbusecke and @TomNicholas . It seems like it will be a bit easier than I thought.
I just wrote an xgcm bug report with a minimal working example here: xgcm/xgcm#652.
Yes I am! I will have a go at writing a small amount first and ask you if I have any issues I'm still running into some problems getting the cell with the xhistogram code to work, but I haven't really debugged it at all: I'll post again here with a minimal example if I am still stuck after I've spent some time on it. |
Hi @jbusecke. I have the code working when I use small amounts of data. When I try to use the number of timeslices per day that I want to, I keep getting disk spillage and it eventually crashes. I think I need to give the workers more memory. When I try to give workers more memory, I get this error:
When I look at the Is this something that is controlled on your end? |
Hey @cspencerjones this is controlled by 2i2c. I think that 8GB/core is the highest memory ratio we can get from the workers.
|
8GB of memory usage per worker is a lot, I would like to understand why that's even necessary. The calculation should be embarrassingly parallel in |
Yeah I agree with Tom here in general, but I also understand that @cspencerjones might not have a lot of time to debug at hand. |
This issue is really for @TomNicholas . I decided to do this here rather than over on Slack because it probably requires a bit of thinking.
It seems like there are issues with the JPDFs. My current hypothesis is that this is caused by
fill_value
, which we would expect would be set hereBut this
fill_value
is ignored. The way to make sure it is used is to do:ζ = vort(grid, ds.U, ds.dxC, ds.V, ds.dyC, ds.rAz, axis= 5 * [("Y", "X")], fill_value=np.nan)
, which @TomNicholas didn't do when he did the most recent version of the calculation. I have not yet written an xgcm issue about this.The current lack of masking creates some horrible artifacts in the JPDFs. So it seems like I need to redo the hero calc, or abandon the JPDFs altogether. If I were to redo the calculation, I presumably would need to get a non-standard environment on LEAP, to include xgcm/xhistogram#59? And I would need to get permissions to write to a persistent bucket? What else would I need to know (e.g. how many procs should I use)?
Is this all just more trouble than it's worth? Should I give up?
The text was updated successfully, but these errors were encountered: