Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eNATL recipe #75

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Add eNATL recipe #75

wants to merge 25 commits into from

Conversation

jbusecke
Copy link
Contributor

@jbusecke jbusecke commented Dec 11, 2023

Towards #73

DO NOT MERGE AS IS. HIGHLY EXPERIMENTAL

@jbusecke
Copy link
Contributor Author

pre-commit.ci autofix

@jbusecke
Copy link
Contributor Author

@cisaacstern this failed with the same error I encountered earlier. Could you take a look at this?

@jbusecke
Copy link
Contributor Author

Trying out pangeo-forge/deploy-recipe-action#27
There is probably a better way to manage this, but lets see.

@jbusecke
Copy link
Contributor Author

Ok I was able to deploy this using the super hacky changes made in pangeo-forge/deploy-recipe-action#27.

I will check later if the dataflow job successfully ran

But maybe more important, we need to wait here how the discussion over at pangeo-forge/deploy-recipe-action#27 goes. Sorry for the delay.

@jbusecke
Copy link
Contributor Author

pre-commit.ci autofix

@jbusecke
Copy link
Contributor Author

Yay, this worked!

image

@auraoupa do you have access to the leap hub? You can inspect the dataset with the following snippet:

import xarray as xr
path = 'gs://leap-persistent-ro/data-library/enatl60-blbt02-595733423-7175544257-1/eNATL60_BLBT02.zarr'
ds = xr.open_dataset(path, engine='zarr', chunks={})
ds

Two things I noticed:

  • We should probably rename time_counter to time
  • The land is not nan, but instead some super high filling value. I would always prefer to have that be nan, but I would leave that decision up to @auraoupa

@jbusecke
Copy link
Contributor Author

There is something even weirder going on around the halo land values:

ds = ds.where(abs(ds)<1e20)
ds['vosaline'].isel(time_counter=0)

gives me:

image

What is the best way to get rid of those low values on land in a reliable way?

@jbusecke
Copy link
Contributor Author

Finally we should think about the chunking of the final product. These are all things we can/should discuss before we have the deployment figured out.

@auraoupa
Copy link

Than you @jbusecke for advancing so quickly on this !
About your remarks :

  • there are 2 masking values inside the variable, one is for land when the computing processor was all land (1e+20) and one is for land when the computing processor had both ocean and land (0). We handle this by using the official mask from this file which is tmask for T variables, umask and vmask for U, V etc ... Maybe I should have processed the data in advance so it is already masked with NaN in the proper areas, can it be done via pangeo-forge recipe ? Or do we upload the mask and grid files alongside the data ?
  • about the time index name, time_counter or time, both work for me
  • about the chunking, I usually do something like : {'time_counter':1, 'x':1000,'y':1000} but you can adjust it if you feel like it has to be bigger or smaller

@jbusecke
Copy link
Contributor Author

Thanks for the quick response @auraoupa.

Lets deal with the most challenging issue first:

there are 2 masking values inside the variable, one is for land when the computing processor was all land (1e+20) and one is for land when the computing processor had both ocean and land (0). We handle this by using the official mask from this file which is tmask for T variables, umask and vmask for U, V etc ... Maybe I should have processed the data in advance so it is already masked with NaN in the proper areas, can it be done via pangeo-forge recipe ? Or do we upload the mask and grid files alongside the data ?

I think ideally each file would contain the masks as coordinates, then we could apply the masking on each file, and also retain the masks in the final output (this might be very important for budget analysis etc).

I have raised pangeo-forge/pangeo-forge-recipes#663 to discuss this more broadly. Just as a heads up, this will probably not move before next week earliest, since folks are at AGU.

about the time index name, time_counter or time, both work for me

I already renamed to 'time' hehe.

about the chunking, I usually do something like : {'time_counter':1, 'x':1000,'y':1000} but you can adjust it if you feel like it has to be bigger or smaller

That seems fairly small to me. I would aim for chunksizes in the 100-200MB range, but this is a detail we can discuss at the end.

@auraoupa
Copy link

Hi @jbusecke, I hope you had a nice end of 2023 and wish you the best for 2024 !
I reprocessed my dataset so that the land-mask values are uniform and corrected at the same time the nav_lat and nav_lon coordinates (there were missing data on land), maybe it could be faster processing these instead of trying to do it with pangeo-forge ? I just added the new name and zenodo record in the eNATL60 feedstock

@jbusecke
Copy link
Contributor Author

jbusecke commented Jan 24, 2024

Thanks for doing this @auraoupa. This might unblock us here. I will keep track of it over at pgf.

auraoupa and others added 2 commits January 24, 2024 13:10
* Update eNATL60.py

new name and zenodo records for homogeneously masked data

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@jbusecke
Copy link
Contributor Author

Seems like we are getting an error that time_counter is not found. I suspect that was renamed @auraoupa? Ill check on that quickly.

@jbusecke
Copy link
Contributor Author

Ok can confirm the dimension is now named 'time':
image

@jbusecke
Copy link
Contributor Author

@auraoupa should 't_mask' be dependent on time? That seems like an error to me. Its easily fixable though!

@jbusecke
Copy link
Contributor Author

pre-commit.ci autofix

@jbusecke
Copy link
Contributor Author

Well thats a new one (cc @cisaacstern ):

Error message from worker: Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 851, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 997, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/transforms/core.py", line 1961, in <lambda>
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/pangeo_forge_recipes/aggregation.py", line 285, in schema_to_zarr
    ds.to_zarr(target_store, mode="w", compute=False)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/core/dataset.py", line 2521, in to_zarr
    return to_zarr(  # type: ignore[call-overload,misc]
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/api.py", line 1832, in to_zarr
    dump_to_store(dataset, zstore, writer, encoding=encoding)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/api.py", line 1362, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/zarr.py", line 657, in store
    self.set_variables(
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/zarr.py", line 779, in set_variables
    writer.add(v.data, zarr_array, region)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/common.py", line 241, in add
    target[region] = source
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/core.py", line 1495, in __setitem__
    self.set_orthogonal_selection(pure_selection, value, fields=fields)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/core.py", line 1682, in set_orthogonal_selection
    indexer = OrthogonalIndexer(selection, self)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/indexing.py", line 620, in __init__
    dim_indexer = SliceDimIndexer(dim_sel, dim_len, dim_chunk_len)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/indexing.py", line 182, in __init__
    self.nchunks = ceildiv(self.dim_len, self.dim_chunk_len)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/indexing.py", line 167, in ceildiv
    return math.ceil(a / b)
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/sdk_worker.py", line 300, in _execute
    response = task()
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/sdk_worker.py", line 375, in <lambda>
    lambda: self.create_worker().do_instruction(request), request)
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/sdk_worker.py", line 639, in do_instruction
    return getattr(self, request_type)(
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/sdk_worker.py", line 677, in process_bundle
    bundle_processor.process_bundle(instruction_id))
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1113, in process_bundle
    input_op_by_transform_id[element.transform_id].process_encoded(
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 237, in process_encoded
    self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 570, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 572, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 263, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 266, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 953, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 954, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1437, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1526, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 636, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1621, in apache_beam.runners.common._OutputHandler.handle_process_outputs
  File "apache_beam/runners/common.py", line 1734, in apache_beam.runners.common._OutputHandler._write_value_to_tag
  File "apache_beam/runners/worker/operations.py", line 266, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 953, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 954, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1437, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1526, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 636, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1621, in apache_beam.runners.common._OutputHandler.handle_process_outputs
  File "apache_beam/runners/common.py", line 1734, in apache_beam.runners.common._OutputHandler._write_value_to_tag
  File "apache_beam/runners/worker/operations.py", line 266, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 953, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 954, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1437, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1526, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 851, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 995, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "apache_beam/runners/common.py", line 1621, in apache_beam.runners.common._OutputHandler.handle_process_outputs
  File "apache_beam/runners/common.py", line 1734, in apache_beam.runners.common._OutputHandler._write_value_to_tag
  File "apache_beam/runners/worker/operations.py", line 352, in apache_beam.runners.worker.operations.GeneralPurposeConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 951, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 953, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 954, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1437, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1547, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 851, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 997, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/transforms/core.py", line 1961, in <lambda>
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/pangeo_forge_recipes/aggregation.py", line 285, in schema_to_zarr
    ds.to_zarr(target_store, mode="w", compute=False)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/core/dataset.py", line 2521, in to_zarr
    return to_zarr(  # type: ignore[call-overload,misc]
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/api.py", line 1832, in to_zarr
    dump_to_store(dataset, zstore, writer, encoding=encoding)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/api.py", line 1362, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/zarr.py", line 657, in store
    self.set_variables(
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/zarr.py", line 779, in set_variables
    writer.add(v.data, zarr_array, region)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/xarray/backends/common.py", line 241, in add
    target[region] = source
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/core.py", line 1495, in __setitem__
    self.set_orthogonal_selection(pure_selection, value, fields=fields)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/core.py", line 1682, in set_orthogonal_selection
    indexer = OrthogonalIndexer(selection, self)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/indexing.py", line 620, in __init__
    dim_indexer = SliceDimIndexer(dim_sel, dim_len, dim_chunk_len)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/indexing.py", line 182, in __init__
    self.nchunks = ceildiv(self.dim_len, self.dim_chunk_len)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/zarr/indexing.py", line 167, in ceildiv
    return math.ceil(a / b)
ZeroDivisionError: division by zero [while running 'Create|OpenURLWithFSSpec|OpenWithXarray|Preprocess|StoreToZarr/StoreToZarr/PrepareZarrTarget/Map(schema_to_zarr)-ptransform-42']

Let me change the target_chunks to see if this goes away

@jbusecke
Copy link
Contributor Author

Ok now I am getting yet another error that I cannot quite grok:

Error message from worker: Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 851, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 995, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "apache_beam/runners/common.py", line 1611, in apache_beam.runners.common._OutputHandler.handle_process_outputs
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/pangeo_forge_recipes/rechunking.py", line 74, in split_fragment
    raise ValueError("A dimsize of 0 means that this fragment has not been properly indexed.")
ValueError: A dimsize of 0 means that this fragment has not been properly indexed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/sdk_worker.py", line 300, in _execute
    response = task()
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/sdk_worker.py", line 375, in <lambda>
    lambda: self.create_worker().do_instruction(request), request)
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/sdk_worker.py", line 639, in do_instruction
    return getattr(self, request_type)(
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/sdk_worker.py", line 677, in process_bundle
    bundle_processor.process_bundle(instruction_id))
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1113, in process_bundle
    input_op_by_transform_id[element.transform_id].process_encoded(
  File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 237, in process_encoded
    self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 570, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 572, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 263, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 266, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 953, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 954, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1437, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1526, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 851, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 995, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "apache_beam/runners/common.py", line 1621, in apache_beam.runners.common._OutputHandler.handle_process_outputs
  File "apache_beam/runners/common.py", line 1734, in apache_beam.runners.common._OutputHandler._write_value_to_tag
  File "apache_beam/runners/worker/operations.py", line 266, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 953, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 954, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1437, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1547, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 851, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 995, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "apache_beam/runners/common.py", line 1611, in apache_beam.runners.common._OutputHandler.handle_process_outputs
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/pangeo_forge_recipes/rechunking.py", line 74, in split_fragment
    raise ValueError("A dimsize of 0 means that this fragment has not been properly indexed.")
ValueError: A dimsize of 0 means that this fragment has not been properly indexed. [while running 'Create|OpenURLWithFSSpec|OpenWithXarray|Preprocess|StoreToZarr/StoreToZarr/Rechunk/FlatMap(split_fragment)-ptransform-35']

@cisaacstern could we dig into this in the coming days? Sorry this will still be blocked for now @auraoupa.

@auraoupa
Copy link

Seems like we are getting an error that time_counter is not found. I suspect that was renamed @auraoupa? Ill check on that quickly.

Yes forgot that, it is time now.

@auraoupa should 't_mask' be dependent on time? That seems like an error to me. Its easily fixable though!

No it is not dependent on time indeed, sorry I missed it

Thanks and good luck with the unusual errors ...

@SammyAgrawal
Copy link

Fixing:

  • try manually editing time coordinates one by one to see what the issue is; remove coords and try with JUST data variables and catting with time. Add back each piece until the whole thing works

@jbusecke
Copy link
Contributor Author

@SammyAgrawal can you move any further discussion to https://github.com/leap-stc/eNATL_feedstock and close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants