Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The `time_counter' variable is not being chunked split with the given chunk strategy #14

Open
soutobias opened this issue Sep 23, 2024 · 1 comment
Assignees

Comments

@soutobias
Copy link
Member

One observed oddity is that the time_counter dataset has a chunk size of 31 (presumably fixed by the length of January?). This is despite the explicit request of:

-cs '{"time_counter": 1, "x": 720, "y": 360}'

in each call. Maybe this is related to the unwarranted warning? Chunksizes have been respected elsewhere; just not for time_counter itself. E.g.:

ncdump -h -s https://noc-msm-o.s3-ext.jc.rl.ac.uk/npd12-j001-t1d-1976/T1d/#mode=nczarr,s3
netcdf \#mode\=nczarr\,s3 {
dimensions:
	y = 3605 ;
	x = 4320 ;
	nvertex = 4 ;
	time_counter = 366 ;
	axis_nbounds = 2 ;
.
.

  	double time_centered(time_counter) ;
  		time_centered:bounds = "time_centered_bounds" ;
  		time_centered:calendar = "gregorian" ;
  		time_centered:long_name = "Time axis" ;
  		time_centered:standard_name = "time" ;
  		time_centered:time_origin = "1900-01-01 00:00:00" ;
  		time_centered:units = "seconds since 1900-01-01 00:00:00" ;
  		time_centered:_Storage = "chunked" ;
  		time_centered:_ChunkSizes = 1 ;
  		time_centered:_Filter = "32001,0,0,0,0,5,1,1" ;
  		time_centered:_Codecs = "[{\"blocksize\": 0, \"clevel\": 5, \"cname\": \"lz4\", \"id\": \"blosc\", \"shuffle\": 1}]" ;
  		time_centered:_Endianness = "little" ;
  	double time_counter(time_counter) ;
  		time_counter:axis = "T" ;
  		time_counter:bounds = "time_counter_bounds" ;
  		time_counter:calendar = "gregorian" ;
  		time_counter:long_name = "Time axis" ;
  		time_counter:standard_name = "time" ;
  		time_counter:time_origin = "1900-01-01 00:00:00" ;
  		time_counter:units = "seconds since 1900-01-01" ;
  		time_counter:_Storage = "chunked" ;
    VVVVVVVVVVVVVVVVVVVVVVV
  		time_counter:_ChunkSizes = 31 ;
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  		time_counter:_Filter = "32001,0,0,0,0,5,1,1" ;
  		time_counter:_Codecs = "[{\"blocksize\": 0, \"clevel\": 5, \"cname\": \"lz4\", \"id\": \"blosc\", \"shuffle\": 1}]" ;
  		time_counter:_Endianness = "little" ;
  	float tossq_con(time_counter, y, x) ;
  		tossq_con:cell_methods = "time: mean (interval: 300 s)" ;
  		tossq_con:coordinates = "time_centered nav_lat nav_lon" ;
  		tossq_con:interval_operation = "300 s" ;
  		tossq_con:interval_write = "1 d" ;
  		tossq_con:long_name = "square_of_sea_surface_conservative_temperature" ;
  		tossq_con:missing_value = 1.00000002004088e+20 ;
  		tossq_con:online_operation = "average" ;
  		tossq_con:standard_name = "square_of_sea_surface_temperature" ;
  		tossq_con:units = "degC2" ;
  		tossq_con:_Storage = "chunked" ;
  		tossq_con:_ChunkSizes = 1, 360, 720 ;
  		tossq_con:_Filter = "32001,0,0,0,0,5,1,1" ;
  		tossq_con:_Codecs = "[{\"blocksize\": 0, \"clevel\": 5, \"cname\": \"lz4\", \"id\": \"blosc\", \"shuffle\": 1}]" ;

I’ve added debug output (it is not on the code in production) to investigate why the time_counter variable isn’t being chunked as expected:

if len(new_chunking.keys()) > 0:
    print(f"Rechunking {variable} to {new_chunking}")
    ds_filepath[variable] = ds_filepath[
        variable
    ].chunk(new_chunking)
    print(f"New chunking: {ds_filepath[variable].chunks}")

The output I'm seeing is:

Rechunking tossq_con to {'time_counter': 1, 'x': 720, 'y': 360}
New chunking: ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (360, 360, 360, 360, 360, 360, 360, 360, 360, 360, 5), (720, 720, 720, 720, 720, 720))
Rechunking nav_lat to {'x': 720, 'y': 360}
New chunking: ((360, 360, 360, 360, 360, 360, 360, 360, 360, 360, 5), (720, 720, 720, 720, 720, 720))
Rechunking nav_lon to {'x': 720, 'y': 360}
New chunking: ((360, 360, 360, 360, 360, 360, 360, 360, 360, 360, 5), (720, 720, 720, 720, 720, 720))
Rechunking time_centered to {'time_counter': 1}
New chunking: ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),)
Rechunking time_counter to {'time_counter': 1}
New chunking: None

It seems that time_counter isn't being chunked even though the code runs without errors.

@soutobias soutobias self-assigned this Sep 23, 2024
@soutobias
Copy link
Member Author

As pointed by @accowa , this could be a Xarray bug (pydata/xarray#6204).

Considering this, I think the only way to solve this problem now is to rename the coordinate (I don't recommend this), or use the zarr library (and not xarray) the first time I upload the data. In this case, right after uploading the data, I would open the time_counter variable and rechunk it to the chosen strategy. The other data that will be appended later will automatically follow the chunk strategy defined in the first upload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant