Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dev Question] What's the purpose/use of Grid and co types? #31

Closed
Datseris opened this issue Jan 11, 2020 · 4 comments
Closed

[Dev Question] What's the purpose/use of Grid and co types? #31

Datseris opened this issue Jan 11, 2020 · 4 comments

Comments

@Datseris
Copy link
Contributor

It is useful for the community to have a way to transform a loaded NetCDF variable into an array with Dimension data attached it, like what this package does. We will implement such an interface in NCDatasets.jl, cf Alexander-Barth/NCDatasets.jl#60 , but unfortunately there are numerous AxisArray-like packages. I've had a look in all of them, and at least from the surface, this one seems to be the strongest candidate.

But since it has been developed by a single person, and I would like to have at least some basic understanding before basing NCDatasets.jl on this one, there are some dev questions. The first one is regarding the Grid types/subtypes/supertypes, which at the moment is something I am not sure about.

  1. What is their purpose?
  2. Why is it that the dimension data type by itself is not enough? For example, a Range should always be a "standard grid" (AlignedGrid as its called here), while a Vector should always be a BoundedGrid?
@rafaqz
Copy link
Owner

rafaqz commented Jan 13, 2020

They will eventually move here:
https://github.com/JuliaGeo/DimensionalArrayTraits.jl

The design had a lot of input from multiple people in JuliaGeo, not just me.

The grid type is to deal with the many ways that dimension indices are stored. Ranges equal spaced. Vectors (youre not sure they are equal spaced but they actually can be). Matrices for all lat/lon combos. Vectors with coordinate transforms. Categorical indices. No index.

On some of these grids the Near Selector makes sense. But categories logically can't be nearest anything. They have no size. At mostly always makes sense but might need different techniques for performance.

Some datasets come with some dimension reversed. Or both. Or the data reversed and the dimension forwards. If we want searchsortedfirst to work and plots to come out the right way around we need to know and track which is which all the time. So you need the Ordered type.
(Edit: the reason we don't just fix the array order is this is often happening lazily. It may be expensive to actually reverse when we know it's always in reverse, say from some particular C api source like gdal)

Then you want a bounding box for each dimension. So bounds works no matter what the contents are after every subset/view that you do of the data. This is easy with a range, harder with a vector.

If you take the mean of the time dimension the bound should stay the same but the step size should now cover the whole span. It should do that for any reducing method. Then a plot of aggregated data will show the right time span.

etc etc etc.

Grids track all that stuff so that plots always plot correctly, we can always use bounds, and we can use the same methods on really quite different underlying data.

But just use GeoData.jl. It already works for NCDatasets.jl lol

@rafaqz
Copy link
Owner

rafaqz commented Jan 15, 2020

Also ranges really map to RegularGrid, not AllignedGrid. Alligned just means not rotated or warped, while Regular means regularly spaced as well.

Arrays can be RegularGrid too if you know they have the same step size. This may be redundant - but the issue is we are working with a bunch of lower level sources that don't always make sense.

You will also notice with LinRange that a single value range has a step of zero:

julia> step((LinRange(1,11,6))[1:2])
2.0

julia> step((LinRange(1,11,6))[1:1])
0.0

But we want to know the step size of a single-value range. We may want to concatenate it with something or plot with the step as a label. So we need to track it somewhere else.

AllignedGrid knows nothing about size of the steps, so the bounds are slightly inaccurate as the last cell isn't counted.

@Datseris
Copy link
Contributor Author

Thanks, this clarified a lot. I guess the only question that remains is that if you do x = X(10:10:500) then the grid that x gets is UnknownGrid(). But since the given object is a range, shouldn't it be automatically detected as RegularGrid?

@rafaqz
Copy link
Owner

rafaqz commented Jan 15, 2020

UnkownGrid is really a flag for formatdims which replaces it when you construct the array. RegularGrid is the default and will have the right values for most ranges (except LinRange length 1).

It might be better to have the grid selection happen when you make the dim, it's just evolved that way. Because the dims are also checked to make sure they match the array. I'll think about moving it. (Also I think it is because I allow passing a tuple instead of a range and it makes the range. But I might remove that)

We could also improve the grid selection logic and simplify the argument passing to set bounds etc, although most of the time a package will define the grid not users.

I also want to polish the no-grid use cases where you just use it like NamedDims as an axis marker, and none of this stuff even happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants