Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiskArrays for Variable's #205

Merged
merged 42 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
d75c6cb
replace indexing with read/writeblock!
tcarion Apr 14, 2023
378be0d
fix slicing issue when reading
tcarion Apr 17, 2023
cb35349
Merge branch 'Alexander-Barth:master' into diskarrays
tcarion Apr 17, 2023
75fcfa4
fix other reading indexing
tcarion Apr 17, 2023
c3dbfbc
pass more tests
tcarion Apr 17, 2023
012678b
small typo
tcarion Apr 18, 2023
88f3a5f
macro for diskarrays methods for Variable
tcarion Apr 24, 2023
1bac35e
avoid methods conflict with `view` on Variable
tcarion Apr 24, 2023
33816fe
adapt some tests
tcarion Apr 24, 2023
d11cc27
pass test_check_size
tcarion Apr 24, 2023
9e03a54
Merge branch 'master' into diskarrays
tcarion Apr 24, 2023
821b67f
fix read and write scalar variable
tcarion Apr 24, 2023
e0191ea
dispatch on _write_data_to_nc
tcarion Apr 24, 2023
a7551b7
convert data before writing
tcarion Apr 24, 2023
b6a4296
dispatch on _read_data_from_nc!
tcarion Apr 25, 2023
7ef519c
@rafaqz suggestion
tcarion Apr 25, 2023
4371216
write/readblock! for abstractvector inds (@rafaqz)
tcarion Apr 26, 2023
f13a09d
fix test_variable
tcarion Apr 26, 2023
2e8199b
broadcast for test unlim
tcarion Apr 26, 2023
5605f4c
new api for writing to unlim vars
tcarion Apr 26, 2023
fce58de
avoid method conflict for view(::Variable)
tcarion Apr 26, 2023
3cba846
another unlim dim update
tcarion Apr 26, 2023
52acdac
add chunking
tcarion Apr 26, 2023
6b0ab28
delegate chunking for cfvar
tcarion May 16, 2023
508baca
remove unused method
tcarion May 16, 2023
6dec5a0
basic implementation of `grow!`
tcarion May 16, 2023
bfb9f16
Merge branch 'master' into diskarrays
tcarion May 16, 2023
1f2f5e4
remove grow! from test for now
tcarion May 16, 2023
43a1e5b
fix issue with strided ranges
tcarion May 24, 2023
9a2735b
Merge branch 'master' into diskarrays
tcarion Jun 6, 2023
94c6310
remove comments
tcarion Jun 7, 2023
90e585b
remove getindex for cfvariable
tcarion Jun 8, 2023
7d32de5
Merge branch 'master' into diskarrays
Jun 12, 2023
3705d64
fix performance script
Jun 13, 2023
ec87d27
fix wrong fallback method for readblock!
tcarion Jun 14, 2023
6f0c381
Merge branch 'master' into diskarrays
tcarion Jun 27, 2023
fd9579d
Merge branch 'Alexander-Barth:master' into diskarrays
tcarion Jul 4, 2023
bb58df1
fix growing unlim issue when writing
tcarion Jul 5, 2023
d6fbe43
remove grow!
tcarion Jul 5, 2023
0873d6d
separate methods for readblock
tcarion Jul 5, 2023
7bd3961
Merge branch 'master' into diskarrays
tcarion Aug 4, 2023
fc4e099
Merge branch 'Alexander-Barth:master' into diskarrays
tcarion Sep 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ CFTime = "179af706-886a-5703-950a-314cd64e0468"
CommonDataModel = "1fbeeb36-5f17-413c-809b-666fb144f157"
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DiskArrays = "3c3547ce-8d99-4f5e-a174-61eb10b00ae3"
NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
NetworkOptions = "ca575930-c2e3-43a9-ace4-1e988b2c1908"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Expand All @@ -24,10 +25,10 @@ julia = "1.3"

[extras]
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
IntervalSets = "8197267c-284f-5f27-9208-e0e47529a953"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
IntervalSets = "8197267c-284f-5f27-9208-e0e47529a953"

[targets]
test = ["Dates", "Test", "Random", "Printf", "IntervalSets"]
5 changes: 5 additions & 0 deletions src/NCDatasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ import CommonDataModel: AbstractDataset, AbstractVariable,
groupnames, group,
dimnames, dim,
attribnames, attrib
import DiskArrays
import DiskArrays: readblock!, writeblock!, eachchunk, haschunks
using DiskArrays: @implement_diskarray

function __init__()
NetCDF_jll.is_available() && init_certificate_authority()
Expand Down Expand Up @@ -65,6 +68,8 @@ include("ncgen.jl")
include("select.jl")
include("precompile.jl")

@implement_diskarray NCDatasets.Variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we aiming to move more of these to DiskArrays.jl?

Just having Variable means this PR doesn't allow Rasters.jl integration of gribb or netcdf.

Copy link
Contributor Author

@tcarion tcarion Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm experimenting with this on this PR: JuliaGeo/CommonDataModel.jl/pull/9.

The issue I'm currently facing is, I think, related to the growth of unlimited dimensions. As you predicted, this causes problems when CFVariable implements DiskArray.

Unfortunately, I can't show you the error stack properly here since it needs the combination of this branch with JuliaGeo/CommonDataModel.jl/pull/9. Is there a way to tell github to run CI with a specific branch of a dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a Project.toml and Mainfest.toml to /test if you need too, but its a bit of fiddling and maybe no better than just pasting errors in issues.

Yes that is fairly predictable. You will need to rebuild the chunking after grow!.


export CatArrays
export CFTime
export daysinmonth, daysinyear, yearmonthday, yearmonth, monthday
Expand Down
2 changes: 1 addition & 1 deletion src/cfvariable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ function _range_indices_dest(of,v,rest...)
end
range_indices_dest(ri...) = _range_indices_dest((),ri...)

function Base.getindex(v::Union{CFVariable,Variable,MFVariable,SubVariable},indices::Union{Int,Colon,AbstractRange{<:Integer},Vector{Int}}...)
function Base.getindex(v::Union{MFVariable,SubVariable},indices::Union{Int,Colon,AbstractRange{<:Integer},Vector{Int}}...)
@debug "transform vector of indices to ranges"

sz_source = size(v)
Expand Down
2 changes: 1 addition & 1 deletion src/subvariable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ close(ds)
```

"""
Base.view(v::AbstractVariable,indices::Union{Int,Colon,AbstractVector{Int}}...) = SubVariable(v,indices...)
Base.view(v::Union{CFVariable, DeferVariable, MFCFVariable},indices::Union{Int,Colon,AbstractVector{Int}}...) = SubVariable(v,indices...)
Base.view(v::SubVariable,indices::CartesianIndex) = view(v,indices.I...)
Base.view(v::SubVariable,indices::CartesianIndices) = view(v,indices.indices...)

Expand Down
181 changes: 50 additions & 131 deletions src/variable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -309,85 +309,73 @@ end
nomissing(a::AbstractArray,value) = a
export nomissing


function Base.getindex(v::Variable,indexes::Int...)
function readblock!(v::Variable, aout, indexes::TI...) where TI <: Union{AbstractUnitRange,StepRange}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function readblock!(v::Variable, aout, indexes::TI...) where TI <: Union{AbstractUnitRange,StepRange}
function readblock!(v::Variable, aout, indexes::Union{<:AbstractUnitRange,<:StepRange}...)

This dispatch was not correct so we were using the fallback for AbstractVector.

I get roughly the same performance as master after this change.

datamode(v.ds)
return nc_get_var1(eltype(v),v.ds.ncid,v.varid,[i-1 for i in indexes[ndims(v):-1:1]])
_read_data_from_nc!(v, aout, indexes...)
return aout
end

function Base.setindex!(v::Variable{T,N},data,indexes::Int...) where N where T
@debug "$(@__LINE__)"
datamode(v.ds)
# use zero-based indexes and reversed order
nc_put_var1(v.ds.ncid,v.varid,[i-1 for i in indexes[ndims(v):-1:1]],T(data))
return data
function _read_data_from_nc!(v::Variable, aout, indexes::Int...)
aout .= nc_get_var1(eltype(v),v.ds.ncid,v.varid,[i-1 for i in reverse(indexes)])
end

function Base.getindex(v::Variable{T,N},indexes::Colon...) where {T,N}
datamode(v.ds)
data = Array{T,N}(undef,size(v))
nc_get_var!(v.ds.ncid,v.varid,data)

# special case for scalar NetCDF variable
if N == 0
return data[]
else
return data
end
function _read_data_from_nc!(v::Variable{T,N}, aout, indexes::TR...) where {T,N} where TR <: Union{StepRange{Int,Int},UnitRange{Int}}
start,count,stride,jlshape = ncsub(indexes)
nc_get_vars!(v.ds.ncid,v.varid,start,count,stride,aout)
end

function Base.setindex!(v::Variable{T,N},data::T,indexes::Colon...) where {T,N}
@debug "setindex! colon $data"
datamode(v.ds) # make sure that the file is in data mode
tmp = fill(data,size(v))
nc_put_var(v.ds.ncid,v.varid,tmp)
return data
function _read_data_from_nc!(v::Variable{T,N}, aout, indexes::Union{Int,Colon,AbstractRange{<:Integer}}...) where {T,N}
sz = size(v)
start,count,stride = ncsub2(sz,indexes...)
jlshape = _shape_after_slice(sz,indexes...)
nc_get_vars!(v.ds.ncid,v.varid,start,count,stride,aout)
end

# union types cannot be used to avoid ambiguity
for data_type = [Number, String, Char]
@eval begin
# call to v .= 123
function Base.setindex!(v::Variable{T,N},data::$data_type) where {T,N}
@debug "setindex! $data"
datamode(v.ds) # make sure that the file is in data mode
tmp = fill(convert(T,data),size(v))
nc_put_var(v.ds.ncid,v.varid,tmp)
return data
end
_read_data_from_nc!(v::Variable, aout) = _read_data_from_nc!(v, aout, 1)

Base.setindex!(v::Variable,data::$data_type,indexes::Colon...) = setindex!(v::Variable,data)
function writeblock!(v::Variable, data, indexes::TI...) where TI <: Union{AbstractUnitRange,StepRange}
datamode(v.ds)
_write_data_to_nc(v, data, indexes...)
return data
end

function Base.setindex!(v::Variable{T,N},data::$data_type,indexes::StepRange{Int,Int}...) where {T,N}
datamode(v.ds) # make sure that the file is in data mode
start,count,stride,jlshape = ncsub(indexes[1:ndims(v)])
tmp = fill(convert(T,data),jlshape)
nc_put_vars(v.ds.ncid,v.varid,start,count,stride,tmp)
return data
end
end
function _write_data_to_nc(v::Variable{T,N},data,indexes::Int...) where {T,N}
nc_put_var1(v.ds.ncid,v.varid,[i-1 for i in reverse(indexes)],T(data[1]))
end

function Base.setindex!(v::Variable{T,N},data::AbstractArray{T,N},indexes::Colon...) where {T,N}
datamode(v.ds) # make sure that the file is in data mode
_write_data_to_nc(v::Variable, data) = _write_data_to_nc(v, data, 1)

nc_put_var(v.ds.ncid,v.varid,data)
return data
function _write_data_to_nc(v::Variable{T, N}, data, indexes::StepRange{Int,Int}...) where {T, N}
start,count,stride,jlshape = ncsub(indexes)
nc_put_vars(v.ds.ncid,v.varid,start,count,stride,T.(data))
Copy link
Contributor

@rafaqz rafaqz Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably DiskArrays.jl wont let you do this optimisation either. You may need to override these methods on OrdinalRange https://github.com/meggart/DiskArrays.jl/blob/main/src/batchgetindex.jl#L166-L216 so that can happen here rather than in DiskArrays.jl

Just the readblock! and writeblock! methods, and rewite the internals from _readblock! and _writeblock!, they're just separate to maake the macros easier to work on.

But you can also leave this optimisation for later too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by defining readblock! and writeblock! for StepRanges should be sufficient to go into these optimisations. DiskArrays does provide some fallback routines in case only AbstractUnitRanges are supported by the backend but they should not be called in this case.

end

function Base.setindex!(v::Variable{T,N},data::AbstractArray{T2,N},indexes::Colon...) where {T,T2,N}
datamode(v.ds) # make sure that the file is in data mode
tmp =
if T <: Integer
round.(T,data)
else
convert(Array{T,N},data)
end
function _write_data_to_nc(v::Variable, data, indexes::Union{AbstractRange{<:Integer}}...)
ind = prod(length.(indexes)) == 1 ? first.(indexes) : normalizeindexes(size(v),indexes)
return _write_data_to_nc(v, data, ind...)
end

nc_put_var(v.ds.ncid,v.varid,tmp)
return data
function grow!(v::CFVariable, data, indexes::Union{Integer, Colon}...)
unlimdims = unlimited(Dimensions(CommonDataModel.dataset(v)))
length(unlimdims) == 1 || error("Only 1 unlimited dimension is supported")
alldims = dimnames(v)
iunlimdim = findfirst(==(unlimdims[1]), alldims)
icol = findfirst(x -> x isa Colon, indexes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are other colons? shouldn't you just check that the iunlimdim index is a Colon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I must admit I implemented this pretty rapidly, and I didn't really test it. But I guess we'll have to look more deeply into it given the issue above!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you saw with the bugs in DiskArrays edge cases its actually pretty hard to do it properly.

iunlimdim == icol || ArgumentError("The Colon in the arguments must match the unlimited dimension.")
inds = replace(indexes, Colon() => 1:length(data))
v[inds...] = data
return v
end

getchunksize(v::Variable) = getchunksize(haschunks(v),v)
getchunksize(::DiskArrays.Chunked, v::Variable) = chunking(v)[2]
# getchunksize(::DiskArrays.Unchunked, v::Variable) = DiskArrays.estimate_chunksize(v)
getchunksize(::DiskArrays.Unchunked, v::Variable) = size(v)
eachchunk(v::CFVariable) = eachchunk(v.var)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are defined on CFVariable, but it's not implementing the rest of the DiskArrays.jl interface

haschunks(v::CFVariable) = haschunks(v.var)
eachchunk(v::Variable) = DiskArrays.GridChunks(v, Tuple(getchunksize(v)))
haschunks(v::Variable) = (chunking(v)[1] == :contiguous ? DiskArrays.Unchunked() : DiskArrays.Chunked())

_normalizeindex(n,ind::Base.OneTo) = 1:1:ind.stop
_normalizeindex(n,ind::Colon) = 1:1:n
_normalizeindex(n,ind::Int) = ind:1:ind
Expand Down Expand Up @@ -441,74 +429,5 @@ end
return start,count,stride
end

function Base.getindex(v::Variable{T,N},indexes::TR...) where {T,N} where TR <: Union{StepRange{Int,Int},UnitRange{Int}}
start,count,stride,jlshape = ncsub(indexes[1:N])
data = Array{T,N}(undef,jlshape)

datamode(v.ds)
nc_get_vars!(v.ds.ncid,v.varid,start,count,stride,data)
return data
end

function Base.setindex!(v::Variable{T,N},data::T,indexes::StepRange{Int,Int}...) where {T,N}
datamode(v.ds) # make sure that the file is in data mode
start,count,stride,jlshape = ncsub(indexes[1:ndims(v)])
tmp = fill(data,jlshape)
nc_put_vars(v.ds.ncid,v.varid,start,count,stride,tmp)
return data
end

function Base.setindex!(v::Variable{T,N},data::Array{T,N},indexes::StepRange{Int,Int}...) where {T,N}
datamode(v.ds) # make sure that the file is in data mode
start,count,stride,jlshape = ncsub(indexes[1:ndims(v)])
nc_put_vars(v.ds.ncid,v.varid,start,count,stride,data)
return data
end

# data can be Array{T2,N} or BitArray{N}
function Base.setindex!(v::Variable{T,N},data::AbstractArray,indexes::StepRange{Int,Int}...) where {T,N}
datamode(v.ds) # make sure that the file is in data mode
start,count,stride,jlshape = ncsub(indexes[1:ndims(v)])

tmp = convert(Array{T,ndims(data)},data)
nc_put_vars(v.ds.ncid,v.varid,start,count,stride,tmp)

return data
end




function Base.getindex(v::Variable{T,N},indexes::Union{Int,Colon,AbstractRange{<:Integer}}...) where {T,N}
sz = size(v)
start,count,stride = ncsub2(sz,indexes...)
jlshape = _shape_after_slice(sz,indexes...)
data = Array{T}(undef,jlshape)

datamode(v.ds)
nc_get_vars!(v.ds.ncid,v.varid,start,count,stride,data)

return data
end

# NetCDF scalars indexed as []
Base.getindex(v::Variable{T, 0}) where T = v[1]



function Base.setindex!(v::Variable,data,indexes::Union{Int,Colon,AbstractRange{<:Integer}}...)
ind = normalizeindexes(size(v),indexes)

# make arrays out of scalars (arrays can have zero dimensions)
if (ndims(data) == 0) && !(data isa AbstractArray)
data = fill(data,length.(ind))
end

return v[ind...] = data
end


Base.getindex(v::Union{MFVariable,DeferVariable,Variable},ci::CartesianIndices) = v[ci.indices...]
Base.setindex!(v::Union{MFVariable,DeferVariable,Variable},data,ci::CartesianIndices) = setindex!(v,data,ci.indices...)


Base.getindex(v::Union{MFVariable,DeferVariable},ci::CartesianIndices) = v[ci.indices...]
Base.setindex!(v::Union{MFVariable,DeferVariable},data,ci::CartesianIndices) = setindex!(v,data,ci.indices...)
12 changes: 6 additions & 6 deletions test/test_check_size.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ defVar(ds, "w", Float64, ("x", "Time"))

for i in 1:10
ds["Time"][i] = i
ds["a"][:,i] = 1
@test_throws NCDatasets.NetCDFError ds["u"][:,i] = collect(1:9)
@test_throws NCDatasets.NetCDFError ds["v"][:,i] = collect(1:11)
@test_throws NCDatasets.NetCDFError ds["w"][:,i] = reshape(collect(1:20), 10, 2)
ds["a"][:,i] .= 1
@test_throws DimensionMismatch ds["u"][:,i] = collect(1:9)
@test_throws DimensionMismatch ds["v"][:,i] = collect(1:11)
@test_throws DimensionMismatch ds["w"][:,i] = reshape(collect(1:20), 10, 2)

# ignore singleton dimension
ds["w"][:,i] = reshape(collect(1:10), 1, 1, 10, 1)
Expand All @@ -29,11 +29,11 @@ end
ds["w"][:,:] = ones(10,10)

# w should grow along the unlimited dimension
ds["w"][:,:] = ones(10,15)
ds["w"][:,1:15] = ones(10,15)
@test size(ds["w"]) == (10,15)

# w cannot grow along a fixed dimension
@test_throws NCDatasets.NetCDFError ds["w"][:,:] = ones(11,15)
@test_throws DimensionMismatch ds["w"][:,:] = ones(11,15)

# NetCDF: Index exceeds dimension bound
@test_throws NCDatasets.NetCDFError ds["u"][100,100]
Expand Down
2 changes: 1 addition & 1 deletion test/test_corner_cases.jl
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ b = dropdims([1.], dims=(1,))
NCDataset(fname,"c") do ds
time = defDim(ds,"time",Inf)
v = defVar(ds,"temp",Float32,("time",))
ds["temp"][1:1] = b
ds["temp"][1] = b
@test ds["temp"][1] == 1
end

Expand Down
2 changes: 1 addition & 1 deletion test/test_scalar.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ for (T,data) in ((Float32,123.f0),
end

NCDataset(filename,"r") do ds
v2 = ds["scalar"][:]
v2 = ds["scalar"][1]
@test v2 == data
end
rm(filename)
Expand Down
8 changes: 4 additions & 4 deletions test/test_variable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ NCDatasets.NCDataset(filename,"c") do ds

v = NCDatasets.defVar(ds,"small",Float64,("lon","lat"))
# @test_throws Union{NCDatasets.NetCDFError,DimensionMismatch} v[:] = zeros(sz[1]+1,sz[2])
@test_throws NCDatasets.NetCDFError v[1:sz[1],1:sz[2]] = zeros(sz[1]+1,sz[2])
@test_throws DimensionMismatch v[1:sz[1],1:sz[2]] = zeros(sz[1]+1,sz[2])
@test_throws NCDatasets.NetCDFError v[sz[1]+1,1] = 1
@test_throws NCDatasets.NetCDFError v[-1,1] = 1

Expand Down Expand Up @@ -78,7 +78,7 @@ NCDataset(filename,"c") do ds
"units" => "degree_Celsius",
"long_name" => "Temperature"
])
@test ds["temp"][:] == data
@test ds["temp"][:] == data[:]
@test eltype(ds["temp"].var) == Int32
@test ds.dim["lon"] == sz[1]
@test ds.dim["lat"] == sz[2]
Expand Down Expand Up @@ -149,10 +149,10 @@ NCDataset(filename,"c") do ds
end

defVar(ds,"scalar",123.)
@test ds["scalar"][:] == 123.
@test ds["scalar"][1] == 123.

# test indexing with symbols #101
@test ds[:scalar][:] == 123.
@test ds[:scalar][1] == 123.
end
rm(filename)

Expand Down
2 changes: 1 addition & 1 deletion test/test_variable_unlim.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ NCDatasets.NCDataset(filename,"c") do ds

for j = 1:sz[2]
data[:,j] .= T(j)
v[:,j] = T(j)
v[:,j] = fill(T(j), sz[1])
end

@test all(v[:,:] == data)
Expand Down
Loading