Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write seems to consume a lot of memory #317

Closed
ayushpatnaikgit opened this issue Sep 25, 2022 · 13 comments · Fixed by #320
Closed

Write seems to consume a lot of memory #317

ayushpatnaikgit opened this issue Sep 25, 2022 · 13 comments · Fixed by #320

Comments

@ayushpatnaikgit
Copy link

Hi,
I have a relatively large data cube of size 7014×7277×1×123, and I want to write this as a netCDF file. I have a lot of space in memory, relative to the size of the data cube, but write is still giving an out of memory error.

julia> varinfo()
  name                    size summary
  –––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––––––
  Base                         Module
  Core                         Module
  InteractiveUtils 254.190 KiB Module
  Main                         Module
  ans                  0 bytes Nothing
  datacube           23.388 GiB 7014×7277×1×123 Raster{Float32,4}

julia> write("test.nc", datacube)
ERROR: OutOfMemoryError()

Relative to the data cube, I seem to have a lot more memory.

ayush@crayshrimp:~$ free -h
              total        used        free      shared  buff/cache   available                                                                  
Mem:           62Gi        15Gi        47Gi       0.0Ki       314Mi        46Gi                                                                  
Swap:          31Gi        25Gi       6.9Gi

It seems write is consuming a lot of memory. On the other hand, I was able to use Raster to load this same data without any issues.

@rafaqz
Copy link
Owner

rafaqz commented Sep 25, 2022

Ok interesting. I wonder if it's in Rasters.jl or NCDatasets.jl.

With GDAL files like tiff you can stream write larger than memory files with DiskArrays.jl, but this isn't implemented in NCDatasets.jl.

This comment is describing the solution:

var = NCD.defVar(ds, key, eltyp, dimnames; attrib=attrib, kw...)
# TODO do this with DiskArrays broadcast ??
var[:] = parent(read(A))

@ayushpatnaikgit
Copy link
Author

In this case, the variable is a lot smaller than the memory, so there shouldn't be a need to stream write.
I have saved it as a .jld file using JLD.jl for now.

For reference, here is the full error message:

ERROR: OutOfMemoryError()                                                                                                                         
Stacktrace:                                                                                                                                       
  [1] Array                                                                                                                                       
    @ ./boot.jl:469 [inlined]                                                                                                                     
  [2] Array                                                                                                                                       
    @ ./array.jl:563 [inlined]                                                                                                                    
  [3] Array                                                                                                                                       
    @ ./boot.jl:481 [inlined]                                                                                                                     
  [4] modify(f::Type{Array}, A::Raster{Float32, 4, Tuple{X{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered
, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.
LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, X{Colon}}}, Y{Mapped{Float
64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, Di
mensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{
Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, Y{Colon}}}, Band{DimensionalData.Dimensions.LookupArrays.Categorical{Int64, Vector{Int64}, Dimens
ionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}}}, Ti{Dimen
sionalData.Dimensions.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimension
s.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, D
ict{Symbol, Any}}}}}, Tuple{}, Array{Float32, 4}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, Fl
oat32})                                                                                                                                           
    @ Rasters ~/.julia/packages/Rasters/fANS1/src/array.jl:117                                                                                    
  [5] read(x::Raster{Float32, 4, Tuple{X{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.
Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Cente
r}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, X{Colon}}}, Y{Mapped{Float64, Vector{Float64
}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dime
nsions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, D
ict{Symbol, Any}}, EPSG, EPSG, Y{Colon}}}, Band{DimensionalData.Dimensions.LookupArrays.Categorical{Int64, Vector{Int64}, DimensionalData.Dimensio
ns.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}}}, Ti{DimensionalData.Dimensi
ons.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Reg
ular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}}
}}, Tuple{}, Array{Float32, 4}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, Float32})
    @ Rasters ~/.julia/packages/Rasters/fANS1/src/read.jl:9
@ Rasters ~/.julia/packages/Rasters/fANS1/src/read.jl:9                                                                                       
  [6] _ncdwritevar!(ds::NCDatasets.NCDataset{Nothing}, A::Raster{Float32, 4, Tuple{X{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.L
ookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{D
imensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, 
X{Colon}}}, Y{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Exp
licit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensi
ons.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, Y{Colon}}}, Band{DimensionalData.Dimensions.LookupArrays.Categorical{In
t64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict
{Symbol, Any}}}}, Ti{DimensionalData.Dimensions.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered,
 DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.M
etadata{Rasters.NCDfile, Dict{Symbol, Any}}}}}, Tuple{}, Array{Float32, 4}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfi
le, Dict{Symbol, Any}}, Missing}; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Rasters ~/.julia/packages/Rasters/fANS1/src/sources/ncdatasets.jl:409
  [7] _ncdwritevar!
    @ ~/.julia/packages/Rasters/fANS1/src/sources/ncdatasets.jl:382 [inlined]
  [8] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Float32, 4, Tuple{X{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.L
ookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{D
imensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, 
X{Colon}}}, Y{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Exp
licit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensi
ons.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, Y{Colon}}}, Band{DimensionalData.Dimensions.LookupArrays.Categorical{In
t64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict
{Symbol, Any}}}}, Ti{DimensionalData.Dimensions.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered,
 DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.M
etadata{Rasters.NCDfile, Dict{Symbol, Any}}}}}, Tuple{}, Array{Float32, 4}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfi
le, Dict{Symbol, Any}}, Missing}; append::Bool, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Rasters ~/.julia/packages/Rasters/fANS1/src/sources/ncdatasets.jl:67
[9] write
    @ ~/.julia/packages/Rasters/fANS1/src/sources/ncdatasets.jl:64 [inlined]                                                                     
 [10] write(filename::String, A::Raster{Float32, 4, Tuple{X{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, X{Colon}}}, Y{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, Y{Colon}}}, Band{DimensionalData.Dimensions.LookupArrays.Categorical{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}}}, Ti{DimensionalData.Dimensions.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}}}}, Tuple{}, Array{Float32, 4}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, Missing}; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})                                                                      
    @ Rasters ~/.julia/packages/Rasters/fANS1/src/write.jl:11
 [11] write(filename::String, A::Raster{Float32, 4, Tuple{X{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, X{Colon}}}, Y{Mapped{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Explicit{Matrix{Float64}}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Center}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, EPSG, EPSG, Y{Colon}}}, Band{DimensionalData.Dimensions.LookupArrays.Categorical{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}}}, Ti{DimensionalData.Dimensions.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}}}}, Tuple{}, Array{Float32, 4}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{Symbol, Any}}, Missing})
    @ Rasters ~/.julia/packages/Rasters/fANS1/src/write.jl:11
 [12] top-level scope
    @ REPL[14]:1

@rafaqz
Copy link
Owner

rafaqz commented Sep 26, 2022

The problem is NCDatsets.jl is making a full copy of the array before writing it. There may be away around this.

@ayushpatnaikgit
Copy link
Author

ayushpatnaikgit commented Sep 27, 2022

I took the example from NCDatasets.jl/README.md and modified it to increase the size of the matrix. The write functionality seems to work.

using NCDatasets
using DataStructures
# This creates a new NetCDF file test.nc.
# The mode "c" stands for creating a new file (clobber)
ds = Dataset("test.nc","c")

defDim(ds,"lon",7014)
defDim(ds,"lat",7277)
defDim(ds, "time", 123)

# Define a global attribute
ds.attrib["title"] = "this is a test file"

# Define the variables temperature with the attribute units
v = defVar(ds,"temperature",Float32,("lon","lat","time"), attrib = OrderedDict(
    "units" => "degree Celsius"))

# add additional attributes
v.attrib["comments"] = "this is a string attribute with Unicode Ω ∈ ∑ ∫ f(x) dx"

# Generate some example data

data = [Float32(i+j+k) for i = 1:7014, j = 1:7277, k = 1:123]

varinfo()
  name                    size summary
  –––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––––––
  Base                         Module
  Core                         Module
  InteractiveUtils 254.190 KiB Module
  Main                         Module
  ans                 23.387 GiB 7014×7277×123 Matrix{Float32}
  data                23.387 GiB 7014×7277×123 Matrix{Float32}
   v                  601 bytes 7014×7277×123 NCDatasets.CFVariable{Float32, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Nothing, Nothing, Nothing, Nothing, Nothing}}}
v[:,:,:] = data # I believe this is by reference and not a copy. 
close(ds)
exit()

However:

using Rasters
raster = Raster("test.nc") # Everything is fine till this point. The data generated earlier is loaded correctly. 
write("test2.nc", raster)                
ERROR: OutOfMemoryError()       

@rafaqz
Copy link
Owner

rafaqz commented Sep 27, 2022

Yeah the problem was the generic setindex! syntax I was using, I wasn't aware some forms of setindex! on NCDatasets.Variable allocate a new array. This should be fixed now with #320

(Try your example with v[:] = and you will get the same error)

@ayushpatnaikgit
Copy link
Author

ayushpatnaikgit commented Sep 28, 2022

For the example from NCDatasets.jl/README.md, Rasters.write in fix_ncd_write is working, but v[:] = data is also working.

I am testing for the original case for which the error had initially occurred, give me a couple of days.

@ayushpatnaikgit
Copy link
Author

There is a different issue writing the original datacube this time. I think missing is causing some trouble.

I have tried to replicate it, for example from NCDatasets.jl/README.md.

julia> using Rasters
julia> raster = Raster("test.nc") # test.nc is described above at: https://github.com/rafaqz/Rasters.jl/issues/317#issuecomment-1258990373
julia> write("test2.nc", raster) # this works now (using fix_ncd_write branch)

julia> raster = rebuild(raster; missingval = nothing)
julia> raster = replace_missing(raster, missing)
julia> write("test2.nc", raster) # This doesn't work. 

ERROR: OutOfMemoryError()                                                                        
Stacktrace:                                                                                      
  [1] Array                                                                                      
    @ ./boot.jl:461 [inlined]
  [2] Array
    @ ./boot.jl:468 [inlined]
  [3] CFinvtransformdata
    @ ~/.julia/packages/NCDatasets/EkOvO/src/cfvariable.jl:702 [inlined]
  [4] setindex!(::NCDatasets.CFVariable{Union{Missing, Float32}, 3, NCDatasets.Variable{Float32, 
3, NCDatasets.NCDataset{Nothing}}, NCDatasets.Attributes{NCDatasets.NCDataset{Nothing}}, NamedTup
le{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_facto
r), Tuple{Float32, Tuple{}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, ::Array{Float32, 3},
::Colon, ::Colon, ::Colon)
@ NCDatasets ~/.julia/packages/NCDatasets/EkOvO/src/cfvariable.jl:764               [62/1072]
  [5] _ncdwritevar!(ds::NCDatasets.NCDataset{Nothing}, A::Raster{Union{Missing, Float32}, 3, Tupl
e{Dim{:lon, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, Dimen
sionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Sym
bol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missin
g}; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:410
  [6] _ncdwritevar!
    @ ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:380 [inlined]
  [7] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Union{Missing, Float32}, 3, Tupl
e{Dim{:lon, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, Dimen
sionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Sym
bol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missin
g}; append::Bool, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:67
  [8] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Union{Missing, Float32}, 3, Tupl
e{Dim{:lon, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, Dimen
sionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Sym
bol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missin
g})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:64                            
  [9] write(filename::String, A::Raster{Union{Missing, Float32}, 3, Tuple{Dim{:lon, DimensionalDa
ta.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.Loo
kupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLook
up{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Symbol, DimensionalData.Dime
nsions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing}; kw::Base.Pairs{Symbol
, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/write.jl:11
 [10] write(filename::String, A::Raster{Union{Missing, Float32}, 3, Tuple{Dim{:lon, DimensionalDa
ta.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.Loo
kupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLook
up{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Symbol, DimensionalData.Dime
nsions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/write.jl:11
 [11] top-level scope
    @ REPL[52]:1

julia> varinfo()
  name                    size summary                                        
  –––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––––––––––––––––––––
  Base                         Module                                         
  Core                         Module                                         
  InteractiveUtils 254.190 KiB Module                                         
  Main                         Module                                         
  ans                  0 bytes Nothing
  raster            29.234 GiB 7014×7277×123 Raster{Union{Missing, Float32},3}

I seem to have sufficient memory.

ayush@crayshrimp:~$ free -h
              total        used        free      shared  buff/cache   available                 
Mem:           62Gi        15Gi        47Gi       0.0Ki       281Mi        46Gi                 
Swap:          31Gi        20Gi        11Gi
ayush@crayshrimp:~$

Even the following doesn't work

julia> raster = Float32.(raster)
julia> write("test2.nc", raster) # This doesn't work. 
ERROR: OutOfMemoryError()                                                                        
Stacktrace:                                                                                      
  [1] Array                                                                                      
    @ ./boot.jl:461 [inlined]                                                                    
  [2] Array                                                                                      
    @ ./boot.jl:468 [inlined]                                                                    
  [3] Array                                                                                      
    @ ./array.jl:563 [inlined]                                                                   
  [4] Array                                                                                      
    @ ./boot.jl:481 [inlined]                                                                    
  [5] modify(f::Type{Array}, A::Raster{Float32, 3, Tuple{Dim{:lon, DimensionalData.Dimensions.Loo
kupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.NoLooku
p{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int
64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rast
ers.NCDfile, Dict{String, Any}}, Float32})                                                       
    @ Rasters ~/JuliaPackages/Rasters.jl/src/array.jl:117                                        
  [6] read(x::Raster{Float32, 3, Tuple{Dim{:lon, DimensionalData.Dimensions.LookupArrays.NoLookup
{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64
}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, A
rray{Float32, 3}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{
String, Any}}, Float32})   
 [7] _ncdwritevar!(ds::NCDatasets.NCDataset{Nothing}, A::Raster{Float32, 3, Tuple{Dim{:lon, Dime
nsionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArra
ys.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.
LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing}; kw::Base.Pairs{Symbol, Union
{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:410
  [8] _ncdwritevar!
    @ ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:380 [inlined]
  [9] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Float32, 3, Tuple{Dim{:lon, Dime
nsionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArra
ys.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.
LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing}; append::Bool, kw::Base.Pairs
{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
 [10] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Float32, 3, Tuple{Dim{:[15/1172]
nsionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArra
ys.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.
LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:64
 [11] write(filename::String, A::Raster{Float32, 3, Tuple{Dim{:lon, DimensionalData.Dimensions.Lo
okupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.NoLook
up{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{In
t64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Ras
ters.NCDfile, Dict{String, Any}}, Missing}; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(
), Tuple{}}})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/write.jl:11
 [12] write(filename::String, A::Raster{Float32, 3, Tuple{Dim{:lon, DimensionalData.Dimensions.Lo
okupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.NoLook
up{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{In
t64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Ra$
ters.NCDfile, Dict{String, Any}}, Missing})
    @ Rasters ~/JuliaPackages/Rasters.jl/src/write.jl:11
 [13] top-level scope
    @ REPL[57]:1

julia> varinfo()
  name                    size summary                                        
  –––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––––––––––––––––––––
  Base                         Module                                         
  Core                         Module                                         
  InteractiveUtils 254.190 KiB Module                                         
  Main                         Module                                         
  ans                  0 bytes Nothing
  raster            23.387 GiB 7014×7277×123 Raster{Float32,3}

@rafaqz
Copy link
Owner

rafaqz commented Sep 30, 2022

Does this work with NCDatasets.jl?

We really need to be able to broadcast the write to dusk chunk by chunk, as we do with TIF files in the archgdal backend.

Then we can broadcast the replacement of missing rather than making a copy of the array.

The problem is doing any manipulations to your data as a whole array at minimum doubles the memory use.

(We may need to write with NetCDF.jl instead of NCDatasets.jl, then we can broadcast the write chunk by chunk)

@ayushpatnaikgit
Copy link
Author

No, it doesn't work.

# Continuing from https://github.com/rafaqz/Rasters.jl/issues/317#issuecomment-1258990373

julia> data = convert(Array{Union{Missing, Float32}, 3}, data)
julia> data[1,1,1] = missing
julia> v[:,:,:] = data

ERROR: OutOfMemoryError()                                                                        
Stacktrace:                                                                                      
 [1] Array                                                                                       
   @ ./boot.jl:461 [inlined]                                                                     
 [2] Array                                                                                       
   @ ./boot.jl:468 [inlined]                                                                     
 [3] CFinvtransformdata                                                                          
   @ ~/.julia/packages/NCDatasets/EkOvO/src/cfvariable.jl:702 [inlined]                          
 [4] setindex!(::NCDatasets.CFVariable{Float32, 3, NCDatasets.Variable{Float32, 3, NCDataset{Noth
ing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale
_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Nothing, N$
thing, Nothing, Nothing, Nothing}}}, ::Array{Union{Missing, Float32}, 3}, ::Colon, ::Colon, ::Col
on)                                                                                              
   @ NCDatasets ~/.julia/packages/NCDatasets/EkOvO/src/cfvariable.jl:764                         
 [5] top-level scope                                                                             
   @ REPL[31]:1 

@rafaqz
Copy link
Owner

rafaqz commented Sep 30, 2022

Ok interesting. Seems we need to do this with a lazy broadcast over chunks. I will look at using NetCDF.jl for this.

@rafaqz rafaqz reopened this Oct 2, 2022
@felixcremer
Copy link
Contributor

This might be fixed now that NCDatasets usesDiskArrays.

@felixcremer
Copy link
Contributor

I tried to reproduce this but since my memory size is different I didn't find an arraysize which is small enough to be able to write it with NCDatasets but large enough to kill the Rasters.write function. Or this is actually already fixed on the latest Rasters.

@rafaqz
Copy link
Owner

rafaqz commented Jan 17, 2024

I think its not fixed because NCDatasets.jl only partially implements DiskArrays, we need #416 merged

@rafaqz rafaqz closed this as completed May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants