-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write
seems to consume a lot of memory
#317
Comments
Ok interesting. I wonder if it's in Rasters.jl or NCDatasets.jl. With GDAL files like tiff you can stream write larger than memory files with DiskArrays.jl, but this isn't implemented in NCDatasets.jl. This comment is describing the solution: Rasters.jl/src/sources/ncdatasets.jl Lines 405 to 407 in 098b75d
|
In this case, the variable is a lot smaller than the memory, so there shouldn't be a need to stream write. For reference, here is the full error message:
|
The problem is NCDatsets.jl is making a full copy of the array before writing it. There may be away around this. |
I took the example from NCDatasets.jl/README.md and modified it to increase the size of the matrix. The write functionality seems to work. using NCDatasets
using DataStructures
# This creates a new NetCDF file test.nc.
# The mode "c" stands for creating a new file (clobber)
ds = Dataset("test.nc","c")
defDim(ds,"lon",7014)
defDim(ds,"lat",7277)
defDim(ds, "time", 123)
# Define a global attribute
ds.attrib["title"] = "this is a test file"
# Define the variables temperature with the attribute units
v = defVar(ds,"temperature",Float32,("lon","lat","time"), attrib = OrderedDict(
"units" => "degree Celsius"))
# add additional attributes
v.attrib["comments"] = "this is a string attribute with Unicode Ω ∈ ∑ ∫ f(x) dx"
# Generate some example data
data = [Float32(i+j+k) for i = 1:7014, j = 1:7277, k = 1:123]
varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––––––
Base Module
Core Module
InteractiveUtils 254.190 KiB Module
Main Module
ans 23.387 GiB 7014×7277×123 Matrix{Float32}
data 23.387 GiB 7014×7277×123 Matrix{Float32}
v 601 bytes 7014×7277×123 NCDatasets.CFVariable{Float32, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Nothing, Nothing, Nothing, Nothing, Nothing}}}
v[:,:,:] = data # I believe this is by reference and not a copy.
close(ds)
exit() However: using Rasters
raster = Raster("test.nc") # Everything is fine till this point. The data generated earlier is loaded correctly.
write("test2.nc", raster) ERROR: OutOfMemoryError() |
Yeah the problem was the generic setindex! syntax I was using, I wasn't aware some forms of setindex! on NCDatasets.Variable allocate a new array. This should be fixed now with #320 (Try your example with |
For the example from NCDatasets.jl/README.md, I am testing for the original case for which the error had initially occurred, give me a couple of days. |
There is a different issue writing the original datacube this time. I think I have tried to replicate it, for example from NCDatasets.jl/README.md. julia> using Rasters
julia> raster = Raster("test.nc") # test.nc is described above at: https://github.com/rafaqz/Rasters.jl/issues/317#issuecomment-1258990373
julia> write("test2.nc", raster) # this works now (using fix_ncd_write branch)
julia> raster = rebuild(raster; missingval = nothing)
julia> raster = replace_missing(raster, missing)
julia> write("test2.nc", raster) # This doesn't work.
ERROR: OutOfMemoryError()
Stacktrace:
[1] Array
@ ./boot.jl:461 [inlined]
[2] Array
@ ./boot.jl:468 [inlined]
[3] CFinvtransformdata
@ ~/.julia/packages/NCDatasets/EkOvO/src/cfvariable.jl:702 [inlined]
[4] setindex!(::NCDatasets.CFVariable{Union{Missing, Float32}, 3, NCDatasets.Variable{Float32,
3, NCDatasets.NCDataset{Nothing}}, NCDatasets.Attributes{NCDatasets.NCDataset{Nothing}}, NamedTup
le{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_facto
r), Tuple{Float32, Tuple{}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, ::Array{Float32, 3},
::Colon, ::Colon, ::Colon)
@ NCDatasets ~/.julia/packages/NCDatasets/EkOvO/src/cfvariable.jl:764 [62/1072]
[5] _ncdwritevar!(ds::NCDatasets.NCDataset{Nothing}, A::Raster{Union{Missing, Float32}, 3, Tupl
e{Dim{:lon, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, Dimen
sionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Sym
bol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missin
g}; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:410
[6] _ncdwritevar!
@ ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:380 [inlined]
[7] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Union{Missing, Float32}, 3, Tupl
e{Dim{:lon, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, Dimen
sionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Sym
bol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missin
g}; append::Bool, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:67
[8] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Union{Missing, Float32}, 3, Tupl
e{Dim{:lon, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, Dimen
sionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Sym
bol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missin
g})
@ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:64
[9] write(filename::String, A::Raster{Union{Missing, Float32}, 3, Tuple{Dim{:lon, DimensionalDa
ta.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.Loo
kupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLook
up{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Symbol, DimensionalData.Dime
nsions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing}; kw::Base.Pairs{Symbol
, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Rasters ~/JuliaPackages/Rasters.jl/src/write.jl:11
[10] write(filename::String, A::Raster{Union{Missing, Float32}, 3, Tuple{Dim{:lon, DimensionalDa
ta.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.Loo
kupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLook
up{Base.OneTo{Int64}}}}, Tuple{}, Array{Union{Missing, Float32}, 3}, Symbol, DimensionalData.Dime
nsions.LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing})
@ Rasters ~/JuliaPackages/Rasters.jl/src/write.jl:11
[11] top-level scope
@ REPL[52]:1
julia> varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––––––––––––––––––––
Base Module
Core Module
InteractiveUtils 254.190 KiB Module
Main Module
ans 0 bytes Nothing
raster 29.234 GiB 7014×7277×123 Raster{Union{Missing, Float32},3} I seem to have sufficient memory.
Even the following doesn't work julia> raster = Float32.(raster)
julia> write("test2.nc", raster) # This doesn't work.
ERROR: OutOfMemoryError()
Stacktrace:
[1] Array
@ ./boot.jl:461 [inlined]
[2] Array
@ ./boot.jl:468 [inlined]
[3] Array
@ ./array.jl:563 [inlined]
[4] Array
@ ./boot.jl:481 [inlined]
[5] modify(f::Type{Array}, A::Raster{Float32, 3, Tuple{Dim{:lon, DimensionalData.Dimensions.Loo
kupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.NoLooku
p{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int
64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rast
ers.NCDfile, Dict{String, Any}}, Float32})
@ Rasters ~/JuliaPackages/Rasters.jl/src/array.jl:117
[6] read(x::Raster{Float32, 3, Tuple{Dim{:lon, DimensionalData.Dimensions.LookupArrays.NoLookup
{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64
}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, A
rray{Float32, 3}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.NCDfile, Dict{
String, Any}}, Float32})
[7] _ncdwritevar!(ds::NCDatasets.NCDataset{Nothing}, A::Raster{Float32, 3, Tuple{Dim{:lon, Dime
nsionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArra
ys.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.
LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing}; kw::Base.Pairs{Symbol, Union
{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:410
[8] _ncdwritevar!
@ ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:380 [inlined]
[9] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Float32, 3, Tuple{Dim{:lon, Dime
nsionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArra
ys.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.
LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing}; append::Bool, kw::Base.Pairs
{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
[10] write(filename::String, ::Type{Rasters.NCDfile}, A::Raster{Float32, 3, Tuple{Dim{:[15/1172]
nsionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimen
sions.LookupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArra
ys.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.
LookupArrays.Metadata{Rasters.NCDfile, Dict{String, Any}}, Missing})
@ Rasters ~/JuliaPackages/Rasters.jl/src/sources/ncdatasets.jl:64
[11] write(filename::String, A::Raster{Float32, 3, Tuple{Dim{:lon, DimensionalData.Dimensions.Lo
okupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.NoLook
up{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{In
t64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Ras
ters.NCDfile, Dict{String, Any}}, Missing}; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(
), Tuple{}}})
@ Rasters ~/JuliaPackages/Rasters.jl/src/write.jl:11
[12] write(filename::String, A::Raster{Float32, 3, Tuple{Dim{:lon, DimensionalData.Dimensions.Lo
okupArrays.NoLookup{Base.OneTo{Int64}}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.NoLook
up{Base.OneTo{Int64}}}, Dim{:time, DimensionalData.Dimensions.LookupArrays.NoLookup{Base.OneTo{In
t64}}}}, Tuple{}, Array{Float32, 3}, Symbol, DimensionalData.Dimensions.LookupArrays.Metadata{Ra$
ters.NCDfile, Dict{String, Any}}, Missing})
@ Rasters ~/JuliaPackages/Rasters.jl/src/write.jl:11
[13] top-level scope
@ REPL[57]:1
julia> varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––––––––––––––––––––
Base Module
Core Module
InteractiveUtils 254.190 KiB Module
Main Module
ans 0 bytes Nothing
raster 23.387 GiB 7014×7277×123 Raster{Float32,3} |
Does this work with NCDatasets.jl? We really need to be able to broadcast the write to dusk chunk by chunk, as we do with TIF files in the archgdal backend. Then we can broadcast the replacement of The problem is doing any manipulations to your data as a whole array at minimum doubles the memory use. (We may need to write with NetCDF.jl instead of NCDatasets.jl, then we can broadcast the write chunk by chunk) |
No, it doesn't work. # Continuing from https://github.com/rafaqz/Rasters.jl/issues/317#issuecomment-1258990373
julia> data = convert(Array{Union{Missing, Float32}, 3}, data)
julia> data[1,1,1] = missing
julia> v[:,:,:] = data
ERROR: OutOfMemoryError()
Stacktrace:
[1] Array
@ ./boot.jl:461 [inlined]
[2] Array
@ ./boot.jl:468 [inlined]
[3] CFinvtransformdata
@ ~/.julia/packages/NCDatasets/EkOvO/src/cfvariable.jl:702 [inlined]
[4] setindex!(::NCDatasets.CFVariable{Float32, 3, NCDatasets.Variable{Float32, 3, NCDataset{Noth
ing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale
_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Nothing, N$
thing, Nothing, Nothing, Nothing}}}, ::Array{Union{Missing, Float32}, 3}, ::Colon, ::Colon, ::Col
on)
@ NCDatasets ~/.julia/packages/NCDatasets/EkOvO/src/cfvariable.jl:764
[5] top-level scope
@ REPL[31]:1 |
Ok interesting. Seems we need to do this with a lazy broadcast over chunks. I will look at using NetCDF.jl for this. |
This might be fixed now that NCDatasets usesDiskArrays. |
I tried to reproduce this but since my memory size is different I didn't find an arraysize which is small enough to be able to write it with NCDatasets but large enough to kill the Rasters.write function. Or this is actually already fixed on the latest Rasters. |
I think its not fixed because NCDatasets.jl only partially implements DiskArrays, we need #416 merged |
Hi,
I have a relatively large data cube of size
7014×7277×1×123
, and I want to write this as a netCDF file. I have a lot of space in memory, relative to the size of the data cube, butwrite
is still giving an out of memory error.Relative to the data cube, I seem to have a lot more memory.
It seems
write
is consuming a lot of memory. On the other hand, I was able to useRaster
to load this same data without any issues.The text was updated successfully, but these errors were encountered: