You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
res = res *"I"^(idx -1) *"X"*"I"^(num_qubit - idx) *"+"
end
q_translate(res[1:end-1], sp = sp)
end
This generates a matrix of type Array{Complex{Float64},2}. While we've shown that casting this as a CuArray, i.e.cu(standard_driver(n)), is sufficient for a speed-up, this is not optimal. Ideally, the GPU should only deal with Float32s, and perhaps even better, with real numbers only.
Furthermore, the DenseHamiltonian constructor performs "scalar operations" by indexing the m array (see
This can be turned off with CUDA.allowscalar(false) and CuArray.allowscalar(false) or something like this.
Questions/ things to resolve:
1.) Does converting matrices to Array{Complex{Float32},2} before casting as CuArray help GPU performance? If so, add this support.
2.) Is there any speed to be gained by converting complex numbers to two reals numbers instead of Complex type? Does CUDA handle that for us?
3.) Does CUDA.allowscalar(false) actually help us? If not, is there a way to remove scalar operations from DenseHamiltonian constructor in the first place so that scalar operations don't occur on GPU?
The text was updated successfully, but these errors were encountered:
I tried to disable scalar in try_gpu_accel.jl (by just adding CUDA.allowscalar(false) in line 64.). However, it shows up that is not so trivial as adding a single line. It gives errors as follows. I suppose some changes for function DenseHamiltonian are needed in order to disable scalar.
ERROR: LoadError: scalar getindex is disallowed
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] assertscalar(::String) at /home1/chaoxian/.julia/packages/GPUArrays/uaFZh/src/host/indexing.jl:41
[3] getindex at /home1/chaoxian/.julia/packages/GPUArrays/uaFZh/src/host/indexing.jl:96 [inlined]
[4] macro expansion at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:46 [inlined]
[5] unroll_tuple at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:43 [inlined]
[6] _convert at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:35 [inlined]
[7] convert at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:32 [inlined]
[8] StaticArrays.SArray{Tuple{8,8},T,2,L} where L where T(::CuArray{Complex{Float64},2}) at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:7
[9] (::OpenQuantumBase.var"#180#182"{Symbol,Tuple{Int64,Int64}})(::CuArray{Complex{Float64},2}) at /home1/chaoxian/.julia/packages/OpenQuantumBase/YkEiX/src/base_util.jl:0
[10] iterate at ./generator.jl:47 [inlined]
[11] collect(::Base.Generator{Array{CuArray{Complex{Float64},2},1},OpenQuantumBase.var"#180#182"{Symbol,Tuple{Int64,Int64}}}) at ./array.jl:665
[12] DenseHamiltonian(::Array{Function,1}, ::Array{CuArray{Complex{Float64},2},1}; unit::Symbol, EIGS::typeof(EIGEN_DEFAULT)) at /home1/chaoxian/.julia/packages/OpenQuantumBase/YkEiX/src/hamiltonian/dense_hamiltonian.jl:41
[13] anneal_spin_glass_gpu(::Int64, ::Int64) at /home1/chaoxian/final_project/accelqat/cuda/try_gpu_accel_ds.jl:58
[14] top-level scope at ./util.jl:175
[15] include(::Module, ::String) at ./Base.jl:377
[16] exec_options(::Base.JLOptions) at ./client.jl:288
[17] _start() at ./client.jl:484
in expression starting at /home1/chaoxian/final_project/accelqat/cuda/try_gpu_accel_ds.jl:67
Just to remind, as shown in stacktrace, line 41 in dense_hamiltonian.jl will need changes. Also there might be more changes needed other then that as mentioned by @naezzell .
In a standard anneal, a user will use the
standard_driver
functionOpenQuantumBase.jl/src/matrix_util.jl
Lines 104 to 110 in c567d61
This generates a matrix of type
Array{Complex{Float64},2}
. While we've shown that casting this as a CuArray, i.e.cu(standard_driver(n))
, is sufficient for a speed-up, this is not optimal. Ideally, the GPU should only deal with Float32s, and perhaps even better, with real numbers only.Furthermore, the DenseHamiltonian constructor performs "scalar operations" by indexing the m array (see
OpenQuantumBase.jl/src/hamiltonian/dense_hamiltonian.jl
Lines 31 to 48 in c567d61
This can be turned off with CUDA.allowscalar(false) and CuArray.allowscalar(false) or something like this.
Questions/ things to resolve:
1.) Does converting matrices to
Array{Complex{Float32},2}
before casting asCuArray
help GPU performance? If so, add this support.2.) Is there any speed to be gained by converting complex numbers to two reals numbers instead of Complex type? Does CUDA handle that for us?
3.) Does CUDA.allowscalar(false) actually help us? If not, is there a way to remove scalar operations from DenseHamiltonian constructor in the first place so that scalar operations don't occur on GPU?
The text was updated successfully, but these errors were encountered: