Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Hamiltonian constructor for GPU acceleration #41

Open
naezzell opened this issue Dec 8, 2020 · 1 comment
Open

Optimizing Hamiltonian constructor for GPU acceleration #41

naezzell opened this issue Dec 8, 2020 · 1 comment
Assignees
Labels
GPU improvements related to GPU acceleration

Comments

@naezzell
Copy link
Member

naezzell commented Dec 8, 2020

In a standard anneal, a user will use the standard_driver function

function standard_driver(num_qubit; sp = false)
res = ""
for idx = 1:num_qubit
res = res * "I"^(idx - 1) * "X" * "I"^(num_qubit - idx) * "+"
end
q_translate(res[1:end-1], sp = sp)
end

This generates a matrix of type Array{Complex{Float64},2}. While we've shown that casting this as a CuArray, i.e.cu(standard_driver(n)), is sufficient for a speed-up, this is not optimal. Ideally, the GPU should only deal with Float32s, and perhaps even better, with real numbers only.

Furthermore, the DenseHamiltonian constructor performs "scalar operations" by indexing the m array (see

function DenseHamiltonian(funcs, mats; unit = :h, EIGS = EIGEN_DEFAULT)
if any((x) -> size(x) != size(mats[1]), mats)
throw(ArgumentError("Matrices in the list do not have the same size."))
end
if is_complex(funcs, mats)
mats = complex.(mats)
end
hsize = size(mats[1])
# use static array for size smaller than 100
if hsize[1] <= 10
mats = [SMatrix{hsize[1],hsize[2]}(unit_scale(unit) * m) for m in mats]
else
mats = unit_scale(unit) * mats
end
cache = similar(mats[1])
EIGS = EIGS(cache)
DenseHamiltonian{eltype(mats[1])}(funcs, mats, cache, hsize, EIGS)
end

This can be turned off with CUDA.allowscalar(false) and CuArray.allowscalar(false) or something like this.

Questions/ things to resolve:
1.) Does converting matrices to Array{Complex{Float32},2} before casting as CuArray help GPU performance? If so, add this support.
2.) Is there any speed to be gained by converting complex numbers to two reals numbers instead of Complex type? Does CUDA handle that for us?
3.) Does CUDA.allowscalar(false) actually help us? If not, is there a way to remove scalar operations from DenseHamiltonian constructor in the first place so that scalar operations don't occur on GPU?

@naezzell naezzell added the GPU improvements related to GPU acceleration label Dec 8, 2020
@SuperElephant
Copy link
Collaborator

I tried to disable scalar in try_gpu_accel.jl (by just adding CUDA.allowscalar(false) in line 64.). However, it shows up that is not so trivial as adding a single line. It gives errors as follows. I suppose some changes for function DenseHamiltonian are needed in order to disable scalar.

ERROR: LoadError: scalar getindex is disallowed
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] assertscalar(::String) at /home1/chaoxian/.julia/packages/GPUArrays/uaFZh/src/host/indexing.jl:41
 [3] getindex at /home1/chaoxian/.julia/packages/GPUArrays/uaFZh/src/host/indexing.jl:96 [inlined]
 [4] macro expansion at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:46 [inlined]
 [5] unroll_tuple at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:43 [inlined]
 [6] _convert at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:35 [inlined]
 [7] convert at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:32 [inlined]
 [8] StaticArrays.SArray{Tuple{8,8},T,2,L} where L where T(::CuArray{Complex{Float64},2}) at /home1/chaoxian/.julia/packages/StaticArrays/l7lu2/src/convert.jl:7
 [9] (::OpenQuantumBase.var"#180#182"{Symbol,Tuple{Int64,Int64}})(::CuArray{Complex{Float64},2}) at /home1/chaoxian/.julia/packages/OpenQuantumBase/YkEiX/src/base_util.jl:0
 [10] iterate at ./generator.jl:47 [inlined]
 [11] collect(::Base.Generator{Array{CuArray{Complex{Float64},2},1},OpenQuantumBase.var"#180#182"{Symbol,Tuple{Int64,Int64}}}) at ./array.jl:665
 [12] DenseHamiltonian(::Array{Function,1}, ::Array{CuArray{Complex{Float64},2},1}; unit::Symbol, EIGS::typeof(EIGEN_DEFAULT)) at /home1/chaoxian/.julia/packages/OpenQuantumBase/YkEiX/src/hamiltonian/dense_hamiltonian.jl:41
 [13] anneal_spin_glass_gpu(::Int64, ::Int64) at /home1/chaoxian/final_project/accelqat/cuda/try_gpu_accel_ds.jl:58
 [14] top-level scope at ./util.jl:175
 [15] include(::Module, ::String) at ./Base.jl:377
 [16] exec_options(::Base.JLOptions) at ./client.jl:288
 [17] _start() at ./client.jl:484
in expression starting at /home1/chaoxian/final_project/accelqat/cuda/try_gpu_accel_ds.jl:67

Just to remind, as shown in stacktrace, line 41 in dense_hamiltonian.jl will need changes. Also there might be more changes needed other then that as mentioned by @naezzell .

mats = [SMatrix{hsize[1],hsize[2]}(unit_scale(unit) * m) for m in mats]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU improvements related to GPU acceleration
Projects
None yet
Development

No branches or pull requests

4 participants