-
Notifications
You must be signed in to change notification settings - Fork 60
CUDA_Graphs for Arbor
- Arbor has a lot of small CUDA kernels
- Most individual kernels do not fill the GPU
- Also, there is a lot of potential parallelism in these kernels
- Mechanisms are compiled from NMODL files provided by the user
- this happens AOT
- These are added to regions at initialisation time
- this means that there are multiple sets per simulation
- these remain static across the run
-
We have the following structure
#+begin_example cpp
while !done { // Compute reversal potentials for m in revpot
mechanisms_ { m->nrncurrent(); // KERNEL }// Mark events for each mechanism state_->mark
eventsuntilafter(); // KERNEL // Compute new currents state_->updatecurrents(mechanisms_); // multiple KERNEL// Add current contribution from gap
junctionsstate_->addgjcurrent(); // KERNEL// Get rid of processed elements state_->drop
consumedevents(); // KERNEL// Update integration step times. state_->update
dt(dtmax, tfinal); // KERNEL// Take samples at cell time if sample time in this step interval. state_->advance
samples(sampletime_, samplevalue_); // KERNEL// Integrate voltage by matrix solve. state.integrate(); // KERNEL
// Integrate mechanism state. for (auto& m: mechanisms_) { m->nrn
state(); // KERNEL }// Update ion concentrations. state_->ions
initconcentration(); // KERNEL for (auto& m: mechanisms_) { m->writeions(); // KERNEL }// Update time and test for spike threshold crossings. threshold
watcher_.test(); // KERNEL state_->swaptimes(); // Pointer swap }#+end_example
-
All updates to mechanism state and current are independent
- But assembling ion concentration and currents needs to be atomic/synchronised (addition)
- These are dependent on zeroing the relevant states
-
Structure of update currents
---+- rev_pot_0 -+---+- zero_Ca -+---+- mark_events_0 --- deliver_events_0 --- current_0 -+--- +- rev_pot_1 -+ +- zero_Na -+ +- mark_events_1 --- deliver_events_1 --- current_1 -+ +- ... -+ +- ... -+ +- ...
- This seems to be the most attractive target
-
We have to adjust three kernels
nrn_current
#+begin_example cpp
[[global]{.ul}]{.ul} void nrn
current(mechanismgpuhhpp_ params_) { // ... // Write to global currents, needs to be atomic adds, if parallel ik = gk*(v-ek); il = params_.gl[tid_]*(v-params_.el[tid_]); // ... }#+end_example
nrn_state
#+begin_example cpp
[[global]{.ul}]{.ul} void nrn
state(mechanismgpuhhpp_ params_) { // Update params_ }#+end_example
deliver_events
#+begin_example cpp
[[global]{.ul}]{.ul} void deliver
events(int mechid_, mechanismgpuexp2synpp_ params_, deliverableeventstreamstateevents) { // Consume events }#+end_example
- arguments will change between calls