diff --git a/previews/PR26/devices/a100_sxm2/index.html b/previews/PR26/devices/a100_sxm2/index.html index bae2058..edf669d 100644 --- a/previews/PR26/devices/a100_sxm2/index.html +++ b/previews/PR26/devices/a100_sxm2/index.html @@ -186,4 +186,4 @@ 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 - 1 1 1 1 1 1 1 0 + 1 1 1 1 1 1 1 0 diff --git a/previews/PR26/devices/geforce_gtx_1650/index.html b/previews/PR26/devices/geforce_gtx_1650/index.html index 295e5b1..aa6adf0 100644 --- a/previews/PR26/devices/geforce_gtx_1650/index.html +++ b/previews/PR26/devices/geforce_gtx_1650/index.html @@ -89,4 +89,4 @@ Supports cooperative kernel launch: Yes Supports multi-device co-op kernel launch: Yes Device PCI domain ID / bus ID / device ID: 0 / 1 / 0 -Compute mode: Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) +Compute mode: Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) diff --git a/previews/PR26/devices/v100_sxm2/index.html b/previews/PR26/devices/v100_sxm2/index.html index 01a30ba..7abff47 100644 --- a/previews/PR26/devices/v100_sxm2/index.html +++ b/previews/PR26/devices/v100_sxm2/index.html @@ -166,4 +166,4 @@ 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1 0 1 1 0 1 - 0 0 0 1 1 1 1 0 + 0 0 0 1 1 1 1 0 diff --git a/previews/PR26/examples/data_bandwidth/index.html b/previews/PR26/examples/data_bandwidth/index.html index 3b13975..4ef1c22 100644 --- a/previews/PR26/examples/data_bandwidth/index.html +++ b/previews/PR26/examples/data_bandwidth/index.html @@ -90,4 +90,4 @@ 452.429 454.293 445.094 454.151 nothing 453.472 451.474 453.981 454.89 454.066 453.84 453.84 451.194 nothing 453.274 451.53 453.925 453.02 454.293 456.459 451.839 451.951 nothing 455.032 - 454.208 416.936 454.265 435.947 452.035 437.836 451.895 nothing + 454.208 416.936 454.265 435.947 452.035 437.836 451.895 nothing diff --git a/previews/PR26/examples/gpuinfo/index.html b/previews/PR26/examples/gpuinfo/index.html index 2cb4b57..c892118 100644 --- a/previews/PR26/examples/gpuinfo/index.html +++ b/previews/PR26/examples/gpuinfo/index.html @@ -66,4 +66,4 @@ 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 - 1 1 1 1 1 1 1 0

Turns out that using CU_DEVICE_P2P_ATTRIBUTE_ACCESS_SUPPORTED or cuDeviceCanAccessPeer to query p2p access support may lead to different results (see this stackoverflow thread). In gpuinfo_p2p_access() we use both methods and, if the results were to be different, we print both matrices (not shown above).

+ 1 1 1 1 1 1 1 0

Turns out that using CU_DEVICE_P2P_ATTRIBUTE_ACCESS_SUPPORTED or cuDeviceCanAccessPeer to query p2p access support may lead to different results (see this stackoverflow thread). In gpuinfo_p2p_access() we use both methods and, if the results were to be different, we print both matrices (not shown above).

diff --git a/previews/PR26/examples/gpustresstest/index.html b/previews/PR26/examples/gpustresstest/index.html index 931cba0..9ca68e0 100644 --- a/previews/PR26/examples/gpustresstest/index.html +++ b/previews/PR26/examples/gpustresstest/index.html @@ -117,4 +117,4 @@ 56 │⣶⣶⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ └────────────────────────────────────────┘ ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀20⠀ - ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀Time [s]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

Note that for convenience stresstest also takes a monitoring keyword argument which can be used to automatically trigger the starting and stopping of the monitoring.

+ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀Time [s]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

Note that for convenience stresstest also takes a monitoring keyword argument which can be used to automatically trigger the starting and stopping of the monitoring.

diff --git a/previews/PR26/examples/peakflops_gpu/index.html b/previews/PR26/examples/peakflops_gpu/index.html index 89049e0..604cece 100644 --- a/previews/PR26/examples/peakflops_gpu/index.html +++ b/previews/PR26/examples/peakflops_gpu/index.html @@ -57,4 +57,4 @@ Peakflops (TFLOP/s): ├ tensorcores: true ├ dtype: TensorFloat32 - └ max: 155.5 + └ max: 155.5 diff --git a/previews/PR26/explanations/dgx/index.html b/previews/PR26/explanations/dgx/index.html index 18ca831..22bef67 100644 --- a/previews/PR26/explanations/dgx/index.html +++ b/previews/PR26/explanations/dgx/index.html @@ -1,2 +1,2 @@ -DGX Details · GPUInspector.jl
+DGX Details · GPUInspector.jl
diff --git a/previews/PR26/index.html b/previews/PR26/index.html index 0c82bb9..4b1d992 100644 --- a/previews/PR26/index.html +++ b/previews/PR26/index.html @@ -1,3 +1,3 @@ GPUInspector · GPUInspector.jl

GPUInspector.jl

Installation

The package is registered in the General registry and can readily be added by using the Pkg REPL mode.

] add GPUInspector

Note: The minimal required Julia version is 1.6 but we strongly recommend to use Julia >= 1.7. Some features might not be available in Julia 1.6!

Getting Started

Backends

GPUInspector itself only provides limited functionality. Most of its features come to live - through package extensions - when you load a GPU backend, like CUDA.jl. Hence, most of the time, you want to add one of these backends next to GPUInspector to your Julia environment and then run

using GPUInspector
-using CUDA # loading a GPU backend triggers the pkg extension to load

Note that you can check the current backend with the backend() function (and set it manually via backend!).

Examples

+using CUDA # loading a GPU backend triggers the pkg extension to load

Note that you can check the current backend with the backend() function (and set it manually via backend!).

Examples

diff --git a/previews/PR26/refs/UnitPrefixedBytes/index.html b/previews/PR26/refs/UnitPrefixedBytes/index.html index 3e2383e..e1a28a0 100644 --- a/previews/PR26/refs/UnitPrefixedBytes/index.html +++ b/previews/PR26/refs/UnitPrefixedBytes/index.html @@ -1,5 +1,5 @@ -Unit-Prefixed Bytes · GPUInspector.jl

Unit-Prefixed Bytes

Index

References

GPUInspector.UnitPrefixedBytesType

Abstract type representing an amount of data, i.e. a certain number of bytes, with a unit prefix (also "metric prefix"). Examples include the SI prefixes, like KB, MB, and GB, but also the binary prefixes (ISO/IEC 80000), like KiB, MiB, and GiB.

See https://en.wikipedia.org/wiki/Binary_prefix for more information.

GPUInspector.bytesMethod
bytes(x::Number)

Returns an appropriate UnitPrefixedBytes object, representing the number of bytes.

Note: This function is type unstable by construction!

See simplify for what "appropriate" means here.

GPUInspector.bytesMethod
bytes(x::UnitPrefixedBytes)

Return the number of bytes (without prefix) as Float64.

GPUInspector.change_baseMethod

Toggle between

  • Base 10, SI prefixes, i.e. factors of 1000
  • Base 2, ISO/IEC prefixes, i.e. factors of 1024

Example:

julia> change_base(KB(13))
+Unit-Prefixed Bytes · GPUInspector.jl

Unit-Prefixed Bytes

Index

References

GPUInspector.UnitPrefixedBytesType

Abstract type representing an amount of data, i.e. a certain number of bytes, with a unit prefix (also "metric prefix"). Examples include the SI prefixes, like KB, MB, and GB, but also the binary prefixes (ISO/IEC 80000), like KiB, MiB, and GiB.

See https://en.wikipedia.org/wiki/Binary_prefix for more information.

GPUInspector.bytesMethod
bytes(x::Number)

Returns an appropriate UnitPrefixedBytes object, representing the number of bytes.

Note: This function is type unstable by construction!

See simplify for what "appropriate" means here.

GPUInspector.bytesMethod
bytes(x::UnitPrefixedBytes)

Return the number of bytes (without prefix) as Float64.

GPUInspector.change_baseMethod

Toggle between

  • Base 10, SI prefixes, i.e. factors of 1000
  • Base 2, ISO/IEC prefixes, i.e. factors of 1024

Example:

julia> change_base(KB(13))
 ~12.7 KiB
 
 julia> change_base(KiB(13))
@@ -7,4 +7,4 @@
 ~38.15 MiB
 
 julia> simplify(B(40_000_000); base=10)
-40.0 MB
+40.0 MB
diff --git a/previews/PR26/refs/backends/index.html b/previews/PR26/refs/backends/index.html index b53f654..3455f22 100644 --- a/previews/PR26/refs/backends/index.html +++ b/previews/PR26/refs/backends/index.html @@ -1,3 +1,3 @@ Backends · GPUInspector.jl

Backends

Index

References

GPUInspector.backend!Method
backend!(b::Backend)
-

Set the GPU backend (manually). Note that the corresponding backend package (e.g. CUDA.jl) must already be loaded in the active Julia session (otherwise an exception is thrown).

GPUInspector.backendinfoMethod

Query information about a specific backend, e.g., what functionality the backend currently supports.

GPUInspector.functionalMethod

Check if GPUInspector, and its GPU backend (e.g. CUDA.jl), is available and functional. If not, print some hopefully useful debug information (or turn it off with verbose=false).

+

Set the GPU backend (manually). Note that the corresponding backend package (e.g. CUDA.jl) must already be loaded in the active Julia session (otherwise an exception is thrown).

GPUInspector.backendMethod

Returns the currently active GPU backend

GPUInspector.backendinfoMethod

Query information about a specific backend, e.g., what functionality the backend currently supports.

GPUInspector.clear_gpu_memoryMethod
clear_gpu_memory(; device, gc)

Reclaim the unused memory of a GPU

GPUInspector.deviceMethod

Return the current device of the active backend.

GPUInspector.devicesMethod

Return the devices of the active backend.

GPUInspector.functionalMethod

Check if GPUInspector, and its GPU backend (e.g. CUDA.jl), is available and functional. If not, print some hopefully useful debug information (or turn it off with verbose=false).

diff --git a/previews/PR26/refs/cuda_extension/index.html b/previews/PR26/refs/cuda_extension/index.html index 77aec7f..bab30d5 100644 --- a/previews/PR26/refs/cuda_extension/index.html +++ b/previews/PR26/refs/cuda_extension/index.html @@ -1,3 +1,3 @@ CUDA Extension · GPUInspector.jl

CUDA Extension

Index

References

CUDAExt.get_gpu_utilizationFunction
get_gpu_utilization(device=CUDA.device())

Get the current utilization of the given CUDA device in percent.

CUDAExt.get_gpu_utilizationsFunction
get_gpu_utilizations(devices=CUDA.devices())

Get the current utilization of the given CUDA devices in percent.

CUDAExt.get_power_usageMethod
get_power_usage(device=CUDA.device())

Get current power usage of the given CUDA device in Watts.

CUDAExt.get_power_usagesFunction
get_power_usages(devices=CUDA.devices())

Get current power usage of the given CUDA devices in Watts.

CUDAExt.get_temperatureFunction
get_temperature(device=CUDA.device())

Get current temperature of the given CUDA device in degrees Celsius.

CUDAExt.get_temperaturesFunction
get_temperatures(devices=CUDA.devices())

Get current temperature of the given CUDA devices in degrees Celsius.

CUDAExt.gpuidFunction

Get GPU index of the given device.

Note: GPU indices start with zero.

CUDAExt._kernel_fmaMethod

Dummy kernel doing _kernel_fma_nfmas() many FMAs (default: 100_000).

CUDAExt._peakflops_gpu_fmasMethod
_peakflops_gpu_fmas(; size::Integer=5_000_000, dtype=Float32, nbench=5, nkernel=5, device=CUDA.device(), verbose=true)

Tries to estimate the peak performance of a GPU in TFLOP/s by measuring the time it takes to perform _kernel_fma_nfmas() * size many FMAs on CUDA cores.

Keyword arguments:

  • device (default: CUDA.device()): CUDA device to be used.
  • dtype (default: Float32): element type of the matrices.
  • size (default: 5_000_000): length of vectors.
  • nkernel (default: 5): number of kernel calls that make up one benchmarking sample.
  • nbench (default: 5): number of measurements to be performed the best of which is used for the TFLOP/s computation.
  • verbose (default: true): toggle printing.
  • io (default: stdout): set the stream where the results should be printed.
CUDAExt._peakflops_gpu_wmmasMethod
_peakflops_gpu_wmmas()

Tries to estimate the peak performance of a GPU in TFLOP/s by measuring the time it takes to perform _kernel_wmma_nwmmas() many WMMAs on Tensor Cores.

Keyword arguments:

  • device (default: CUDA.device()): CUDA device to be used.
  • dtype (default: Float16): element type of the matrices. We currently only support Float16 (Int8, :TensorFloat32, :BFloat16, and Float64 might or might not work).
  • nkernel (default: 10): number of kernel calls that make up one benchmarking sample.
  • nbench (default: 5): number of measurements to be performed the best of which is used for the TFLOP/s computation.
  • threads (default: max. threads per block): how many threads to use per block (part of the kernel launch configuration).
  • blocks (default: 2048): how many blocks to use (part of the kernel launch configuration).
  • verbose (default: true): toggle printing.
  • io (default: stdout): set the stream where the results should be printed.
CUDAExt.peakflops_gpu_matmulMethod
peakflops_gpu_matmul(; device, dtype=Float32, size=2^14, nmatmuls=5, nbench=5, verbose=true)

Tries to estimate the peak performance of a GPU in TFLOP/s by measuring the time it takes to perform nmatmuls many (in-place) matrix-matrix multiplications.

Keyword arguments:

  • device (default: CUDA.device()): CUDA device to be used.
  • dtype (default: Float32): element type of the matrices.
  • size (default: 2^14): matrices will have dimensions (size, size).
  • nmatmuls (default: 5): number of matmuls that will make up the kernel to be timed.
  • nbench (default: 5): number of measurements to be performed the best of which is used for the TFLOP/s computation.
  • verbose (default: true): toggle printing.
  • io (default: stdout): set the stream where the results should be printed.

See also: peakflops_gpu_matmul_scaling, peakflops_gpu_matmul_graphs.

CUDAExt.peakflops_gpu_matmul_scalingMethod
peakflops_gpu_matmul_scaling(peakflops_func = peakflops_gpu_matmul; verbose=true) -> sizes, flops

Asserts the scaling of the given peakflops_function (defaults to peakflops_gpu_matmul) with increasing matrix size. If verbose=true (default), displays a unicode plot. Returns the considered sizes and TFLOP/s. For further options, see peakflops_gpu_matmul.

CUDAExt.StressTestBatchedType

GPU stress test (matrix multiplications) in which we try to run for a given time period. We try to keep the CUDA stream continously busy with matmuls at any point in time. Concretely, we submit batches of matmuls and, after half of them, we record a CUDA event. On the host, after submitting a batch, we (non-blockingly) synchronize on, i.e. wait for, the CUDA event and, if we haven't exceeded the desired duration already, submit another batch.

CUDAExt.StressTestEnforcedType

GPU stress test (matrix multiplications) in which we run almost precisely for a given time period (duration is enforced).

CUDAExt.StressTestFixedIterType

GPU stress test (matrix multiplications) in which we run for a given number of iteration, or try to run for a given time period (with potentially high uncertainty!). In the latter case, we estimate how long a synced matmul takes and set niter accordingly.

CUDAExt.StressTestStoreResultsType

GPU stress test (matrix multiplications) in which we store all matmul results and try to run as many iterations as possible for a certain memory limit (default: 90% of free memory).

This stress test is somewhat inspired by gpu-burn by Ville Timonen.

CUDAExt.alloc_memMethod
alloc_mem(memsize::UnitPrefixedBytes; devs=(CUDA.device(),), dtype=Float32)

Allocates memory on the devices whose IDs are provided via devs. Returns a vector of memory handles (i.e. CuArrays).

Examples:

alloc_mem(MiB(1024)) # allocate on the currently active device
-alloc_mem(B(40_000_000); devs=(0,1)) # allocate on GPU0 and GPU1
CUDAExt.toggle_tensorcoremathFunction
toggle_tensorcoremath([enable::Bool; verbose=true])

Switches the CUDA.math_mode between CUDA.FAST_MATH (enable=true) and CUDA.DEFAULT_MATH (enable=false). For matmuls of CuArray{Float32}s, this should have the effect of using/enabling and not using/disabling tensor cores. Of course, this only works on supported devices and CUDA versions.

If no arguments are provided, this functions toggles between the two math modes.

GPUInspector.memory_bandwidth_saxpyMethod

Extra keyword arguments:

  • cublas (default: true): toggle between CUDA.axpy! and a custom _saxpy_gpu_kernel!.

(This method is from the NVIDIA Backend.)

GPUInspector.gpuinfoMethod
gpuinfo(deviceid::Integer)

Print out detailed information about the NVIDIA GPU with the given deviceid.

Heavily inspired by the CUDA sample "deviceQueryDrv.cpp".

(This method is from the NVIDIA Backend.)

GPUInspector.monitoring_stopMethod
monitoring_stop(; verbose=true) -> results

Specifically, results is a named tuple with the following keys:

  • time: the (relative) times at which we measured
  • temperature, power, compute, mem

(This method is from the NVIDIA Backend.)

+alloc_mem(B(40_000_000); devs=(0,1)) # allocate on GPU0 and GPU1
CUDAExt.hastensorcoresFunction

Checks whether the given CuDevice has Tensor Cores.

CUDAExt.toggle_tensorcoremathFunction
toggle_tensorcoremath([enable::Bool; verbose=true])

Switches the CUDA.math_mode between CUDA.FAST_MATH (enable=true) and CUDA.DEFAULT_MATH (enable=false). For matmuls of CuArray{Float32}s, this should have the effect of using/enabling and not using/disabling tensor cores. Of course, this only works on supported devices and CUDA versions.

If no arguments are provided, this functions toggles between the two math modes.

GPUInspector.memory_bandwidth_saxpyMethod

Extra keyword arguments:

  • cublas (default: true): toggle between CUDA.axpy! and a custom _saxpy_gpu_kernel!.

(This method is from the NVIDIA Backend.)

GPUInspector.gpuinfoMethod
gpuinfo(deviceid::Integer)

Print out detailed information about the NVIDIA GPU with the given deviceid.

Heavily inspired by the CUDA sample "deviceQueryDrv.cpp".

(This method is from the NVIDIA Backend.)

GPUInspector.monitoring_stopMethod
monitoring_stop(; verbose=true) -> results

Specifically, results is a named tuple with the following keys:

  • time: the (relative) times at which we measured
  • temperature, power, compute, mem

(This method is from the NVIDIA Backend.)

diff --git a/previews/PR26/refs/data_bandwidth/index.html b/previews/PR26/refs/data_bandwidth/index.html index f3a10b0..f2e70e5 100644 --- a/previews/PR26/refs/data_bandwidth/index.html +++ b/previews/PR26/refs/data_bandwidth/index.html @@ -3,4 +3,4 @@ host2device_bandwidth(MiB(1024)) host2device_bandwidth(KiB(20_000); dtype=Int32)
GPUInspector.p2p_bandwidthMethod

p2p_bandwidth(; kwargs...) Performs a peer-to-peer memory copy benchmark (time measurement) and returns an inter-gpu memory bandwidth estimate (in GiB/s) derived from it.

Keyword arguments:

  • memsize (default: B(40_000_000)): memory size to be used
  • src (default: 0): source device
  • dst (default: 1): destination device
  • nbench (default: 5): number of time measurements (i.e. p2p memcopies)
  • verbose (default: true): set to false to turn off any printing.
  • hist (default: false): when true, a UnicodePlots-based histogram is printed.
  • times (default: false): toggle printing of measured times.
  • alternate (default: false): alternate src and dst, i.e. copy data back and forth.
  • dtype (default: Float32): see alloc_mem.
  • io (default: stdout): set the stream where the results should be printed.

Examples:

p2p_bandwidth()
 p2p_bandwidth(MiB(1024))
-p2p_bandwidth(KiB(20_000); dtype=Int32)
GPUInspector.p2p_bandwidth_allMethod
p2p_bandwidth_all(; kwargs...)

Run p2p_bandwidth for all combinations of available devices. Returns a matrix with the p2p memory bandwidth estimates.

GPUInspector.p2p_bandwidth_bidirectionalMethod

Same as p2p_bandwidth but measures the bidirectional bandwidth (copying data back and forth).

GPUInspector.p2p_bandwidth_bidirectional_allMethod

Same as p2p_bandwidth_all but measures the bidirectional bandwidth (copying data back and forth).

+p2p_bandwidth(KiB(20_000); dtype=Int32)
GPUInspector.p2p_bandwidth_allMethod
p2p_bandwidth_all(; kwargs...)

Run p2p_bandwidth for all combinations of available devices. Returns a matrix with the p2p memory bandwidth estimates.

GPUInspector.p2p_bandwidth_bidirectionalMethod

Same as p2p_bandwidth but measures the bidirectional bandwidth (copying data back and forth).

GPUInspector.p2p_bandwidth_bidirectional_allMethod

Same as p2p_bandwidth_all but measures the bidirectional bandwidth (copying data back and forth).

diff --git a/previews/PR26/refs/gpuinfo/index.html b/previews/PR26/refs/gpuinfo/index.html index c18176a..3403fe0 100644 --- a/previews/PR26/refs/gpuinfo/index.html +++ b/previews/PR26/refs/gpuinfo/index.html @@ -1,2 +1,2 @@ -GPU Information · GPUInspector.jl

GPU Information

Heavily inspired by the cuda sample deviceQueryDrv.

Index

References

GPUInspector.gpuinfoMethod
gpuinfo([device]; kwargs...)

Print out detailed information about the GPU with the given device id.

Note: Device ids start at zero!

GPUInspector.ngpusMethod

Returns the number of available GPUs for the given/current backend

+GPU Information · GPUInspector.jl

GPU Information

Heavily inspired by the cuda sample deviceQueryDrv.

Index

References

GPUInspector.gpuinfoMethod
gpuinfo([device]; kwargs...)

Print out detailed information about the GPU with the given device id.

Note: Device ids start at zero!

GPUInspector.ngpusMethod

Returns the number of available GPUs for the given/current backend

diff --git a/previews/PR26/refs/gpustresstest/index.html b/previews/PR26/refs/gpustresstest/index.html index f661f99..417cd6f 100644 --- a/previews/PR26/refs/gpustresstest/index.html +++ b/previews/PR26/refs/gpustresstest/index.html @@ -1,2 +1,2 @@ -GPU Stress Test · GPUInspector.jl

GPU Stresstest

Index

References

GPUInspector.stresstestMethod
stresstest(; kwargs...)

Run a GPU stress test (matrix multiplication) on one or multiple GPU devices, as specified by the positional argument. If no argument is provided (only) the currently active GPU will be used.

Keyword arguments:

Choose one of the following (or none):

  • duration: stress test will take about the given time in seconds. (StressTestBatched)
  • enforced_duration: stress test will take almost precisely the given time in seconds. (StressTestEnforced)
  • approx_duration: stress test will hopefully take approximately the given time in seconds. No promises made! (StressTestFixedIter)
  • niter: stress test will run the given number of matrix-multiplications, however long that will take. (StressTestFixedIter)
  • mem: number (<:Real) between 0 and 1, indicating the fraction of the available GPU memory that should be used, or a <:UnitPrefixedBytes indicating an absolute memory limit. (StressTestStoreResults)

General settings:

  • devices (default: e.g. [CUDA.device()]): GPU devices to be included in the stress test
  • dtype (default: Float32): element type of the matrices
  • monitoring (default: false): enable automatic monitoring, in which case a MonitoringResults object is returned.
  • size (default: 2048): matrices of size (size, size) will be used
  • verbose (default: true): toggle printing of information
  • parallel (default: true): If true, will (try to) run each GPU test on a different Julia thread. Make sure to have enough Julia threads.
  • threads (default: nothing): If parallel == true, this argument may be used to specify the Julia threads to use.
  • clearmem (default: false): If true, we call clear_all_gpus_memory after the stress test.
  • io (default: stdout): set the stream where the results should be printed.

When duration is specifiec (i.e. StressTestEnforced) there is also:

  • batch_duration (default: ceil(Int, duration/10)): desired duration of one batch of matmuls.
+GPU Stress Test · GPUInspector.jl

GPU Stresstest

Index

References

GPUInspector.stresstestMethod
stresstest(; kwargs...)

Run a GPU stress test (matrix multiplication) on one or multiple GPU devices, as specified by the positional argument. If no argument is provided (only) the currently active GPU will be used.

Keyword arguments:

Choose one of the following (or none):

  • duration: stress test will take about the given time in seconds. (StressTestBatched)
  • enforced_duration: stress test will take almost precisely the given time in seconds. (StressTestEnforced)
  • approx_duration: stress test will hopefully take approximately the given time in seconds. No promises made! (StressTestFixedIter)
  • niter: stress test will run the given number of matrix-multiplications, however long that will take. (StressTestFixedIter)
  • mem: number (<:Real) between 0 and 1, indicating the fraction of the available GPU memory that should be used, or a <:UnitPrefixedBytes indicating an absolute memory limit. (StressTestStoreResults)

General settings:

  • devices (default: e.g. [CUDA.device()]): GPU devices to be included in the stress test
  • dtype (default: Float32): element type of the matrices
  • monitoring (default: false): enable automatic monitoring, in which case a MonitoringResults object is returned.
  • size (default: 2048): matrices of size (size, size) will be used
  • verbose (default: true): toggle printing of information
  • parallel (default: true): If true, will (try to) run each GPU test on a different Julia thread. Make sure to have enough Julia threads.
  • threads (default: nothing): If parallel == true, this argument may be used to specify the Julia threads to use.
  • clearmem (default: false): If true, we call clear_all_gpus_memory after the stress test.
  • io (default: stdout): set the stream where the results should be printed.

When duration is specifiec (i.e. StressTestEnforced) there is also:

  • batch_duration (default: ceil(Int, duration/10)): desired duration of one batch of matmuls.
diff --git a/previews/PR26/refs/monitoring/index.html b/previews/PR26/refs/monitoring/index.html index 0b4e5ae..aff4a84 100644 --- a/previews/PR26/refs/monitoring/index.html +++ b/previews/PR26/refs/monitoring/index.html @@ -1,2 +1,2 @@ -GPU Monitoring · GPUInspector.jl

GPU Monitoring

Index

References

GPUInspector.livemonitor_powerusageMethod
livemonitor_powerusage(duration) -> times, powerusage

Monitor the power usage of GPU(s) (in Watts) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the power usage as a Vector{Vector{Float64}}.

For general keyword arguments, see livemonitor_something.

GPUInspector.livemonitor_somethingMethod
livemonitor_something(f, duration) -> times, values

Monitor some property of GPU(s), as specified through the function f, over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.

The function f will be called on a vector of devices and should return a vector of Float64 values.

Keyword arguments:

  • freq (default: 1): polling rate in Hz.
  • devices (default: e.g. NVML.devices()): Devices to monitor.
  • plot (default: false): Create a unicode plot after the monitoring.
  • liveplot (default: false): Create and update a unicode plot during the monitoring. Use optional ylims to specify fixed y limits.
  • title (default: ""): Title used in unicode plots.
  • ylabel (default: "Values"): y label used in unicode plots.

See: livemonitor_temperature, livemonitor_powerusage

GPUInspector.livemonitor_temperatureMethod
livemonitor_temperature(duration) -> times, temperatures

Monitor the temperature of GPU(s) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.

For general keyword arguments, see livemonitor_something.

GPUInspector.monitoring_startMethod
monitoring_start(; devices, kwargs...)

Start monitoring of GPU temperature, utilization, power usage, etc.

Keyword arguments:

  • freq (default: 1): polling rate in Hz.
  • devices (default: e.g. CUDA.devices()): GPU devices to monitor.
  • thread (default: Threads.nthreads()): id of the Julia thread that should run the monitoring.
  • verbose (default: true): toggle verbose output.

See also monitoring_stop.

GPUInspector.savefig_monitoring_resultsFunction
savefig_monitoring_results(r::MonitoringResults, symbols=keys(r.results); ext=:pdf)

Save plots of the quantities specified through symbols of a MonitoringResults object to disk. Note: Only available if CairoMakie.jl is loaded next to GPUInspector.jl.

GPUInspector.MonitoringResultsType

Struct to hold the results of monitoring. This includes the time points (times), the monitored devices (devices), as well as a dictionary holding the (vector-)values of different quantities (identified by symbols) at each of the time points.

GPUInspector.plot_monitoring_resultsFunction
plot_monitoring_results(r::MonitoringResults, symbols=keys(r.results))

Plot the quantities specified through symbols of a MonitoringResults object. Will generate a textual in-terminal / in-logfile plot using UnicodePlots.jl.

+GPU Monitoring · GPUInspector.jl

GPU Monitoring

Index

References

GPUInspector.livemonitor_powerusageMethod
livemonitor_powerusage(duration) -> times, powerusage

Monitor the power usage of GPU(s) (in Watts) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the power usage as a Vector{Vector{Float64}}.

For general keyword arguments, see livemonitor_something.

GPUInspector.livemonitor_somethingMethod
livemonitor_something(f, duration) -> times, values

Monitor some property of GPU(s), as specified through the function f, over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.

The function f will be called on a vector of devices and should return a vector of Float64 values.

Keyword arguments:

  • freq (default: 1): polling rate in Hz.
  • devices (default: e.g. NVML.devices()): Devices to monitor.
  • plot (default: false): Create a unicode plot after the monitoring.
  • liveplot (default: false): Create and update a unicode plot during the monitoring. Use optional ylims to specify fixed y limits.
  • title (default: ""): Title used in unicode plots.
  • ylabel (default: "Values"): y label used in unicode plots.

See: livemonitor_temperature, livemonitor_powerusage

GPUInspector.livemonitor_temperatureMethod
livemonitor_temperature(duration) -> times, temperatures

Monitor the temperature of GPU(s) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.

For general keyword arguments, see livemonitor_something.

GPUInspector.monitoring_startMethod
monitoring_start(; devices, kwargs...)

Start monitoring of GPU temperature, utilization, power usage, etc.

Keyword arguments:

  • freq (default: 1): polling rate in Hz.
  • devices (default: e.g. CUDA.devices()): GPU devices to monitor.
  • thread (default: Threads.nthreads()): id of the Julia thread that should run the monitoring.
  • verbose (default: true): toggle verbose output.

See also monitoring_stop.

GPUInspector.savefig_monitoring_resultsFunction
savefig_monitoring_results(r::MonitoringResults, symbols=keys(r.results); ext=:pdf)

Save plots of the quantities specified through symbols of a MonitoringResults object to disk. Note: Only available if CairoMakie.jl is loaded next to GPUInspector.jl.

GPUInspector.MonitoringResultsType

Struct to hold the results of monitoring. This includes the time points (times), the monitored devices (devices), as well as a dictionary holding the (vector-)values of different quantities (identified by symbols) at each of the time points.

GPUInspector.plot_monitoring_resultsFunction
plot_monitoring_results(r::MonitoringResults, symbols=keys(r.results))

Plot the quantities specified through symbols of a MonitoringResults object. Will generate a textual in-terminal / in-logfile plot using UnicodePlots.jl.

diff --git a/previews/PR26/refs/peakflops_gpu/index.html b/previews/PR26/refs/peakflops_gpu/index.html index dc7e513..4d09133 100644 --- a/previews/PR26/refs/peakflops_gpu/index.html +++ b/previews/PR26/refs/peakflops_gpu/index.html @@ -1,2 +1,2 @@ -Peakflops · GPUInspector.jl

Peakflops

Index

References

GPUInspector.theoretical_peakflops_gpuMethod

Estimates the theoretical peak performance of a GPU device in TFLOP/s.

Keyword arguments:

  • verbose (default: true): toggle printing of information
  • device (default: e.g. CUDA.device()): GPU device to be analyzed
  • dtype (default: Float32): element type of the matrices
  • io (default: stdout): set the stream where the results should be printed.
+Peakflops · GPUInspector.jl

Peakflops

Index

References

GPUInspector.theoretical_peakflops_gpuMethod

Estimates the theoretical peak performance of a GPU device in TFLOP/s.

Keyword arguments:

  • verbose (default: true): toggle printing of information
  • device (default: e.g. CUDA.device()): GPU device to be analyzed
  • dtype (default: Float32): element type of the matrices
  • io (default: stdout): set the stream where the results should be printed.
diff --git a/previews/PR26/refs/stresstest_cpu/index.html b/previews/PR26/refs/stresstest_cpu/index.html index f48ea2c..9c8cc19 100644 --- a/previews/PR26/refs/stresstest_cpu/index.html +++ b/previews/PR26/refs/stresstest_cpu/index.html @@ -1,2 +1,2 @@ -CPU Stress Test · GPUInspector.jl

CPU Stresstest

Index

References

GPUInspector.stresstest_cpuMethod
stresstest_cpu(core_or_cores)

Run a CPU stress test (matrix multiplication) on one or multiple CPU cores, as specified by the positional argument. If no argument is provided (only) the currently active CPU core will be used.

Keyword arguments:

  • duration: stress test will take about the given time in seconds.
  • dtype (default: Float64): element type of the matrices
  • size (default: floor(Int, sqrt(L2_cachesize() / sizeof(dtype)))): matrices of size (size, size) will be used
  • verbose (default: true): toggle printing of information
  • parallel (default: true): If true, will (try to) run each CPU core test on a different Julia thread. Make sure to have enough Julia threads.
  • threads (default: nothing): If parallel == true, this argument may be used to specify the Julia threads to use.
+CPU Stress Test · GPUInspector.jl

CPU Stresstest

Index

References

GPUInspector.stresstest_cpuMethod
stresstest_cpu(core_or_cores)

Run a CPU stress test (matrix multiplication) on one or multiple CPU cores, as specified by the positional argument. If no argument is provided (only) the currently active CPU core will be used.

Keyword arguments:

  • duration: stress test will take about the given time in seconds.
  • dtype (default: Float64): element type of the matrices
  • size (default: floor(Int, sqrt(L2_cachesize() / sizeof(dtype)))): matrices of size (size, size) will be used
  • verbose (default: true): toggle printing of information
  • parallel (default: true): If true, will (try to) run each CPU core test on a different Julia thread. Make sure to have enough Julia threads.
  • threads (default: nothing): If parallel == true, this argument may be used to specify the Julia threads to use.
diff --git a/previews/PR26/refs/utility/index.html b/previews/PR26/refs/utility/index.html index 3b17004..9633218 100644 --- a/previews/PR26/refs/utility/index.html +++ b/previews/PR26/refs/utility/index.html @@ -1,2 +1,2 @@ -Utility · GPUInspector.jl
+Utility · GPUInspector.jl
diff --git a/previews/PR26/search/index.html b/previews/PR26/search/index.html index d34808d..92c9596 100644 --- a/previews/PR26/search/index.html +++ b/previews/PR26/search/index.html @@ -1,2 +1,2 @@ -Search · GPUInspector.jl

Loading search...

    +Search · GPUInspector.jl

    Loading search...