-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[L0 v2] implement USM allocation functions using UMF #2016
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Compute Benchmarks level_zero_v2 run (with params: --filter Hashtable): |
This comment was marked as outdated.
This comment was marked as outdated.
Compute Benchmarks level_zero_v2 run (with params: --verbose): |
Compute Benchmarks level_zero_v2 run (--verbose): |
Compute Benchmarks level_zero_v2 run (with params: --verbose): |
Compute Benchmarks level_zero_v2 run (--verbose): |
Compute Benchmarks level_zero_v2 run (with params: --verbose): |
Compute Benchmarks level_zero_v2 run (--verbose): |
Compute Benchmarks level_zero_v2 run (with params: --verbose): |
Compute Benchmarks level_zero_v2 run (--verbose): |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Compute Benchmarks level_zero_v2 run (with params: --verbose): |
Compute Benchmarks level_zero_v2 run (--verbose): Summaryresult is better
Chartsapi_overhead_benchmark_sycl SubmitKernel out of order---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl SubmitKernel out of order
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)
This PR (45.05 μs) : crit, 0, 45
baseline (50.186 μs) : 0, 50
- : 0, 0
- : 0, 0
api_overhead_benchmark_sycl SubmitKernel in order---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl SubmitKernel in order
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)
This PR (41.6 μs) : crit, 0, 41
baseline (49.206 μs) : 0, 49
- : 0, 0
- : 0, 0
api_overhead_benchmark_ur SubmitKernel out of order---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_ur SubmitKernel out of order
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)
This PR (25.512 μs) : crit, 0, 25
baseline (31.972 μs) : 0, 31
- : 0, 0
- : 0, 0
api_overhead_benchmark_ur SubmitKernel in order---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_ur SubmitKernel in order
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)
This PR (25.338 μs) : crit, 0, 25
baseline (29.597 μs) : 0, 29
- : 0, 0
- : 0, 0
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
todayMarker off
dateFormat X
axisFormat %s
section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB<br>count=100)
This PR (346.822 μs) : crit, 0, 346
baseline (478.666 μs) : 0, 478
- : 0, 0
- : 0, 0
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
todayMarker off
dateFormat X
axisFormat %s
section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Host<br>destinationPlacement=Device<br>size=1KB<br>count=100)
This PR (182.595 μs) : crit, 0, 182
baseline (277.92 μs) : 0, 277
- : 0, 0
- : 0, 0
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
todayMarker off
dateFormat X
axisFormat %s
section QueueMemcpy(api=sycl<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB)
This PR (10.465 μs) : crit, 0, 10
baseline (9.227 μs) : 0, 9
- : 0, 0
- : 0, 0
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
todayMarker off
dateFormat X
axisFormat %s
section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Device<br>dst=Device<br>size=1KB<br>ioq=0)
This PR (3.324 μs) : crit, 0, 3
baseline (4.546 μs) : 0, 4
- : 0, 0
- : 0, 0
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
todayMarker off
dateFormat X
axisFormat %s
section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Host<br>dst=Host<br>size=1KB<br>ioq=1)
This PR (3.306 μs) : crit, 0, 3
baseline (3.58 μs) : 0, 3
- : 0, 0
- : 0, 0
Velocity-Bench Hashtable---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench Hashtable
todayMarker off
dateFormat X
axisFormat %s
section hashtable
This PR (202.004102 M keys/sec) : crit, 0, 202
baseline (176.888802 M keys/sec) : 0, 176
- : 0, 0
- : 0, 0
Velocity-Bench Bitcracker---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench Bitcracker
todayMarker off
dateFormat X
axisFormat %s
section bitcracker
This PR (35.7243 s) : crit, 0, 35
baseline (35.8488 s) : 0, 35
- : 0, 0
- : 0, 0
Velocity-Bench Easywave---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench Easywave
todayMarker off
dateFormat X
axisFormat %s
section easywave
This PR (427 ms) : crit, 0, 427
baseline (389.0 ms) : 0, 389
- : 0, 0
- : 0, 0
Velocity-Bench QuickSilver---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench QuickSilver
todayMarker off
dateFormat X
axisFormat %s
section QuickSilver
This PR (116.09 MMS/CTT) : crit, 0, 116
baseline (117.3 MMS/CTT) : 0, 117
- : 0, 0
- : 0, 0
Velocity-Bench Sobel Filter---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench Sobel Filter
todayMarker off
dateFormat X
axisFormat %s
section sobel_filter
This PR (928.734 ms) : crit, 0, 928
baseline (856.488 ms) : 0, 856
- : 0, 0
- : 0, 0
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
todayMarker off
dateFormat X
axisFormat %s
section StreamMemory(api=sycl<br>type=Triad<br>size=10KB<br>useEvents=0<br>contents=Zeros<br>memoryPlacement=Device)
baseline (1.895 μs) : 0, 1
- : 0, 0
- : 0, 0
- : 0, 0
miscellaneous_benchmark_sycl VectorSum---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title miscellaneous_benchmark_sycl VectorSum
todayMarker off
dateFormat X
axisFormat %s
section VectorSum(api=sycl<br>numberOfElementsX=512<br>numberOfElementsY=256<br>numberOfElementsZ=256)
baseline (862.689 μs) : 0, 862
- : 0, 0
- : 0, 0
- : 0, 0
Velocity-Bench CudaSift---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench CudaSift
todayMarker off
dateFormat X
axisFormat %s
section cudaSift
baseline (270.543 ms) : 0, 270
- : 0, 0
- : 0, 0
- : 0, 0
DetailsSubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1)Environment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type hashtableEnvironment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.664431 s bitcrackerEnvironment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
e45a48d
to
ccf3a88
Compare
@pbalcer can we bump the UMF version used by UMF to the latest main already? |
Compute Benchmarks level_zero run (with params: ): |
Compute Benchmarks level_zero run (): |
Compute Benchmarks level_zero run (with params: ): |
Compute Benchmarks level_zero run (): |
Compute Benchmarks level_zero run (with params: --verbose): |
This comment was marked as outdated.
This comment was marked as outdated.
Compute Benchmarks level_zero_v2 run (with params: --verbose): |
Compute Benchmarks level_zero_v2 run (--verbose): SummaryTotal 130 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group Velocity-Bench (5): 99.677%
Relative perf in group Runtime (52): 108.854%
Relative perf in group MicroBench (17): 100.379%
Relative perf in group Pattern (14): 101.123%
Relative perf in group ScalarProduct (6): 102.451%
Relative perf in group USM (17): 101.780%
Relative perf in group SYCL2020 (2): 100.325%
Relative perf in group VectorAddition (3): 100.826%
Relative perf in group Polybench (13): 99.899%
Relative perf in group ReductionAtomic (4): 109.534%
Relative perf in group Kmeans (1): 99.889%
Relative perf in group LinearRegressionCoeff (1): 96.093%
Relative perf in group LinearRegression (1): 99.726%
Relative perf in group MatmulChain (1): 99.837%
Relative perf in group MolecularDynamics (1): 100.000%
Relative perf in group api (6): cannot calculate
Relative perf in group memory (4): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
DetailsBenchmark details - environment, command, output...Velocity-Bench HashtableEnvironment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/shared-actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.371803 s Velocity-Bench BitcrackerEnvironment Variables:UR_ADAPTERS_FORCE_LOAD=/home/test-user/shared-actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so Command:/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
Compute Benchmarks level_zero_v2 run (with params: --verbose): |
Compute Benchmarks level_zero_v2 run (--verbose): SummaryTotal 140 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group api (6): 112.443%
Relative perf in group memory (4): 92.471%
Relative perf in group Velocity-Bench (5): 99.912%
Relative perf in group Runtime (52): 108.515%
Relative perf in group MicroBench (17): 100.130%
Relative perf in group Pattern (14): 100.481%
Relative perf in group ScalarProduct (6): 102.720%
Relative perf in group USM (17): 103.018%
Relative perf in group SYCL2020 (2): 100.380%
Relative perf in group VectorAddition (3): 102.729%
Relative perf in group Polybench (13): 99.899%
Relative perf in group ReductionAtomic (4): 111.775%
Relative perf in group Kmeans (1): 99.833%
Relative perf in group LinearRegressionCoeff (1): 91.634%
Relative perf in group LinearRegression (1): 99.726%
Relative perf in group MatmulChain (1): 99.837%
Relative perf in group MolecularDynamics (1): 100.000%
Relative perf in group miscellaneous (1): cannot calculate
DetailsBenchmark details - environment, command, output...api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type Velocity-Bench HashtableEnvironment Variables:Command:/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.366547 s Velocity-Bench BitcrackerEnvironment Variables:Command:/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
to include oneapi-src/unified-runtime#2016
addPool took unique pool handle by reference and later passes rvalue ref to unordered_map::try_emplace. Make AddPool take rvalue ref to make it clear that ownership it passes to the pool manager
to include oneapi-src/unified-runtime#2016
to include oneapi-src/unified-runtime#2016
to include compilation fix for older compilers and fixes for L0 provider
Use UMF L0 provider
Calling loader APIs is incorrect - handles would have to be translated to and from loader handles. Also, using loader APIs without explictly linking with loaders results in linking failure on Windows. Fix this, by using function pointers.
to include oneapi-src/unified-runtime#2016
Based on: #2012