Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[L0 v2] implement USM allocation functions using UMF #2016

Merged
merged 5 commits into from
Oct 4, 2024

Conversation

igchor
Copy link
Member

@igchor igchor commented Aug 27, 2024

Based on: #2012

@igchor igchor requested review from a team as code owners August 27, 2024 23:02
@github-actions github-actions bot added common Changes or additions to common utilities conformance Conformance test suite issues. level-zero L0 adapter specific issues labels Aug 27, 2024
@github-actions github-actions bot added the ci/cd Continuous integration/devliery label Aug 28, 2024

This comment was marked as resolved.

This comment was marked as resolved.

Copy link

Compute Benchmarks level_zero_v2 run (with params: --filter Hashtable):
https://github.com/oneapi-src/unified-runtime/actions/runs/10607122480

This comment was marked as outdated.

Copy link

Compute Benchmarks level_zero_v2 run (with params: --verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10619463957

Copy link

Compute Benchmarks level_zero_v2 run (--verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10619463957
Job status: failure. Test status: failure.

Copy link

Compute Benchmarks level_zero_v2 run (with params: --verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10619793698

Copy link

Compute Benchmarks level_zero_v2 run (--verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10619793698
Job status: failure. Test status: failure.

Copy link

Compute Benchmarks level_zero_v2 run (with params: --verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10619793698

Copy link

Compute Benchmarks level_zero_v2 run (--verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10619793698
Job status: failure. Test status: failure.

Copy link

Compute Benchmarks level_zero_v2 run (with params: --verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10620485923

Copy link

Compute Benchmarks level_zero_v2 run (--verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10620485923
Job status: failure. Test status: failure.

This comment was marked as outdated.

This comment was marked as outdated.

Copy link

Compute Benchmarks level_zero_v2 run (with params: --verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10623623541

Copy link

Compute Benchmarks level_zero_v2 run (--verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/10623623541
Job status: success. Test status: success.

Summary

result is better

Benchmark This PR baseline
api_overhead_benchmark_sycl SubmitKernel out of order 45.05 50.186
api_overhead_benchmark_sycl SubmitKernel in order 41.6 49.206
api_overhead_benchmark_ur SubmitKernel out of order 25.512 31.972
api_overhead_benchmark_ur SubmitKernel in order 25.338 29.597
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 346.822 478.666
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 182.595 277.92
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 10.465 9.227
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 3.324 4.546
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 3.306 3.58
Velocity-Bench Hashtable 202.004102 176.888802
Velocity-Bench Bitcracker 35.7243 35.8488
Velocity-Bench Easywave 427 389.0
Velocity-Bench QuickSilver 116.09 117.3
Velocity-Bench Sobel Filter 928.734 856.488
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 1.895
miscellaneous_benchmark_sycl VectorSum - 862.689
Velocity-Bench CudaSift - 270.543

Charts

api_overhead_benchmark_sycl SubmitKernel out of order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel out of order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (45.05 μs)   : crit, 0, 45

        baseline (50.186 μs)   :  0, 50

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl SubmitKernel in order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel in order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (41.6 μs)   : crit, 0, 41

        baseline (49.206 μs)   :  0, 49

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_ur SubmitKernel out of order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_ur SubmitKernel out of order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (25.512 μs)   : crit, 0, 25

        baseline (31.972 μs)   :  0, 31

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_ur SubmitKernel in order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_ur SubmitKernel in order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (25.338 μs)   : crit, 0, 25

        baseline (29.597 μs)   :  0, 29

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (346.822 μs)   : crit, 0, 346

        baseline (478.666 μs)   :  0, 478

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Host<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (182.595 μs)   : crit, 0, 182

        baseline (277.92 μs)   :  0, 277

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueMemcpy(api=sycl<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB)

        This PR (10.465 μs)   : crit, 0, 10

        baseline (9.227 μs)   :  0, 9

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Device<br>dst=Device<br>size=1KB<br>ioq=0)

        This PR (3.324 μs)   : crit, 0, 3

        baseline (4.546 μs)   :  0, 4

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Host<br>dst=Host<br>size=1KB<br>ioq=1)

        This PR (3.306 μs)   : crit, 0, 3

        baseline (3.58 μs)   :  0, 3

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Hashtable
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Hashtable
    todayMarker off
    dateFormat  X
    axisFormat %s

    section hashtable

        This PR (202.004102 M keys/sec)   : crit, 0, 202

        baseline (176.888802 M keys/sec)   :  0, 176

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Bitcracker
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Bitcracker
    todayMarker off
    dateFormat  X
    axisFormat %s

    section bitcracker

        This PR (35.7243 s)   : crit, 0, 35

        baseline (35.8488 s)   :  0, 35

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Easywave
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Easywave
    todayMarker off
    dateFormat  X
    axisFormat %s

    section easywave

        This PR (427 ms)   : crit, 0, 427

        baseline (389.0 ms)   :  0, 389

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench QuickSilver
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench QuickSilver
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QuickSilver

        This PR (116.09 MMS/CTT)   : crit, 0, 116

        baseline (117.3 MMS/CTT)   :  0, 117

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Sobel Filter
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Sobel Filter
    todayMarker off
    dateFormat  X
    axisFormat %s

    section sobel_filter

        This PR (928.734 ms)   : crit, 0, 928

        baseline (856.488 ms)   :  0, 856

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
    todayMarker off
    dateFormat  X
    axisFormat %s

    section StreamMemory(api=sycl<br>type=Triad<br>size=10KB<br>useEvents=0<br>contents=Zeros<br>memoryPlacement=Device)

        baseline (1.895 μs)   :  0, 1

    -   : 0, 0

    -   : 0, 0

    -   : 0, 0

Loading
miscellaneous_benchmark_sycl VectorSum
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title miscellaneous_benchmark_sycl VectorSum
    todayMarker off
    dateFormat  X
    axisFormat %s

    section VectorSum(api=sycl<br>numberOfElementsX=512<br>numberOfElementsY=256<br>numberOfElementsZ=256)

        baseline (862.689 μs)   :  0, 862

    -   : 0, 0

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench CudaSift
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench CudaSift
    todayMarker off
    dateFormat  X
    axisFormat %s

    section cudaSift

        baseline (270.543 ms)   :  0, 270

    -   : 0, 0

    -   : 0, 0

    -   : 0, 0

Loading

Details

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),45.050,44.394,8.96%,41.997,781.648,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),41.600,45.988,24.08%,21.211,567.860,[CPU],[us]

SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.512,25.731,9.15%,19.203,508.170,[CPU],[us]

SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.338,25.022,9.74%,23.012,507.432,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),346.822,345.414,6.46%,322.391,822.965,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),182.595,162.637,27.71%,161.288,472.563,[CPU],[us]

QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),10.465,10.317,23.84%,7.922,197.517,[CPU],[us]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),3.324,3.209,23.99%,2.842,203.942,[CPU],[us]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1)

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/bench_workdir/compute-benchmarks-build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),3.306,3.188,26.68%,2.821,204.096,[CPU],[us]

hashtable

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.664431 s
202.004102 million keys/second

bitcracker

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.0151186 s
bitcracker - total time for whole calculation: 35.7243 s

easywave

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 7.219910e-01 6.300730e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 5.226400e-01 7.647550e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 5.478500e-01 7.819800e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 5.990010e-01 8.531390e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.799900e-01 7.997050e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 5.333440e-01 7.794890e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 5.204820e-01 7.799120e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 4.628000e-01 7.968360e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 5.313590e-01 7.994310e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 5.163230e-01 7.746930e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.310e+07 1.310e+07 1.310e+07 0.000e+00 100.00
cycleInit 10 5.336e+06 5.336e+06 5.336e+06 0.000e+00 100.00
cycleTracking 10 7.760e+06 7.760e+06 7.760e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.941e+06 4.941e+06 4.941e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.358e+05 2.358e+05 2.358e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 7.280e+02 7.280e+02 7.280e+02 0.000e+00 100.00
Figure Of Merit 116.09 [Num Mega Segments / Cycle Tracking Time]

sobel_filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 15.0138 s
sobelfilter - total time for whole calculation: 0.928734 s

@igchor igchor force-pushed the usm_rebased branch 5 times, most recently from e45a48d to ccf3a88 Compare August 30, 2024 17:36
@igchor
Copy link
Member Author

igchor commented Sep 25, 2024

@pbalcer can we bump the UMF version used by UMF to the latest main already?

@igchor igchor marked this pull request as ready for review September 27, 2024 16:22
Copy link

github-actions bot commented Oct 1, 2024

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11124493690

Copy link

github-actions bot commented Oct 1, 2024

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11124493690
Job status: failure. Test status: skipped.

Copy link

github-actions bot commented Oct 1, 2024

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11124493690

Copy link

github-actions bot commented Oct 1, 2024

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11124493690
Job status: failure. Test status: skipped.

Copy link

github-actions bot commented Oct 1, 2024

Compute Benchmarks level_zero run (with params: --verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/11131954357

This comment was marked as outdated.

Copy link

github-actions bot commented Oct 1, 2024

Compute Benchmarks level_zero_v2 run (with params: --verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/11133967294

Copy link

github-actions bot commented Oct 1, 2024

Compute Benchmarks level_zero_v2 run (--verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/11133967294
Job status: success. Test status: success.

Summary

Total 130 benchmarks in mean.
Geomean 103.960%.
Improved 39 Regressed 9 (threshold 0.50%)

(result is better)

Performance change in benchmark groups

Relative perf in group Velocity-Bench (5): 99.677%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable 360.991416 M keys/sec 356.852 M keys/sec 101.16% 1.16% .
Velocity-Bench Bitcracker 35.358500 s 35.544 s 100.52% 0.52% .
Velocity-Bench CudaSift 221.393000 ms 222.221 ms 100.37% 0.37% .
Velocity-Bench Sobel Filter 556.629 ms 549.907000 ms 98.79% -1.21% .
Velocity-Bench QuickSilver 115.310 MMS/CTT 118.170000 MMS/CTT 97.58% -2.42% .
Relative perf in group Runtime (52): 108.854%
Benchmark This PR baseline Relative perf Change -
Runtime_BlockedTransform_iter_128_blocksize_1024 0.301000 ms 0.544 ms 180.73% 80.73% ++++++++++
Runtime_BlockedTransform_iter_64_blocksize_1024 0.348000 ms 0.625 ms 179.60% 79.60% ++++++++++
Runtime_BlockedTransform_iter_256_blocksize_1024 0.317000 ms 0.499 ms 157.41% 57.41% +++++++
Runtime_BlockedTransform_iter_256_blocksize_2048 0.266000 ms 0.395 ms 148.50% 48.50% ++++++
Runtime_BlockedTransform_iter_512_blocksize_2048 0.297000 ms 0.403 ms 135.69% 35.69% ++++
Runtime_BlockedTransform_iter_64_blocksize_2048 0.252000 ms 0.340 ms 134.92% 34.92% ++++
Runtime_BlockedTransform_iter_256_blocksize_4096 0.268000 ms 0.354 ms 132.09% 32.09% ++++
Runtime_BlockedTransform_iter_512_blocksize_4096 0.276000 ms 0.359 ms 130.07% 30.07% ++++
Runtime_BlockedTransform_iter_64_blocksize_4096 0.235000 ms 0.304 ms 129.36% 29.36% ++++
Runtime_BlockedTransform_iter_512_blocksize_8192 0.278000 ms 0.345 ms 124.10% 24.10% +++
Runtime_BlockedTransform_iter_256_blocksize_8192 0.284000 ms 0.334 ms 117.61% 17.61% ++
Runtime_BlockedTransform_iter_128_blocksize_4096 0.227000 ms 0.263 ms 115.86% 15.86% ++
Runtime_BlockedTransform_iter_128_blocksize_2048 0.253000 ms 0.288 ms 113.83% 13.83% ++
Runtime_BlockedTransform_iter_128_blocksize_8192 0.234000 ms 0.255 ms 108.97% 8.97% +
Runtime_BlockedTransform_iter_512_blocksize_1024 0.450000 ms 0.486 ms 108.00% 8.00% +
Runtime_BlockedTransform_iter_64_blocksize_8192 0.275000 ms 0.285 ms 103.64% 3.64% .
Runtime_BlockedTransform_iter_128_blocksize_262144 2.469000 ms 2.469 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_256 0.341000 ms 0.341 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_16384 2.241000 ms 2.241 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_131072 2.469000 ms 2.469 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_262144 2.367000 ms 2.367 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_256 0.084000 ms 0.084 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_262144 2.580000 ms 2.580 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_524288 2.578000 ms 2.578 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_65536 2.591000 ms 2.591 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_65536 2.513000 ms 2.513 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_32768 2.364000 ms 2.364 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_16384 2.686000 ms 2.686 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_16384 2.288000 ms 2.288 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_32768 2.421000 ms 2.421 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_65536 2.608000 ms 2.608 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_256 0.156000 ms 0.156 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_256 0.079000 ms 0.079 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_32768 2.421000 ms 2.421 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_524288 2.573000 ms 2.573 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_65536 2.543000 ms 2.543 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_16384 2.551000 ms 2.551 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_32768 2.450000 ms 2.450 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_131072 2.424000 ms 2.424 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_262144 2.519000 ms 2.519 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_524288 2.492000 ms 2.492 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_131072 2.519000 ms 2.519 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_131072 2.559000 ms 2.559 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_524288 2.748000 ms 2.748 ms 100.00% 0.00% .
Runtime_DAGTaskThroughput_NDRangeParallelFor 5.215 ms 5.187000 ms 99.46% -0.54% .
Runtime_DAGTaskThroughput_HierarchicalParallelFor 5.828 ms 5.579000 ms 95.73% -4.27% -
Runtime_DAGTaskThroughput_SingleTask 7.844 ms 7.396000 ms 94.29% -5.71% -
Runtime_DAGTaskThroughput_BasicParallelFor 6.856 ms 6.184000 ms 90.20% -9.80% -
Runtime_IndependentDAGTaskThroughput_SingleTask - 7.088000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 5.602000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 5.613000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 5.818000 ms
Relative perf in group MicroBench (17): 100.379%
Benchmark This PR baseline Relative perf Change -
MicroBench_L2_fp32_4 0.025000 ms 0.026 ms 104.00% 4.00% .
MicroBench_Arith_int32_512 0.072000 ms 0.073 ms 101.39% 1.39% .
MicroBench_LocalMem_int32_4096 0.228000 ms 0.228 ms 100.00% 0.00% .
MicroBench_LocalMem_fp32_4096 0.200000 ms 0.200 ms 100.00% 0.00% .
MicroBench_L2_int32_1 0.033000 ms 0.033 ms 100.00% 0.00% .
MicroBench_L2_fp32_16 0.025000 ms 0.025 ms 100.00% 0.00% .
MicroBench_L2_fp32_2 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_L2_fp32_8 0.025000 ms 0.025 ms 100.00% 0.00% .
MicroBench_L2_fp32_1 0.025000 ms 0.025 ms 100.00% 0.00% .
MicroBench_L2_int32_2 0.027000 ms 0.027 ms 100.00% 0.00% .
MicroBench_L2_int32_8 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_L2_int32_16 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_L2_int32_4 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_Arith_fp32_512 0.032000 ms 0.032 ms 100.00% 0.00% .
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 794.532000 ms -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 1613.821000 ms -
MicroBench_sf_fp32_16 - 0.025000 ms
Relative perf in group Pattern (14): 101.123%
Benchmark This PR baseline Relative perf Change -
Pattern_SegmentedReduction_NDRange_int64 0.015000 ms 0.016 ms 106.67% 6.67% +
Pattern_Reduction_NDRange_fp32 0.024000 ms 0.025 ms 104.17% 4.17% +
Pattern_SegmentedReduction_NDRange_int32 0.026000 ms 0.027 ms 103.85% 3.85% .
Pattern_Reduction_NDRange_int32 0.075000 ms 0.076 ms 101.33% 1.33% .
Pattern_Reduction_Hierarchical_int32 0.052000 ms 0.052 ms 100.00% 0.00% .
Pattern_Reduction_NDRange_int64 0.052000 ms 0.052 ms 100.00% 0.00% .
Pattern_Reduction_Hierarchical_int64 0.050000 ms 0.050 ms 100.00% 0.00% .
Pattern_Reduction_Hierarchical_fp32 0.049000 ms 0.049 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_int16 0.030000 ms 0.030 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_int32 0.028000 ms 0.028 ms 100.00% 0.00% .
Pattern_SegmentedReduction_NDRange_int16 0.044000 ms 0.044 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_fp32 0.030000 ms 0.030 ms 100.00% 0.00% .
Pattern_SegmentedReduction_NDRange_fp32 0.014000 ms 0.014 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_int64 0.029000 ms 0.029 ms 100.00% 0.00% .
Relative perf in group ScalarProduct (6): 102.451%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 0.128000 ms 0.149 ms 116.41% 16.41% ++
ScalarProduct_NDRange_int64 0.098000 ms 0.099 ms 101.02% 1.02% .
ScalarProduct_Hierarchical_int64 0.063000 ms 0.063 ms 100.00% 0.00% .
ScalarProduct_Hierarchical_int32 0.062000 ms 0.062 ms 100.00% 0.00% .
ScalarProduct_NDRange_fp32 0.040000 ms 0.040 ms 100.00% 0.00% .
ScalarProduct_Hierarchical_fp32 0.060 ms 0.059000 ms 98.33% -1.67% .
Relative perf in group USM (17): 101.780%
Benchmark This PR baseline Relative perf Change -
USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1 0.010000 ms 0.011 ms 110.00% 10.00% +
USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1 0.013000 ms 0.014 ms 107.69% 7.69% +
USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1 0.018000 ms 0.019 ms 105.56% 5.56% +
USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1 0.410000 ms 0.426 ms 103.90% 3.90% .
USM_Allocation_latency_fp32_shared 0.116000 ms 0.117 ms 100.86% 0.86% .
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.778000 ms 1.791 ms 100.73% 0.73% .
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 3.258000 ms 3.277 ms 100.58% 0.58% .
USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch 15.412000 ms 15.462 ms 100.32% 0.32% .
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.871000 ms 1.876 ms 100.27% 0.27% .
USM_Latency_fp32_in_order__ 33.604000 ms 33.686 ms 100.24% 0.24% .
USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch 14.140000 ms 14.174 ms 100.24% 0.24% .
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 3.104000 ms 3.110 ms 100.19% 0.19% .
USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch 13.719000 ms 13.740 ms 100.15% 0.15% .
USM_Latency_fp32_out_of_order__ 46.678000 ms 46.732 ms 100.12% 0.12% .
USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch 15.307000 ms 15.324 ms 100.11% 0.11% .
USM_Allocation_latency_fp32_device 0.008000 ms 0.008 ms 100.00% 0.00% .
USM_Allocation_latency_fp32_host 0.002000 ms 0.002 ms 100.00% 0.00% .
Relative perf in group SYCL2020 (2): 100.325%
Benchmark This PR baseline Relative perf Change -
SYCL2020_Accessors_Latency_fp32_out_of_order__ 70.397000 ms 70.770 ms 100.53% 0.53% .
SYCL2020_Accessors_Latency_fp32_in_order__ 68.287000 ms 68.370 ms 100.12% 0.12% .
Relative perf in group VectorAddition (3): 100.826%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int64 0.040000 ms 0.041 ms 102.50% 2.50% .
VectorAddition_int32 0.037000 ms 0.037 ms 100.00% 0.00% .
VectorAddition_fp32 0.033000 ms 0.033 ms 100.00% 0.00% .
Relative perf in group Polybench (13): 99.899%
Benchmark This PR baseline Relative perf Change -
Polybench_2mm 1.238000 ms 1.239 ms 100.08% 0.08% .
Polybench_3mm 1.744000 ms 1.745 ms 100.06% 0.06% .
Polybench_Atax 6.901000 ms 6.902 ms 100.01% 0.01% .
Polybench_Gramschmidt 285.038000 ms 285.054 ms 100.01% 0.01% .
Polybench_2DConvolution 0.229000 ms 0.229 ms 100.00% 0.00% .
Polybench_Gesummv 7.308000 ms 7.308 ms 100.00% 0.00% .
Polybench_Bicg 5.134 ms 5.133000 ms 99.98% -0.02% .
Polybench_Syrk 3.225 ms 3.222000 ms 99.91% -0.09% .
Polybench_Syr2k 6.393 ms 6.386000 ms 99.89% -0.11% .
Polybench_Mvt 3.657 ms 3.650000 ms 99.81% -0.19% .
Polybench_Correlation 95.859 ms 95.656000 ms 99.79% -0.21% .
Polybench_Covariance 95.656 ms 94.948000 ms 99.26% -0.74% .
Polybench_Gemm 3.962000 ms -
Relative perf in group ReductionAtomic (4): 109.534%
Benchmark This PR baseline Relative perf Change -
ReductionAtomic_int64 0.036000 ms 0.041 ms 113.89% 13.89% ++
ReductionAtomic_int32 0.036000 ms 0.041 ms 113.89% 13.89% ++
ReductionAtomic_fp32 0.036000 ms 0.039 ms 108.33% 8.33% +
ReductionAtomic_fp64 0.041000 ms 0.042 ms 102.44% 2.44% .
Relative perf in group Kmeans (1): 99.889%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 1.797 ms 1.795000 ms 99.89% -0.11% .
Relative perf in group LinearRegressionCoeff (1): 96.093%
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 1.459 ms 1.402000 ms 96.09% -3.91% .
Relative perf in group LinearRegression (1): 99.726%
Benchmark This PR baseline Relative perf Change -
LinearRegression_fp32 0.365 ms 0.364000 ms 99.73% -0.27% .
Relative perf in group MatmulChain (1): 99.837%
Benchmark This PR baseline Relative perf Change -
MatmulChain 11.064 ms 11.046000 ms 99.84% -0.16% .
Relative perf in group MolecularDynamics (1): 100.000%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.065000 ms 0.065 ms 100.00% 0.00% .
Relative perf in group api (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_sycl SubmitKernel out of order - 23.181000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 23.015000 μs
api_overhead_benchmark_ur SubmitKernel out of order - 14.612000 μs
api_overhead_benchmark_ur SubmitKernel in order - 13.659000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.406000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.690000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 - 228.730000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 - 121.251000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.882000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.193000 μs
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 857.978000 μs

Details

Benchmark details - environment, command, output...
Velocity-Bench Hashtable

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/shared-actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.371803 s
360.991416 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/shared-actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00425163 s
bitcracker - total time for whole calculation: 35.3585 s

Velocity-Bench CudaSift

Environment Variables:

UR_ADAPTERS_FORCE_LOAD=/home/test-user/shared-actions-runner/_work/unified-runtime/unified-runtime/ur-repo/build/lib/libur_adapter_level_zero_v2.so

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1129 1264 30.6544% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1266 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1266 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1107 1276 30.057% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1270 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1209 1246 32.8265% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1254 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1276 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1111 1260 30.1656% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1259 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1123 1273 30.4914% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1262 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1118 1270 30.3557% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1269 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1090 1261 29.5954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1019 1266 27.6677% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1269 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1267 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1267 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1072 1264 29.1067% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1115 1263 30.2742% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1115 1275 30.2742% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1275 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1067 1256 28.9709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1273 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1202 1254 32.6364% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1269 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1259 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1249 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1180 1267 32.0391% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1113 1255 30.2199% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1126 1253 30.5729% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1273 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1269 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1256 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1261 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1270 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1258 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1276 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1085 1269 29.4597% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1261 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1248 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1097 1272 29.7855% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1270 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1206 1264 32.745% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1265 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1114 1269 30.2471% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 221.393 ms

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.326610e-01 6.302120e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.655510e-01 7.765590e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.617220e-01 7.859080e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.602270e-01 8.468590e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.616570e-01 8.143440e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.618570e-01 7.876820e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.385570e-01 7.809320e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.300500e-01 8.062310e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.293980e-01 8.082470e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.299770e-01 7.756320e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.139e+07 1.139e+07 1.139e+07 0.000e+00 100.00
cycleInit 10 3.572e+06 3.572e+06 3.572e+06 0.000e+00 100.00
cycleTracking 10 7.813e+06 7.813e+06 7.813e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.948e+06 4.948e+06 4.948e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.269e+05 2.269e+05 2.269e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.200e+02 4.200e+02 4.200e+02 0.000e+00 100.00
Figure Of Merit 115.31 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.45846 s
sobelfilter - total time for whole calculation: 0.556629 s

Runtime_BlockedTransform_iter_128_blocksize_262144

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_262144', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002465', '0.002469', '0.002358', '0.002358 0.002469 0.002569', '0.000105', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000251', '0.000268', '0.000212', '0.000212 0.000268 0.000274', '0.000034', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_256

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_256', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.002352', '0.000341', '0.000186', '0.000186 0.000341 0.006528', '0.003618', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_16384

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_16384', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002506', '0.002241', '0.002218', '0.002218 0.002241 0.003058', '0.000479', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000313', '0.000301', '0.000250', '0.000250 0.000301 0.000389', '0.000070', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_131072

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_131072', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002720', '0.002469', '0.002258', '0.002258 0.002469 0.003434', '0.000627', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000268', '0.000252', '0.000247', '0.000247 0.000252 0.000305', '0.000032', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_262144

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_262144', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002654', '0.002367', '0.002305', '0.002305 0.002367 0.003289', '0.000551', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_256

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_256', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000095', '0.000084', '0.000077', '0.000077 0.000084 0.000123', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000300', '0.000284', '0.000278', '0.000278 0.000284 0.000338', '0.000033', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_262144

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_262144', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002683', '0.002580', '0.002427', '0.002427 0.002580 0.003041', '0.000320', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_524288

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_524288', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002673', '0.002578', '0.002186', '0.002186 0.002578 0.003254', '0.000540', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_65536

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_65536', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002824', '0.002591', '0.002512', '0.002512 0.002591 0.003370', '0.000474', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_65536

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_65536', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002559', '0.002513', '0.002472', '0.002472 0.002513 0.002691', '0.000116', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_32768

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_32768', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002597', '0.002364', '0.002245', '0.002245 0.002364 0.003182', '0.000510', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000261', '0.000275', '0.000231', '0.000231 0.000275 0.000277', '0.000026', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000268', '0.000276', '0.000245', '0.000245 0.000276 0.000283', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_16384

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_16384', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002771', '0.002686', '0.002300', '0.002300 0.002686 0.003328', '0.000519', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_16384

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_16384', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002376', '0.002288', '0.002252', '0.002252 0.002288 0.002587', '0.000184', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_32768

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_32768', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002499', '0.002421', '0.002141', '0.002141 0.002421 0.002934', '0.000402', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_65536

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_65536', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002577', '0.002608', '0.002157', '0.002157 0.002608 0.002967', '0.000406', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000262', '0.000253', '0.000249', '0.000249 0.000253 0.000284', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_256

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_256', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000207', '0.000156', '0.000122', '0.000122 0.000156 0.000345', '0.000120', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_256

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_256', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000116', '0.000079', '0.000076', '0.000076 0.000079 0.000192', '0.000066', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000218', '0.000227', '0.000197', '0.000197 0.000227 0.000231', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000252', '0.000235', '0.000234', '0.000234 0.000235 0.000287', '0.000030', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000765', '0.000348', '0.000301', '0.000301 0.000348 0.001647', '0.000764', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000320', '0.000317', '0.000296', '0.000296 0.000317 0.000347', '0.000026', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_32768

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_32768', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002445', '0.002421', '0.002410', '0.002410 0.002421 0.002506', '0.000052', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000289', '0.000278', '0.000268', '0.000268 0.000278 0.000322', '0.000029', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000449', '0.000450', '0.000413', '0.000413 0.000450 0.000483', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000232', '0.000234', '0.000223', '0.000223 0.000234 0.000238', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_524288

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_524288', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002655', '0.002573', '0.002562', '0.002562 0.002573 0.002831', '0.000152', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_65536

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_65536', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002574', '0.002543', '0.002440', '0.002440 0.002543 0.002739', '0.000152', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_16384

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_16384', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002923', '0.002551', '0.002523', '0.002523 0.002551 0.003697', '0.000670', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_32768

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_32768', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002386', '0.002450', '0.002212', '0.002212 0.002450 0.002496', '0.000153', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_131072

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_131072', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002449', '0.002424', '0.002306', '0.002306 0.002424 0.002617', '0.000157', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_262144

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_262144', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002657', '0.002519', '0.002191', '0.002191 0.002519 0.003261', '0.000548', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_524288

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_524288', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002515', '0.002492', '0.002465', '0.002465 0.002492 0.002589', '0.000065', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_131072

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_131072', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002705', '0.002519', '0.002329', '0.002329 0.002519 0.003268', '0.000496', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_131072

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_131072', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002730', '0.002559', '0.002296', '0.002296 0.002559 0.003335', '0.000540', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_524288

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_524288', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002712', '0.002748', '0.002529', '0.002529 0.002748 0.002858', '0.000167', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000255', '0.000266', '0.000226', '0.000226 0.000266 0.000271', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000297', '0.000297', '0.000294', '0.000294 0.000297 0.000300', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.006839', '0.006856', '0.006567', '0.006567 0.006856 0.007092', '0.000263', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.008119', '0.007844', '0.007174', '0.007174 0.007844 0.009339', '0.001109', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005283', '0.005215', '0.005173', '0.005173 0.005215 0.005460', '0.000155', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005823', '0.005828', '0.005565', '0.005565 0.005828 0.006077', '0.000256', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.793877', '0.794532', '0.792546', '0.792546 0.794532 0.794553', '0.001153', '34.067402', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '27.000000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '1.624108', '1.613821', '1.613743', '1.613743 1.613821 1.644759', '0.017885', '16.731285', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '27.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=512

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000243', '0.000228', '0.000210', '0.000210 0.000228 0.000292', '0.000043', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=512

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000207', '0.000200', '0.000196', '0.000196 0.000200 0.000225', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

MicroBench_L2_int32_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000050', '0.000033', '0.000027', '0.000027 0.000033 0.000091', '0.000036', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_16', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000032', '0.000025', '0.000025', '0.000025 0.000025 0.000047', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_2

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_2', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000034', '0.000026', '0.000026', '0.000026 0.000026 0.000050', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_8

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_8', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000030', '0.000025', '0.000025', '0.000025 0.000025 0.000038', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000031', '0.000025', '0.000024', '0.000024 0.000025 0.000043', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_4

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_4', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000030', '0.000025', '0.000023', '0.000023 0.000025 0.000040', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_2

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_2', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000038', '0.000027', '0.000026', '0.000026 0.000027 0.000060', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_8

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_8', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000030', '0.000026', '0.000025', '0.000025 0.000026 0.000040', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_16', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000034', '0.000026', '0.000025', '0.000025 0.000026 0.000050', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_4

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_4', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000031', '0.000026', '0.000025', '0.000025 0.000026 0.000043', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000064', '0.000052', '0.000052', '0.000052 0.000052 0.000086', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000041', '0.000024', '0.000022', '0.000022 0.000024 0.000076', '0.000031', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000061', '0.000052', '0.000047', '0.000047 0.000052 0.000085', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000090', '0.000075', '0.000059', '0.000059 0.000075 0.000135', '0.000040', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000184', '0.000050', '0.000045', '0.000045 0.000050 0.000456', '0.000236', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000056', '0.000049', '0.000048', '0.000048 0.000049 0.000072', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000107', '0.000098', '0.000041', '0.000041 0.000098 0.000183', '0.000071', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000074', '0.000063', '0.000062', '0.000062 0.000063 0.000098', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000083', '0.000062', '0.000060', '0.000060 0.000062 0.000127', '0.000038', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000087', '0.000060', '0.000059', '0.000059 0.000060 0.000142', '0.000048', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000170', '0.000128', '0.000097', '0.000097 0.000128 0.000285', '0.000101', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000071', '0.000040', '0.000039', '0.000039 0.000040 0.000133', '0.000054', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000033', '0.000030', '0.000029', '0.000029 0.000030 0.000038', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000041', '0.000026', '0.000024', '0.000024 0.000026 0.000074', '0.000028', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000032', '0.000028', '0.000027', '0.000027 0.000028 0.000041', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000072', '0.000044', '0.000030', '0.000030 0.000044 0.000142', '0.000061', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000033', '0.000030', '0.000028', '0.000028 0.000030 0.000041', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000016', '0.000014', '0.000013', '0.000013 0.000014 0.000022', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000017', '0.000015', '0.000014', '0.000014 0.000015 0.000021', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000032', '0.000029', '0.000029', '0.000029 0.000029 0.000037', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Latency_fp32_in_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['USM_Latency_fp32_in_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.032041', '0.033604', '0.028811', '0.028811 0.033604 0.033706', '0.002797', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Latency_fp32_out_of_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['USM_Latency_fp32_out_of_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.045736', '0.046678', '0.043567', '0.043567 0.046678 0.046964', '0.001884', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

SYCL2020_Accessors_Latency_fp32_out_of_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['SYCL2020_Accessors_Latency_fp32_out_of_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.075511', '0.070397', '0.070184', '0.070184 0.070397 0.085952', '0.009042', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

SYCL2020_Accessors_Latency_fp32_in_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['SYCL2020_Accessors_Latency_fp32_in_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.068167', '0.068287', '0.067740', '0.067740 0.068287 0.068472', '0.000381', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000041', '0.000008', '0.000002', '0.000002 0.000008 0.000112', '0.000062', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000153', '0.000116', '0.000104', '0.000104 0.000116 0.000239', '0.000075', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000003', '0.000002', '0.000002', '0.000002 0.000002 0.000004', '0.000001', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001866', '0.001871', '0.001856', '0.001856 0.001871 0.001872', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.003103', '0.003104', '0.003019', '0.003019 0.003104 0.003186', '0.000084', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001790', '0.001778', '0.001772', '0.001772 0.001778 0.001821', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.013719', '0.013719', '0.013719', '0.013719', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.015307', '0.015307', '0.015307', '0.015307', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.014140', '0.014140', '0.014140', '0.014140', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.015412', '0.015412', '0.015412', '0.015412', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.004721', '0.003258', '0.003182', '0.003182 0.003258 0.007725', '0.002601', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000025', '0.000018', '0.000017', '0.000017 0.000018 0.000040', '0.000013', '0.667255', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000517', '0.000410', '0.000151', '0.000151 0.000410 0.000992', '0.000430', '0.075735', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000035', '0.000010', '0.000008', '0.000008 0.000010 0.000087', '0.000045', '1.454881', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000089', '0.000013', '0.000008', '0.000008 0.000013 0.000245', '0.000135', '1.354170', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000060', '0.000037', '0.000031', '0.000031 0.000037 0.000114', '0.000046', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000047', '0.000040', '0.000036', '0.000036 0.000040 0.000066', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000039', '0.000033', '0.000030', '0.000030 0.000033 0.000054', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2DConvolution

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2DConvolution --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2DConvolution.csv

Output:

['Polybench_2DConvolution', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000235', '0.000229', '0.000213', '0.000213 0.000229 0.000262', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001261', '0.001238', '0.001235', '0.001235 0.001238 0.001311', '0.000043', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001745', '0.001744', '0.001734', '0.001734 0.001744 0.001756', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_Arith_fp32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_fp32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000043', '0.000032', '0.000029', '0.000029 0.000032 0.000069', '0.000022', '1072.151508', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

MicroBench_Arith_int32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_int32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000148', '0.000072', '0.000058', '0.000058 0.000072 0.000314', '0.000144', '539.034740', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006907', '0.006901', '0.006871', '0.006871 0.006901 0.006950', '0.000040', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000040', '0.000036', '0.000034', '0.000034 0.000036 0.000052', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000215', '0.000036', '0.000028', '0.000028 0.000036 0.000582', '0.000318', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000061', '0.000036', '0.000031', '0.000031 0.000036 0.000117', '0.000048', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000218', '0.000041', '0.000036', '0.000036 0.000041 0.000577', '0.000311', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Bicg

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/bicg --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Bicg.csv --size=20480

Output:

['Polybench_Bicg', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.005134', '0.005134', '0.005134', '0.005134', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Correlation

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/correlation --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Correlation.csv --size=2048

Output:

['Polybench_Correlation', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.095859', '0.095859', '0.095859', '0.095859', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Covariance

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/covariance --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Covariance.csv --size=2048

Output:

['Polybench_Covariance', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.095656', '0.095656', '0.095656', '0.095656', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gemm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gemm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gemm.csv --size=8192

Output:

['Polybench_Gemm', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024', '0.003962', '0.003962', '0.003962', '0.003962', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gesummv

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gesummv --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gesummv.csv --size=8192

Output:

['Polybench_Gesummv', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.007308', '0.007308', '0.007308', '0.007308', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gramschmidt

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gramschmidt --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gramschmidt.csv --size=512

Output:

['Polybench_Gramschmidt', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.285038', '0.285038', '0.285038', '0.285038', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '67108864', '0.001803', '0.001797', '0.001789', '0.001789 0.001797 0.001822', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001670', '0.001459', '0.001053', '0.001053 0.001459 0.002499', '0.000746', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegression_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_error --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LinearRegression.csv --size=640000

Output:

['LinearRegression_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000390', '0.000365', '0.000351', '0.000351 0.000365 0.000452', '0.000055', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MatmulChain

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/matmulchain --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MatmulChain.csv --size=2048

Output:

['MatmulChain', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024', '0.011064', '0.011064', '0.011064', '0.011064', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000080', '0.000065', '0.000057', '0.000057 0.000065 0.000119', '0.000034', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Mvt

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mvt --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Mvt.csv --size=32767

Output:

['Polybench_Mvt', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.003657', '0.003657', '0.003657', '0.003657', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Syr2k

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syr2k --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syr2k.csv --size=6144

Output:

['Polybench_Syr2k', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024', '0.006393', '0.006393', '0.006393', '0.006393', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Syrk

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syrk --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syrk.csv --size=4096

Output:

['Polybench_Syrk', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024', '0.003225', '0.003225', '0.003225', '0.003225', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Copy link

github-actions bot commented Oct 1, 2024

Compute Benchmarks level_zero_v2 run (with params: --verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/11135017485

Copy link

github-actions bot commented Oct 2, 2024

Compute Benchmarks level_zero_v2 run (--verbose):
https://github.com/oneapi-src/unified-runtime/actions/runs/11135017485
Job status: success. Test status: success.

Summary

Total 140 benchmarks in mean.
Geomean 103.998%.
Improved 41 Regressed 15 (threshold 0.50%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): 112.443%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 1.440000 μs 2.406 μs 167.08% 67.08% ++++++++
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.454000 μs 1.690 μs 116.23% 16.23% ++
api_overhead_benchmark_sycl SubmitKernel out of order 20.741000 μs 23.181 μs 111.76% 11.76% +
api_overhead_benchmark_ur SubmitKernel out of order 14.553000 μs 14.612 μs 100.41% 0.41% .
api_overhead_benchmark_ur SubmitKernel in order 14.162 μs 13.659000 μs 96.45% -3.55% .
api_overhead_benchmark_sycl SubmitKernel in order 23.935 μs 23.015000 μs 96.16% -3.84% .
Relative perf in group memory (4): 92.471%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 83.294000 μs 121.251 μs 145.57% 45.57% ++++++
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.319 μs 3.193000 μs 96.20% -3.80% .
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 7.273 μs 5.882000 μs 80.87% -19.13% --
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 354.302 μs 228.730000 μs 64.56% -35.44% ----
Relative perf in group Velocity-Bench (5): 99.912%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable 366.167768 M keys/sec 356.852 M keys/sec 102.61% 2.61% .
Velocity-Bench Bitcracker 35.414000 s 35.544 s 100.37% 0.37% .
Velocity-Bench CudaSift 222.970 ms 222.221000 ms 99.66% -0.34% .
Velocity-Bench Sobel Filter 554.651 ms 549.907000 ms 99.14% -0.86% .
Velocity-Bench QuickSilver 115.610 MMS/CTT 118.170000 MMS/CTT 97.83% -2.17% .
Relative perf in group Runtime (52): 108.515%
Benchmark This PR baseline Relative perf Change -
Runtime_BlockedTransform_iter_128_blocksize_1024 0.300000 ms 0.544 ms 181.33% 81.33% ++++++++++
Runtime_BlockedTransform_iter_64_blocksize_1024 0.351000 ms 0.625 ms 178.06% 78.06% ++++++++++
Runtime_BlockedTransform_iter_256_blocksize_1024 0.317000 ms 0.499 ms 157.41% 57.41% +++++++
Runtime_BlockedTransform_iter_256_blocksize_2048 0.282000 ms 0.395 ms 140.07% 40.07% +++++
Runtime_BlockedTransform_iter_64_blocksize_2048 0.254000 ms 0.340 ms 133.86% 33.86% ++++
Runtime_BlockedTransform_iter_512_blocksize_2048 0.304000 ms 0.403 ms 132.57% 32.57% ++++
Runtime_BlockedTransform_iter_256_blocksize_4096 0.272000 ms 0.354 ms 130.15% 30.15% ++++
Runtime_BlockedTransform_iter_512_blocksize_4096 0.280000 ms 0.359 ms 128.21% 28.21% +++
Runtime_BlockedTransform_iter_64_blocksize_4096 0.242000 ms 0.304 ms 125.62% 25.62% +++
Runtime_BlockedTransform_iter_512_blocksize_8192 0.277000 ms 0.345 ms 124.55% 24.55% +++
Runtime_BlockedTransform_iter_256_blocksize_8192 0.275000 ms 0.334 ms 121.45% 21.45% +++
Runtime_BlockedTransform_iter_128_blocksize_2048 0.254000 ms 0.288 ms 113.39% 13.39% ++
Runtime_BlockedTransform_iter_128_blocksize_4096 0.236000 ms 0.263 ms 111.44% 11.44% +
Runtime_BlockedTransform_iter_128_blocksize_8192 0.233000 ms 0.255 ms 109.44% 9.44% +
Runtime_BlockedTransform_iter_64_blocksize_8192 0.265000 ms 0.285 ms 107.55% 7.55% +
Runtime_BlockedTransform_iter_512_blocksize_1024 0.456000 ms 0.486 ms 106.58% 6.58% +
Runtime_BlockedTransform_iter_64_blocksize_65536 2.591000 ms 2.591 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_524288 2.578000 ms 2.578 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_16384 2.686000 ms 2.686 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_524288 2.573000 ms 2.573 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_131072 2.519000 ms 2.519 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_32768 2.421000 ms 2.421 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_16384 2.551000 ms 2.551 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_262144 2.580000 ms 2.580 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_256 0.156000 ms 0.156 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_524288 2.748000 ms 2.748 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_65536 2.543000 ms 2.543 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_32768 2.364000 ms 2.364 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_32768 2.421000 ms 2.421 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_131072 2.559000 ms 2.559 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_256 0.084000 ms 0.084 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_16384 2.288000 ms 2.288 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_256 0.341000 ms 0.341 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_16384 2.241000 ms 2.241 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_32768 2.450000 ms 2.450 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_262144 2.469000 ms 2.469 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_262144 2.367000 ms 2.367 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_131072 2.469000 ms 2.469 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_64_blocksize_524288 2.492000 ms 2.492 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_65536 2.513000 ms 2.513 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_256_blocksize_262144 2.519000 ms 2.519 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_256 0.079000 ms 0.079 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_512_blocksize_65536 2.608000 ms 2.608 ms 100.00% 0.00% .
Runtime_BlockedTransform_iter_128_blocksize_131072 2.424000 ms 2.424 ms 100.00% 0.00% .
Runtime_DAGTaskThroughput_NDRangeParallelFor 5.226 ms 5.187000 ms 99.25% -0.75% .
Runtime_DAGTaskThroughput_HierarchicalParallelFor 5.869 ms 5.579000 ms 95.06% -4.94% -
Runtime_DAGTaskThroughput_SingleTask 7.869 ms 7.396000 ms 93.99% -6.01% -
Runtime_DAGTaskThroughput_BasicParallelFor 6.893 ms 6.184000 ms 89.71% -10.29% -
Runtime_IndependentDAGTaskThroughput_SingleTask - 7.088000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 5.602000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 5.613000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 5.818000 ms
Relative perf in group MicroBench (17): 100.130%
Benchmark This PR baseline Relative perf Change -
MicroBench_Arith_int32_512 0.072000 ms 0.073 ms 101.39% 1.39% .
MicroBench_LocalMem_int32_4096 0.227000 ms 0.228 ms 100.44% 0.44% .
MicroBench_LocalMem_fp32_4096 0.200000 ms 0.200 ms 100.00% 0.00% .
MicroBench_L2_fp32_8 0.025000 ms 0.025 ms 100.00% 0.00% .
MicroBench_L2_int32_2 0.027000 ms 0.027 ms 100.00% 0.00% .
MicroBench_L2_int32_8 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_L2_int32_1 0.033000 ms 0.033 ms 100.00% 0.00% .
MicroBench_L2_int32_16 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_L2_fp32_2 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_L2_fp32_4 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_L2_fp32_16 0.025000 ms 0.025 ms 100.00% 0.00% .
MicroBench_L2_fp32_1 0.025000 ms 0.025 ms 100.00% 0.00% .
MicroBench_L2_int32_4 0.026000 ms 0.026 ms 100.00% 0.00% .
MicroBench_Arith_fp32_512 0.032000 ms 0.032 ms 100.00% 0.00% .
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 794.503000 ms -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 1589.121000 ms -
MicroBench_sf_fp32_16 - 0.025000 ms
Relative perf in group Pattern (14): 100.481%
Benchmark This PR baseline Relative perf Change -
Pattern_SegmentedReduction_NDRange_int32 0.026000 ms 0.027 ms 103.85% 3.85% .
Pattern_Reduction_NDRange_int32 0.074000 ms 0.076 ms 102.70% 2.70% .
Pattern_SegmentedReduction_NDRange_int16 0.043000 ms 0.044 ms 102.33% 2.33% .
Pattern_Reduction_NDRange_fp32 0.025000 ms 0.025 ms 100.00% 0.00% .
Pattern_Reduction_NDRange_int64 0.052000 ms 0.052 ms 100.00% 0.00% .
Pattern_Reduction_Hierarchical_int64 0.050000 ms 0.050 ms 100.00% 0.00% .
Pattern_Reduction_Hierarchical_int32 0.052000 ms 0.052 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_int16 0.030000 ms 0.030 ms 100.00% 0.00% .
Pattern_SegmentedReduction_NDRange_int64 0.016000 ms 0.016 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_fp32 0.030000 ms 0.030 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_int64 0.029000 ms 0.029 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_int32 0.028000 ms 0.028 ms 100.00% 0.00% .
Pattern_SegmentedReduction_NDRange_fp32 0.014000 ms 0.014 ms 100.00% 0.00% .
Pattern_Reduction_Hierarchical_fp32 0.050 ms 0.049000 ms 98.00% -2.00% .
Relative perf in group ScalarProduct (6): 102.720%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 0.126000 ms 0.149 ms 118.25% 18.25% ++
ScalarProduct_NDRange_int64 0.098000 ms 0.099 ms 101.02% 1.02% .
ScalarProduct_Hierarchical_int32 0.062000 ms 0.062 ms 100.00% 0.00% .
ScalarProduct_NDRange_fp32 0.040000 ms 0.040 ms 100.00% 0.00% .
ScalarProduct_Hierarchical_int64 0.063000 ms 0.063 ms 100.00% 0.00% .
ScalarProduct_Hierarchical_fp32 0.060 ms 0.059000 ms 98.33% -1.67% .
Relative perf in group USM (17): 103.018%
Benchmark This PR baseline Relative perf Change -
USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1 0.009000 ms 0.011 ms 122.22% 22.22% +++
USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1 0.012000 ms 0.014 ms 116.67% 16.67% ++
USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1 0.018000 ms 0.019 ms 105.56% 5.56% +
USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1 0.408000 ms 0.426 ms 104.41% 4.41% +
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.775000 ms 1.791 ms 100.90% 0.90% .
USM_Allocation_latency_fp32_shared 0.116000 ms 0.117 ms 100.86% 0.86% .
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 3.249000 ms 3.277 ms 100.86% 0.86% .
USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch 15.392000 ms 15.462 ms 100.45% 0.45% .
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.868000 ms 1.876 ms 100.43% 0.43% .
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 3.098000 ms 3.110 ms 100.39% 0.39% .
USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch 15.271000 ms 15.324 ms 100.35% 0.35% .
USM_Latency_fp32_in_order__ 33.582000 ms 33.686 ms 100.31% 0.31% .
USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch 14.133000 ms 14.174 ms 100.29% 0.29% .
USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch 13.704000 ms 13.740 ms 100.26% 0.26% .
USM_Latency_fp32_out_of_order__ 46.617000 ms 46.732 ms 100.25% 0.25% .
USM_Allocation_latency_fp32_device 0.008000 ms 0.008 ms 100.00% 0.00% .
USM_Allocation_latency_fp32_host 0.002000 ms 0.002 ms 100.00% 0.00% .
Relative perf in group SYCL2020 (2): 100.380%
Benchmark This PR baseline Relative perf Change -
SYCL2020_Accessors_Latency_fp32_out_of_order__ 70.324000 ms 70.770 ms 100.63% 0.63% .
SYCL2020_Accessors_Latency_fp32_in_order__ 68.284000 ms 68.370 ms 100.13% 0.13% .
Relative perf in group VectorAddition (3): 102.729%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int64 0.039000 ms 0.041 ms 105.13% 5.13% +
VectorAddition_fp32 0.032000 ms 0.033 ms 103.12% 3.12% .
VectorAddition_int32 0.037000 ms 0.037 ms 100.00% 0.00% .
Relative perf in group Polybench (13): 99.899%
Benchmark This PR baseline Relative perf Change -
Polybench_2mm 1.238000 ms 1.239 ms 100.08% 0.08% .
Polybench_3mm 1.744000 ms 1.745 ms 100.06% 0.06% .
Polybench_Atax 6.901000 ms 6.902 ms 100.01% 0.01% .
Polybench_Gramschmidt 285.038000 ms 285.054 ms 100.01% 0.01% .
Polybench_2DConvolution 0.229000 ms 0.229 ms 100.00% 0.00% .
Polybench_Gesummv 7.308000 ms 7.308 ms 100.00% 0.00% .
Polybench_Bicg 5.134 ms 5.133000 ms 99.98% -0.02% .
Polybench_Syrk 3.225 ms 3.222000 ms 99.91% -0.09% .
Polybench_Syr2k 6.393 ms 6.386000 ms 99.89% -0.11% .
Polybench_Mvt 3.657 ms 3.650000 ms 99.81% -0.19% .
Polybench_Correlation 95.859 ms 95.656000 ms 99.79% -0.21% .
Polybench_Covariance 95.656 ms 94.948000 ms 99.26% -0.74% .
Polybench_Gemm 3.962000 ms -
Relative perf in group ReductionAtomic (4): 111.775%
Benchmark This PR baseline Relative perf Change -
ReductionAtomic_int64 0.035000 ms 0.041 ms 117.14% 17.14% ++
ReductionAtomic_int32 0.036000 ms 0.041 ms 113.89% 13.89% ++
ReductionAtomic_fp32 0.035000 ms 0.039 ms 111.43% 11.43% +
ReductionAtomic_fp64 0.040000 ms 0.042 ms 105.00% 5.00% +
Relative perf in group Kmeans (1): 99.833%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 1.798 ms 1.795000 ms 99.83% -0.17% .
Relative perf in group LinearRegressionCoeff (1): 91.634%
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 1.530 ms 1.402000 ms 91.63% -8.37% -
Relative perf in group LinearRegression (1): 99.726%
Benchmark This PR baseline Relative perf Change -
LinearRegression_fp32 0.365 ms 0.364000 ms 99.73% -0.27% .
Relative perf in group MatmulChain (1): 99.837%
Benchmark This PR baseline Relative perf Change -
MatmulChain 11.064 ms 11.046000 ms 99.84% -0.16% .
Relative perf in group MolecularDynamics (1): 100.000%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.065000 ms 0.065 ms 100.00% 0.00% .
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 857.978000 μs

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),20.741,20.672,3.98%,19.902,238.087,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.935,23.921,5.07%,22.638,379.553,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.553,14.712,4.86%,11.154,24.421,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.162,14.328,4.72%,11.114,27.283,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),354.302,353.823,1.41%,345.916,437.412,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),83.294,83.169,1.53%,80.815,150.463,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),7.273,7.261,11.04%,6.002,70.649,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.319,3.372,4.94%,0.753,3.623,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),1.440,1.438,16.34%,1.292,72.864,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.454,1.452,16.45%,1.312,73.774,[CPU],[us]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.366547 s
366.167768 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00436064 s
bitcracker - total time for whole calculation: 35.414 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1261 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1259 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1164 1273 31.6047% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1270 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1266 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1114 1249 30.2471% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1257 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1264 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1096 1275 29.7583% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1068 1254 28.9981% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1252 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1254 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1276 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1253 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1198 1256 32.5278% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1125 1265 30.5458% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1261 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1270 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1036 1263 28.1292% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1181 1274 32.0662% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1272 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1248 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1083 1259 29.4054% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1259 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1263 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1265 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1276 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1205 1246 32.7179% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1249 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1211 1262 32.8808% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1261 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1057 1261 28.6994% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1260 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1086 1256 29.4868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1041 1253 28.265% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1256 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1257 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1253 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1086 1262 29.4868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1212 1248 32.908% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1269 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1259 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1266 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1266 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1115 1271 30.2742% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1273 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 222.97 ms

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.269280e-01 6.267850e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.629290e-01 7.693460e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.605860e-01 7.796830e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.634980e-01 8.474720e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.593980e-01 8.125190e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.591380e-01 7.866250e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.583160e-01 7.819350e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.289100e-01 8.043680e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.291570e-01 8.072430e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.289190e-01 7.766490e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.137e+07 1.137e+07 1.137e+07 0.000e+00 100.00
cycleInit 10 3.578e+06 3.578e+06 3.578e+06 0.000e+00 100.00
cycleTracking 10 7.793e+06 7.793e+06 7.793e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.950e+06 4.950e+06 4.950e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.120e+05 2.120e+05 2.120e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.180e+02 4.180e+02 4.180e+02 0.000e+00 100.00
Figure Of Merit 115.61 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.42447 s
sobelfilter - total time for whole calculation: 0.554651 s

Runtime_BlockedTransform_iter_64_blocksize_65536

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_65536', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002824', '0.002591', '0.002512', '0.002512 0.002591 0.003370', '0.000474', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_524288

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_524288', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002673', '0.002578', '0.002186', '0.002186 0.002578 0.003254', '0.000540', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000279', '0.000265', '0.000255', '0.000255 0.000265 0.000316', '0.000033', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_16384

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_16384', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002771', '0.002686', '0.002300', '0.002300 0.002686 0.003328', '0.000519', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_524288

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_524288', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002655', '0.002573', '0.002562', '0.002562 0.002573 0.002831', '0.000152', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000316', '0.000300', '0.000237', '0.000237 0.000300 0.000412', '0.000088', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_131072

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_131072', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002705', '0.002519', '0.002329', '0.002329 0.002519 0.003268', '0.000496', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000266', '0.000272', '0.000245', '0.000245 0.000272 0.000282', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_32768

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_32768', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002445', '0.002421', '0.002410', '0.002410 0.002421 0.002506', '0.000052', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_16384

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_16384', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002923', '0.002551', '0.002523', '0.002523 0.002551 0.003697', '0.000670', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_262144

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_262144', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002683', '0.002580', '0.002427', '0.002427 0.002580 0.003041', '0.000320', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000257', '0.000282', '0.000181', '0.000181 0.000282 0.000306', '0.000067', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_256

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_256', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000207', '0.000156', '0.000122', '0.000122 0.000156 0.000345', '0.000120', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000268', '0.000280', '0.000244', '0.000244 0.000280 0.000281', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_524288

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_524288', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002712', '0.002748', '0.002529', '0.002529 0.002748 0.002858', '0.000167', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_65536

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_65536', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002574', '0.002543', '0.002440', '0.002440 0.002543 0.002739', '0.000152', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_32768

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_32768', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002597', '0.002364', '0.002245', '0.002245 0.002364 0.003182', '0.000510', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000244', '0.000233', '0.000231', '0.000231 0.000233 0.000268', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_32768

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_32768', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002499', '0.002421', '0.002141', '0.002141 0.002421 0.002934', '0.000402', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_131072

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_131072', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002730', '0.002559', '0.002296', '0.002296 0.002559 0.003335', '0.000540', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_256

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_256', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000095', '0.000084', '0.000077', '0.000077 0.000084 0.000123', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000263', '0.000275', '0.000237', '0.000237 0.000275 0.000278', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_16384

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_16384', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002376', '0.002288', '0.002252', '0.002252 0.002288 0.002587', '0.000184', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_256

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_256', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.002352', '0.000341', '0.000186', '0.000186 0.000341 0.006528', '0.003618', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000458', '0.000456', '0.000454', '0.000454 0.000456 0.000463', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000257', '0.000254', '0.000254', '0.000254 0.000254 0.000262', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_16384

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_16384', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002506', '0.002241', '0.002218', '0.002218 0.002241 0.003058', '0.000479', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_32768

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_32768', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002386', '0.002450', '0.002212', '0.002212 0.002450 0.002496', '0.000153', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_262144

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_262144', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002465', '0.002469', '0.002358', '0.002358 0.002469 0.002569', '0.000105', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000269', '0.000254', '0.000251', '0.000251 0.000254 0.000304', '0.000030', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000282', '0.000277', '0.000265', '0.000265 0.000277 0.000306', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000784', '0.000351', '0.000299', '0.000299 0.000351 0.001703', '0.000796', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_262144

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_262144', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002654', '0.002367', '0.002305', '0.002305 0.002367 0.003289', '0.000551', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000302', '0.000304', '0.000297', '0.000297 0.000304 0.000306', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000256', '0.000242', '0.000234', '0.000234 0.000242 0.000292', '0.000032', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_131072

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_131072', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002720', '0.002469', '0.002258', '0.002258 0.002469 0.003434', '0.000627', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_524288

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_524288', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002515', '0.002492', '0.002465', '0.002465 0.002492 0.002589', '0.000065', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000237', '0.000236', '0.000231', '0.000231 0.000236 0.000245', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_65536

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_65536', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002559', '0.002513', '0.002472', '0.002472 0.002513 0.002691', '0.000116', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_262144

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_262144', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002657', '0.002519', '0.002191', '0.002191 0.002519 0.003261', '0.000548', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_256

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_256', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000116', '0.000079', '0.000076', '0.000076 0.000079 0.000192', '0.000066', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_65536

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_65536', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002577', '0.002608', '0.002157', '0.002157 0.002608 0.002967', '0.000406', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000320', '0.000317', '0.000296', '0.000296 0.000317 0.000347', '0.000026', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_131072

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_131072', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '8192', '819200', '0.002449', '0.002424', '0.002306', '0.002306 0.002424 0.002617', '0.000157', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.008194', '0.007869', '0.007301', '0.007301 0.007869 0.009411', '0.001092', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005326', '0.005226', '0.005196', '0.005196 0.005226 0.005557', '0.000201', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.006890', '0.006893', '0.006498', '0.006498 0.006893 0.007280', '0.000391', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005799', '0.005869', '0.005554', '0.005554 0.005869 0.005973', '0.000218', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.794704', '0.794503', '0.794299', '0.794299 0.794503 0.795310', '0.000534', '33.992225', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '27.000000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '1.610564', '1.589121', '1.579281', '1.579281 1.589121 1.663289', '0.045926', '17.096392', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '27.000000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=512

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000208', '0.000200', '0.000197', '0.000197 0.000200 0.000228', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=512

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000242', '0.000227', '0.000210', '0.000210 0.000227 0.000289', '0.000042', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

MicroBench_L2_fp32_8

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_8', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000029', '0.000025', '0.000024', '0.000024 0.000025 0.000039', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_2

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_2', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000034', '0.000027', '0.000026', '0.000026 0.000027 0.000049', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_8

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_8', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000031', '0.000026', '0.000025', '0.000025 0.000026 0.000041', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000050', '0.000033', '0.000027', '0.000027 0.000033 0.000091', '0.000036', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_16', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000034', '0.000026', '0.000025', '0.000025 0.000026 0.000049', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_2

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_2', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000032', '0.000026', '0.000024', '0.000024 0.000026 0.000047', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_4

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_4', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000030', '0.000026', '0.000024', '0.000024 0.000026 0.000042', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_16', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000033', '0.000025', '0.000025', '0.000025 0.000025 0.000048', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000030', '0.000025', '0.000024', '0.000024 0.000025 0.000041', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_4

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_4', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000031', '0.000026', '0.000025', '0.000025 0.000026 0.000042', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000033', '0.000025', '0.000022', '0.000022 0.000025 0.000052', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000058', '0.000052', '0.000047', '0.000047 0.000052 0.000074', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000090', '0.000074', '0.000056', '0.000056 0.000074 0.000139', '0.000044', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000075', '0.000050', '0.000046', '0.000046 0.000050 0.000130', '0.000047', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000112', '0.000052', '0.000050', '0.000050 0.000052 0.000236', '0.000107', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000056', '0.000050', '0.000046', '0.000046 0.000050 0.000072', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000081', '0.000062', '0.000060', '0.000060 0.000062 0.000121', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000107', '0.000098', '0.000041', '0.000041 0.000098 0.000183', '0.000071', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000167', '0.000126', '0.000095', '0.000095 0.000126 0.000280', '0.000099', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000071', '0.000040', '0.000037', '0.000037 0.000040 0.000137', '0.000057', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000091', '0.000060', '0.000058', '0.000058 0.000060 0.000156', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000079', '0.000063', '0.000060', '0.000060 0.000063 0.000114', '0.000030', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000072', '0.000043', '0.000031', '0.000031 0.000043 0.000142', '0.000061', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000033', '0.000030', '0.000029', '0.000029 0.000030 0.000038', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000024', '0.000016', '0.000013', '0.000013 0.000016 0.000042', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000034', '0.000030', '0.000026', '0.000026 0.000030 0.000044', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000035', '0.000029', '0.000028', '0.000028 0.000029 0.000048', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000030', '0.000028', '0.000027', '0.000027 0.000028 0.000034', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000035', '0.000026', '0.000023', '0.000023 0.000026 0.000056', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000016', '0.000014', '0.000013', '0.000013 0.000014 0.000021', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Latency_fp32_out_of_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['USM_Latency_fp32_out_of_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.047065', '0.046617', '0.043214', '0.043214 0.046617 0.051365', '0.004094', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

SYCL2020_Accessors_Latency_fp32_in_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['SYCL2020_Accessors_Latency_fp32_in_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.068717', '0.068284', '0.068121', '0.068121 0.068284 0.069747', '0.000895', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

SYCL2020_Accessors_Latency_fp32_out_of_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['SYCL2020_Accessors_Latency_fp32_out_of_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.073694', '0.070324', '0.068810', '0.068810 0.070324 0.081949', '0.007189', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Latency_fp32_in_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['USM_Latency_fp32_in_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.032337', '0.033582', '0.029582', '0.029582 0.033582 0.033848', '0.002390', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000043', '0.000008', '0.000002', '0.000002 0.000008 0.000118', '0.000065', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000002', '0.000002', '0.000001', '0.000001 0.000002 0.000003', '0.000001', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000153', '0.000116', '0.000104', '0.000104 0.000116 0.000239', '0.000075', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.003145', '0.003098', '0.003092', '0.003092 0.003098 0.003246', '0.000088', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.015271', '0.015271', '0.015271', '0.015271', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.014133', '0.014133', '0.014133', '0.014133', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.004844', '0.003249', '0.003231', '0.003231 0.003249 0.008053', '0.002779', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001796', '0.001775', '0.001748', '0.001748 0.001775 0.001864', '0.000061', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.013704', '0.013704', '0.013704', '0.013704', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001870', '0.001868', '0.001866', '0.001866 0.001868 0.001875', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.015392', '0.015392', '0.015392', '0.015392', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000091', '0.000012', '0.000008', '0.000008 0.000012 0.000254', '0.000141', '1.497722', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000025', '0.000018', '0.000017', '0.000017 0.000018 0.000040', '0.000013', '0.667255', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000031', '0.000009', '0.000007', '0.000007 0.000009 0.000079', '0.000041', '1.713957', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000519', '0.000408', '0.000184', '0.000184 0.000408 0.000965', '0.000402', '0.062321', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000038', '0.000032', '0.000029', '0.000029 0.000032 0.000053', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000046', '0.000039', '0.000036', '0.000036 0.000039 0.000064', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000060', '0.000037', '0.000031', '0.000031 0.000037 0.000114', '0.000046', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2DConvolution

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2DConvolution --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2DConvolution.csv

Output:

['Polybench_2DConvolution', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000235', '0.000229', '0.000213', '0.000213 0.000229 0.000262', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001261', '0.001238', '0.001235', '0.001235 0.001238 0.001311', '0.000043', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001745', '0.001744', '0.001734', '0.001734 0.001744 0.001756', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_Arith_fp32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_fp32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000036', '0.000032', '0.000028', '0.000028 0.000032 0.000049', '0.000011', '1122.888969', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

MicroBench_Arith_int32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_int32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000147', '0.000072', '0.000058', '0.000058 0.000072 0.000311', '0.000142', '534.452977', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006907', '0.006901', '0.006871', '0.006871 0.006901 0.006950', '0.000040', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000061', '0.000036', '0.000031', '0.000031 0.000036 0.000117', '0.000048', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000210', '0.000035', '0.000027', '0.000027 0.000035 0.000567', '0.000309', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000194', '0.000040', '0.000035', '0.000035 0.000040 0.000506', '0.000271', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000041', '0.000035', '0.000034', '0.000034 0.000035 0.000053', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Bicg

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/bicg --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Bicg.csv --size=20480

Output:

['Polybench_Bicg', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.005134', '0.005134', '0.005134', '0.005134', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Correlation

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/correlation --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Correlation.csv --size=2048

Output:

['Polybench_Correlation', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.095859', '0.095859', '0.095859', '0.095859', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Covariance

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/covariance --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Covariance.csv --size=2048

Output:

['Polybench_Covariance', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.095656', '0.095656', '0.095656', '0.095656', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gemm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gemm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gemm.csv --size=8192

Output:

['Polybench_Gemm', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024', '0.003962', '0.003962', '0.003962', '0.003962', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gesummv

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gesummv --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gesummv.csv --size=8192

Output:

['Polybench_Gesummv', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.007308', '0.007308', '0.007308', '0.007308', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gramschmidt

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gramschmidt --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gramschmidt.csv --size=512

Output:

['Polybench_Gramschmidt', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.285038', '0.285038', '0.285038', '0.285038', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '67108864', '0.001812', '0.001798', '0.001796', '0.001796 0.001798 0.001842', '0.000026', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001723', '0.001530', '0.001245', '0.001245 0.001530 0.002395', '0.000599', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegression_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_error --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LinearRegression.csv --size=640000

Output:

['LinearRegression_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000394', '0.000365', '0.000344', '0.000344 0.000365 0.000474', '0.000070', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MatmulChain

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/matmulchain --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MatmulChain.csv --size=2048

Output:

['MatmulChain', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024', '0.011064', '0.011064', '0.011064', '0.011064', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000079', '0.000065', '0.000054', '0.000054 0.000065 0.000117', '0.000034', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Mvt

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mvt --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Mvt.csv --size=32767

Output:

['Polybench_Mvt', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.003657', '0.003657', '0.003657', '0.003657', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Syr2k

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syr2k --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syr2k.csv --size=6144

Output:

['Polybench_Syr2k', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024', '0.006393', '0.006393', '0.006393', '0.006393', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Syrk

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syrk --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syrk.csv --size=4096

Output:

['Polybench_Syrk', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024', '0.003225', '0.003225', '0.003225', '0.003225', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

igchor added a commit to igchor/llvm that referenced this pull request Oct 3, 2024
@igchor igchor mentioned this pull request Oct 3, 2024
addPool took unique pool handle by reference and
later passes rvalue ref to unordered_map::try_emplace.

Make AddPool take rvalue ref to make it clear that
ownership it passes to the pool manager
igchor added a commit to igchor/llvm that referenced this pull request Oct 3, 2024
igchor added a commit to igchor/llvm that referenced this pull request Oct 3, 2024
to include compilation fix for older compilers and
fixes for L0 provider
Calling loader APIs is incorrect - handles would have
to be translated to and from loader handles.

Also, using loader APIs without explictly linking with
loaders results in linking failure on Windows.

Fix this, by using function pointers.
igchor added a commit to igchor/llvm that referenced this pull request Oct 3, 2024
@pbalcer pbalcer merged commit 7e9d9d4 into oneapi-src:main Oct 4, 2024
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/cd Continuous integration/devliery common Changes or additions to common utilities conformance Conformance test suite issues. level-zero L0 adapter specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants