Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Cudagraphs #986

Merged
merged 24 commits into from
Sep 16, 2024
Merged

[Feature] Cudagraphs #986

merged 24 commits into from
Sep 16, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 11, 2024

Added CudaGraphModule class to provide a user-friendly interface to CUDA graphs for PyTorch callables.

This class enables fast, CPU-overhead-free execution of operations on GPU while ensuring essential checks for input functions. Includes documentation and example usage.

cc @mikaylagawarecki @albanD @eellison @BoyuanFeng @Chillee

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2024
Copy link

github-actions bot commented Sep 11, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}30$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 44.9230μs 20.2935μs 49.2769 KOps/s 48.1950 KOps/s $\color{#35bf28}+2.24\%$
test_plain_set_stack_nested 40.4560μs 20.2741μs 49.3240 KOps/s 46.7545 KOps/s $\textbf{\color{#35bf28}+5.50\%}$
test_plain_set_nested_inplace 64.9810μs 22.1341μs 45.1791 KOps/s 44.2036 KOps/s $\color{#35bf28}+2.21\%$
test_plain_set_stack_nested_inplace 69.3900μs 22.3665μs 44.7098 KOps/s 44.7502 KOps/s $\color{#d91a1a}-0.09\%$
test_items 18.0340μs 4.1504μs 240.9414 KOps/s 236.3517 KOps/s $\color{#35bf28}+1.94\%$
test_items_nested 0.5622ms 0.3298ms 3.0324 KOps/s 3.0677 KOps/s $\color{#d91a1a}-1.15\%$
test_items_nested_locked 0.6390ms 0.3299ms 3.0314 KOps/s 3.0491 KOps/s $\color{#d91a1a}-0.58\%$
test_items_nested_leaf 0.1573ms 84.9869μs 11.7665 KOps/s 11.6020 KOps/s $\color{#35bf28}+1.42\%$
test_items_stack_nested 0.6346ms 0.3315ms 3.0166 KOps/s 3.0437 KOps/s $\color{#d91a1a}-0.89\%$
test_items_stack_nested_leaf 0.1584ms 86.6158μs 11.5452 KOps/s 11.5263 KOps/s $\color{#35bf28}+0.16\%$
test_items_stack_nested_locked 0.6486ms 0.3379ms 2.9593 KOps/s 3.0109 KOps/s $\color{#d91a1a}-1.72\%$
test_keys 23.0130μs 3.6910μs 270.9295 KOps/s 277.0112 KOps/s $\color{#d91a1a}-2.20\%$
test_keys_nested 0.2683ms 97.6110μs 10.2448 KOps/s 10.5283 KOps/s $\color{#d91a1a}-2.69\%$
test_keys_nested_locked 0.7958ms 0.1025ms 9.7570 KOps/s 10.1924 KOps/s $\color{#d91a1a}-4.27\%$
test_keys_nested_leaf 0.1583ms 82.6219μs 12.1033 KOps/s 12.7868 KOps/s $\textbf{\color{#d91a1a}-5.35\%}$
test_keys_stack_nested 0.2019ms 97.6924μs 10.2362 KOps/s 10.4511 KOps/s $\color{#d91a1a}-2.06\%$
test_keys_stack_nested_leaf 0.1796ms 82.4706μs 12.1255 KOps/s 12.5887 KOps/s $\color{#d91a1a}-3.68\%$
test_keys_stack_nested_locked 0.3090ms 0.1025ms 9.7589 KOps/s 10.0299 KOps/s $\color{#d91a1a}-2.70\%$
test_values 10.8282μs 1.0767μs 928.7664 KOps/s 1.0048 MOps/s $\textbf{\color{#d91a1a}-7.56\%}$
test_values_nested 96.7000μs 47.7035μs 20.9628 KOps/s 21.1891 KOps/s $\color{#d91a1a}-1.07\%$
test_values_nested_locked 0.1220ms 47.3255μs 21.1303 KOps/s 21.0452 KOps/s $\color{#35bf28}+0.40\%$
test_values_nested_leaf 97.7920μs 42.0639μs 23.7733 KOps/s 23.7393 KOps/s $\color{#35bf28}+0.14\%$
test_values_stack_nested 98.8570μs 47.6410μs 20.9903 KOps/s 20.8766 KOps/s $\color{#35bf28}+0.54\%$
test_values_stack_nested_leaf 0.1074ms 42.4419μs 23.5616 KOps/s 23.9206 KOps/s $\color{#d91a1a}-1.50\%$
test_values_stack_nested_locked 95.8890μs 48.1760μs 20.7572 KOps/s 21.0088 KOps/s $\color{#d91a1a}-1.20\%$
test_membership 17.4620μs 0.9270μs 1.0788 MOps/s 1.1799 MOps/s $\textbf{\color{#d91a1a}-8.57\%}$
test_membership_nested 49.2120μs 2.5806μs 387.4999 KOps/s 379.7985 KOps/s $\color{#35bf28}+2.03\%$
test_membership_nested_leaf 20.3480μs 2.5677μs 389.4509 KOps/s 365.7579 KOps/s $\textbf{\color{#35bf28}+6.48\%}$
test_membership_stacked_nested 50.0130μs 2.5734μs 388.5879 KOps/s 386.2551 KOps/s $\color{#35bf28}+0.60\%$
test_membership_stacked_nested_leaf 17.6330μs 2.5907μs 385.9894 KOps/s 381.2843 KOps/s $\color{#35bf28}+1.23\%$
test_membership_nested_last 50.9050μs 3.8850μs 257.4010 KOps/s 263.9171 KOps/s $\color{#d91a1a}-2.47\%$
test_membership_nested_leaf_last 40.5360μs 3.8082μs 262.5890 KOps/s 261.4246 KOps/s $\color{#35bf28}+0.45\%$
test_membership_stacked_nested_last 44.7830μs 3.8342μs 260.8085 KOps/s 187.8979 KOps/s $\textbf{\color{#35bf28}+38.80\%}$
test_membership_stacked_nested_leaf_last 29.1440μs 3.8692μs 258.4527 KOps/s 186.9683 KOps/s $\textbf{\color{#35bf28}+38.23\%}$
test_nested_getleaf 52.3370μs 11.2425μs 88.9478 KOps/s 93.7320 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_nested_get 65.3420μs 10.0614μs 99.3897 KOps/s 97.6181 KOps/s $\color{#35bf28}+1.81\%$
test_stacked_getleaf 36.0280μs 10.6302μs 94.0716 KOps/s 92.5325 KOps/s $\color{#35bf28}+1.66\%$
test_stacked_get 54.0810μs 10.0832μs 99.1753 KOps/s 98.6730 KOps/s $\color{#35bf28}+0.51\%$
test_nested_getitemleaf 59.5710μs 10.9185μs 91.5873 KOps/s 90.1808 KOps/s $\color{#35bf28}+1.56\%$
test_nested_getitem 34.6640μs 10.1537μs 98.4863 KOps/s 96.3202 KOps/s $\color{#35bf28}+2.25\%$
test_stacked_getitemleaf 55.7540μs 10.9515μs 91.3116 KOps/s 91.5176 KOps/s $\color{#d91a1a}-0.23\%$
test_stacked_getitem 31.0270μs 10.1550μs 98.4733 KOps/s 98.7140 KOps/s $\color{#d91a1a}-0.24\%$
test_lock_nested 1.1537ms 0.4853ms 2.0608 KOps/s 2.0636 KOps/s $\color{#d91a1a}-0.14\%$
test_lock_stack_nested 0.8272ms 0.4533ms 2.2062 KOps/s 2.2181 KOps/s $\color{#d91a1a}-0.54\%$
test_unlock_nested 95.0731ms 0.4981ms 2.0076 KOps/s 2.4515 KOps/s $\textbf{\color{#d91a1a}-18.11\%}$
test_unlock_stack_nested 0.7198ms 0.3687ms 2.7121 KOps/s 2.6688 KOps/s $\color{#35bf28}+1.62\%$
test_flatten_speed 0.6353ms 0.1096ms 9.1200 KOps/s 9.5276 KOps/s $\color{#d91a1a}-4.28\%$
test_unflatten_speed 0.6418ms 0.4692ms 2.1312 KOps/s 2.1521 KOps/s $\color{#d91a1a}-0.97\%$
test_common_ops 5.2125ms 1.1294ms 885.4351 Ops/s 886.7956 Ops/s $\color{#d91a1a}-0.15\%$
test_creation 29.6960μs 2.0851μs 479.5972 KOps/s 478.8909 KOps/s $\color{#35bf28}+0.15\%$
test_creation_empty 79.5810μs 17.4802μs 57.2077 KOps/s 54.3249 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_creation_nested_1 46.8570μs 20.2265μs 49.4402 KOps/s 46.6593 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_creation_nested_2 94.4130μs 24.3210μs 41.1167 KOps/s 37.7734 KOps/s $\textbf{\color{#35bf28}+8.85\%}$
test_clone 0.1906ms 17.3746μs 57.5553 KOps/s 55.1652 KOps/s $\color{#35bf28}+4.33\%$
test_getitem[int] 0.7693ms 16.4333μs 60.8519 KOps/s 58.6557 KOps/s $\color{#35bf28}+3.74\%$
test_getitem[slice_int] 0.1361ms 30.0574μs 33.2697 KOps/s 31.8464 KOps/s $\color{#35bf28}+4.47\%$
test_getitem[range] 0.1920ms 58.0061μs 17.2396 KOps/s 17.2734 KOps/s $\color{#d91a1a}-0.20\%$
test_getitem[tuple] 0.1291ms 25.0450μs 39.9281 KOps/s 38.9486 KOps/s $\color{#35bf28}+2.51\%$
test_getitem[list] 0.3352ms 53.3772μs 18.7346 KOps/s 18.8283 KOps/s $\color{#d91a1a}-0.50\%$
test_setitem_dim[int] 54.5210μs 33.3627μs 29.9736 KOps/s 29.0004 KOps/s $\color{#35bf28}+3.36\%$
test_setitem_dim[slice_int] 0.1061ms 61.2532μs 16.3257 KOps/s 15.8716 KOps/s $\color{#35bf28}+2.86\%$
test_setitem_dim[range] 0.1560ms 84.1818μs 11.8790 KOps/s 11.4403 KOps/s $\color{#35bf28}+3.83\%$
test_setitem_dim[tuple] 0.1299ms 51.2740μs 19.5031 KOps/s 19.6915 KOps/s $\color{#d91a1a}-0.96\%$
test_setitem 0.1974ms 30.3870μs 32.9088 KOps/s 32.2092 KOps/s $\color{#35bf28}+2.17\%$
test_set 0.1545ms 29.9652μs 33.3721 KOps/s 32.7659 KOps/s $\color{#35bf28}+1.85\%$
test_set_shared 1.3080ms 0.2217ms 4.5113 KOps/s 4.7228 KOps/s $\color{#d91a1a}-4.48\%$
test_update 0.1823ms 36.6645μs 27.2743 KOps/s 26.7648 KOps/s $\color{#35bf28}+1.90\%$
test_update_nested 0.1850ms 47.1423μs 21.2124 KOps/s 20.5825 KOps/s $\color{#35bf28}+3.06\%$
test_update__nested 0.1902ms 35.7440μs 27.9767 KOps/s 27.1634 KOps/s $\color{#35bf28}+2.99\%$
test_set_nested 0.1955ms 32.3786μs 30.8846 KOps/s 29.9505 KOps/s $\color{#35bf28}+3.12\%$
test_set_nested_new 0.1753ms 37.7761μs 26.4718 KOps/s 25.7835 KOps/s $\color{#35bf28}+2.67\%$
test_select 0.2235ms 55.0188μs 18.1756 KOps/s 17.4702 KOps/s $\color{#35bf28}+4.04\%$
test_select_nested 0.1162ms 59.4482μs 16.8214 KOps/s 15.5991 KOps/s $\textbf{\color{#35bf28}+7.84\%}$
test_exclude_nested 0.1409ms 75.4135μs 13.2602 KOps/s 12.5679 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_empty[True] 0.5865ms 0.3153ms 3.1714 KOps/s 3.1584 KOps/s $\color{#35bf28}+0.41\%$
test_empty[False] 13.1043μs 1.1972μs 835.3161 KOps/s 815.3780 KOps/s $\color{#35bf28}+2.45\%$
test_unbind_speed 0.5586ms 0.2950ms 3.3895 KOps/s 3.2168 KOps/s $\textbf{\color{#35bf28}+5.37\%}$
test_unbind_speed_stack0 0.5148ms 0.2951ms 3.3885 KOps/s 3.3218 KOps/s $\color{#35bf28}+2.01\%$
test_unbind_speed_stack1 0.1002s 0.8245ms 1.2129 KOps/s 1.3490 KOps/s $\textbf{\color{#d91a1a}-10.09\%}$
test_split 2.2368ms 1.9894ms 502.6717 Ops/s 442.3783 Ops/s $\textbf{\color{#35bf28}+13.63\%}$
test_chunk 0.1004s 2.3722ms 421.5580 Ops/s 437.1694 Ops/s $\color{#d91a1a}-3.57\%$
test_creation[device0] 0.2404ms 0.1180ms 8.4764 KOps/s 8.3648 KOps/s $\color{#35bf28}+1.33\%$
test_creation_from_tensor 3.5688ms 0.1190ms 8.4058 KOps/s 8.6127 KOps/s $\color{#d91a1a}-2.40\%$
test_add_one[memmap_tensor0] 0.1924ms 7.5459μs 132.5227 KOps/s 131.8736 KOps/s $\color{#35bf28}+0.49\%$
test_contiguous[memmap_tensor0] 28.4940μs 1.8931μs 528.2217 KOps/s 519.2956 KOps/s $\color{#35bf28}+1.72\%$
test_stack[memmap_tensor0] 51.8160μs 5.5005μs 181.8027 KOps/s 173.6110 KOps/s $\color{#35bf28}+4.72\%$
test_memmaptd_index 1.0791ms 0.3992ms 2.5048 KOps/s 2.4579 KOps/s $\color{#35bf28}+1.91\%$
test_memmaptd_index_astensor 0.9725ms 0.4803ms 2.0818 KOps/s 2.0242 KOps/s $\color{#35bf28}+2.85\%$
test_memmaptd_index_op 1.5788ms 1.0090ms 991.0662 Ops/s 961.3237 Ops/s $\color{#35bf28}+3.09\%$
test_serialize_model 0.1286s 0.1187s 8.4239 Ops/s 8.2945 Ops/s $\color{#35bf28}+1.56\%$
test_serialize_model_pickle 0.4501s 0.3885s 2.5743 Ops/s 2.5043 Ops/s $\color{#35bf28}+2.80\%$
test_serialize_weights 0.1253s 0.1166s 8.5768 Ops/s 8.4700 Ops/s $\color{#35bf28}+1.26\%$
test_serialize_weights_returnearly 0.2714s 0.1739s 5.7518 Ops/s 6.3371 Ops/s $\textbf{\color{#d91a1a}-9.24\%}$
test_serialize_weights_pickle 0.4633s 0.4071s 2.4563 Ops/s 1.0700 Ops/s $\textbf{\color{#35bf28}+129.57\%}$
test_serialize_weights_filesystem 0.1454s 0.1393s 7.1805 Ops/s 6.9728 Ops/s $\color{#35bf28}+2.98\%$
test_serialize_model_filesystem 0.1592s 0.1494s 6.6917 Ops/s 6.1864 Ops/s $\textbf{\color{#35bf28}+8.17\%}$
test_reshape_pytree 87.9230μs 38.1607μs 26.2050 KOps/s 26.0186 KOps/s $\color{#35bf28}+0.72\%$
test_reshape_td 96.0180μs 45.4522μs 22.0011 KOps/s 20.0203 KOps/s $\textbf{\color{#35bf28}+9.89\%}$
test_view_pytree 0.1150ms 37.7418μs 26.4958 KOps/s 25.9513 KOps/s $\color{#35bf28}+2.10\%$
test_view_td 0.1323ms 51.1945μs 19.5333 KOps/s 18.1305 KOps/s $\textbf{\color{#35bf28}+7.74\%}$
test_unbind_pytree 0.1091ms 35.3821μs 28.2629 KOps/s 28.0645 KOps/s $\color{#35bf28}+0.71\%$
test_unbind_td 0.3179ms 44.0500μs 22.7015 KOps/s 22.1488 KOps/s $\color{#35bf28}+2.50\%$
test_split_pytree 0.1150ms 37.8935μs 26.3898 KOps/s 26.7752 KOps/s $\color{#d91a1a}-1.44\%$
test_split_td 0.4719ms 56.7056μs 17.6350 KOps/s 16.9400 KOps/s $\color{#35bf28}+4.10\%$
test_add_pytree 0.1046ms 44.2861μs 22.5805 KOps/s 22.6278 KOps/s $\color{#d91a1a}-0.21\%$
test_add_td 0.1751ms 80.8281μs 12.3719 KOps/s 11.5184 KOps/s $\textbf{\color{#35bf28}+7.41\%}$
test_compile_add_one_nested[tensordict-compile] 0.1230ms 55.9847μs 17.8620 KOps/s 17.5972 KOps/s $\color{#35bf28}+1.51\%$
test_compile_add_one_nested[tensordict-eager] 0.4491ms 0.1913ms 5.2285 KOps/s 5.1628 KOps/s $\color{#35bf28}+1.27\%$
test_compile_add_one_nested[pytree-compile] 0.1092ms 55.6432μs 17.9717 KOps/s 17.5278 KOps/s $\color{#35bf28}+2.53\%$
test_compile_add_one_nested[pytree-eager] 0.3122ms 0.1432ms 6.9844 KOps/s 7.1369 KOps/s $\color{#d91a1a}-2.14\%$
test_compile_copy_nested[tensordict-compile] 0.1330ms 20.5550μs 48.6501 KOps/s 47.5267 KOps/s $\color{#35bf28}+2.36\%$
test_compile_copy_nested[tensordict-eager] 0.1475ms 67.8650μs 14.7351 KOps/s 13.9720 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_compile_copy_nested[pytree-compile] 0.1517ms 77.0162μs 12.9843 KOps/s 13.3580 KOps/s $\color{#d91a1a}-2.80\%$
test_compile_copy_nested[pytree-eager] 0.1437ms 69.6883μs 14.3496 KOps/s 14.8160 KOps/s $\color{#d91a1a}-3.15\%$
test_compile_add_one_flat[tensordict-compile] 0.3579ms 0.1709ms 5.8531 KOps/s 5.7329 KOps/s $\color{#35bf28}+2.10\%$
test_compile_add_one_flat[tensordict-eager] 0.3771ms 0.1887ms 5.3006 KOps/s 4.9766 KOps/s $\textbf{\color{#35bf28}+6.51\%}$
test_compile_add_one_flat[tensorclass-compile] 0.1056ms 45.5960μs 21.9318 KOps/s 20.9762 KOps/s $\color{#35bf28}+4.56\%$
test_compile_add_one_flat[tensorclass-eager] 0.6287ms 70.1591μs 14.2533 KOps/s 14.0212 KOps/s $\color{#35bf28}+1.66\%$
test_compile_add_one_flat[pytree-compile] 0.5138ms 0.1782ms 5.6127 KOps/s 5.7211 KOps/s $\color{#d91a1a}-1.90\%$
test_compile_add_one_flat[pytree-eager] 0.5411ms 0.2925ms 3.4194 KOps/s 3.4210 KOps/s $\color{#d91a1a}-0.05\%$
test_compile_add_self_flat[tensordict-eager] 0.3260ms 0.2009ms 4.9766 KOps/s 4.7272 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_compile_add_self_flat[tensordict-compile] 0.3645ms 0.1729ms 5.7848 KOps/s 5.7723 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_self_flat[tensorclass-eager] 0.2768ms 63.3542μs 15.7843 KOps/s 15.3514 KOps/s $\color{#35bf28}+2.82\%$
test_compile_add_self_flat[tensorclass-compile] 0.1187ms 46.9298μs 21.3084 KOps/s 20.8859 KOps/s $\color{#35bf28}+2.02\%$
test_compile_add_self_flat[pytree-eager] 0.4222ms 0.2295ms 4.3568 KOps/s 4.1970 KOps/s $\color{#35bf28}+3.81\%$
test_compile_add_self_flat[pytree-compile] 0.2851ms 0.1744ms 5.7324 KOps/s 5.6980 KOps/s $\color{#35bf28}+0.60\%$
test_compile_copy_flat[tensordict-compile] 0.2272ms 0.1020ms 9.8051 KOps/s 9.6413 KOps/s $\color{#35bf28}+1.70\%$
test_compile_copy_flat[tensordict-eager] 0.1297ms 56.5739μs 17.6760 KOps/s 17.2323 KOps/s $\color{#35bf28}+2.57\%$
test_compile_copy_flat[pytree-compile] 0.1783ms 78.0534μs 12.8117 KOps/s 13.1015 KOps/s $\color{#d91a1a}-2.21\%$
test_compile_copy_flat[pytree-eager] 0.1296ms 69.1689μs 14.4574 KOps/s 14.0662 KOps/s $\color{#35bf28}+2.78\%$
test_compile_assign_and_add[tensordict-compile] 0.3072ms 0.1981ms 5.0482 KOps/s 5.0185 KOps/s $\color{#35bf28}+0.59\%$
test_compile_assign_and_add[tensordict-eager] 1.9909ms 1.6202ms 617.2111 Ops/s 581.5811 Ops/s $\textbf{\color{#35bf28}+6.13\%}$
test_compile_assign_and_add[pytree-compile] 0.3910ms 0.1961ms 5.0991 KOps/s 5.2101 KOps/s $\color{#d91a1a}-2.13\%$
test_compile_assign_and_add[pytree-eager] 1.3513ms 1.0979ms 910.8126 Ops/s 894.0810 Ops/s $\color{#35bf28}+1.87\%$
test_compile_assign_and_add_stack[compile] 0.7460ms 0.4207ms 2.3772 KOps/s 2.3856 KOps/s $\color{#d91a1a}-0.35\%$
test_compile_assign_and_add_stack[eager] 3.9110ms 3.6417ms 274.5944 Ops/s 263.9524 Ops/s $\color{#35bf28}+4.03\%$
test_compile_indexing[tensor-tensordict-compile] 0.1221ms 32.8905μs 30.4039 KOps/s 28.2334 KOps/s $\textbf{\color{#35bf28}+7.69\%}$
test_compile_indexing[tensor-tensordict-eager] 1.0511ms 49.0853μs 20.3727 KOps/s 20.3076 KOps/s $\color{#35bf28}+0.32\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1111ms 29.4633μs 33.9406 KOps/s 32.8467 KOps/s $\color{#35bf28}+3.33\%$
test_compile_indexing[tensor-tensorclass-eager] 75.5410μs 29.0467μs 34.4274 KOps/s 35.3385 KOps/s $\color{#d91a1a}-2.58\%$
test_compile_indexing[tensor-pytree-compile] 95.7290μs 28.9491μs 34.5433 KOps/s 32.9601 KOps/s $\color{#35bf28}+4.80\%$
test_compile_indexing[tensor-pytree-eager] 73.3270μs 28.6086μs 34.9546 KOps/s 35.0490 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_indexing[slice-tensordict-compile] 0.1638ms 74.3798μs 13.4445 KOps/s 13.4693 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_indexing[slice-tensordict-eager] 0.5501ms 27.3785μs 36.5250 KOps/s 34.2444 KOps/s $\textbf{\color{#35bf28}+6.66\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1512ms 69.3353μs 14.4227 KOps/s 14.7557 KOps/s $\color{#d91a1a}-2.26\%$
test_compile_indexing[slice-tensorclass-eager] 95.7880μs 22.3375μs 44.7678 KOps/s 43.6262 KOps/s $\color{#35bf28}+2.62\%$
test_compile_indexing[slice-pytree-compile] 0.1462ms 69.0134μs 14.4899 KOps/s 14.6067 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_indexing[slice-pytree-eager] 74.4690μs 22.4651μs 44.5136 KOps/s 43.3276 KOps/s $\color{#35bf28}+2.74\%$
test_compile_indexing[int-tensordict-compile] 0.1459ms 74.0870μs 13.4976 KOps/s 13.5066 KOps/s $\color{#d91a1a}-0.07\%$
test_compile_indexing[int-tensordict-eager] 1.1046ms 27.5795μs 36.2589 KOps/s 35.0932 KOps/s $\color{#35bf28}+3.32\%$
test_compile_indexing[int-tensorclass-compile] 0.1622ms 69.1238μs 14.4668 KOps/s 14.8093 KOps/s $\color{#d91a1a}-2.31\%$
test_compile_indexing[int-tensorclass-eager] 0.2910ms 22.4763μs 44.4913 KOps/s 43.9466 KOps/s $\color{#35bf28}+1.24\%$
test_compile_indexing[int-pytree-compile] 0.1705ms 69.8266μs 14.3212 KOps/s 14.6290 KOps/s $\color{#d91a1a}-2.10\%$
test_compile_indexing[int-pytree-eager] 64.2700μs 22.1624μs 45.1215 KOps/s 44.0837 KOps/s $\color{#35bf28}+2.35\%$
test_mod_add[eager] 86.5940μs 24.3472μs 41.0724 KOps/s 38.9623 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_mod_add[compile] 0.1124ms 36.7599μs 27.2035 KOps/s 25.4371 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_mod_add[compile-overhead] 0.1103ms 37.4027μs 26.7360 KOps/s 25.5667 KOps/s $\color{#35bf28}+4.57\%$
test_mod_wrap[eager] 0.3999ms 0.2059ms 4.8574 KOps/s 4.7879 KOps/s $\color{#35bf28}+1.45\%$
test_mod_wrap[compile] 0.4364ms 0.2286ms 4.3743 KOps/s 4.2372 KOps/s $\color{#35bf28}+3.23\%$
test_mod_wrap[compile-overhead] 0.6142ms 0.2303ms 4.3425 KOps/s 4.3084 KOps/s $\color{#35bf28}+0.79\%$
test_mod_wrap_and_backward[eager] 11.6447ms 10.7584ms 92.9502 Ops/s 83.0654 Ops/s $\textbf{\color{#35bf28}+11.90\%}$
test_mod_wrap_and_backward[compile] 12.6682ms 11.2616ms 88.7974 Ops/s 80.9293 Ops/s $\textbf{\color{#35bf28}+9.72\%}$
test_mod_wrap_and_backward[compile-overhead] 12.8029ms 11.7793ms 84.8949 Ops/s 76.8817 Ops/s $\textbf{\color{#35bf28}+10.42\%}$
test_seq_add[eager] 0.2220ms 91.2945μs 10.9536 KOps/s 10.8818 KOps/s $\color{#35bf28}+0.66\%$
test_seq_add[compile] 0.1412ms 62.2768μs 16.0573 KOps/s 15.8304 KOps/s $\color{#35bf28}+1.43\%$
test_seq_add[compile-overhead] 0.1345ms 60.3700μs 16.5645 KOps/s 16.0084 KOps/s $\color{#35bf28}+3.47\%$
test_seq_wrap[eager] 0.9736ms 0.3842ms 2.6028 KOps/s 2.6324 KOps/s $\color{#d91a1a}-1.13\%$
test_seq_wrap[compile] 1.3953ms 0.2676ms 3.7376 KOps/s 3.7310 KOps/s $\color{#35bf28}+0.18\%$
test_seq_wrap[compile-overhead] 1.3553ms 0.2638ms 3.7912 KOps/s 3.7460 KOps/s $\color{#35bf28}+1.21\%$
test_func_call_runtime[False-eager] 0.9140ms 0.5100ms 1.9607 KOps/s 1.9093 KOps/s $\color{#35bf28}+2.70\%$
test_func_call_runtime[False-compile] 1.0596ms 0.5069ms 1.9729 KOps/s 1.9354 KOps/s $\color{#35bf28}+1.94\%$
test_func_call_runtime[False-compile-overhead] 0.6384ms 0.4962ms 2.0153 KOps/s 1.9451 KOps/s $\color{#35bf28}+3.61\%$
test_func_call_runtime[True-eager] 1.1820ms 0.7278ms 1.3740 KOps/s 1.3395 KOps/s $\color{#35bf28}+2.58\%$
test_func_call_runtime[True-compile] 0.8449ms 0.5120ms 1.9533 KOps/s 1.8909 KOps/s $\color{#35bf28}+3.30\%$
test_func_call_runtime[True-compile-overhead] 0.8746ms 0.5095ms 1.9627 KOps/s 1.9130 KOps/s $\color{#35bf28}+2.60\%$
test_func_call_cm_runtime[False-eager] 0.8985ms 0.5053ms 1.9789 KOps/s 1.9385 KOps/s $\color{#35bf28}+2.08\%$
test_func_call_cm_runtime[False-compile] 0.6190ms 0.4984ms 2.0063 KOps/s 1.9294 KOps/s $\color{#35bf28}+3.99\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6502ms 0.4988ms 2.0049 KOps/s 1.9384 KOps/s $\color{#35bf28}+3.43\%$
test_func_call_cm_runtime[True-eager] 0.9924ms 0.8487ms 1.1782 KOps/s 1.1234 KOps/s $\color{#35bf28}+4.88\%$
test_func_call_cm_runtime[True-compile] 1.0836ms 0.7246ms 1.3801 KOps/s 1.3542 KOps/s $\color{#35bf28}+1.92\%$
test_func_call_cm_runtime[True-compile-overhead] 1.2257ms 0.7350ms 1.3606 KOps/s 1.3427 KOps/s $\color{#35bf28}+1.33\%$
test_vmap_func_call_cm_runtime[eager] 2.4160ms 1.8641ms 536.4582 Ops/s 519.7422 Ops/s $\color{#35bf28}+3.22\%$
test_vmap_func_call_cm_runtime[compile] 2.6960ms 1.9084ms 524.0116 Ops/s 505.5672 Ops/s $\color{#35bf28}+3.65\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.6632ms 1.9053ms 524.8545 Ops/s 506.0725 Ops/s $\color{#35bf28}+3.71\%$
test_distributed 0.2607ms 0.1242ms 8.0518 KOps/s 7.7921 KOps/s $\color{#35bf28}+3.33\%$
test_tdmodule 38.1920μs 16.9065μs 59.1490 KOps/s 53.2644 KOps/s $\textbf{\color{#35bf28}+11.05\%}$
test_tdmodule_dispatch 54.3320μs 35.3376μs 28.2985 KOps/s 27.4899 KOps/s $\color{#35bf28}+2.94\%$
test_tdseq 42.1180μs 20.7290μs 48.2415 KOps/s 45.8338 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_tdseq_dispatch 71.5230μs 40.8272μs 24.4935 KOps/s 23.7227 KOps/s $\color{#35bf28}+3.25\%$
test_instantiation_functorch 1.7908ms 1.5856ms 630.6907 Ops/s 613.4312 Ops/s $\color{#35bf28}+2.81\%$
test_instantiation_td 1.9742ms 1.1625ms 860.1888 Ops/s 843.1359 Ops/s $\color{#35bf28}+2.02\%$
test_exec_functorch 0.4194ms 0.1826ms 5.4777 KOps/s 5.4454 KOps/s $\color{#35bf28}+0.59\%$
test_exec_functional_call 0.2801ms 0.1719ms 5.8173 KOps/s 5.7499 KOps/s $\color{#35bf28}+1.17\%$
test_exec_td 0.2430ms 0.1670ms 5.9872 KOps/s 5.8888 KOps/s $\color{#35bf28}+1.67\%$
test_exec_td_decorator 1.0469ms 0.2199ms 4.5471 KOps/s 4.3907 KOps/s $\color{#35bf28}+3.56\%$
test_vmap_mlp_speed[True-True] 0.9610ms 0.6419ms 1.5580 KOps/s 1.5537 KOps/s $\color{#35bf28}+0.27\%$
test_vmap_mlp_speed[True-False] 0.8668ms 0.6456ms 1.5490 KOps/s 1.5673 KOps/s $\color{#d91a1a}-1.17\%$
test_vmap_mlp_speed[False-True] 0.8030ms 0.4988ms 2.0049 KOps/s 2.0044 KOps/s $\color{#35bf28}+0.03\%$
test_vmap_mlp_speed[False-False] 0.7794ms 0.4966ms 2.0138 KOps/s 2.0322 KOps/s $\color{#d91a1a}-0.91\%$
test_vmap_mlp_speed_decorator[True-True] 1.4049ms 0.6247ms 1.6008 KOps/s 1.5974 KOps/s $\color{#35bf28}+0.21\%$
test_vmap_mlp_speed_decorator[True-False] 1.0215ms 0.6189ms 1.6159 KOps/s 1.5905 KOps/s $\color{#35bf28}+1.60\%$
test_vmap_mlp_speed_decorator[False-True] 0.7300ms 0.5062ms 1.9757 KOps/s 1.8883 KOps/s $\color{#35bf28}+4.63\%$
test_vmap_mlp_speed_decorator[False-False] 0.8666ms 0.5063ms 1.9750 KOps/s 1.9434 KOps/s $\color{#35bf28}+1.63\%$
test_to_module_speed[True] 1.5307ms 1.2838ms 778.9516 Ops/s 755.1510 Ops/s $\color{#35bf28}+3.15\%$
test_to_module_speed[False] 1.3788ms 1.2464ms 802.3412 Ops/s 779.9303 Ops/s $\color{#35bf28}+2.87\%$
test_tc_init 76.4920μs 44.6406μs 22.4012 KOps/s 21.8162 KOps/s $\color{#35bf28}+2.68\%$
test_tc_init_nested 0.1589ms 86.5809μs 11.5499 KOps/s 11.1188 KOps/s $\color{#35bf28}+3.88\%$
test_tc_first_layer_tensor 16.8410μs 1.5769μs 634.1392 KOps/s 634.0496 KOps/s $\color{#35bf28}+0.01\%$
test_tc_first_layer_nontensor 46.7770μs 4.8887μs 204.5529 KOps/s 208.9749 KOps/s $\color{#d91a1a}-2.12\%$
test_tc_second_layer_tensor 18.0430μs 2.8746μs 347.8732 KOps/s 342.8935 KOps/s $\color{#35bf28}+1.45\%$
test_tc_second_layer_nontensor 46.9070μs 6.3099μs 158.4811 KOps/s 159.8643 KOps/s $\color{#d91a1a}-0.87\%$
test_unbind 0.4668s 13.1897ms 75.8165 Ops/s 77.2346 Ops/s $\color{#d91a1a}-1.84\%$
test_full_like 7.8778ms 6.8227ms 146.5697 Ops/s 123.1951 Ops/s $\textbf{\color{#35bf28}+18.97\%}$
test_zeros_like 12.1542ms 6.5106ms 153.5945 Ops/s 343.8066 Ops/s $\textbf{\color{#d91a1a}-55.33\%}$
test_ones_like 17.3182ms 8.1008ms 123.4452 Ops/s 283.8320 Ops/s $\textbf{\color{#d91a1a}-56.51\%}$
test_clone 16.0662ms 9.6616ms 103.5024 Ops/s 190.0642 Ops/s $\textbf{\color{#d91a1a}-45.54\%}$
test_squeeze 70.1110μs 12.8497μs 77.8226 KOps/s 77.2793 KOps/s $\color{#35bf28}+0.70\%$
test_unsqueeze 0.1780ms 94.6330μs 10.5671 KOps/s 10.6218 KOps/s $\color{#d91a1a}-0.51\%$
test_split 0.5272ms 0.1945ms 5.1414 KOps/s 4.9864 KOps/s $\color{#35bf28}+3.11\%$
test_permute 0.5094ms 0.2277ms 4.3917 KOps/s 4.3368 KOps/s $\color{#35bf28}+1.26\%$
test_stack 27.9671ms 25.0587ms 39.9063 Ops/s 39.7633 Ops/s $\color{#35bf28}+0.36\%$
test_cat 26.6968ms 24.8799ms 40.1931 Ops/s 40.6181 Ops/s $\color{#d91a1a}-1.05\%$

Copy link

github-actions bot commented Sep 11, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 228. Improved: $\large\color{#35bf28}50$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.6514ms 14.9261μs 66.9967 KOps/s 66.4069 KOps/s $\color{#35bf28}+0.89\%$
test_plain_set_stack_nested 39.6020μs 15.0438μs 66.4727 KOps/s 65.5422 KOps/s $\color{#35bf28}+1.42\%$
test_plain_set_nested_inplace 51.8030μs 15.9367μs 62.7484 KOps/s 62.2163 KOps/s $\color{#35bf28}+0.86\%$
test_plain_set_stack_nested_inplace 47.5920μs 15.7533μs 63.4787 KOps/s 62.6470 KOps/s $\color{#35bf28}+1.33\%$
test_items 26.5310μs 2.8922μs 345.7579 KOps/s 345.4870 KOps/s $\color{#35bf28}+0.08\%$
test_items_nested 0.3550ms 0.3158ms 3.1668 KOps/s 3.2197 KOps/s $\color{#d91a1a}-1.64\%$
test_items_nested_locked 0.3633ms 0.3157ms 3.1678 KOps/s 3.1755 KOps/s $\color{#d91a1a}-0.24\%$
test_items_nested_leaf 85.2940μs 63.5398μs 15.7382 KOps/s 15.8589 KOps/s $\color{#d91a1a}-0.76\%$
test_items_stack_nested 0.3662ms 0.3192ms 3.1326 KOps/s 3.2007 KOps/s $\color{#d91a1a}-2.13\%$
test_items_stack_nested_leaf 0.1179ms 64.6095μs 15.4776 KOps/s 15.3885 KOps/s $\color{#35bf28}+0.58\%$
test_items_stack_nested_locked 0.4911ms 0.3172ms 3.1523 KOps/s 3.1736 KOps/s $\color{#d91a1a}-0.67\%$
test_keys 41.1320μs 3.4333μs 291.2648 KOps/s 293.8637 KOps/s $\color{#d91a1a}-0.88\%$
test_keys_nested 81.0450μs 55.2874μs 18.0873 KOps/s 18.0979 KOps/s $\color{#d91a1a}-0.06\%$
test_keys_nested_locked 1.9791ms 60.4679μs 16.5377 KOps/s 16.6672 KOps/s $\color{#d91a1a}-0.78\%$
test_keys_nested_leaf 75.4850μs 46.8898μs 21.3266 KOps/s 21.4808 KOps/s $\color{#d91a1a}-0.72\%$
test_keys_stack_nested 89.3750μs 55.5851μs 17.9904 KOps/s 18.1204 KOps/s $\color{#d91a1a}-0.72\%$
test_keys_stack_nested_leaf 72.2240μs 47.0151μs 21.2698 KOps/s 21.1171 KOps/s $\color{#35bf28}+0.72\%$
test_keys_stack_nested_locked 98.5550μs 60.4043μs 16.5551 KOps/s 16.5501 KOps/s $\color{#35bf28}+0.03\%$
test_values 5.2803μs 0.8063μs 1.2403 MOps/s 1.2466 MOps/s $\color{#d91a1a}-0.51\%$
test_values_nested 55.1430μs 27.4572μs 36.4203 KOps/s 36.2484 KOps/s $\color{#35bf28}+0.47\%$
test_values_nested_locked 55.3330μs 29.3581μs 34.0622 KOps/s 34.1040 KOps/s $\color{#d91a1a}-0.12\%$
test_values_nested_leaf 52.1730μs 24.1460μs 41.4148 KOps/s 40.9809 KOps/s $\color{#35bf28}+1.06\%$
test_values_stack_nested 70.2340μs 27.9314μs 35.8020 KOps/s 34.6152 KOps/s $\color{#35bf28}+3.43\%$
test_values_stack_nested_leaf 0.1904ms 24.5474μs 40.7375 KOps/s 39.4354 KOps/s $\color{#35bf28}+3.30\%$
test_values_stack_nested_locked 0.1368ms 29.8349μs 33.5178 KOps/s 32.4375 KOps/s $\color{#35bf28}+3.33\%$
test_membership 1.7696μs 0.4702μs 2.1269 MOps/s 2.1291 MOps/s $\color{#d91a1a}-0.11\%$
test_membership_nested 16.9010μs 1.7436μs 573.5240 KOps/s 572.5567 KOps/s $\color{#35bf28}+0.17\%$
test_membership_nested_leaf 11.5107μs 1.6966μs 589.4073 KOps/s 581.7905 KOps/s $\color{#35bf28}+1.31\%$
test_membership_stacked_nested 50.3130μs 1.7742μs 563.6446 KOps/s 562.8179 KOps/s $\color{#35bf28}+0.15\%$
test_membership_stacked_nested_leaf 17.4910μs 1.7768μs 562.8190 KOps/s 565.5517 KOps/s $\color{#d91a1a}-0.48\%$
test_membership_nested_last 31.4420μs 2.6454μs 378.0137 KOps/s 386.4512 KOps/s $\color{#d91a1a}-2.18\%$
test_membership_nested_leaf_last 24.5120μs 2.6409μs 378.6530 KOps/s 387.2543 KOps/s $\color{#d91a1a}-2.22\%$
test_membership_stacked_nested_last 32.4320μs 2.6013μs 384.4222 KOps/s 311.8175 KOps/s $\textbf{\color{#35bf28}+23.28\%}$
test_membership_stacked_nested_leaf_last 24.1710μs 2.6199μs 381.6881 KOps/s 313.2047 KOps/s $\textbf{\color{#35bf28}+21.87\%}$
test_nested_getleaf 37.5420μs 6.1533μs 162.5148 KOps/s 163.7981 KOps/s $\color{#d91a1a}-0.78\%$
test_nested_get 0.2014ms 5.7306μs 174.5011 KOps/s 173.9776 KOps/s $\color{#35bf28}+0.30\%$
test_stacked_getleaf 30.9610μs 6.0826μs 164.4022 KOps/s 166.3888 KOps/s $\color{#d91a1a}-1.19\%$
test_stacked_get 40.2030μs 5.7138μs 175.0152 KOps/s 177.8551 KOps/s $\color{#d91a1a}-1.60\%$
test_nested_getitemleaf 29.5120μs 6.1969μs 161.3722 KOps/s 162.4935 KOps/s $\color{#d91a1a}-0.69\%$
test_nested_getitem 30.8920μs 5.7839μs 172.8928 KOps/s 172.3563 KOps/s $\color{#35bf28}+0.31\%$
test_stacked_getitemleaf 52.0320μs 6.1872μs 161.6230 KOps/s 163.4587 KOps/s $\color{#d91a1a}-1.12\%$
test_stacked_getitem 32.6720μs 5.7103μs 175.1213 KOps/s 173.5111 KOps/s $\color{#35bf28}+0.93\%$
test_lock_nested 4.6840ms 0.4207ms 2.3772 KOps/s 2.3286 KOps/s $\color{#35bf28}+2.09\%$
test_lock_stack_nested 0.4896ms 0.3863ms 2.5888 KOps/s 2.5613 KOps/s $\color{#35bf28}+1.08\%$
test_unlock_nested 0.7771ms 0.3583ms 2.7907 KOps/s 2.7175 KOps/s $\color{#35bf28}+2.69\%$
test_unlock_stack_nested 0.4442ms 0.3257ms 3.0704 KOps/s 3.0250 KOps/s $\color{#35bf28}+1.50\%$
test_flatten_speed 0.2614ms 81.2788μs 12.3033 KOps/s 12.5452 KOps/s $\color{#d91a1a}-1.93\%$
test_unflatten_speed 0.3220ms 0.2817ms 3.5494 KOps/s 3.5589 KOps/s $\color{#d91a1a}-0.27\%$
test_common_ops 1.4757ms 1.2784ms 782.2133 Ops/s 715.3211 Ops/s $\textbf{\color{#35bf28}+9.35\%}$
test_creation 22.2810μs 1.4910μs 670.7071 KOps/s 685.7907 KOps/s $\color{#d91a1a}-2.20\%$
test_creation_empty 54.3130μs 17.2695μs 57.9055 KOps/s 58.0553 KOps/s $\color{#d91a1a}-0.26\%$
test_creation_nested_1 54.5430μs 19.0441μs 52.5098 KOps/s 52.1250 KOps/s $\color{#35bf28}+0.74\%$
test_creation_nested_2 51.5030μs 21.9907μs 45.4738 KOps/s 44.8428 KOps/s $\color{#35bf28}+1.41\%$
test_clone 60.4230μs 29.1330μs 34.3254 KOps/s 33.6433 KOps/s $\color{#35bf28}+2.03\%$
test_getitem[int] 1.2589ms 16.3756μs 61.0665 KOps/s 59.7698 KOps/s $\color{#35bf28}+2.17\%$
test_getitem[slice_int] 0.1689ms 27.8924μs 35.8520 KOps/s 34.8501 KOps/s $\color{#35bf28}+2.88\%$
test_getitem[range] 0.2202ms 0.1096ms 9.1257 KOps/s 9.0603 KOps/s $\color{#35bf28}+0.72\%$
test_getitem[tuple] 0.1189ms 23.6985μs 42.1968 KOps/s 38.4168 KOps/s $\textbf{\color{#35bf28}+9.84\%}$
test_getitem[list] 0.2777ms 97.8718μs 10.2174 KOps/s 9.4981 KOps/s $\textbf{\color{#35bf28}+7.57\%}$
test_setitem_dim[int] 69.1230μs 44.5219μs 22.4609 KOps/s 20.5426 KOps/s $\textbf{\color{#35bf28}+9.34\%}$
test_setitem_dim[slice_int] 0.2062ms 67.9904μs 14.7080 KOps/s 14.5897 KOps/s $\color{#35bf28}+0.81\%$
test_setitem_dim[range] 0.1545ms 0.1268ms 7.8836 KOps/s 7.7972 KOps/s $\color{#35bf28}+1.11\%$
test_setitem_dim[tuple] 0.1887ms 60.9941μs 16.3950 KOps/s 15.3802 KOps/s $\textbf{\color{#35bf28}+6.60\%}$
test_setitem 0.1886ms 42.4822μs 23.5393 KOps/s 21.2928 KOps/s $\textbf{\color{#35bf28}+10.55\%}$
test_set 0.2243ms 43.4876μs 22.9951 KOps/s 21.8438 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_set_shared 0.3759ms 51.3771μs 19.4639 KOps/s 19.0598 KOps/s $\color{#35bf28}+2.12\%$
test_update 0.2019ms 51.1781μs 19.5396 KOps/s 18.0391 KOps/s $\textbf{\color{#35bf28}+8.32\%}$
test_update_nested 0.2413ms 58.1344μs 17.2015 KOps/s 16.0899 KOps/s $\textbf{\color{#35bf28}+6.91\%}$
test_update__nested 0.2085ms 58.3793μs 17.1293 KOps/s 15.2957 KOps/s $\textbf{\color{#35bf28}+11.99\%}$
test_set_nested 0.1889ms 44.3097μs 22.5684 KOps/s 20.7481 KOps/s $\textbf{\color{#35bf28}+8.77\%}$
test_set_nested_new 0.1969ms 47.7227μs 20.9544 KOps/s 19.3306 KOps/s $\textbf{\color{#35bf28}+8.40\%}$
test_select 0.5409ms 61.8753μs 16.1615 KOps/s 15.1368 KOps/s $\textbf{\color{#35bf28}+6.77\%}$
test_select_nested 70.8340μs 42.3386μs 23.6191 KOps/s 23.3867 KOps/s $\color{#35bf28}+0.99\%$
test_exclude_nested 0.1708ms 58.5018μs 17.0935 KOps/s 16.6223 KOps/s $\color{#35bf28}+2.83\%$
test_empty[True] 0.2715ms 0.2431ms 4.1131 KOps/s 4.1205 KOps/s $\color{#d91a1a}-0.18\%$
test_empty[False] 3.5252μs 0.7446μs 1.3429 MOps/s 1.3614 MOps/s $\color{#d91a1a}-1.35\%$
test_to 73.3540μs 25.2846μs 39.5498 KOps/s 34.2046 KOps/s $\textbf{\color{#35bf28}+15.63\%}$
test_to_nonblocking 54.2130μs 24.5241μs 40.7761 KOps/s 34.9949 KOps/s $\textbf{\color{#35bf28}+16.52\%}$
test_unbind_speed 0.3551ms 0.2834ms 3.5281 KOps/s 3.4417 KOps/s $\color{#35bf28}+2.51\%$
test_unbind_speed_stack0 0.4064ms 0.2826ms 3.5386 KOps/s 3.4559 KOps/s $\color{#35bf28}+2.39\%$
test_unbind_speed_stack1 93.5340ms 0.7179ms 1.3930 KOps/s 1.3878 KOps/s $\color{#35bf28}+0.37\%$
test_split 95.4369ms 2.2020ms 454.1411 Ops/s 456.1561 Ops/s $\color{#d91a1a}-0.44\%$
test_chunk 95.4070ms 2.2101ms 452.4593 Ops/s 453.0936 Ops/s $\color{#d91a1a}-0.14\%$
test_creation[device0] 0.3415ms 0.1262ms 7.9219 KOps/s 7.7358 KOps/s $\color{#35bf28}+2.41\%$
test_creation_from_tensor 0.3751ms 0.1280ms 7.8145 KOps/s 7.4055 KOps/s $\textbf{\color{#35bf28}+5.52\%}$
test_add_one[memmap_tensor0] 0.1314ms 8.9055μs 112.2906 KOps/s 107.1517 KOps/s $\color{#35bf28}+4.80\%$
test_contiguous[memmap_tensor0] 38.6720μs 2.2045μs 453.6206 KOps/s 434.0575 KOps/s $\color{#35bf28}+4.51\%$
test_stack[memmap_tensor0] 35.1420μs 6.6089μs 151.3111 KOps/s 136.7417 KOps/s $\textbf{\color{#35bf28}+10.65\%}$
test_memmaptd_index 1.0846ms 0.4308ms 2.3214 KOps/s 2.2738 KOps/s $\color{#35bf28}+2.09\%$
test_memmaptd_index_astensor 0.7358ms 0.4877ms 2.0504 KOps/s 2.0006 KOps/s $\color{#35bf28}+2.49\%$
test_memmaptd_index_op 1.4485ms 1.0602ms 943.2355 Ops/s 903.1674 Ops/s $\color{#35bf28}+4.44\%$
test_serialize_model 0.1289s 0.1285s 7.7795 Ops/s 7.7773 Ops/s $\color{#35bf28}+0.03\%$
test_serialize_model_pickle 1.3709s 1.2169s 0.8218 Ops/s 0.8244 Ops/s $\color{#d91a1a}-0.31\%$
test_serialize_weights 0.2174s 0.1415s 7.0683 Ops/s 7.7707 Ops/s $\textbf{\color{#d91a1a}-9.04\%}$
test_serialize_weights_returnearly 0.2469s 56.4852ms 17.7037 Ops/s 15.7402 Ops/s $\textbf{\color{#35bf28}+12.47\%}$
test_serialize_weights_pickle 1.3646s 1.2153s 0.8228 Ops/s 0.8219 Ops/s $\color{#35bf28}+0.11\%$
test_reshape_pytree 0.1223ms 36.3498μs 27.5105 KOps/s 26.1450 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_reshape_td 0.1697ms 42.4095μs 23.5796 KOps/s 22.3848 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_view_pytree 64.3130μs 36.0009μs 27.7771 KOps/s 26.5827 KOps/s $\color{#35bf28}+4.49\%$
test_view_td 0.1306ms 47.4525μs 21.0737 KOps/s 20.1497 KOps/s $\color{#35bf28}+4.59\%$
test_unbind_pytree 0.1396ms 35.1143μs 28.4784 KOps/s 27.6548 KOps/s $\color{#35bf28}+2.98\%$
test_unbind_td 0.4433ms 43.5653μs 22.9540 KOps/s 22.0131 KOps/s $\color{#35bf28}+4.27\%$
test_split_pytree 0.2231ms 46.7419μs 21.3941 KOps/s 21.2976 KOps/s $\color{#35bf28}+0.45\%$
test_split_td 0.6813ms 56.2747μs 17.7700 KOps/s 17.5443 KOps/s $\color{#35bf28}+1.29\%$
test_add_pytree 0.2056ms 56.5120μs 17.6954 KOps/s 17.0978 KOps/s $\color{#35bf28}+3.49\%$
test_add_td 0.2396ms 91.6473μs 10.9114 KOps/s 10.1651 KOps/s $\textbf{\color{#35bf28}+7.34\%}$
test_compile_add_one_nested[tensordict-compile] 0.4101ms 0.2090ms 4.7848 KOps/s 4.5530 KOps/s $\textbf{\color{#35bf28}+5.09\%}$
test_compile_add_one_nested[tensordict-eager] 0.2956ms 0.1566ms 6.3848 KOps/s 6.3454 KOps/s $\color{#35bf28}+0.62\%$
test_compile_add_one_nested[pytree-compile] 0.2859ms 0.1458ms 6.8572 KOps/s 6.6952 KOps/s $\color{#35bf28}+2.42\%$
test_compile_add_one_nested[pytree-eager] 0.3608ms 0.1854ms 5.3923 KOps/s 5.4190 KOps/s $\color{#d91a1a}-0.49\%$
test_compile_copy_nested[tensordict-compile] 96.2650μs 21.4911μs 46.5309 KOps/s 48.0107 KOps/s $\color{#d91a1a}-3.08\%$
test_compile_copy_nested[tensordict-eager] 91.5750μs 44.0388μs 22.7073 KOps/s 21.8589 KOps/s $\color{#35bf28}+3.88\%$
test_compile_copy_nested[pytree-compile] 0.2179ms 62.4114μs 16.0227 KOps/s 15.6158 KOps/s $\color{#35bf28}+2.61\%$
test_compile_copy_nested[pytree-eager] 81.4540μs 49.4013μs 20.2424 KOps/s 20.2417 KOps/s $+0.00\%$
test_compile_add_one_flat[tensordict-compile] 0.3718ms 0.3199ms 3.1264 KOps/s 2.9682 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_compile_add_one_flat[tensordict-eager] 0.3460ms 0.2072ms 4.8268 KOps/s 4.7215 KOps/s $\color{#35bf28}+2.23\%$
test_compile_add_one_flat[tensorclass-compile] 0.2393ms 0.1276ms 7.8378 KOps/s 7.4036 KOps/s $\textbf{\color{#35bf28}+5.87\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1230ms 60.6710μs 16.4823 KOps/s 15.4431 KOps/s $\textbf{\color{#35bf28}+6.73\%}$
test_compile_add_one_flat[pytree-compile] 0.4645ms 0.3178ms 3.1464 KOps/s 2.8835 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_compile_add_one_flat[pytree-eager] 0.7824ms 0.6239ms 1.6028 KOps/s 1.5748 KOps/s $\color{#35bf28}+1.78\%$
test_compile_add_self_flat[tensordict-eager] 0.3529ms 0.2480ms 4.0322 KOps/s 3.9637 KOps/s $\color{#35bf28}+1.73\%$
test_compile_add_self_flat[tensordict-compile] 0.3755ms 0.3205ms 3.1197 KOps/s 2.9339 KOps/s $\textbf{\color{#35bf28}+6.33\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1528ms 71.0347μs 14.0776 KOps/s 13.2373 KOps/s $\textbf{\color{#35bf28}+6.35\%}$
test_compile_add_self_flat[tensorclass-compile] 0.2551ms 0.1287ms 7.7674 KOps/s 7.3379 KOps/s $\textbf{\color{#35bf28}+5.85\%}$
test_compile_add_self_flat[pytree-eager] 0.6917ms 0.5251ms 1.9042 KOps/s 1.8293 KOps/s $\color{#35bf28}+4.09\%$
test_compile_add_self_flat[pytree-compile] 0.4046ms 0.3185ms 3.1394 KOps/s 2.9715 KOps/s $\textbf{\color{#35bf28}+5.65\%}$
test_compile_copy_flat[tensordict-compile] 0.1098ms 18.4235μs 54.2786 KOps/s 53.3083 KOps/s $\color{#35bf28}+1.82\%$
test_compile_copy_flat[tensordict-eager] 65.5630μs 27.3091μs 36.6178 KOps/s 37.1136 KOps/s $\color{#d91a1a}-1.34\%$
test_compile_copy_flat[pytree-compile] 0.1127ms 69.8641μs 14.3135 KOps/s 14.3410 KOps/s $\color{#d91a1a}-0.19\%$
test_compile_copy_flat[pytree-eager] 86.9750μs 51.5708μs 19.3908 KOps/s 19.4543 KOps/s $\color{#d91a1a}-0.33\%$
test_compile_assign_and_add[tensordict-compile] 2.3247ms 0.8121ms 1.2313 KOps/s 1.0941 KOps/s $\textbf{\color{#35bf28}+12.55\%}$
test_compile_assign_and_add[tensordict-eager] 3.5099ms 3.1492ms 317.5377 Ops/s 303.9940 Ops/s $\color{#35bf28}+4.46\%$
test_compile_assign_and_add[pytree-compile] 2.2893ms 0.8028ms 1.2456 KOps/s 1.1045 KOps/s $\textbf{\color{#35bf28}+12.78\%}$
test_compile_assign_and_add[pytree-eager] 3.3448ms 3.1310ms 319.3817 Ops/s 294.5170 Ops/s $\textbf{\color{#35bf28}+8.44\%}$
test_compile_indexing[tensor-tensordict-compile] 0.2590ms 0.1088ms 9.1892 KOps/s 8.8437 KOps/s $\color{#35bf28}+3.91\%$
test_compile_indexing[tensor-tensordict-eager] 0.2027ms 60.1744μs 16.6184 KOps/s 14.7427 KOps/s $\textbf{\color{#35bf28}+12.72\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.2661ms 0.1037ms 9.6452 KOps/s 9.4257 KOps/s $\color{#35bf28}+2.33\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2105ms 45.2470μs 22.1009 KOps/s 22.3963 KOps/s $\color{#d91a1a}-1.32\%$
test_compile_indexing[tensor-pytree-compile] 0.2947ms 0.1068ms 9.3657 KOps/s 9.3817 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_indexing[tensor-pytree-eager] 0.2174ms 45.2771μs 22.0862 KOps/s 22.3843 KOps/s $\color{#d91a1a}-1.33\%$
test_compile_indexing[slice-tensordict-compile] 0.2867ms 0.1394ms 7.1724 KOps/s 6.7591 KOps/s $\textbf{\color{#35bf28}+6.12\%}$
test_compile_indexing[slice-tensordict-eager] 0.2713ms 25.8138μs 38.7389 KOps/s 37.2753 KOps/s $\color{#35bf28}+3.93\%$
test_compile_indexing[slice-tensorclass-compile] 0.2818ms 0.1326ms 7.5398 KOps/s 7.3647 KOps/s $\color{#35bf28}+2.38\%$
test_compile_indexing[slice-tensorclass-eager] 70.7940μs 21.1782μs 47.2183 KOps/s 46.5154 KOps/s $\color{#35bf28}+1.51\%$
test_compile_indexing[slice-pytree-compile] 0.2983ms 0.1376ms 7.2692 KOps/s 7.3426 KOps/s $\color{#d91a1a}-1.00\%$
test_compile_indexing[slice-pytree-eager] 0.1422ms 21.1067μs 47.3783 KOps/s 47.2710 KOps/s $\color{#35bf28}+0.23\%$
test_compile_indexing[int-tensordict-compile] 0.2778ms 0.1439ms 6.9509 KOps/s 6.8242 KOps/s $\color{#35bf28}+1.86\%$
test_compile_indexing[int-tensordict-eager] 0.5163ms 25.7435μs 38.8447 KOps/s 38.7692 KOps/s $\color{#35bf28}+0.19\%$
test_compile_indexing[int-tensorclass-compile] 0.2755ms 0.1358ms 7.3627 KOps/s 7.0545 KOps/s $\color{#35bf28}+4.37\%$
test_compile_indexing[int-tensorclass-eager] 60.1640μs 22.4886μs 44.4669 KOps/s 47.7510 KOps/s $\textbf{\color{#d91a1a}-6.88\%}$
test_compile_indexing[int-pytree-compile] 0.2970ms 0.1356ms 7.3771 KOps/s 7.0429 KOps/s $\color{#35bf28}+4.74\%$
test_compile_indexing[int-pytree-eager] 0.3866ms 22.3070μs 44.8290 KOps/s 46.8846 KOps/s $\color{#d91a1a}-4.38\%$
test_mod_add[eager] 0.1962ms 34.3479μs 29.1139 KOps/s 28.4694 KOps/s $\color{#35bf28}+2.26\%$
test_mod_add[compile] 0.2186ms 73.3171μs 13.6394 KOps/s 13.3429 KOps/s $\color{#35bf28}+2.22\%$
test_mod_add[compile-overhead] 0.2607ms 0.1359ms 7.3610 KOps/s 6.9809 KOps/s $\textbf{\color{#35bf28}+5.44\%}$
test_mod_wrap[eager] 0.3864ms 0.2441ms 4.0974 KOps/s 3.7931 KOps/s $\textbf{\color{#35bf28}+8.02\%}$
test_mod_wrap[compile] 0.4148ms 0.3112ms 3.2132 KOps/s 3.1062 KOps/s $\color{#35bf28}+3.44\%$
test_mod_wrap[compile-overhead] 7.6371ms 4.0700ms 245.7001 Ops/s 255.1650 Ops/s $\color{#d91a1a}-3.71\%$
test_mod_wrap_and_backward[eager] 1.5508ms 1.3449ms 743.5661 Ops/s 689.5112 Ops/s $\textbf{\color{#35bf28}+7.84\%}$
test_mod_wrap_and_backward[compile] 2.4506ms 1.3253ms 754.5177 Ops/s 686.7726 Ops/s $\textbf{\color{#35bf28}+9.86\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3216ms 0.9064ms 1.1033 KOps/s 971.3738 Ops/s $\textbf{\color{#35bf28}+13.58\%}$
test_seq_add[eager] 0.2591ms 0.1041ms 9.6096 KOps/s 9.4666 KOps/s $\color{#35bf28}+1.51\%$
test_seq_add[compile] 0.5786ms 81.0229μs 12.3422 KOps/s 11.6530 KOps/s $\textbf{\color{#35bf28}+5.91\%}$
test_seq_add[compile-overhead] 0.2559ms 0.1150ms 8.6960 KOps/s 8.3618 KOps/s $\color{#35bf28}+4.00\%$
test_seq_wrap[eager] 0.5628ms 0.4045ms 2.4721 KOps/s 2.4371 KOps/s $\color{#35bf28}+1.44\%$
test_seq_wrap[compile] 0.4674ms 0.3165ms 3.1593 KOps/s 2.9739 KOps/s $\textbf{\color{#35bf28}+6.23\%}$
test_seq_wrap[compile-overhead] 0.3138ms 0.2257ms 4.4307 KOps/s 4.2634 KOps/s $\color{#35bf28}+3.92\%$
test_func_call_runtime[False-eager] 0.9792ms 0.7851ms 1.2738 KOps/s 1.2522 KOps/s $\color{#35bf28}+1.73\%$
test_func_call_runtime[False-compile] 0.9952ms 0.8381ms 1.1931 KOps/s 1.2254 KOps/s $\color{#d91a1a}-2.63\%$
test_func_call_runtime[False-compile-overhead] 0.4943ms 0.3602ms 2.7759 KOps/s 2.6744 KOps/s $\color{#35bf28}+3.80\%$
test_func_call_runtime[True-eager] 1.0529ms 0.8946ms 1.1178 KOps/s 1.0886 KOps/s $\color{#35bf28}+2.68\%$
test_func_call_runtime[True-compile] 0.9912ms 0.8276ms 1.2084 KOps/s 1.1793 KOps/s $\color{#35bf28}+2.47\%$
test_func_call_runtime[True-compile-overhead] 0.5413ms 0.3944ms 2.5353 KOps/s 2.4603 KOps/s $\color{#35bf28}+3.05\%$
test_func_call_cm_runtime[False-eager] 0.8320ms 0.7268ms 1.3759 KOps/s 1.3249 KOps/s $\color{#35bf28}+3.85\%$
test_func_call_cm_runtime[False-compile] 0.9537ms 0.8208ms 1.2184 KOps/s 1.2195 KOps/s $\color{#d91a1a}-0.09\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4854ms 0.3631ms 2.7540 KOps/s 2.6586 KOps/s $\color{#35bf28}+3.59\%$
test_func_call_cm_runtime[True-eager] 1.1361ms 0.9874ms 1.0128 KOps/s 974.6907 Ops/s $\color{#35bf28}+3.91\%$
test_func_call_cm_runtime[True-compile] 1.0106ms 0.8526ms 1.1729 KOps/s 1.1243 KOps/s $\color{#35bf28}+4.32\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5669ms 0.4198ms 2.3820 KOps/s 2.2912 KOps/s $\color{#35bf28}+3.96\%$
test_vmap_func_call_cm_runtime[eager] 2.5694ms 2.0724ms 482.5296 Ops/s 478.0238 Ops/s $\color{#35bf28}+0.94\%$
test_vmap_func_call_cm_runtime[compile] 1.0803ms 0.8977ms 1.1140 KOps/s 1.1174 KOps/s $\color{#d91a1a}-0.31\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5252ms 0.4230ms 2.3640 KOps/s 2.2650 KOps/s $\color{#35bf28}+4.37\%$
test_distributed 3.1119ms 0.2189ms 4.5688 KOps/s 8.4800 KOps/s $\textbf{\color{#d91a1a}-46.12\%}$
test_tdmodule 45.9230μs 15.1584μs 65.9702 KOps/s 66.6670 KOps/s $\color{#d91a1a}-1.05\%$
test_tdmodule_dispatch 51.3720μs 30.3527μs 32.9460 KOps/s 32.3328 KOps/s $\color{#35bf28}+1.90\%$
test_tdseq 35.5420μs 15.9980μs 62.5077 KOps/s 63.8824 KOps/s $\color{#d91a1a}-2.15\%$
test_tdseq_dispatch 63.5530μs 33.1984μs 30.1220 KOps/s 29.8781 KOps/s $\color{#35bf28}+0.82\%$
test_instantiation_functorch 2.0231ms 1.8516ms 540.0629 Ops/s 521.4223 Ops/s $\color{#35bf28}+3.57\%$
test_instantiation_td 1.7997ms 1.1885ms 841.3810 Ops/s 816.5350 Ops/s $\color{#35bf28}+3.04\%$
test_exec_functorch 0.2901ms 0.2094ms 4.7752 KOps/s 4.6653 KOps/s $\color{#35bf28}+2.36\%$
test_exec_functional_call 0.3567ms 0.2086ms 4.7939 KOps/s 4.6622 KOps/s $\color{#35bf28}+2.82\%$
test_exec_td 0.3587ms 0.2137ms 4.6804 KOps/s 4.5080 KOps/s $\color{#35bf28}+3.82\%$
test_exec_td_decorator 0.5882ms 0.2545ms 3.9288 KOps/s 3.6734 KOps/s $\textbf{\color{#35bf28}+6.95\%}$
test_vmap_mlp_speed[True-True] 0.8233ms 0.6853ms 1.4591 KOps/s 1.3949 KOps/s $\color{#35bf28}+4.60\%$
test_vmap_mlp_speed[True-False] 0.8225ms 0.6831ms 1.4639 KOps/s 1.3809 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_vmap_mlp_speed[False-True] 0.9909ms 0.5727ms 1.7462 KOps/s 1.6439 KOps/s $\textbf{\color{#35bf28}+6.23\%}$
test_vmap_mlp_speed[False-False] 0.7303ms 0.5746ms 1.7405 KOps/s 1.6340 KOps/s $\textbf{\color{#35bf28}+6.52\%}$
test_vmap_mlp_speed_decorator[True-True] 0.8747ms 0.6971ms 1.4345 KOps/s 1.4164 KOps/s $\color{#35bf28}+1.28\%$
test_vmap_mlp_speed_decorator[True-False] 1.0107ms 0.7009ms 1.4266 KOps/s 1.4089 KOps/s $\color{#35bf28}+1.26\%$
test_vmap_mlp_speed_decorator[False-True] 0.7382ms 0.5863ms 1.7056 KOps/s 1.6062 KOps/s $\textbf{\color{#35bf28}+6.19\%}$
test_vmap_mlp_speed_decorator[False-False] 0.8015ms 0.6134ms 1.6302 KOps/s 1.6065 KOps/s $\color{#35bf28}+1.48\%$
test_vmap_transformer_speed[True-True] 8.5432ms 8.3447ms 119.8372 Ops/s 117.4525 Ops/s $\color{#35bf28}+2.03\%$
test_vmap_transformer_speed[True-False] 8.5293ms 8.3130ms 120.2939 Ops/s 117.6782 Ops/s $\color{#35bf28}+2.22\%$
test_vmap_transformer_speed[False-True] 8.4607ms 8.1389ms 122.8667 Ops/s 120.2123 Ops/s $\color{#35bf28}+2.21\%$
test_vmap_transformer_speed[False-False] 8.5834ms 8.1946ms 122.0311 Ops/s 120.6638 Ops/s $\color{#35bf28}+1.13\%$
test_vmap_transformer_speed_decorator[True-True] 20.3675ms 19.5528ms 51.1434 Ops/s 50.3290 Ops/s $\color{#35bf28}+1.62\%$
test_vmap_transformer_speed_decorator[True-False] 20.2711ms 19.6276ms 50.9488 Ops/s 50.7365 Ops/s $\color{#35bf28}+0.42\%$
test_vmap_transformer_speed_decorator[False-True] 20.2029ms 19.5081ms 51.2607 Ops/s 51.2882 Ops/s $\color{#d91a1a}-0.05\%$
test_vmap_transformer_speed_decorator[False-False] 20.1613ms 19.4185ms 51.4973 Ops/s 51.0228 Ops/s $\color{#35bf28}+0.93\%$
test_to_module_speed[True] 1.5128ms 0.9251ms 1.0809 KOps/s 1.0623 KOps/s $\color{#35bf28}+1.75\%$
test_to_module_speed[False] 1.3270ms 0.8968ms 1.1151 KOps/s 1.0804 KOps/s $\color{#35bf28}+3.21\%$
test_tc_init 71.6240μs 34.7541μs 28.7736 KOps/s 28.5148 KOps/s $\color{#35bf28}+0.91\%$
test_tc_init_nested 0.1079ms 70.0218μs 14.2813 KOps/s 13.6737 KOps/s $\color{#35bf28}+4.44\%$
test_tc_first_layer_tensor 5.0817μs 0.6548μs 1.5271 MOps/s 1.4771 MOps/s $\color{#35bf28}+3.39\%$
test_tc_first_layer_nontensor 27.4410μs 2.2216μs 450.1296 KOps/s 451.4787 KOps/s $\color{#d91a1a}-0.30\%$
test_tc_second_layer_tensor 9.5403μs 1.4135μs 707.4777 KOps/s 726.4368 KOps/s $\color{#d91a1a}-2.61\%$
test_tc_second_layer_nontensor 82.0840μs 2.9802μs 335.5438 KOps/s 339.1467 KOps/s $\color{#d91a1a}-1.06\%$
test_unbind 0.1962s 11.9545ms 83.6507 Ops/s 92.4827 Ops/s $\textbf{\color{#d91a1a}-9.55\%}$
test_full_like 0.7522ms 0.5752ms 1.7386 KOps/s 1.7306 KOps/s $\color{#35bf28}+0.46\%$
test_zeros_like 0.3535ms 0.1981ms 5.0468 KOps/s 5.0517 KOps/s $\color{#d91a1a}-0.10\%$
test_ones_like 0.3570ms 0.1981ms 5.0475 KOps/s 5.0547 KOps/s $\color{#d91a1a}-0.14\%$
test_clone 0.5659ms 0.4140ms 2.4156 KOps/s 2.4102 KOps/s $\color{#35bf28}+0.22\%$
test_squeeze 0.1542ms 10.8458μs 92.2014 KOps/s 96.0070 KOps/s $\color{#d91a1a}-3.96\%$
test_unsqueeze 0.2372ms 78.1162μs 12.8014 KOps/s 13.0979 KOps/s $\color{#d91a1a}-2.26\%$
test_split 0.3846ms 0.1589ms 6.2951 KOps/s 6.0507 KOps/s $\color{#35bf28}+4.04\%$
test_permute 0.2407ms 0.1809ms 5.5280 KOps/s 5.2144 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_stack 1.3995ms 0.8780ms 1.1389 KOps/s 1.1679 KOps/s $\color{#d91a1a}-2.49\%$
test_cat 1.2587ms 1.2318ms 811.8374 Ops/s 811.7973 Ops/s $+0.00\%$

@vmoens vmoens added the enhancement New feature or request label Sep 13, 2024
@vmoens vmoens merged commit fc323e5 into main Sep 16, 2024
46 of 54 checks passed
@vmoens vmoens deleted the cudagraphsmodule branch September 16, 2024 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants