Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] TensorDict.softmax #1163

Merged
merged 1 commit into from
Jan 7, 2025
Merged

[Feature] TensorDict.softmax #1163

merged 1 commit into from
Jan 7, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 7, 2025

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 7, 2025
ghstack-source-id: a88bebc23e6aaa02ec297db72dbda68ec9628ce7
Pull Request resolved: #1163
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 7, 2025
@vmoens vmoens added the enhancement New feature or request label Jan 7, 2025
Copy link

github-actions bot commented Jan 7, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 217. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}43$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 41.5370μs 20.6825μs 48.3500 KOps/s 50.2283 KOps/s $\color{#d91a1a}-3.74\%$
test_plain_set_stack_nested 46.6270μs 21.0241μs 47.5645 KOps/s 50.3000 KOps/s $\textbf{\color{#d91a1a}-5.44\%}$
test_plain_set_nested_inplace 75.5000μs 22.8587μs 43.7470 KOps/s 45.6296 KOps/s $\color{#d91a1a}-4.13\%$
test_plain_set_stack_nested_inplace 70.0400μs 22.7993μs 43.8609 KOps/s 45.4315 KOps/s $\color{#d91a1a}-3.46\%$
test_items 34.6240μs 4.2400μs 235.8478 KOps/s 234.7238 KOps/s $\color{#35bf28}+0.48\%$
test_items_nested 0.6321ms 0.4123ms 2.4253 KOps/s 2.4725 KOps/s $\color{#d91a1a}-1.91\%$
test_items_nested_locked 0.7342ms 0.4094ms 2.4426 KOps/s 2.4712 KOps/s $\color{#d91a1a}-1.16\%$
test_items_nested_leaf 0.1469ms 77.0068μs 12.9859 KOps/s 13.0968 KOps/s $\color{#d91a1a}-0.85\%$
test_items_stack_nested 0.6686ms 0.4118ms 2.4286 KOps/s 2.4431 KOps/s $\color{#d91a1a}-0.59\%$
test_items_stack_nested_leaf 0.1358ms 80.6572μs 12.3982 KOps/s 12.7020 KOps/s $\color{#d91a1a}-2.39\%$
test_items_stack_nested_locked 0.8507ms 0.4113ms 2.4311 KOps/s 2.4437 KOps/s $\color{#d91a1a}-0.51\%$
test_keys 31.1280μs 3.5769μs 279.5703 KOps/s 278.5926 KOps/s $\color{#35bf28}+0.35\%$
test_keys_nested 0.3112ms 0.1683ms 5.9419 KOps/s 5.9631 KOps/s $\color{#d91a1a}-0.36\%$
test_keys_nested_locked 0.8168ms 0.1740ms 5.7463 KOps/s 5.7824 KOps/s $\color{#d91a1a}-0.62\%$
test_keys_nested_leaf 1.8908ms 0.1465ms 6.8239 KOps/s 6.8921 KOps/s $\color{#d91a1a}-0.99\%$
test_keys_stack_nested 0.2594ms 0.1654ms 6.0472 KOps/s 6.0089 KOps/s $\color{#35bf28}+0.64\%$
test_keys_stack_nested_leaf 0.2337ms 0.1437ms 6.9607 KOps/s 6.9153 KOps/s $\color{#35bf28}+0.66\%$
test_keys_stack_nested_locked 0.2863ms 0.1719ms 5.8182 KOps/s 5.8255 KOps/s $\color{#d91a1a}-0.13\%$
test_values 10.2336μs 1.0755μs 929.7863 KOps/s 965.1051 KOps/s $\color{#d91a1a}-3.66\%$
test_values_nested 99.6960μs 63.3521μs 15.7848 KOps/s 16.0818 KOps/s $\color{#d91a1a}-1.85\%$
test_values_nested_locked 0.1190ms 63.1521μs 15.8348 KOps/s 16.0108 KOps/s $\color{#d91a1a}-1.10\%$
test_values_nested_leaf 0.1429ms 72.0608μs 13.8772 KOps/s 13.8986 KOps/s $\color{#d91a1a}-0.15\%$
test_values_stack_nested 0.1576ms 63.1629μs 15.8321 KOps/s 14.8917 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_values_stack_nested_leaf 0.1381ms 72.0790μs 13.8737 KOps/s 13.6705 KOps/s $\color{#35bf28}+1.49\%$
test_values_stack_nested_locked 0.1099ms 63.5926μs 15.7251 KOps/s 15.6537 KOps/s $\color{#35bf28}+0.46\%$
test_membership 13.9060μs 0.8950μs 1.1173 MOps/s 1.1129 MOps/s $\color{#35bf28}+0.40\%$
test_membership_nested 55.7250μs 2.9220μs 342.2286 KOps/s 344.5993 KOps/s $\color{#d91a1a}-0.69\%$
test_membership_nested_leaf 44.5530μs 2.9653μs 337.2365 KOps/s 347.9678 KOps/s $\color{#d91a1a}-3.08\%$
test_membership_stacked_nested 45.7550μs 2.9083μs 343.8493 KOps/s 345.5924 KOps/s $\color{#d91a1a}-0.50\%$
test_membership_stacked_nested_leaf 23.7140μs 2.9143μs 343.1352 KOps/s 345.7949 KOps/s $\color{#d91a1a}-0.77\%$
test_membership_nested_last 28.7130μs 4.3573μs 229.5011 KOps/s 232.4218 KOps/s $\color{#d91a1a}-1.26\%$
test_membership_nested_leaf_last 35.3560μs 4.3806μs 228.2789 KOps/s 227.4231 KOps/s $\color{#35bf28}+0.38\%$
test_membership_stacked_nested_last 33.9930μs 5.1236μs 195.1742 KOps/s 231.0654 KOps/s $\textbf{\color{#d91a1a}-15.53\%}$
test_membership_stacked_nested_leaf_last 26.6300μs 5.1558μs 193.9568 KOps/s 229.8296 KOps/s $\textbf{\color{#d91a1a}-15.61\%}$
test_nested_getleaf 39.9040μs 10.6623μs 93.7883 KOps/s 92.9834 KOps/s $\color{#35bf28}+0.87\%$
test_nested_get 31.3180μs 10.1690μs 98.3380 KOps/s 97.1560 KOps/s $\color{#35bf28}+1.22\%$
test_stacked_getleaf 45.8150μs 10.5876μs 94.4501 KOps/s 93.2720 KOps/s $\color{#35bf28}+1.26\%$
test_stacked_get 40.9260μs 10.2551μs 97.5128 KOps/s 99.2313 KOps/s $\color{#d91a1a}-1.73\%$
test_nested_getitemleaf 35.9670μs 11.5475μs 86.5990 KOps/s 89.4442 KOps/s $\color{#d91a1a}-3.18\%$
test_nested_getitem 66.6640μs 10.0908μs 99.0997 KOps/s 95.4990 KOps/s $\color{#35bf28}+3.77\%$
test_stacked_getitemleaf 35.8670μs 11.0900μs 90.1716 KOps/s 90.4617 KOps/s $\color{#d91a1a}-0.32\%$
test_stacked_getitem 0.1015ms 10.3202μs 96.8973 KOps/s 96.2920 KOps/s $\color{#35bf28}+0.63\%$
test_lock_nested 1.7318ms 0.4606ms 2.1709 KOps/s 2.2143 KOps/s $\color{#d91a1a}-1.96\%$
test_lock_stack_nested 0.6989ms 0.4320ms 2.3146 KOps/s 2.3316 KOps/s $\color{#d91a1a}-0.73\%$
test_unlock_nested 0.7824ms 0.3763ms 2.6577 KOps/s 2.6725 KOps/s $\color{#d91a1a}-0.55\%$
test_unlock_stack_nested 0.5296ms 0.3447ms 2.9013 KOps/s 2.9212 KOps/s $\color{#d91a1a}-0.68\%$
test_flatten_speed 0.2016ms 0.1025ms 9.7582 KOps/s 10.2545 KOps/s $\color{#d91a1a}-4.84\%$
test_unflatten_speed 0.9683ms 0.5414ms 1.8472 KOps/s 1.8784 KOps/s $\color{#d91a1a}-1.66\%$
test_common_ops 4.3316ms 0.8173ms 1.2235 KOps/s 1.3503 KOps/s $\textbf{\color{#d91a1a}-9.39\%}$
test_creation 59.3110μs 2.5145μs 397.6939 KOps/s 398.8308 KOps/s $\color{#d91a1a}-0.29\%$
test_creation_empty 34.6850μs 12.2506μs 81.6286 KOps/s 100.8409 KOps/s $\textbf{\color{#d91a1a}-19.05\%}$
test_creation_nested_1 60.8940μs 15.0769μs 66.3268 KOps/s 78.6192 KOps/s $\textbf{\color{#d91a1a}-15.64\%}$
test_creation_nested_2 50.2240μs 19.7710μs 50.5790 KOps/s 57.7699 KOps/s $\textbf{\color{#d91a1a}-12.45\%}$
test_clone 0.2061ms 13.7638μs 72.6543 KOps/s 74.6216 KOps/s $\color{#d91a1a}-2.64\%$
test_getitem[int] 0.7716ms 13.1608μs 75.9832 KOps/s 80.0537 KOps/s $\textbf{\color{#d91a1a}-5.08\%}$
test_getitem[slice_int] 0.1398ms 25.2190μs 39.6526 KOps/s 42.0850 KOps/s $\textbf{\color{#d91a1a}-5.78\%}$
test_getitem[range] 0.1743ms 50.9847μs 19.6137 KOps/s 21.0974 KOps/s $\textbf{\color{#d91a1a}-7.03\%}$
test_getitem[tuple] 0.1398ms 20.8360μs 47.9938 KOps/s 50.6781 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_getitem[list] 0.1757ms 46.4644μs 21.5219 KOps/s 23.3545 KOps/s $\textbf{\color{#d91a1a}-7.85\%}$
test_setitem_dim[int] 56.0840μs 27.2023μs 36.7616 KOps/s 41.3567 KOps/s $\textbf{\color{#d91a1a}-11.11\%}$
test_setitem_dim[slice_int] 96.2100μs 55.4690μs 18.0281 KOps/s 19.3574 KOps/s $\textbf{\color{#d91a1a}-6.87\%}$
test_setitem_dim[range] 0.1144ms 75.5158μs 13.2423 KOps/s 13.7425 KOps/s $\color{#d91a1a}-3.64\%$
test_setitem_dim[tuple] 76.5330μs 43.6422μs 22.9136 KOps/s 24.5134 KOps/s $\textbf{\color{#d91a1a}-6.53\%}$
test_setitem 82.5630μs 21.5303μs 46.4462 KOps/s 52.0931 KOps/s $\textbf{\color{#d91a1a}-10.84\%}$
test_set 0.1518ms 21.2768μs 46.9995 KOps/s 53.5514 KOps/s $\textbf{\color{#d91a1a}-12.23\%}$
test_set_shared 1.1785ms 0.1707ms 5.8587 KOps/s 6.0046 KOps/s $\color{#d91a1a}-2.43\%$
test_update 0.3041ms 24.1677μs 41.3776 KOps/s 48.1336 KOps/s $\textbf{\color{#d91a1a}-14.04\%}$
test_update_nested 0.3735ms 34.5055μs 28.9809 KOps/s 32.2272 KOps/s $\textbf{\color{#d91a1a}-10.07\%}$
test_update__nested 0.5907ms 34.6472μs 28.8623 KOps/s 30.0819 KOps/s $\color{#d91a1a}-4.05\%$
test_set_nested 0.2893ms 23.4473μs 42.6488 KOps/s 48.7508 KOps/s $\textbf{\color{#d91a1a}-12.52\%}$
test_set_nested_new 97.2110μs 28.5926μs 34.9741 KOps/s 39.2617 KOps/s $\textbf{\color{#d91a1a}-10.92\%}$
test_select 0.1053ms 45.1337μs 22.1564 KOps/s 24.3946 KOps/s $\textbf{\color{#d91a1a}-9.17\%}$
test_select_nested 0.1229ms 62.9693μs 15.8808 KOps/s 16.1964 KOps/s $\color{#d91a1a}-1.95\%$
test_exclude_nested 0.1558ms 83.3851μs 11.9925 KOps/s 12.4878 KOps/s $\color{#d91a1a}-3.97\%$
test_empty[True] 0.7173ms 0.4244ms 2.3563 KOps/s 2.4443 KOps/s $\color{#d91a1a}-3.60\%$
test_empty[False] 6.7400μs 1.3977μs 715.4814 KOps/s 724.0916 KOps/s $\color{#d91a1a}-1.19\%$
test_unbind_speed 0.5209ms 0.2712ms 3.6875 KOps/s 3.7251 KOps/s $\color{#d91a1a}-1.01\%$
test_unbind_speed_stack0 0.4450ms 0.2643ms 3.7834 KOps/s 3.7704 KOps/s $\color{#35bf28}+0.35\%$
test_unbind_speed_stack1 99.9301ms 0.7836ms 1.2762 KOps/s 1.3639 KOps/s $\textbf{\color{#d91a1a}-6.43\%}$
test_split 1.7887ms 1.6159ms 618.8355 Ops/s 572.1426 Ops/s $\textbf{\color{#35bf28}+8.16\%}$
test_chunk 0.1065s 1.9526ms 512.1475 Ops/s 572.1762 Ops/s $\textbf{\color{#d91a1a}-10.49\%}$
test_consolidate_njt[False-None] 8.6133ms 8.2197ms 121.6591 Ops/s 124.1283 Ops/s $\color{#d91a1a}-1.99\%$
test_creation[device0] 4.5817ms 94.0892μs 10.6282 KOps/s 10.8214 KOps/s $\color{#d91a1a}-1.79\%$
test_creation_from_tensor 0.2392ms 94.6658μs 10.5635 KOps/s 10.4551 KOps/s $\color{#35bf28}+1.04\%$
test_add_one[memmap_tensor0] 0.2157ms 4.9457μs 202.1948 KOps/s 215.6424 KOps/s $\textbf{\color{#d91a1a}-6.24\%}$
test_contiguous[memmap_tensor0] 16.2710μs 0.5116μs 1.9548 MOps/s 1.9567 MOps/s $\color{#d91a1a}-0.10\%$
test_stack[memmap_tensor0] 51.4860μs 3.4080μs 293.4293 KOps/s 303.3288 KOps/s $\color{#d91a1a}-3.26\%$
test_memmaptd_index 0.9423ms 0.2414ms 4.1428 KOps/s 4.1801 KOps/s $\color{#d91a1a}-0.89\%$
test_memmaptd_index_astensor 0.7457ms 0.3327ms 3.0058 KOps/s 3.0785 KOps/s $\color{#d91a1a}-2.36\%$
test_memmaptd_index_op 0.9747ms 0.6105ms 1.6379 KOps/s 1.8106 KOps/s $\textbf{\color{#d91a1a}-9.54\%}$
test_serialize_model 0.1396s 0.1167s 8.5699 Ops/s 8.6516 Ops/s $\color{#d91a1a}-0.95\%$
test_serialize_model_pickle 0.4443s 0.3881s 2.5767 Ops/s 2.5225 Ops/s $\color{#35bf28}+2.15\%$
test_serialize_weights 0.1191s 0.1118s 8.9415 Ops/s 8.9459 Ops/s $\color{#d91a1a}-0.05\%$
test_serialize_weights_returnearly 0.2569s 0.1713s 5.8385 Ops/s 6.3687 Ops/s $\textbf{\color{#d91a1a}-8.33\%}$
test_serialize_weights_pickle 0.5637s 0.4202s 2.3798 Ops/s 2.5910 Ops/s $\textbf{\color{#d91a1a}-8.15\%}$
test_serialize_weights_filesystem 0.1497s 0.1459s 6.8559 Ops/s 7.0825 Ops/s $\color{#d91a1a}-3.20\%$
test_serialize_model_filesystem 0.1579s 0.1470s 6.8014 Ops/s 6.0522 Ops/s $\textbf{\color{#35bf28}+12.38\%}$
test_reshape_pytree 61.2740μs 26.1822μs 38.1939 KOps/s 36.1869 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_reshape_td 71.1330μs 32.4472μs 30.8193 KOps/s 29.8905 KOps/s $\color{#35bf28}+3.11\%$
test_view_pytree 80.2300μs 26.3375μs 37.9686 KOps/s 36.8598 KOps/s $\color{#35bf28}+3.01\%$
test_view_td 89.6470μs 39.3617μs 25.4054 KOps/s 26.3126 KOps/s $\color{#d91a1a}-3.45\%$
test_unbind_pytree 90.5980μs 29.9187μs 33.4239 KOps/s 33.7910 KOps/s $\color{#d91a1a}-1.09\%$
test_unbind_td 0.3213ms 40.2202μs 24.8631 KOps/s 25.4624 KOps/s $\color{#d91a1a}-2.35\%$
test_split_pytree 72.7350μs 29.8642μs 33.4849 KOps/s 33.5438 KOps/s $\color{#d91a1a}-0.18\%$
test_split_td 0.5292ms 46.0176μs 21.7308 KOps/s 22.5075 KOps/s $\color{#d91a1a}-3.45\%$
test_add_pytree 98.0220μs 36.5232μs 27.3799 KOps/s 28.2580 KOps/s $\color{#d91a1a}-3.11\%$
test_add_td 0.1535ms 60.6723μs 16.4820 KOps/s 19.4438 KOps/s $\textbf{\color{#d91a1a}-15.23\%}$
test_compile_add_one_nested[tensordict-compile] 0.1326ms 63.2784μs 15.8032 KOps/s 15.7512 KOps/s $\color{#35bf28}+0.33\%$
test_compile_add_one_nested[tensordict-eager] 0.4057ms 0.1713ms 5.8363 KOps/s 5.8779 KOps/s $\color{#d91a1a}-0.71\%$
test_compile_add_one_nested[pytree-compile] 0.1220ms 46.8053μs 21.3651 KOps/s 21.9480 KOps/s $\color{#d91a1a}-2.66\%$
test_compile_add_one_nested[pytree-eager] 0.2588ms 0.1234ms 8.1044 KOps/s 8.3408 KOps/s $\color{#d91a1a}-2.83\%$
test_compile_copy_nested[tensordict-compile] 60.5520μs 26.6251μs 37.5585 KOps/s 39.9828 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_compile_copy_nested[tensordict-eager] 0.1404ms 59.6573μs 16.7624 KOps/s 16.8436 KOps/s $\color{#d91a1a}-0.48\%$
test_compile_copy_nested[pytree-compile] 0.1841ms 79.2504μs 12.6182 KOps/s 12.4856 KOps/s $\color{#35bf28}+1.06\%$
test_compile_copy_nested[pytree-eager] 0.1217ms 66.8844μs 14.9512 KOps/s 14.4961 KOps/s $\color{#35bf28}+3.14\%$
test_compile_add_one_flat[tensordict-compile] 0.2004ms 0.1057ms 9.4627 KOps/s 9.3111 KOps/s $\color{#35bf28}+1.63\%$
test_compile_add_one_flat[tensordict-eager] 0.3856ms 0.2165ms 4.6196 KOps/s 4.6415 KOps/s $\color{#d91a1a}-0.47\%$
test_compile_add_one_flat[tensorclass-compile] 0.1191ms 46.3217μs 21.5882 KOps/s 21.7262 KOps/s $\color{#d91a1a}-0.64\%$
test_compile_add_one_flat[tensorclass-eager] 0.4782ms 66.2118μs 15.1031 KOps/s 15.6771 KOps/s $\color{#d91a1a}-3.66\%$
test_compile_add_one_flat[pytree-compile] 0.1883ms 0.1049ms 9.5303 KOps/s 9.8052 KOps/s $\color{#d91a1a}-2.80\%$
test_compile_add_one_flat[pytree-eager] 0.3137ms 0.2028ms 4.9315 KOps/s 4.9823 KOps/s $\color{#d91a1a}-1.02\%$
test_compile_add_self_flat[tensordict-eager] 0.4047ms 0.2381ms 4.1997 KOps/s 4.2734 KOps/s $\color{#d91a1a}-1.72\%$
test_compile_add_self_flat[tensordict-compile] 0.2192ms 0.1102ms 9.0738 KOps/s 9.3835 KOps/s $\color{#d91a1a}-3.30\%$
test_compile_add_self_flat[tensorclass-eager] 0.2081ms 61.6018μs 16.2333 KOps/s 16.9484 KOps/s $\color{#d91a1a}-4.22\%$
test_compile_add_self_flat[tensorclass-compile] 0.6302ms 46.3413μs 21.5790 KOps/s 21.9848 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_add_self_flat[pytree-eager] 0.5670ms 0.1582ms 6.3192 KOps/s 6.3745 KOps/s $\color{#d91a1a}-0.87\%$
test_compile_add_self_flat[pytree-compile] 0.1819ms 0.1039ms 9.6263 KOps/s 9.7487 KOps/s $\color{#d91a1a}-1.26\%$
test_compile_copy_flat[tensordict-compile] 66.2230μs 20.9059μs 47.8333 KOps/s 46.8017 KOps/s $\color{#35bf28}+2.20\%$
test_compile_copy_flat[tensordict-eager] 0.1869ms 69.0696μs 14.4782 KOps/s 15.0720 KOps/s $\color{#d91a1a}-3.94\%$
test_compile_copy_flat[pytree-compile] 0.1526ms 81.4036μs 12.2845 KOps/s 12.1129 KOps/s $\color{#35bf28}+1.42\%$
test_compile_copy_flat[pytree-eager] 0.1275ms 69.4040μs 14.4084 KOps/s 14.2296 KOps/s $\color{#35bf28}+1.26\%$
test_compile_assign_and_add[tensordict-compile] 0.2973ms 0.2077ms 4.8144 KOps/s 4.9115 KOps/s $\color{#d91a1a}-1.98\%$
test_compile_assign_and_add[tensordict-eager] 1.7441ms 1.3551ms 737.9644 Ops/s 767.7678 Ops/s $\color{#d91a1a}-3.88\%$
test_compile_assign_and_add[pytree-compile] 0.3445ms 0.2058ms 4.8582 KOps/s 5.0184 KOps/s $\color{#d91a1a}-3.19\%$
test_compile_assign_and_add[pytree-eager] 1.0298ms 0.7781ms 1.2851 KOps/s 1.3159 KOps/s $\color{#d91a1a}-2.34\%$
test_compile_assign_and_add_stack[compile] 0.5505ms 0.4539ms 2.2032 KOps/s 2.2073 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_assign_and_add_stack[eager] 3.4719ms 2.8530ms 350.5109 Ops/s 397.3091 Ops/s $\textbf{\color{#d91a1a}-11.78\%}$
test_compile_indexing[tensor-tensordict-compile] 89.0550μs 35.7888μs 27.9417 KOps/s 27.6866 KOps/s $\color{#35bf28}+0.92\%$
test_compile_indexing[tensor-tensordict-eager] 0.5046ms 34.8341μs 28.7075 KOps/s 30.7524 KOps/s $\textbf{\color{#d91a1a}-6.65\%}$
test_compile_indexing[tensor-tensorclass-compile] 85.3390μs 28.8157μs 34.7034 KOps/s 33.6261 KOps/s $\color{#35bf28}+3.20\%$
test_compile_indexing[tensor-tensorclass-eager] 63.1980μs 23.7396μs 42.1238 KOps/s 43.1039 KOps/s $\color{#d91a1a}-2.27\%$
test_compile_indexing[tensor-pytree-compile] 86.5020μs 29.3176μs 34.1092 KOps/s 32.6862 KOps/s $\color{#35bf28}+4.35\%$
test_compile_indexing[tensor-pytree-eager] 62.9070μs 23.6513μs 42.2810 KOps/s 43.0006 KOps/s $\color{#d91a1a}-1.67\%$
test_compile_indexing[slice-tensordict-compile] 0.1144ms 51.6897μs 19.3462 KOps/s 19.2085 KOps/s $\color{#35bf28}+0.72\%$
test_compile_indexing[slice-tensordict-eager] 0.6323ms 21.0795μs 47.4395 KOps/s 49.7339 KOps/s $\color{#d91a1a}-4.61\%$
test_compile_indexing[slice-tensorclass-compile] 90.3680μs 44.7894μs 22.3267 KOps/s 22.5137 KOps/s $\color{#d91a1a}-0.83\%$
test_compile_indexing[slice-tensorclass-eager] 55.8440μs 19.1388μs 52.2500 KOps/s 52.3170 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_indexing[slice-pytree-compile] 0.1075ms 45.5828μs 21.9381 KOps/s 22.3765 KOps/s $\color{#d91a1a}-1.96\%$
test_compile_indexing[slice-pytree-eager] 53.6400μs 19.0920μs 52.3780 KOps/s 53.3801 KOps/s $\color{#d91a1a}-1.88\%$
test_compile_indexing[int-tensordict-compile] 0.1181ms 53.2007μs 18.7967 KOps/s 19.1123 KOps/s $\color{#d91a1a}-1.65\%$
test_compile_indexing[int-tensordict-eager] 1.0372ms 20.7369μs 48.2232 KOps/s 49.8400 KOps/s $\color{#d91a1a}-3.24\%$
test_compile_indexing[int-tensorclass-compile] 97.2210μs 45.3591μs 22.0463 KOps/s 22.1314 KOps/s $\color{#d91a1a}-0.38\%$
test_compile_indexing[int-tensorclass-eager] 50.2140μs 19.0929μs 52.3754 KOps/s 52.7099 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_indexing[int-pytree-compile] 0.1135ms 45.2173μs 22.1154 KOps/s 22.4261 KOps/s $\color{#d91a1a}-1.39\%$
test_compile_indexing[int-pytree-eager] 70.3510μs 18.8099μs 53.1634 KOps/s 52.4611 KOps/s $\color{#35bf28}+1.34\%$
test_mod_add[eager] 89.8070μs 35.6456μs 28.0540 KOps/s 31.1388 KOps/s $\textbf{\color{#d91a1a}-9.91\%}$
test_mod_add[compile] 0.1077ms 47.4500μs 21.0748 KOps/s 20.6978 KOps/s $\color{#35bf28}+1.82\%$
test_mod_add[compile-overhead] 0.1341ms 47.6796μs 20.9733 KOps/s 20.7871 KOps/s $\color{#35bf28}+0.90\%$
test_mod_wrap[eager] 0.4558ms 0.2261ms 4.4222 KOps/s 4.5681 KOps/s $\color{#d91a1a}-3.19\%$
test_mod_wrap[compile] 0.4436ms 0.2189ms 4.5680 KOps/s 4.8991 KOps/s $\textbf{\color{#d91a1a}-6.76\%}$
test_mod_wrap[compile-overhead] 0.4186ms 0.2074ms 4.8207 KOps/s 4.8474 KOps/s $\color{#d91a1a}-0.55\%$
test_mod_wrap_and_backward[eager] 18.5038ms 11.8347ms 84.4971 Ops/s 88.1358 Ops/s $\color{#d91a1a}-4.13\%$
test_mod_wrap_and_backward[compile] 17.7854ms 12.6738ms 78.9031 Ops/s 79.9234 Ops/s $\color{#d91a1a}-1.28\%$
test_mod_wrap_and_backward[compile-overhead] 18.1711ms 12.5378ms 79.7590 Ops/s 79.5108 Ops/s $\color{#35bf28}+0.31\%$
test_seq_add[eager] 0.2514ms 0.1187ms 8.4215 KOps/s 8.9715 KOps/s $\textbf{\color{#d91a1a}-6.13\%}$
test_seq_add[compile] 0.1289ms 63.0348μs 15.8642 KOps/s 15.6342 KOps/s $\color{#35bf28}+1.47\%$
test_seq_add[compile-overhead] 0.1305ms 60.3032μs 16.5829 KOps/s 16.2821 KOps/s $\color{#35bf28}+1.85\%$
test_seq_wrap[eager] 0.6984ms 0.4460ms 2.2422 KOps/s 2.3753 KOps/s $\textbf{\color{#d91a1a}-5.61\%}$
test_seq_wrap[compile] 0.3455ms 0.2292ms 4.3632 KOps/s 4.3847 KOps/s $\color{#d91a1a}-0.49\%$
test_seq_wrap[compile-overhead] 0.3589ms 0.2279ms 4.3872 KOps/s 4.4201 KOps/s $\color{#d91a1a}-0.74\%$
test_func_call_runtime[False-eager] 0.9787ms 0.5603ms 1.7849 KOps/s 1.8429 KOps/s $\color{#d91a1a}-3.15\%$
test_func_call_runtime[False-compile] 0.8218ms 0.4336ms 2.3062 KOps/s 2.4051 KOps/s $\color{#d91a1a}-4.11\%$
test_func_call_runtime[False-compile-overhead] 0.8054ms 0.4323ms 2.3132 KOps/s 2.3777 KOps/s $\color{#d91a1a}-2.71\%$
test_func_call_runtime[True-eager] 1.2615ms 0.7692ms 1.3000 KOps/s 1.3289 KOps/s $\color{#d91a1a}-2.17\%$
test_func_call_runtime[True-compile] 0.8367ms 0.4702ms 2.1266 KOps/s 2.1606 KOps/s $\color{#d91a1a}-1.57\%$
test_func_call_runtime[True-compile-overhead] 0.5929ms 0.4700ms 2.1276 KOps/s 2.1689 KOps/s $\color{#d91a1a}-1.90\%$
test_func_call_cm_runtime[False-eager] 0.9298ms 0.5527ms 1.8092 KOps/s 1.8556 KOps/s $\color{#d91a1a}-2.50\%$
test_func_call_cm_runtime[False-compile] 0.5425ms 0.4285ms 2.3336 KOps/s 2.3830 KOps/s $\color{#d91a1a}-2.07\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5835ms 0.4284ms 2.3342 KOps/s 2.3900 KOps/s $\color{#d91a1a}-2.34\%$
test_func_call_cm_runtime[True-eager] 1.4780ms 0.9118ms 1.0967 KOps/s 1.1167 KOps/s $\color{#d91a1a}-1.79\%$
test_func_call_cm_runtime[True-compile] 0.8727ms 0.4945ms 2.0222 KOps/s 2.0512 KOps/s $\color{#d91a1a}-1.42\%$
test_func_call_cm_runtime[True-compile-overhead] 0.8554ms 0.4972ms 2.0111 KOps/s 2.0674 KOps/s $\color{#d91a1a}-2.72\%$
test_vmap_func_call_cm_runtime[eager] 2.8388ms 1.9272ms 518.8976 Ops/s 525.5861 Ops/s $\color{#d91a1a}-1.27\%$
test_vmap_func_call_cm_runtime[compile] 0.9042ms 0.5249ms 1.9050 KOps/s 1.9304 KOps/s $\color{#d91a1a}-1.32\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.6972ms 0.5236ms 1.9099 KOps/s 1.9402 KOps/s $\color{#d91a1a}-1.56\%$
test_distributed 0.2557ms 0.1264ms 7.9085 KOps/s 7.8330 KOps/s $\color{#35bf28}+0.96\%$
test_tdmodule 0.1221ms 26.7583μs 37.3715 KOps/s 39.4510 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_tdmodule_dispatch 82.7050μs 48.7516μs 20.5121 KOps/s 21.7825 KOps/s $\textbf{\color{#d91a1a}-5.83\%}$
test_tdseq 60.4130μs 29.2122μs 34.2322 KOps/s 35.4830 KOps/s $\color{#d91a1a}-3.52\%$
test_tdseq_dispatch 88.5650μs 54.9177μs 18.2091 KOps/s 19.1863 KOps/s $\textbf{\color{#d91a1a}-5.09\%}$
test_instantiation_functorch 2.0519ms 1.5848ms 630.9972 Ops/s 650.7353 Ops/s $\color{#d91a1a}-3.03\%$
test_exec_functorch 0.2797ms 0.1822ms 5.4872 KOps/s 5.7314 KOps/s $\color{#d91a1a}-4.26\%$
test_exec_functional_call 0.3120ms 0.1765ms 5.6661 KOps/s 5.9956 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_exec_td_decorator 0.4668ms 0.2370ms 4.2200 KOps/s 4.3871 KOps/s $\color{#d91a1a}-3.81\%$
test_vmap_mlp_speed_decorator[True-True] 1.2409ms 0.6694ms 1.4938 KOps/s 1.5313 KOps/s $\color{#d91a1a}-2.45\%$
test_vmap_mlp_speed_decorator[True-False] 0.8693ms 0.6631ms 1.5080 KOps/s 1.5541 KOps/s $\color{#d91a1a}-2.97\%$
test_vmap_mlp_speed_decorator[False-True] 0.7308ms 0.5354ms 1.8678 KOps/s 1.9094 KOps/s $\color{#d91a1a}-2.18\%$
test_vmap_mlp_speed_decorator[False-False] 0.8585ms 0.5379ms 1.8592 KOps/s 1.9004 KOps/s $\color{#d91a1a}-2.17\%$
test_to_module_speed[True] 1.5394ms 1.3433ms 744.4354 Ops/s 742.7790 Ops/s $\color{#35bf28}+0.22\%$
test_to_module_speed[False] 1.5678ms 1.3196ms 757.8049 Ops/s 761.5808 Ops/s $\color{#d91a1a}-0.50\%$
test_tc_init 0.1266ms 49.9162μs 20.0336 KOps/s 22.4835 KOps/s $\textbf{\color{#d91a1a}-10.90\%}$
test_tc_init_nested 0.1535ms 95.7153μs 10.4477 KOps/s 11.6603 KOps/s $\textbf{\color{#d91a1a}-10.40\%}$
test_tc_first_layer_tensor 16.1000μs 1.5962μs 626.4991 KOps/s 651.0546 KOps/s $\color{#d91a1a}-3.77\%$
test_tc_first_layer_nontensor 36.9280μs 4.7918μs 208.6885 KOps/s 214.1258 KOps/s $\color{#d91a1a}-2.54\%$
test_tc_second_layer_tensor 43.7710μs 2.9261μs 341.7542 KOps/s 349.9554 KOps/s $\color{#d91a1a}-2.34\%$
test_tc_second_layer_nontensor 27.2510μs 6.1724μs 162.0106 KOps/s 167.0392 KOps/s $\color{#d91a1a}-3.01\%$
test_unbind 0.2205s 15.0916ms 66.2622 Ops/s 78.2721 Ops/s $\textbf{\color{#d91a1a}-15.34\%}$
test_full_like 9.9184ms 7.5308ms 132.7888 Ops/s 71.3174 Ops/s $\textbf{\color{#35bf28}+86.19\%}$
test_zeros_like 3.8349ms 2.9953ms 333.8515 Ops/s 134.6580 Ops/s $\textbf{\color{#35bf28}+147.93\%}$
test_ones_like 5.0114ms 3.5879ms 278.7138 Ops/s 122.6627 Ops/s $\textbf{\color{#35bf28}+127.22\%}$
test_clone 7.5466ms 5.5375ms 180.5854 Ops/s 104.5356 Ops/s $\textbf{\color{#35bf28}+72.75\%}$
test_squeeze 65.6720μs 12.2545μs 81.6029 KOps/s 84.2360 KOps/s $\color{#d91a1a}-3.13\%$
test_unsqueeze 0.1679ms 94.1699μs 10.6191 KOps/s 10.8648 KOps/s $\color{#d91a1a}-2.26\%$
test_split 0.5857ms 0.2013ms 4.9666 KOps/s 5.1904 KOps/s $\color{#d91a1a}-4.31\%$
test_permute 0.4045ms 0.2141ms 4.6711 KOps/s 4.8359 KOps/s $\color{#d91a1a}-3.41\%$
test_stack 31.5510ms 25.2450ms 39.6118 Ops/s 38.3302 Ops/s $\color{#35bf28}+3.34\%$
test_cat 31.5497ms 24.9342ms 40.1055 Ops/s 39.7962 Ops/s $\color{#35bf28}+0.78\%$

Copy link

github-actions bot commented Jan 7, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}62$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.9200μs 11.5810μs 86.3487 KOps/s 78.0322 KOps/s $\textbf{\color{#35bf28}+10.66\%}$
test_plain_set_stack_nested 32.1800μs 11.8552μs 84.3509 KOps/s 77.1647 KOps/s $\textbf{\color{#35bf28}+9.31\%}$
test_plain_set_nested_inplace 34.3010μs 12.7964μs 78.1470 KOps/s 71.9067 KOps/s $\textbf{\color{#35bf28}+8.68\%}$
test_plain_set_stack_nested_inplace 33.8910μs 12.6555μs 79.0172 KOps/s 71.6669 KOps/s $\textbf{\color{#35bf28}+10.26\%}$
test_items 20.9600μs 2.8997μs 344.8632 KOps/s 344.2328 KOps/s $\color{#35bf28}+0.18\%$
test_items_nested 0.4840ms 0.3628ms 2.7565 KOps/s 2.8037 KOps/s $\color{#d91a1a}-1.68\%$
test_items_nested_locked 0.4016ms 0.3632ms 2.7532 KOps/s 2.7807 KOps/s $\color{#d91a1a}-0.99\%$
test_items_nested_leaf 83.6310μs 59.1970μs 16.8927 KOps/s 16.9269 KOps/s $\color{#d91a1a}-0.20\%$
test_items_stack_nested 0.3954ms 0.3674ms 2.7215 KOps/s 2.7477 KOps/s $\color{#d91a1a}-0.95\%$
test_items_stack_nested_leaf 88.9210μs 61.5065μs 16.2584 KOps/s 16.6128 KOps/s $\color{#d91a1a}-2.13\%$
test_items_stack_nested_locked 0.3970ms 0.3668ms 2.7265 KOps/s 2.7536 KOps/s $\color{#d91a1a}-0.98\%$
test_keys 22.7700μs 3.4995μs 285.7533 KOps/s 289.4395 KOps/s $\color{#d91a1a}-1.27\%$
test_keys_nested 0.1109ms 81.5377μs 12.2643 KOps/s 12.3048 KOps/s $\color{#d91a1a}-0.33\%$
test_keys_nested_locked 0.8283ms 85.7745μs 11.6585 KOps/s 11.4837 KOps/s $\color{#35bf28}+1.52\%$
test_keys_nested_leaf 2.4591ms 72.3896μs 13.8141 KOps/s 13.9386 KOps/s $\color{#d91a1a}-0.89\%$
test_keys_stack_nested 0.1070ms 81.9407μs 12.2039 KOps/s 11.8830 KOps/s $\color{#35bf28}+2.70\%$
test_keys_stack_nested_leaf 97.0620μs 73.3438μs 13.6344 KOps/s 13.3885 KOps/s $\color{#35bf28}+1.84\%$
test_keys_stack_nested_locked 0.1337ms 87.8783μs 11.3794 KOps/s 11.1518 KOps/s $\color{#35bf28}+2.04\%$
test_values 5.3985μs 0.8549μs 1.1697 MOps/s 1.1654 MOps/s $\color{#35bf28}+0.37\%$
test_values_nested 54.2300μs 34.8351μs 28.7067 KOps/s 28.9358 KOps/s $\color{#d91a1a}-0.79\%$
test_values_nested_locked 61.8410μs 36.0988μs 27.7017 KOps/s 27.5624 KOps/s $\color{#35bf28}+0.51\%$
test_values_nested_leaf 67.0910μs 39.2313μs 25.4898 KOps/s 25.4928 KOps/s $\color{#d91a1a}-0.01\%$
test_values_stack_nested 75.6910μs 34.9375μs 28.6225 KOps/s 28.3641 KOps/s $\color{#35bf28}+0.91\%$
test_values_stack_nested_leaf 76.9410μs 39.5764μs 25.2676 KOps/s 25.1888 KOps/s $\color{#35bf28}+0.31\%$
test_values_stack_nested_locked 67.8510μs 36.6601μs 27.2776 KOps/s 27.0618 KOps/s $\color{#35bf28}+0.80\%$
test_membership 1.8215μs 0.5202μs 1.9223 MOps/s 1.9486 MOps/s $\color{#d91a1a}-1.35\%$
test_membership_nested 29.8910μs 2.0965μs 476.9938 KOps/s 471.7162 KOps/s $\color{#35bf28}+1.12\%$
test_membership_nested_leaf 15.9350μs 2.0096μs 497.6038 KOps/s 491.1910 KOps/s $\color{#35bf28}+1.31\%$
test_membership_stacked_nested 47.4900μs 2.1027μs 475.5694 KOps/s 463.6338 KOps/s $\color{#35bf28}+2.57\%$
test_membership_stacked_nested_leaf 31.3410μs 2.0822μs 480.2601 KOps/s 476.6002 KOps/s $\color{#35bf28}+0.77\%$
test_membership_nested_last 32.4210μs 3.1054μs 322.0221 KOps/s 324.3384 KOps/s $\color{#d91a1a}-0.71\%$
test_membership_nested_leaf_last 31.1900μs 3.0975μs 322.8447 KOps/s 315.9784 KOps/s $\color{#35bf28}+2.17\%$
test_membership_stacked_nested_last 46.7910μs 3.6182μs 276.3821 KOps/s 320.4616 KOps/s $\textbf{\color{#d91a1a}-13.76\%}$
test_membership_stacked_nested_leaf_last 26.8410μs 3.5629μs 280.6670 KOps/s 321.6560 KOps/s $\textbf{\color{#d91a1a}-12.74\%}$
test_nested_getleaf 39.3500μs 6.0683μs 164.7898 KOps/s 163.3576 KOps/s $\color{#35bf28}+0.88\%$
test_nested_get 39.1810μs 5.8815μs 170.0254 KOps/s 172.1438 KOps/s $\color{#d91a1a}-1.23\%$
test_stacked_getleaf 33.3500μs 6.1348μs 163.0058 KOps/s 161.7016 KOps/s $\color{#35bf28}+0.81\%$
test_stacked_get 35.4200μs 5.8548μs 170.7995 KOps/s 171.9199 KOps/s $\color{#d91a1a}-0.65\%$
test_nested_getitemleaf 86.1210μs 6.3183μs 158.2700 KOps/s 159.9705 KOps/s $\color{#d91a1a}-1.06\%$
test_nested_getitem 38.3800μs 5.8832μs 169.9748 KOps/s 168.5819 KOps/s $\color{#35bf28}+0.83\%$
test_stacked_getitemleaf 27.5500μs 6.2313μs 160.4807 KOps/s 160.3193 KOps/s $\color{#35bf28}+0.10\%$
test_stacked_getitem 33.0310μs 5.8979μs 169.5518 KOps/s 168.8157 KOps/s $\color{#35bf28}+0.44\%$
test_lock_nested 2.3644ms 0.3788ms 2.6396 KOps/s 2.6742 KOps/s $\color{#d91a1a}-1.29\%$
test_lock_stack_nested 0.3903ms 0.3476ms 2.8771 KOps/s 2.8604 KOps/s $\color{#35bf28}+0.58\%$
test_unlock_nested 0.6712ms 0.3161ms 3.1638 KOps/s 3.1403 KOps/s $\color{#35bf28}+0.75\%$
test_unlock_stack_nested 0.3427ms 0.2865ms 3.4902 KOps/s 3.4771 KOps/s $\color{#35bf28}+0.38\%$
test_flatten_speed 0.1248ms 75.5681μs 13.2331 KOps/s 13.3965 KOps/s $\color{#d91a1a}-1.22\%$
test_unflatten_speed 0.3631ms 0.3191ms 3.1339 KOps/s 3.0652 KOps/s $\color{#35bf28}+2.24\%$
test_common_ops 1.5321ms 0.5902ms 1.6943 KOps/s 1.5735 KOps/s $\textbf{\color{#35bf28}+7.68\%}$
test_creation 0.1272ms 1.7430μs 573.7153 KOps/s 572.0595 KOps/s $\color{#35bf28}+0.29\%$
test_creation_empty 39.8800μs 7.0503μs 141.8386 KOps/s 106.8111 KOps/s $\textbf{\color{#35bf28}+32.79\%}$
test_creation_nested_1 33.1300μs 8.7508μs 114.2751 KOps/s 90.1545 KOps/s $\textbf{\color{#35bf28}+26.75\%}$
test_creation_nested_2 48.9010μs 11.4340μs 87.4588 KOps/s 71.6002 KOps/s $\textbf{\color{#35bf28}+22.15\%}$
test_clone 37.7500μs 10.9849μs 91.0337 KOps/s 93.7749 KOps/s $\color{#d91a1a}-2.92\%$
test_getitem[int] 1.2967ms 10.9822μs 91.0567 KOps/s 92.8675 KOps/s $\color{#d91a1a}-1.95\%$
test_getitem[slice_int] 0.1043ms 21.5910μs 46.3155 KOps/s 47.5745 KOps/s $\color{#d91a1a}-2.65\%$
test_getitem[range] 0.1258ms 37.3665μs 26.7619 KOps/s 26.9623 KOps/s $\color{#d91a1a}-0.74\%$
test_getitem[tuple] 0.1162ms 18.3482μs 54.5014 KOps/s 55.1638 KOps/s $\color{#d91a1a}-1.20\%$
test_getitem[list] 0.1310ms 33.8179μs 29.5702 KOps/s 31.0787 KOps/s $\color{#d91a1a}-4.85\%$
test_setitem_dim[int] 52.2810μs 19.0133μs 52.5948 KOps/s 53.2200 KOps/s $\color{#d91a1a}-1.17\%$
test_setitem_dim[slice_int] 65.2610μs 39.0678μs 25.5965 KOps/s 25.7278 KOps/s $\color{#d91a1a}-0.51\%$
test_setitem_dim[range] 78.0710μs 53.1712μs 18.8072 KOps/s 18.8772 KOps/s $\color{#d91a1a}-0.37\%$
test_setitem_dim[tuple] 52.6110μs 32.7586μs 30.5264 KOps/s 30.5054 KOps/s $\color{#35bf28}+0.07\%$
test_setitem 39.8100μs 14.9014μs 67.1076 KOps/s 62.4946 KOps/s $\textbf{\color{#35bf28}+7.38\%}$
test_set 0.1102ms 14.2981μs 69.9396 KOps/s 64.8961 KOps/s $\textbf{\color{#35bf28}+7.77\%}$
test_set_shared 1.4656ms 0.1544ms 6.4768 KOps/s 6.5125 KOps/s $\color{#d91a1a}-0.55\%$
test_update 0.4809ms 16.4460μs 60.8052 KOps/s 52.5554 KOps/s $\textbf{\color{#35bf28}+15.70\%}$
test_update_nested 0.1116ms 21.7484μs 45.9805 KOps/s 39.6300 KOps/s $\textbf{\color{#35bf28}+16.02\%}$
test_update__nested 0.5117ms 26.3716μs 37.9195 KOps/s 38.5446 KOps/s $\color{#d91a1a}-1.62\%$
test_set_nested 0.1053ms 15.6984μs 63.7008 KOps/s 59.6106 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_set_nested_new 0.1134ms 19.4429μs 51.4327 KOps/s 52.4635 KOps/s $\color{#d91a1a}-1.96\%$
test_select 0.2117ms 31.6616μs 31.5840 KOps/s 31.1518 KOps/s $\color{#35bf28}+1.39\%$
test_select_nested 76.9510μs 43.8946μs 22.7819 KOps/s 22.1929 KOps/s $\color{#35bf28}+2.65\%$
test_exclude_nested 95.7210μs 64.4857μs 15.5073 KOps/s 15.2444 KOps/s $\color{#35bf28}+1.72\%$
test_empty[True] 0.3634ms 0.2886ms 3.4656 KOps/s 3.4423 KOps/s $\color{#35bf28}+0.68\%$
test_empty[False] 3.6050μs 0.8794μs 1.1371 MOps/s 1.1268 MOps/s $\color{#35bf28}+0.91\%$
test_to 85.9710μs 55.5945μs 17.9874 KOps/s 17.8097 KOps/s $\color{#35bf28}+1.00\%$
test_to_nonblocking 0.1001ms 47.8488μs 20.8992 KOps/s 20.5748 KOps/s $\color{#35bf28}+1.58\%$
test_unbind_speed 1.3167ms 0.2413ms 4.1447 KOps/s 4.0966 KOps/s $\color{#35bf28}+1.17\%$
test_unbind_speed_stack0 0.2985ms 0.2388ms 4.1873 KOps/s 4.1201 KOps/s $\color{#35bf28}+1.63\%$
test_unbind_speed_stack1 92.7464ms 0.6705ms 1.4915 KOps/s 1.4725 KOps/s $\color{#35bf28}+1.29\%$
test_split 0.1023s 1.6410ms 609.3768 Ops/s 580.4213 Ops/s $\color{#35bf28}+4.99\%$
test_chunk 94.5760ms 1.6440ms 608.2829 Ops/s 689.6312 Ops/s $\textbf{\color{#d91a1a}-11.80\%}$
test_consolidate[False-None] 97.0657ms 2.9420ms 339.9036 Ops/s 338.7293 Ops/s $\color{#35bf28}+0.35\%$
test_consolidate[default-None] 1.8557ms 1.6606ms 602.1899 Ops/s 587.1979 Ops/s $\color{#35bf28}+2.55\%$
test_consolidate[reduce-overhead-None] 1.8157ms 1.6654ms 600.4476 Ops/s 582.3687 Ops/s $\color{#35bf28}+3.10\%$
test_consolidate_njt[False-None] 7.0018ms 6.4374ms 155.3426 Ops/s 152.2004 Ops/s $\color{#35bf28}+2.06\%$
test_to[False-False-None] 1.7558ms 1.6903ms 591.6038 Ops/s 575.6510 Ops/s $\color{#35bf28}+2.77\%$
test_to[True-False-None] 1.5757ms 1.2922ms 773.8556 Ops/s 769.0263 Ops/s $\color{#35bf28}+0.63\%$
test_to[within-False-None] 4.1775ms 4.0587ms 246.3849 Ops/s 241.0604 Ops/s $\color{#35bf28}+2.21\%$
test_to[True-default-None] 5.4672ms 5.2765ms 189.5183 Ops/s 187.6885 Ops/s $\color{#35bf28}+0.97\%$
test_to_njt[False-False-None] 7.3878ms 7.1939ms 139.0063 Ops/s 142.5187 Ops/s $\color{#d91a1a}-2.46\%$
test_to_njt[True-False-None] 5.7487ms 5.4789ms 182.5194 Ops/s 178.3925 Ops/s $\color{#35bf28}+2.31\%$
test_to_njt[within-False-None] 12.2645ms 12.1485ms 82.3147 Ops/s 72.2224 Ops/s $\textbf{\color{#35bf28}+13.97\%}$
test_creation[device0] 0.5495ms 81.1264μs 12.3265 KOps/s 11.2823 KOps/s $\textbf{\color{#35bf28}+9.25\%}$
test_creation_from_tensor 0.5017ms 86.4161μs 11.5719 KOps/s 11.0408 KOps/s $\color{#35bf28}+4.81\%$
test_add_one[memmap_tensor0] 0.3691ms 6.8015μs 147.0270 KOps/s 142.9026 KOps/s $\color{#35bf28}+2.89\%$
test_contiguous[memmap_tensor0] 2.1380μs 0.4167μs 2.4000 MOps/s 2.4434 MOps/s $\color{#d91a1a}-1.78\%$
test_stack[memmap_tensor0] 38.5510μs 4.3727μs 228.6923 KOps/s 231.5721 KOps/s $\color{#d91a1a}-1.24\%$
test_memmaptd_index 1.7255ms 0.2484ms 4.0251 KOps/s 3.9610 KOps/s $\color{#35bf28}+1.62\%$
test_memmaptd_index_astensor 0.5828ms 0.3094ms 3.2325 KOps/s 3.1156 KOps/s $\color{#35bf28}+3.75\%$
test_memmaptd_index_op 1.0012ms 0.5616ms 1.7805 KOps/s 1.6743 KOps/s $\textbf{\color{#35bf28}+6.34\%}$
test_serialize_model 0.1321s 0.1314s 7.6118 Ops/s 7.5890 Ops/s $\color{#35bf28}+0.30\%$
test_serialize_model_pickle 1.3460s 1.1876s 0.8420 Ops/s 0.8209 Ops/s $\color{#35bf28}+2.58\%$
test_serialize_weights 0.1325s 0.1310s 7.6312 Ops/s 7.6661 Ops/s $\color{#d91a1a}-0.46\%$
test_serialize_weights_returnearly 0.3336s 63.4441ms 15.7619 Ops/s 14.4756 Ops/s $\textbf{\color{#35bf28}+8.89\%}$
test_serialize_weights_pickle 1.3739s 1.2163s 0.8221 Ops/s 0.8224 Ops/s $\color{#d91a1a}-0.03\%$
test_reshape_pytree 51.9500μs 22.3578μs 44.7271 KOps/s 44.5452 KOps/s $\color{#35bf28}+0.41\%$
test_reshape_td 59.6710μs 26.9967μs 37.0415 KOps/s 35.7033 KOps/s $\color{#35bf28}+3.75\%$
test_view_pytree 50.1110μs 21.6009μs 46.2944 KOps/s 44.9342 KOps/s $\color{#35bf28}+3.03\%$
test_view_td 59.8610μs 30.3092μs 32.9933 KOps/s 29.2340 KOps/s $\textbf{\color{#35bf28}+12.86\%}$
test_unbind_pytree 50.2710μs 27.8523μs 35.9036 KOps/s 35.2426 KOps/s $\color{#35bf28}+1.88\%$
test_unbind_td 0.6800ms 37.0917μs 26.9602 KOps/s 26.6820 KOps/s $\color{#35bf28}+1.04\%$
test_split_pytree 84.6310μs 29.8697μs 33.4788 KOps/s 32.6548 KOps/s $\color{#35bf28}+2.52\%$
test_split_td 0.8649ms 37.5477μs 26.6328 KOps/s 25.4148 KOps/s $\color{#35bf28}+4.79\%$
test_add_pytree 67.4510μs 35.3247μs 28.3088 KOps/s 29.1597 KOps/s $\color{#d91a1a}-2.92\%$
test_add_td 80.0600μs 49.3906μs 20.2468 KOps/s 18.9396 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_compile_add_one_nested[tensordict-compile] 0.1742ms 0.1199ms 8.3376 KOps/s 8.1252 KOps/s $\color{#35bf28}+2.61\%$
test_compile_add_one_nested[tensordict-eager] 0.2254ms 0.1309ms 7.6415 KOps/s 7.5766 KOps/s $\color{#35bf28}+0.86\%$
test_compile_add_one_nested[pytree-compile] 0.1315ms 96.1276μs 10.4028 KOps/s 10.1436 KOps/s $\color{#35bf28}+2.56\%$
test_compile_add_one_nested[pytree-eager] 0.3379ms 0.1480ms 6.7550 KOps/s 6.4166 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_compile_copy_nested[tensordict-compile] 52.4910μs 22.5080μs 44.4286 KOps/s 43.7646 KOps/s $\color{#35bf28}+1.52\%$
test_compile_copy_nested[tensordict-eager] 69.2300μs 29.6788μs 33.6940 KOps/s 33.2408 KOps/s $\color{#35bf28}+1.36\%$
test_compile_copy_nested[pytree-compile] 0.4040ms 65.1314μs 15.3536 KOps/s 15.0758 KOps/s $\color{#35bf28}+1.84\%$
test_compile_copy_nested[pytree-eager] 81.4810μs 49.9144μs 20.0343 KOps/s 19.9567 KOps/s $\color{#35bf28}+0.39\%$
test_compile_add_one_flat[tensordict-compile] 0.1813ms 0.1393ms 7.1808 KOps/s 6.9355 KOps/s $\color{#35bf28}+3.54\%$
test_compile_add_one_flat[tensordict-eager] 0.3023ms 0.2128ms 4.6990 KOps/s 4.6472 KOps/s $\color{#35bf28}+1.12\%$
test_compile_add_one_flat[tensorclass-compile] 0.1363ms 97.3162μs 10.2758 KOps/s 9.6295 KOps/s $\textbf{\color{#35bf28}+6.71\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1087ms 53.4577μs 18.7064 KOps/s 18.2958 KOps/s $\color{#35bf28}+2.24\%$
test_compile_add_one_flat[pytree-compile] 0.1706ms 0.1338ms 7.4749 KOps/s 7.3360 KOps/s $\color{#35bf28}+1.89\%$
test_compile_add_one_flat[pytree-eager] 0.5219ms 0.4769ms 2.0967 KOps/s 2.0564 KOps/s $\color{#35bf28}+1.96\%$
test_compile_add_self_flat[tensordict-eager] 0.3650ms 0.2544ms 3.9309 KOps/s 3.8534 KOps/s $\color{#35bf28}+2.01\%$
test_compile_add_self_flat[tensordict-compile] 0.1849ms 0.1407ms 7.1077 KOps/s 7.0782 KOps/s $\color{#35bf28}+0.42\%$
test_compile_add_self_flat[tensorclass-eager] 0.1432ms 64.3637μs 15.5367 KOps/s 15.4052 KOps/s $\color{#35bf28}+0.85\%$
test_compile_add_self_flat[tensorclass-compile] 0.1378ms 97.3720μs 10.2699 KOps/s 10.0357 KOps/s $\color{#35bf28}+2.33\%$
test_compile_add_self_flat[pytree-eager] 0.4483ms 0.3965ms 2.5222 KOps/s 2.2499 KOps/s $\textbf{\color{#35bf28}+12.10\%}$
test_compile_add_self_flat[pytree-compile] 0.1897ms 0.1332ms 7.5076 KOps/s 7.3430 KOps/s $\color{#35bf28}+2.24\%$
test_compile_copy_flat[tensordict-compile] 65.2400μs 18.6865μs 53.5147 KOps/s 53.8026 KOps/s $\color{#d91a1a}-0.54\%$
test_compile_copy_flat[tensordict-eager] 62.4600μs 31.2137μs 32.0372 KOps/s 31.3977 KOps/s $\color{#35bf28}+2.04\%$
test_compile_copy_flat[pytree-compile] 98.6310μs 71.5256μs 13.9810 KOps/s 14.0690 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_copy_flat[pytree-eager] 83.8610μs 52.1710μs 19.1677 KOps/s 19.0547 KOps/s $\color{#35bf28}+0.59\%$
test_compile_assign_and_add[tensordict-compile] 1.6039ms 0.3871ms 2.5834 KOps/s 2.2439 KOps/s $\textbf{\color{#35bf28}+15.13\%}$
test_compile_assign_and_add[tensordict-eager] 2.7966ms 2.6388ms 378.9656 Ops/s 375.7283 Ops/s $\color{#35bf28}+0.86\%$
test_compile_assign_and_add[pytree-compile] 1.5617ms 0.3783ms 2.6435 KOps/s 2.2959 KOps/s $\textbf{\color{#35bf28}+15.14\%}$
test_compile_assign_and_add[pytree-eager] 3.0672ms 2.6315ms 380.0088 Ops/s 380.8688 Ops/s $\color{#d91a1a}-0.23\%$
test_compile_indexing[tensor-tensordict-compile] 0.5408ms 0.1135ms 8.8119 KOps/s 8.4167 KOps/s $\color{#35bf28}+4.70\%$
test_compile_indexing[tensor-tensordict-eager] 0.5571ms 79.3188μs 12.6074 KOps/s 12.1770 KOps/s $\color{#35bf28}+3.53\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1570ms 0.1091ms 9.1626 KOps/s 9.1922 KOps/s $\color{#d91a1a}-0.32\%$
test_compile_indexing[tensor-tensorclass-eager] 0.4904ms 70.9064μs 14.1031 KOps/s 13.5226 KOps/s $\color{#35bf28}+4.29\%$
test_compile_indexing[tensor-pytree-compile] 0.5252ms 0.1121ms 8.9203 KOps/s 8.5657 KOps/s $\color{#35bf28}+4.14\%$
test_compile_indexing[tensor-pytree-eager] 0.4921ms 71.1533μs 14.0542 KOps/s 13.2496 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_compile_indexing[slice-tensordict-compile] 0.1419ms 0.1023ms 9.7749 KOps/s 9.0016 KOps/s $\textbf{\color{#35bf28}+8.59\%}$
test_compile_indexing[slice-tensordict-eager] 0.4433ms 17.4918μs 57.1695 KOps/s 42.6618 KOps/s $\textbf{\color{#35bf28}+34.01\%}$
test_compile_indexing[slice-tensorclass-compile] 0.5276ms 97.2176μs 10.2862 KOps/s 9.2987 KOps/s $\textbf{\color{#35bf28}+10.62\%}$
test_compile_indexing[slice-tensorclass-eager] 0.4219ms 16.2723μs 61.4540 KOps/s 56.3722 KOps/s $\textbf{\color{#35bf28}+9.01\%}$
test_compile_indexing[slice-pytree-compile] 0.5147ms 98.0239μs 10.2016 KOps/s 9.2245 KOps/s $\textbf{\color{#35bf28}+10.59\%}$
test_compile_indexing[slice-pytree-eager] 50.8900μs 16.0014μs 62.4943 KOps/s 56.3723 KOps/s $\textbf{\color{#35bf28}+10.86\%}$
test_compile_indexing[int-tensordict-compile] 0.5183ms 0.1028ms 9.7264 KOps/s 8.8503 KOps/s $\textbf{\color{#35bf28}+9.90\%}$
test_compile_indexing[int-tensordict-eager] 0.5705ms 16.9816μs 58.8872 KOps/s 50.9253 KOps/s $\textbf{\color{#35bf28}+15.63\%}$
test_compile_indexing[int-tensorclass-compile] 0.5053ms 97.9179μs 10.2126 KOps/s 9.2642 KOps/s $\textbf{\color{#35bf28}+10.24\%}$
test_compile_indexing[int-tensorclass-eager] 50.8610μs 15.9733μs 62.6044 KOps/s 55.4241 KOps/s $\textbf{\color{#35bf28}+12.96\%}$
test_compile_indexing[int-pytree-compile] 0.5065ms 97.7447μs 10.2307 KOps/s 9.2579 KOps/s $\textbf{\color{#35bf28}+10.51\%}$
test_compile_indexing[int-pytree-eager] 49.3310μs 15.8674μs 63.0221 KOps/s 55.8574 KOps/s $\textbf{\color{#35bf28}+12.83\%}$
test_mod_add[eager] 83.9610μs 37.1592μs 26.9112 KOps/s 22.6949 KOps/s $\textbf{\color{#35bf28}+18.58\%}$
test_mod_add[compile] 0.1314ms 79.1119μs 12.6403 KOps/s 11.1728 KOps/s $\textbf{\color{#35bf28}+13.13\%}$
test_mod_add[compile-overhead] 0.3185ms 0.1680ms 5.9537 KOps/s 5.7415 KOps/s $\color{#35bf28}+3.70\%$
test_mod_wrap[eager] 0.3485ms 0.2541ms 3.9362 KOps/s 3.5982 KOps/s $\textbf{\color{#35bf28}+9.39\%}$
test_mod_wrap[compile] 0.3583ms 0.2825ms 3.5403 KOps/s 3.3914 KOps/s $\color{#35bf28}+4.39\%$
test_mod_wrap[compile-overhead] 7.1035ms 3.7376ms 267.5484 Ops/s 284.0270 Ops/s $\textbf{\color{#d91a1a}-5.80\%}$
test_mod_wrap_and_backward[eager] 1.4628ms 1.3637ms 733.3218 Ops/s 685.6287 Ops/s $\textbf{\color{#35bf28}+6.96\%}$
test_mod_wrap_and_backward[compile] 1.4035ms 1.2697ms 787.5975 Ops/s 731.0145 Ops/s $\textbf{\color{#35bf28}+7.74\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3929ms 0.9451ms 1.0580 KOps/s 917.4492 Ops/s $\textbf{\color{#35bf28}+15.32\%}$
test_seq_add[eager] 0.1737ms 0.1144ms 8.7427 KOps/s 7.9733 KOps/s $\textbf{\color{#35bf28}+9.65\%}$
test_seq_add[compile] 0.2239ms 88.2074μs 11.3369 KOps/s 10.6016 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_seq_add[compile-overhead] 0.2220ms 0.1308ms 7.6478 KOps/s 7.4455 KOps/s $\color{#35bf28}+2.72\%$
test_seq_wrap[eager] 0.4757ms 0.4141ms 2.4150 KOps/s 2.2568 KOps/s $\textbf{\color{#35bf28}+7.01\%}$
test_seq_wrap[compile] 0.3512ms 0.2982ms 3.3532 KOps/s 3.1383 KOps/s $\textbf{\color{#35bf28}+6.85\%}$
test_seq_wrap[compile-overhead] 0.3031ms 0.2252ms 4.4396 KOps/s 4.4258 KOps/s $\color{#35bf28}+0.31\%$
test_func_call_runtime[False-eager] 0.8095ms 0.7347ms 1.3612 KOps/s 1.2808 KOps/s $\textbf{\color{#35bf28}+6.27\%}$
test_func_call_runtime[False-compile] 0.9993ms 0.7399ms 1.3515 KOps/s 1.2427 KOps/s $\textbf{\color{#35bf28}+8.75\%}$
test_func_call_runtime[False-compile-overhead] 0.4137ms 0.3639ms 2.7481 KOps/s 2.6134 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_func_call_runtime[True-eager] 0.9577ms 0.8982ms 1.1134 KOps/s 1.0099 KOps/s $\textbf{\color{#35bf28}+10.25\%}$
test_func_call_runtime[True-compile] 0.8223ms 0.7602ms 1.3154 KOps/s 1.3178 KOps/s $\color{#d91a1a}-0.18\%$
test_func_call_runtime[True-compile-overhead] 0.4333ms 0.3861ms 2.5903 KOps/s 2.5933 KOps/s $\color{#d91a1a}-0.12\%$
test_func_call_cm_runtime[False-eager] 0.7937ms 0.7291ms 1.3715 KOps/s 1.3189 KOps/s $\color{#35bf28}+3.99\%$
test_func_call_cm_runtime[False-compile] 0.7917ms 0.7450ms 1.3422 KOps/s 1.3419 KOps/s $\color{#35bf28}+0.03\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4233ms 0.3664ms 2.7293 KOps/s 2.7303 KOps/s $\color{#d91a1a}-0.04\%$
test_func_call_cm_runtime[True-eager] 1.1019ms 1.0022ms 997.8537 Ops/s 981.4942 Ops/s $\color{#35bf28}+1.67\%$
test_func_call_cm_runtime[True-compile] 0.8486ms 0.7868ms 1.2710 KOps/s 1.2595 KOps/s $\color{#35bf28}+0.91\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4643ms 0.4096ms 2.4417 KOps/s 2.4079 KOps/s $\color{#35bf28}+1.40\%$
test_vmap_func_call_cm_runtime[eager] 2.5653ms 2.0881ms 478.9064 Ops/s 471.5418 Ops/s $\color{#35bf28}+1.56\%$
test_vmap_func_call_cm_runtime[compile] 0.9146ms 0.8147ms 1.2274 KOps/s 1.2337 KOps/s $\color{#d91a1a}-0.51\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4895ms 0.4137ms 2.4171 KOps/s 2.4154 KOps/s $\color{#35bf28}+0.07\%$
test_distributed 3.2015ms 0.1859ms 5.3778 KOps/s 8.3218 KOps/s $\textbf{\color{#d91a1a}-35.38\%}$
test_tdmodule 43.9300μs 19.2564μs 51.9307 KOps/s 47.7854 KOps/s $\textbf{\color{#35bf28}+8.67\%}$
test_tdmodule_dispatch 58.5310μs 34.3470μs 29.1147 KOps/s 25.6213 KOps/s $\textbf{\color{#35bf28}+13.63\%}$
test_tdseq 38.9710μs 20.2790μs 49.3121 KOps/s 45.5595 KOps/s $\textbf{\color{#35bf28}+8.24\%}$
test_tdseq_dispatch 62.0210μs 37.4576μs 26.6969 KOps/s 24.6543 KOps/s $\textbf{\color{#35bf28}+8.28\%}$
test_instantiation_functorch 1.6488ms 1.5751ms 634.8923 Ops/s 636.5040 Ops/s $\color{#d91a1a}-0.25\%$
test_exec_functorch 0.2058ms 0.1481ms 6.7522 KOps/s 6.8860 KOps/s $\color{#d91a1a}-1.94\%$
test_exec_functional_call 0.1855ms 0.1394ms 7.1745 KOps/s 7.1819 KOps/s $\color{#d91a1a}-0.10\%$
test_exec_td_decorator 0.3850ms 0.1862ms 5.3712 KOps/s 5.3137 KOps/s $\color{#35bf28}+1.08\%$
test_vmap_mlp_speed_decorator[True-True] 0.7547ms 0.6865ms 1.4567 KOps/s 1.3735 KOps/s $\textbf{\color{#35bf28}+6.06\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8248ms 0.6827ms 1.4647 KOps/s 1.3662 KOps/s $\textbf{\color{#35bf28}+7.21\%}$
test_vmap_mlp_speed_decorator[False-True] 0.7093ms 0.5978ms 1.6729 KOps/s 1.5735 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7109ms 0.5958ms 1.6785 KOps/s 1.5676 KOps/s $\textbf{\color{#35bf28}+7.07\%}$
test_vmap_transformer_speed_decorator[True-True] 19.3633ms 19.1812ms 52.1343 Ops/s 51.4574 Ops/s $\color{#35bf28}+1.32\%$
test_vmap_transformer_speed_decorator[True-False] 19.9269ms 19.2612ms 51.9180 Ops/s 51.2986 Ops/s $\color{#35bf28}+1.21\%$
test_vmap_transformer_speed_decorator[False-True] 19.2654ms 19.1301ms 52.2736 Ops/s 51.4826 Ops/s $\color{#35bf28}+1.54\%$
test_vmap_transformer_speed_decorator[False-False] 19.3817ms 19.0823ms 52.4045 Ops/s 51.7839 Ops/s $\color{#35bf28}+1.20\%$
test_to_module_speed[True] 1.0657ms 0.9748ms 1.0258 KOps/s 1.0112 KOps/s $\color{#35bf28}+1.45\%$
test_to_module_speed[False] 1.3397ms 0.9604ms 1.0412 KOps/s 1.0360 KOps/s $\color{#35bf28}+0.50\%$
test_tc_init 61.0910μs 35.2047μs 28.4053 KOps/s 26.6634 KOps/s $\textbf{\color{#35bf28}+6.53\%}$
test_tc_init_nested 0.1128ms 69.5846μs 14.3710 KOps/s 13.4726 KOps/s $\textbf{\color{#35bf28}+6.67\%}$
test_tc_first_layer_tensor 20.7500μs 0.8281μs 1.2076 MOps/s 1.2239 MOps/s $\color{#d91a1a}-1.33\%$
test_tc_first_layer_nontensor 27.8310μs 2.3162μs 431.7339 KOps/s 432.8706 KOps/s $\color{#d91a1a}-0.26\%$
test_tc_second_layer_tensor 9.1653μs 1.4424μs 693.2900 KOps/s 682.9727 KOps/s $\color{#35bf28}+1.51\%$
test_tc_second_layer_nontensor 25.9700μs 3.0829μs 324.3676 KOps/s 323.9778 KOps/s $\color{#35bf28}+0.12\%$
test_unbind 0.2430s 10.3176ms 96.9216 Ops/s 141.7124 Ops/s $\textbf{\color{#d91a1a}-31.61\%}$
test_full_like 12.1733ms 9.2502ms 108.1055 Ops/s 106.6248 Ops/s $\color{#35bf28}+1.39\%$
test_zeros_like 6.0543ms 4.3367ms 230.5900 Ops/s 230.3083 Ops/s $\color{#35bf28}+0.12\%$
test_ones_like 5.0743ms 4.4411ms 225.1701 Ops/s 225.5189 Ops/s $\color{#d91a1a}-0.15\%$
test_clone 11.4594ms 9.2436ms 108.1829 Ops/s 153.6019 Ops/s $\textbf{\color{#d91a1a}-29.57\%}$
test_squeeze 62.8310μs 9.8033μs 102.0069 KOps/s 103.6975 KOps/s $\color{#d91a1a}-1.63\%$
test_unsqueeze 0.1233ms 74.1847μs 13.4799 KOps/s 13.4128 KOps/s $\color{#35bf28}+0.50\%$
test_split 0.3816ms 0.1629ms 6.1382 KOps/s 5.9653 KOps/s $\color{#35bf28}+2.90\%$
test_permute 0.2197ms 0.1759ms 5.6855 KOps/s 5.3312 KOps/s $\textbf{\color{#35bf28}+6.65\%}$
test_stack 51.3354ms 51.0684ms 19.5816 Ops/s 19.5028 Ops/s $\color{#35bf28}+0.40\%$
test_cat 52.8748ms 51.7185ms 19.3354 Ops/s 19.2463 Ops/s $\color{#35bf28}+0.46\%$

@vmoens vmoens merged commit 4f26dc7 into gh/vmoens/43/base Jan 7, 2025
51 of 55 checks passed
vmoens added a commit that referenced this pull request Jan 7, 2025
ghstack-source-id: a88bebc23e6aaa02ec297db72dbda68ec9628ce7
Pull Request resolved: #1163
@vmoens vmoens deleted the gh/vmoens/43/head branch January 7, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants