Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Fix versioning of h2d tests #1053

Merged
merged 1 commit into from
Oct 21, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 21, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 21, 2024
ghstack-source-id: faa425ca71953c1627690e08cf691683f78694d3
Pull Request resolved: #1053
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 21, 2024
@vmoens vmoens merged commit a6b14d2 into gh/vmoens/31/base Oct 21, 2024
11 of 24 checks passed
vmoens added a commit that referenced this pull request Oct 21, 2024
ghstack-source-id: faa425ca71953c1627690e08cf691683f78694d3
Pull Request resolved: #1053
@vmoens vmoens deleted the gh/vmoens/31/head branch October 21, 2024 14:48
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}28$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 46.9390μs 23.6279μs 42.3228 KOps/s 41.2818 KOps/s $\color{#35bf28}+2.52\%$
test_plain_set_stack_nested 56.9770μs 23.6974μs 42.1988 KOps/s 39.9348 KOps/s $\textbf{\color{#35bf28}+5.67\%}$
test_plain_set_nested_inplace 73.6180μs 26.2176μs 38.1423 KOps/s 37.4260 KOps/s $\color{#35bf28}+1.91\%$
test_plain_set_stack_nested_inplace 62.9790μs 25.7617μs 38.8173 KOps/s 37.3541 KOps/s $\color{#35bf28}+3.92\%$
test_items 34.6850μs 4.1686μs 239.8901 KOps/s 237.5288 KOps/s $\color{#35bf28}+0.99\%$
test_items_nested 0.5604ms 0.3861ms 2.5898 KOps/s 2.6132 KOps/s $\color{#d91a1a}-0.90\%$
test_items_nested_locked 0.5767ms 0.3851ms 2.5965 KOps/s 2.6160 KOps/s $\color{#d91a1a}-0.74\%$
test_items_nested_leaf 0.1428ms 80.3140μs 12.4511 KOps/s 12.4472 KOps/s $\color{#35bf28}+0.03\%$
test_items_stack_nested 0.5645ms 0.3862ms 2.5896 KOps/s 2.6224 KOps/s $\color{#d91a1a}-1.25\%$
test_items_stack_nested_leaf 0.1455ms 85.1533μs 11.7435 KOps/s 12.0034 KOps/s $\color{#d91a1a}-2.17\%$
test_items_stack_nested_locked 0.6271ms 0.3896ms 2.5669 KOps/s 2.6048 KOps/s $\color{#d91a1a}-1.46\%$
test_keys 23.0430μs 3.4925μs 286.3257 KOps/s 279.4766 KOps/s $\color{#35bf28}+2.45\%$
test_keys_nested 0.2091ms 0.1349ms 7.4110 KOps/s 7.5261 KOps/s $\color{#d91a1a}-1.53\%$
test_keys_nested_locked 0.7247ms 0.1400ms 7.1420 KOps/s 7.2162 KOps/s $\color{#d91a1a}-1.03\%$
test_keys_nested_leaf 0.1970ms 0.1170ms 8.5453 KOps/s 8.6627 KOps/s $\color{#d91a1a}-1.36\%$
test_keys_stack_nested 0.2190ms 0.1343ms 7.4464 KOps/s 7.4953 KOps/s $\color{#d91a1a}-0.65\%$
test_keys_stack_nested_leaf 0.1967ms 0.1170ms 8.5491 KOps/s 8.6418 KOps/s $\color{#d91a1a}-1.07\%$
test_keys_stack_nested_locked 0.7036ms 0.1446ms 6.9132 KOps/s 7.2226 KOps/s $\color{#d91a1a}-4.28\%$
test_values 5.7468μs 1.0370μs 964.3601 KOps/s 937.2274 KOps/s $\color{#35bf28}+2.89\%$
test_values_nested 0.1508ms 91.1685μs 10.9687 KOps/s 10.9409 KOps/s $\color{#35bf28}+0.25\%$
test_values_nested_locked 0.1524ms 93.0569μs 10.7461 KOps/s 10.4931 KOps/s $\color{#35bf28}+2.41\%$
test_values_nested_leaf 0.4316ms 79.5622μs 12.5688 KOps/s 12.6096 KOps/s $\color{#d91a1a}-0.32\%$
test_values_stack_nested 0.3035ms 93.5335μs 10.6914 KOps/s 10.8537 KOps/s $\color{#d91a1a}-1.50\%$
test_values_stack_nested_leaf 0.1351ms 78.9763μs 12.6620 KOps/s 12.5554 KOps/s $\color{#35bf28}+0.85\%$
test_values_stack_nested_locked 0.1652ms 94.4981μs 10.5822 KOps/s 10.8132 KOps/s $\color{#d91a1a}-2.14\%$
test_membership 5.8510μs 0.7536μs 1.3270 MOps/s 1.1027 MOps/s $\textbf{\color{#35bf28}+20.33\%}$
test_membership_nested 36.4380μs 2.6587μs 376.1219 KOps/s 360.1910 KOps/s $\color{#35bf28}+4.42\%$
test_membership_nested_leaf 34.8050μs 2.6859μs 372.3099 KOps/s 360.6299 KOps/s $\color{#35bf28}+3.24\%$
test_membership_stacked_nested 30.2470μs 2.6908μs 371.6338 KOps/s 343.6529 KOps/s $\textbf{\color{#35bf28}+8.14\%}$
test_membership_stacked_nested_leaf 24.2060μs 2.6765μs 373.6245 KOps/s 361.4210 KOps/s $\color{#35bf28}+3.38\%$
test_membership_nested_last 43.7820μs 4.0633μs 246.1072 KOps/s 241.8838 KOps/s $\color{#35bf28}+1.75\%$
test_membership_nested_leaf_last 43.6620μs 4.1139μs 243.0798 KOps/s 240.8320 KOps/s $\color{#35bf28}+0.93\%$
test_membership_stacked_nested_last 45.5860μs 4.7876μs 208.8723 KOps/s 241.3892 KOps/s $\textbf{\color{#d91a1a}-13.47\%}$
test_membership_stacked_nested_leaf_last 43.7620μs 4.7820μs 209.1175 KOps/s 241.0139 KOps/s $\textbf{\color{#d91a1a}-13.23\%}$
test_nested_getleaf 50.4040μs 10.4304μs 95.8736 KOps/s 93.5404 KOps/s $\color{#35bf28}+2.49\%$
test_nested_get 48.4110μs 10.0610μs 99.3937 KOps/s 98.8912 KOps/s $\color{#35bf28}+0.51\%$
test_stacked_getleaf 49.4430μs 10.3809μs 96.3309 KOps/s 93.5745 KOps/s $\color{#35bf28}+2.95\%$
test_stacked_get 0.1128ms 9.6346μs 103.7925 KOps/s 94.6679 KOps/s $\textbf{\color{#35bf28}+9.64\%}$
test_nested_getitemleaf 56.5360μs 10.8385μs 92.2635 KOps/s 88.4635 KOps/s $\color{#35bf28}+4.30\%$
test_nested_getitem 51.5970μs 10.1359μs 98.6590 KOps/s 97.9910 KOps/s $\color{#35bf28}+0.68\%$
test_stacked_getitemleaf 54.0420μs 10.6022μs 94.3202 KOps/s 92.8867 KOps/s $\color{#35bf28}+1.54\%$
test_stacked_getitem 36.7390μs 10.1859μs 98.1754 KOps/s 99.6604 KOps/s $\color{#d91a1a}-1.49\%$
test_lock_nested 1.0322ms 0.5110ms 1.9571 KOps/s 1.9645 KOps/s $\color{#d91a1a}-0.38\%$
test_lock_stack_nested 0.8296ms 0.4776ms 2.0940 KOps/s 2.0824 KOps/s $\color{#35bf28}+0.55\%$
test_unlock_nested 0.1106s 0.5346ms 1.8706 KOps/s 2.3340 KOps/s $\textbf{\color{#d91a1a}-19.85\%}$
test_unlock_stack_nested 0.6035ms 0.3889ms 2.5715 KOps/s 2.5276 KOps/s $\color{#35bf28}+1.74\%$
test_flatten_speed 0.2009ms 99.2321μs 10.0774 KOps/s 9.9764 KOps/s $\color{#35bf28}+1.01\%$
test_unflatten_speed 0.7462ms 0.5107ms 1.9580 KOps/s 1.9126 KOps/s $\color{#35bf28}+2.37\%$
test_common_ops 2.1454ms 1.1163ms 895.7819 Ops/s 841.7870 Ops/s $\textbf{\color{#35bf28}+6.41\%}$
test_creation 23.1240μs 2.0912μs 478.2043 KOps/s 483.6262 KOps/s $\color{#d91a1a}-1.12\%$
test_creation_empty 61.6750μs 18.0443μs 55.4192 KOps/s 50.3281 KOps/s $\textbf{\color{#35bf28}+10.12\%}$
test_creation_nested_1 0.1054ms 20.8663μs 47.9243 KOps/s 42.0775 KOps/s $\textbf{\color{#35bf28}+13.90\%}$
test_creation_nested_2 58.5400μs 24.9688μs 40.0500 KOps/s 36.5383 KOps/s $\textbf{\color{#35bf28}+9.61\%}$
test_clone 99.3060μs 17.1778μs 58.2147 KOps/s 56.5754 KOps/s $\color{#35bf28}+2.90\%$
test_getitem[int] 1.0399ms 16.7982μs 59.5301 KOps/s 59.7612 KOps/s $\color{#d91a1a}-0.39\%$
test_getitem[slice_int] 0.1573ms 30.4869μs 32.8010 KOps/s 32.6182 KOps/s $\color{#35bf28}+0.56\%$
test_getitem[range] 0.2630ms 58.5822μs 17.0700 KOps/s 17.4645 KOps/s $\color{#d91a1a}-2.26\%$
test_getitem[tuple] 0.1547ms 25.0048μs 39.9923 KOps/s 40.1407 KOps/s $\color{#d91a1a}-0.37\%$
test_getitem[list] 0.3312ms 53.0723μs 18.8422 KOps/s 18.6258 KOps/s $\color{#35bf28}+1.16\%$
test_setitem_dim[int] 76.0730μs 33.5397μs 29.8154 KOps/s 30.1999 KOps/s $\color{#d91a1a}-1.27\%$
test_setitem_dim[slice_int] 0.1176ms 62.0797μs 16.1083 KOps/s 16.0799 KOps/s $\color{#35bf28}+0.18\%$
test_setitem_dim[range] 0.1568ms 84.9165μs 11.7763 KOps/s 11.6392 KOps/s $\color{#35bf28}+1.18\%$
test_setitem_dim[tuple] 92.0830μs 50.0111μs 19.9956 KOps/s 19.7413 KOps/s $\color{#35bf28}+1.29\%$
test_setitem 0.1562ms 29.5313μs 33.8624 KOps/s 31.7701 KOps/s $\textbf{\color{#35bf28}+6.59\%}$
test_set 0.1900ms 29.5211μs 33.8741 KOps/s 32.0488 KOps/s $\textbf{\color{#35bf28}+5.70\%}$
test_set_shared 1.2453ms 0.2204ms 4.5375 KOps/s 4.4596 KOps/s $\color{#35bf28}+1.74\%$
test_update 0.2398ms 37.6829μs 26.5372 KOps/s 25.5470 KOps/s $\color{#35bf28}+3.88\%$
test_update_nested 0.2876ms 48.8342μs 20.4774 KOps/s 19.8093 KOps/s $\color{#35bf28}+3.37\%$
test_update__nested 0.4169ms 45.0172μs 22.2137 KOps/s 22.3314 KOps/s $\color{#d91a1a}-0.53\%$
test_set_nested 0.1804ms 32.6431μs 30.6343 KOps/s 29.6149 KOps/s $\color{#35bf28}+3.44\%$
test_set_nested_new 0.2128ms 36.9785μs 27.0427 KOps/s 25.6445 KOps/s $\textbf{\color{#35bf28}+5.45\%}$
test_select 0.2067ms 54.9565μs 18.1962 KOps/s 17.4987 KOps/s $\color{#35bf28}+3.99\%$
test_select_nested 0.1380ms 59.0657μs 16.9303 KOps/s 16.6686 KOps/s $\color{#35bf28}+1.57\%$
test_exclude_nested 0.1374ms 74.5638μs 13.4113 KOps/s 13.4129 KOps/s $\color{#d91a1a}-0.01\%$
test_empty[True] 0.6494ms 0.3532ms 2.8313 KOps/s 2.8345 KOps/s $\color{#d91a1a}-0.11\%$
test_empty[False] 46.3998μs 1.2688μs 788.1453 KOps/s 785.2253 KOps/s $\color{#35bf28}+0.37\%$
test_unbind_speed 0.4328ms 0.3064ms 3.2635 KOps/s 3.2434 KOps/s $\color{#35bf28}+0.62\%$
test_unbind_speed_stack0 0.6053ms 0.3008ms 3.3244 KOps/s 3.3205 KOps/s $\color{#35bf28}+0.12\%$
test_unbind_speed_stack1 0.1289s 0.8374ms 1.1942 KOps/s 1.3047 KOps/s $\textbf{\color{#d91a1a}-8.48\%}$
test_split 0.1150s 2.2316ms 448.1117 Ops/s 456.2462 Ops/s $\color{#d91a1a}-1.78\%$
test_chunk 0.1102s 2.2139ms 451.6917 Ops/s 455.7799 Ops/s $\color{#d91a1a}-0.90\%$
test_creation[device0] 0.2299ms 0.1164ms 8.5938 KOps/s 8.6382 KOps/s $\color{#d91a1a}-0.51\%$
test_creation_from_tensor 4.2130ms 0.1199ms 8.3372 KOps/s 8.4770 KOps/s $\color{#d91a1a}-1.65\%$
test_add_one[memmap_tensor0] 0.2010ms 7.3096μs 136.8058 KOps/s 141.4598 KOps/s $\color{#d91a1a}-3.29\%$
test_contiguous[memmap_tensor0] 26.9100μs 1.9759μs 506.1077 KOps/s 512.6756 KOps/s $\color{#d91a1a}-1.28\%$
test_stack[memmap_tensor0] 40.2750μs 5.5712μs 179.4930 KOps/s 175.4368 KOps/s $\color{#35bf28}+2.31\%$
test_memmaptd_index 1.1136ms 0.4088ms 2.4460 KOps/s 2.4275 KOps/s $\color{#35bf28}+0.76\%$
test_memmaptd_index_astensor 1.3701ms 0.5202ms 1.9222 KOps/s 1.9398 KOps/s $\color{#d91a1a}-0.90\%$
test_memmaptd_index_op 1.5016ms 1.0444ms 957.5197 Ops/s 930.6575 Ops/s $\color{#35bf28}+2.89\%$
test_serialize_model 0.1290s 0.1196s 8.3625 Ops/s 8.3196 Ops/s $\color{#35bf28}+0.52\%$
test_serialize_model_pickle 0.4585s 0.3973s 2.5173 Ops/s 2.5382 Ops/s $\color{#d91a1a}-0.82\%$
test_serialize_weights 0.2334s 0.1320s 7.5742 Ops/s 7.2399 Ops/s $\color{#35bf28}+4.62\%$
test_serialize_weights_returnearly 0.1756s 0.1631s 6.1321 Ops/s 6.4680 Ops/s $\textbf{\color{#d91a1a}-5.19\%}$
test_serialize_weights_pickle 1.2035s 0.7554s 1.3238 Ops/s 1.0954 Ops/s $\textbf{\color{#35bf28}+20.84\%}$
test_serialize_weights_filesystem 0.1467s 0.1411s 7.0888 Ops/s 7.0785 Ops/s $\color{#35bf28}+0.15\%$
test_serialize_model_filesystem 0.1566s 0.1487s 6.7243 Ops/s 6.1856 Ops/s $\textbf{\color{#35bf28}+8.71\%}$
test_reshape_pytree 98.7050μs 39.3000μs 25.4453 KOps/s 25.4078 KOps/s $\color{#35bf28}+0.15\%$
test_reshape_td 0.1117ms 45.5138μs 21.9714 KOps/s 21.3650 KOps/s $\color{#35bf28}+2.84\%$
test_view_pytree 84.3080μs 39.1358μs 25.5521 KOps/s 25.2680 KOps/s $\color{#35bf28}+1.12\%$
test_view_td 0.1056ms 50.6164μs 19.7565 KOps/s 18.0069 KOps/s $\textbf{\color{#35bf28}+9.72\%}$
test_unbind_pytree 83.8980μs 35.9208μs 27.8390 KOps/s 27.7995 KOps/s $\color{#35bf28}+0.14\%$
test_unbind_td 0.3243ms 44.1497μs 22.6502 KOps/s 21.9851 KOps/s $\color{#35bf28}+3.03\%$
test_split_pytree 0.1085ms 38.3103μs 26.1027 KOps/s 26.2822 KOps/s $\color{#d91a1a}-0.68\%$
test_split_td 0.4703ms 57.6410μs 17.3488 KOps/s 17.7514 KOps/s $\color{#d91a1a}-2.27\%$
test_add_pytree 0.1069ms 44.5589μs 22.4422 KOps/s 22.0333 KOps/s $\color{#35bf28}+1.86\%$
test_add_td 0.8010ms 91.3611μs 10.9456 KOps/s 11.0920 KOps/s $\color{#d91a1a}-1.32\%$
test_compile_add_one_nested[tensordict-compile] 0.1626ms 75.2363μs 13.2915 KOps/s 13.7245 KOps/s $\color{#d91a1a}-3.16\%$
test_compile_add_one_nested[tensordict-eager] 1.5411ms 0.2035ms 4.9128 KOps/s 4.9244 KOps/s $\color{#d91a1a}-0.24\%$
test_compile_add_one_nested[pytree-compile] 0.2947ms 55.8295μs 17.9117 KOps/s 18.3606 KOps/s $\color{#d91a1a}-2.45\%$
test_compile_add_one_nested[pytree-eager] 0.3631ms 0.1427ms 7.0057 KOps/s 6.9159 KOps/s $\color{#35bf28}+1.30\%$
test_compile_copy_nested[tensordict-compile] 88.4260μs 27.9198μs 35.8169 KOps/s 36.4685 KOps/s $\color{#d91a1a}-1.79\%$
test_compile_copy_nested[tensordict-eager] 0.1654ms 75.9353μs 13.1691 KOps/s 13.2451 KOps/s $\color{#d91a1a}-0.57\%$
test_compile_copy_nested[pytree-compile] 0.1293ms 78.1170μs 12.8013 KOps/s 12.7313 KOps/s $\color{#35bf28}+0.55\%$
test_compile_copy_nested[pytree-eager] 0.1366ms 67.3124μs 14.8561 KOps/s 14.4875 KOps/s $\color{#35bf28}+2.54\%$
test_compile_add_one_flat[tensordict-compile] 0.1954ms 0.1220ms 8.1995 KOps/s 8.0184 KOps/s $\color{#35bf28}+2.26\%$
test_compile_add_one_flat[tensordict-eager] 0.4010ms 0.2449ms 4.0825 KOps/s 3.9850 KOps/s $\color{#35bf28}+2.45\%$
test_compile_add_one_flat[tensorclass-compile] 0.1902ms 54.8011μs 18.2478 KOps/s 18.3188 KOps/s $\color{#d91a1a}-0.39\%$
test_compile_add_one_flat[tensorclass-eager] 0.1645ms 78.4376μs 12.7490 KOps/s 12.5464 KOps/s $\color{#35bf28}+1.61\%$
test_compile_add_one_flat[pytree-compile] 0.1853ms 0.1136ms 8.8050 KOps/s 8.8779 KOps/s $\color{#d91a1a}-0.82\%$
test_compile_add_one_flat[pytree-eager] 0.5471ms 0.2986ms 3.3487 KOps/s 3.3682 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_add_self_flat[tensordict-eager] 0.7901ms 0.2782ms 3.5942 KOps/s 3.5515 KOps/s $\color{#35bf28}+1.20\%$
test_compile_add_self_flat[tensordict-compile] 0.2958ms 0.1340ms 7.4646 KOps/s 8.0961 KOps/s $\textbf{\color{#d91a1a}-7.80\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1498ms 75.5291μs 13.2399 KOps/s 13.4444 KOps/s $\color{#d91a1a}-1.52\%$
test_compile_add_self_flat[tensorclass-compile] 0.1459ms 55.8810μs 17.8952 KOps/s 18.0273 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_add_self_flat[pytree-eager] 0.4652ms 0.2420ms 4.1325 KOps/s 4.1492 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_add_self_flat[pytree-compile] 0.2148ms 0.1166ms 8.5785 KOps/s 8.8089 KOps/s $\color{#d91a1a}-2.62\%$
test_compile_copy_flat[tensordict-compile] 82.7560μs 30.3410μs 32.9587 KOps/s 31.6777 KOps/s $\color{#35bf28}+4.04\%$
test_compile_copy_flat[tensordict-eager] 0.1690ms 77.5547μs 12.8941 KOps/s 12.8540 KOps/s $\color{#35bf28}+0.31\%$
test_compile_copy_flat[pytree-compile] 0.1731ms 80.9861μs 12.3478 KOps/s 12.1779 KOps/s $\color{#35bf28}+1.40\%$
test_compile_copy_flat[pytree-eager] 0.1295ms 68.0206μs 14.7014 KOps/s 14.3839 KOps/s $\color{#35bf28}+2.21\%$
test_compile_assign_and_add[tensordict-compile] 0.3970ms 0.2173ms 4.6023 KOps/s 4.5527 KOps/s $\color{#35bf28}+1.09\%$
test_compile_assign_and_add[tensordict-eager] 3.0435ms 1.7810ms 561.4872 Ops/s 556.6438 Ops/s $\color{#35bf28}+0.87\%$
test_compile_assign_and_add[pytree-compile] 0.4704ms 0.2169ms 4.6098 KOps/s 4.7350 KOps/s $\color{#d91a1a}-2.64\%$
test_compile_assign_and_add[pytree-eager] 1.4054ms 1.1451ms 873.2995 Ops/s 868.4884 Ops/s $\color{#35bf28}+0.55\%$
test_compile_assign_and_add_stack[compile] 0.9285ms 0.4655ms 2.1481 KOps/s 2.1432 KOps/s $\color{#35bf28}+0.23\%$
test_compile_assign_and_add_stack[eager] 5.8285ms 4.0695ms 245.7317 Ops/s 238.6062 Ops/s $\color{#35bf28}+2.99\%$
test_compile_indexing[tensor-tensordict-compile] 0.1116ms 43.6714μs 22.8983 KOps/s 22.7612 KOps/s $\color{#35bf28}+0.60\%$
test_compile_indexing[tensor-tensordict-eager] 0.7479ms 50.6046μs 19.7610 KOps/s 19.6845 KOps/s $\color{#35bf28}+0.39\%$
test_compile_indexing[tensor-tensorclass-compile] 95.7290μs 37.7591μs 26.4837 KOps/s 26.1739 KOps/s $\color{#35bf28}+1.18\%$
test_compile_indexing[tensor-tensorclass-eager] 85.1100μs 29.6874μs 33.6843 KOps/s 33.8441 KOps/s $\color{#d91a1a}-0.47\%$
test_compile_indexing[tensor-pytree-compile] 86.5220μs 38.1506μs 26.2119 KOps/s 26.0211 KOps/s $\color{#35bf28}+0.73\%$
test_compile_indexing[tensor-pytree-eager] 0.1054ms 29.5533μs 33.8371 KOps/s 33.9847 KOps/s $\color{#d91a1a}-0.43\%$
test_compile_indexing[slice-tensordict-compile] 0.1891ms 76.3071μs 13.1049 KOps/s 12.6802 KOps/s $\color{#35bf28}+3.35\%$
test_compile_indexing[slice-tensordict-eager] 0.5508ms 28.9933μs 34.4907 KOps/s 33.8550 KOps/s $\color{#35bf28}+1.88\%$
test_compile_indexing[slice-tensorclass-compile] 0.1740ms 70.9424μs 14.0960 KOps/s 14.1397 KOps/s $\color{#d91a1a}-0.31\%$
test_compile_indexing[slice-tensorclass-eager] 66.1240μs 23.7015μs 42.1915 KOps/s 42.7462 KOps/s $\color{#d91a1a}-1.30\%$
test_compile_indexing[slice-pytree-compile] 0.1420ms 70.2436μs 14.2362 KOps/s 14.0408 KOps/s $\color{#35bf28}+1.39\%$
test_compile_indexing[slice-pytree-eager] 85.1800μs 23.5054μs 42.5435 KOps/s 42.6301 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_indexing[int-tensordict-compile] 0.1985ms 77.7630μs 12.8596 KOps/s 12.7903 KOps/s $\color{#35bf28}+0.54\%$
test_compile_indexing[int-tensordict-eager] 0.9221ms 28.8483μs 34.6641 KOps/s 35.4650 KOps/s $\color{#d91a1a}-2.26\%$
test_compile_indexing[int-tensorclass-compile] 0.1606ms 70.2658μs 14.2317 KOps/s 13.9942 KOps/s $\color{#35bf28}+1.70\%$
test_compile_indexing[int-tensorclass-eager] 92.9740μs 23.5023μs 42.5490 KOps/s 42.8367 KOps/s $\color{#d91a1a}-0.67\%$
test_compile_indexing[int-pytree-compile] 0.1900ms 71.4839μs 13.9892 KOps/s 14.1065 KOps/s $\color{#d91a1a}-0.83\%$
test_compile_indexing[int-pytree-eager] 64.4810μs 23.4984μs 42.5561 KOps/s 42.6244 KOps/s $\color{#d91a1a}-0.16\%$
test_mod_add[eager] 73.4680μs 24.7026μs 40.4816 KOps/s 38.0321 KOps/s $\textbf{\color{#35bf28}+6.44\%}$
test_mod_add[compile] 0.1647ms 47.3095μs 21.1374 KOps/s 21.8157 KOps/s $\color{#d91a1a}-3.11\%$
test_mod_add[compile-overhead] 0.1008ms 44.7921μs 22.3253 KOps/s 22.2015 KOps/s $\color{#35bf28}+0.56\%$
test_mod_wrap[eager] 0.4059ms 0.2097ms 4.7698 KOps/s 4.6368 KOps/s $\color{#35bf28}+2.87\%$
test_mod_wrap[compile] 2.0262ms 0.2034ms 4.9157 KOps/s 4.9628 KOps/s $\color{#d91a1a}-0.95\%$
test_mod_wrap[compile-overhead] 2.1934ms 0.2019ms 4.9523 KOps/s 4.6963 KOps/s $\textbf{\color{#35bf28}+5.45\%}$
test_mod_wrap_and_backward[eager] 12.5021ms 11.1059ms 90.0420 Ops/s 81.0831 Ops/s $\textbf{\color{#35bf28}+11.05\%}$
test_mod_wrap_and_backward[compile] 12.0482ms 11.0200ms 90.7438 Ops/s 70.5321 Ops/s $\textbf{\color{#35bf28}+28.66\%}$
test_mod_wrap_and_backward[compile-overhead] 11.7131ms 10.9550ms 91.2825 Ops/s 75.6337 Ops/s $\textbf{\color{#35bf28}+20.69\%}$
test_seq_add[eager] 0.1764ms 88.6852μs 11.2758 KOps/s 10.2518 KOps/s $\textbf{\color{#35bf28}+9.99\%}$
test_seq_add[compile] 0.1360ms 58.3409μs 17.1406 KOps/s 16.5908 KOps/s $\color{#35bf28}+3.31\%$
test_seq_add[compile-overhead] 0.1616ms 56.7417μs 17.6237 KOps/s 16.6815 KOps/s $\textbf{\color{#35bf28}+5.65\%}$
test_seq_wrap[eager] 0.6430ms 0.3812ms 2.6235 KOps/s 2.4952 KOps/s $\textbf{\color{#35bf28}+5.14\%}$
test_seq_wrap[compile] 0.3732ms 0.2204ms 4.5381 KOps/s 4.4000 KOps/s $\color{#35bf28}+3.14\%$
test_seq_wrap[compile-overhead] 0.3923ms 0.2214ms 4.5168 KOps/s 4.4317 KOps/s $\color{#35bf28}+1.92\%$
test_func_call_runtime[False-eager] 1.8558ms 0.5572ms 1.7946 KOps/s 1.8336 KOps/s $\color{#d91a1a}-2.13\%$
test_func_call_runtime[False-compile] 0.5476ms 0.4224ms 2.3672 KOps/s 2.3734 KOps/s $\color{#d91a1a}-0.26\%$
test_func_call_runtime[False-compile-overhead] 0.5662ms 0.4229ms 2.3648 KOps/s 2.3743 KOps/s $\color{#d91a1a}-0.40\%$
test_func_call_runtime[True-eager] 2.5148ms 0.7770ms 1.2870 KOps/s 1.3303 KOps/s $\color{#d91a1a}-3.25\%$
test_func_call_runtime[True-compile] 1.3026ms 0.4779ms 2.0925 KOps/s 2.1786 KOps/s $\color{#d91a1a}-3.96\%$
test_func_call_runtime[True-compile-overhead] 0.6509ms 0.4571ms 2.1879 KOps/s 2.1691 KOps/s $\color{#35bf28}+0.87\%$
test_func_call_cm_runtime[False-eager] 0.7781ms 0.5377ms 1.8599 KOps/s 1.8387 KOps/s $\color{#35bf28}+1.15\%$
test_func_call_cm_runtime[False-compile] 1.0118ms 0.4221ms 2.3691 KOps/s 2.3719 KOps/s $\color{#d91a1a}-0.12\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5533ms 0.4177ms 2.3940 KOps/s 2.3179 KOps/s $\color{#35bf28}+3.29\%$
test_func_call_cm_runtime[True-eager] 2.1243ms 0.8963ms 1.1156 KOps/s 1.1082 KOps/s $\color{#35bf28}+0.67\%$
test_func_call_cm_runtime[True-compile] 0.6873ms 0.4865ms 2.0555 KOps/s 2.0792 KOps/s $\color{#d91a1a}-1.14\%$
test_func_call_cm_runtime[True-compile-overhead] 0.8853ms 0.4892ms 2.0441 KOps/s 2.0474 KOps/s $\color{#d91a1a}-0.16\%$
test_vmap_func_call_cm_runtime[eager] 2.4872ms 1.9011ms 526.0050 Ops/s 520.5415 Ops/s $\color{#35bf28}+1.05\%$
test_vmap_func_call_cm_runtime[compile] 1.0115ms 0.5122ms 1.9524 KOps/s 1.9413 KOps/s $\color{#35bf28}+0.57\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.8683ms 0.5199ms 1.9234 KOps/s 1.9491 KOps/s $\color{#d91a1a}-1.32\%$
test_distributed 0.3658ms 0.1268ms 7.8842 KOps/s 7.8003 KOps/s $\color{#35bf28}+1.08\%$
test_tdmodule 60.8040μs 17.1834μs 58.1958 KOps/s 51.2055 KOps/s $\textbf{\color{#35bf28}+13.65\%}$
test_tdmodule_dispatch 0.2600ms 44.0987μs 22.6764 KOps/s 25.9207 KOps/s $\textbf{\color{#d91a1a}-12.52\%}$
test_tdseq 47.3290μs 20.2616μs 49.3545 KOps/s 46.3431 KOps/s $\textbf{\color{#35bf28}+6.50\%}$
test_tdseq_dispatch 77.7750μs 39.8466μs 25.0963 KOps/s 23.2221 KOps/s $\textbf{\color{#35bf28}+8.07\%}$
test_instantiation_functorch 2.3749ms 1.5339ms 651.9119 Ops/s 662.6702 Ops/s $\color{#d91a1a}-1.62\%$
test_exec_functorch 0.3149ms 0.1764ms 5.6689 KOps/s 5.6879 KOps/s $\color{#d91a1a}-0.33\%$
test_exec_functional_call 0.3435ms 0.1724ms 5.7994 KOps/s 5.8879 KOps/s $\color{#d91a1a}-1.50\%$
test_exec_td_decorator 0.5542ms 0.2328ms 4.2959 KOps/s 4.2761 KOps/s $\color{#35bf28}+0.46\%$
test_vmap_mlp_speed_decorator[True-True] 1.1661ms 0.6548ms 1.5272 KOps/s 1.5392 KOps/s $\color{#d91a1a}-0.78\%$
test_vmap_mlp_speed_decorator[True-False] 0.8819ms 0.6423ms 1.5570 KOps/s 1.5184 KOps/s $\color{#35bf28}+2.54\%$
test_vmap_mlp_speed_decorator[False-True] 0.9399ms 0.5325ms 1.8779 KOps/s 1.8774 KOps/s $\color{#35bf28}+0.03\%$
test_vmap_mlp_speed_decorator[False-False] 1.6308ms 0.5443ms 1.8371 KOps/s 1.8654 KOps/s $\color{#d91a1a}-1.52\%$
test_to_module_speed[True] 2.6032ms 1.3692ms 730.3699 Ops/s 722.6466 Ops/s $\color{#35bf28}+1.07\%$
test_to_module_speed[False] 2.1405ms 1.3424ms 744.9453 Ops/s 741.9549 Ops/s $\color{#35bf28}+0.40\%$
test_tc_init 95.8500μs 43.9382μs 22.7593 KOps/s 20.8348 KOps/s $\textbf{\color{#35bf28}+9.24\%}$
test_tc_init_nested 0.2046ms 89.5608μs 11.1656 KOps/s 10.5264 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_tc_first_layer_tensor 34.1940μs 1.4845μs 673.6149 KOps/s 661.5947 KOps/s $\color{#35bf28}+1.82\%$
test_tc_first_layer_nontensor 75.8970μs 4.6294μs 216.0084 KOps/s 218.1633 KOps/s $\color{#d91a1a}-0.99\%$
test_tc_second_layer_tensor 46.3380μs 2.7124μs 368.6721 KOps/s 350.7972 KOps/s $\textbf{\color{#35bf28}+5.10\%}$
test_tc_second_layer_nontensor 48.6810μs 5.8604μs 170.6378 KOps/s 166.2498 KOps/s $\color{#35bf28}+2.64\%$
test_unbind 0.2660s 14.5643ms 68.6609 Ops/s 66.6895 Ops/s $\color{#35bf28}+2.96\%$
test_full_like 11.9706ms 8.5562ms 116.8745 Ops/s 120.2378 Ops/s $\color{#d91a1a}-2.80\%$
test_zeros_like 4.6200ms 3.3301ms 300.2891 Ops/s 314.6321 Ops/s $\color{#d91a1a}-4.56\%$
test_ones_like 4.5893ms 3.5573ms 281.1147 Ops/s 284.1186 Ops/s $\color{#d91a1a}-1.06\%$
test_clone 8.6506ms 6.2059ms 161.1364 Ops/s 171.4363 Ops/s $\textbf{\color{#d91a1a}-6.01\%}$
test_squeeze 66.5240μs 12.6762μs 78.8881 KOps/s 75.7178 KOps/s $\color{#35bf28}+4.19\%$
test_unsqueeze 0.1928ms 94.1602μs 10.6202 KOps/s 10.7609 KOps/s $\color{#d91a1a}-1.31\%$
test_split 0.5336ms 0.1962ms 5.0971 KOps/s 5.1102 KOps/s $\color{#d91a1a}-0.26\%$
test_permute 0.4862ms 0.2236ms 4.4732 KOps/s 4.5333 KOps/s $\color{#d91a1a}-1.33\%$
test_stack 31.7804ms 26.9041ms 37.1690 Ops/s 37.2261 Ops/s $\color{#d91a1a}-0.15\%$
test_cat 44.1902ms 28.8057ms 34.7154 Ops/s 36.6055 Ops/s $\textbf{\color{#d91a1a}-5.16\%}$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1552ms 15.9226μs 62.8038 KOps/s 60.0473 KOps/s $\color{#35bf28}+4.59\%$
test_plain_set_stack_nested 48.9910μs 16.0224μs 62.4125 KOps/s 60.2961 KOps/s $\color{#35bf28}+3.51\%$
test_plain_set_nested_inplace 47.4310μs 17.1298μs 58.3778 KOps/s 56.2531 KOps/s $\color{#35bf28}+3.78\%$
test_plain_set_stack_nested_inplace 47.8510μs 17.2309μs 58.0353 KOps/s 56.2011 KOps/s $\color{#35bf28}+3.26\%$
test_items 33.7500μs 2.8867μs 346.4135 KOps/s 347.7377 KOps/s $\color{#d91a1a}-0.38\%$
test_items_nested 0.3844ms 0.3454ms 2.8955 KOps/s 2.9453 KOps/s $\color{#d91a1a}-1.69\%$
test_items_nested_locked 0.3789ms 0.3451ms 2.8977 KOps/s 2.9230 KOps/s $\color{#d91a1a}-0.87\%$
test_items_nested_leaf 94.0020μs 62.8706μs 15.9057 KOps/s 15.9668 KOps/s $\color{#d91a1a}-0.38\%$
test_items_stack_nested 0.4167ms 0.3447ms 2.9009 KOps/s 2.9221 KOps/s $\color{#d91a1a}-0.72\%$
test_items_stack_nested_leaf 0.2040ms 62.8985μs 15.8986 KOps/s 15.8569 KOps/s $\color{#35bf28}+0.26\%$
test_items_stack_nested_locked 0.3787ms 0.3469ms 2.8826 KOps/s 2.9039 KOps/s $\color{#d91a1a}-0.74\%$
test_keys 28.5110μs 3.4393μs 290.7536 KOps/s 293.0878 KOps/s $\color{#d91a1a}-0.80\%$
test_keys_nested 0.1828ms 72.0934μs 13.8709 KOps/s 13.8725 KOps/s $\color{#d91a1a}-0.01\%$
test_keys_nested_locked 3.2041ms 78.4833μs 12.7416 KOps/s 12.8366 KOps/s $\color{#d91a1a}-0.74\%$
test_keys_nested_leaf 0.1049ms 63.1930μs 15.8245 KOps/s 15.6707 KOps/s $\color{#35bf28}+0.98\%$
test_keys_stack_nested 0.2143ms 72.5110μs 13.7910 KOps/s 13.6486 KOps/s $\color{#35bf28}+1.04\%$
test_keys_stack_nested_leaf 93.5520μs 62.7426μs 15.9381 KOps/s 15.5678 KOps/s $\color{#35bf28}+2.38\%$
test_keys_stack_nested_locked 0.1426ms 78.6669μs 12.7118 KOps/s 12.7137 KOps/s $\color{#d91a1a}-0.01\%$
test_values 4.3568μs 0.8553μs 1.1692 MOps/s 1.1926 MOps/s $\color{#d91a1a}-1.97\%$
test_values_nested 76.9120μs 48.6481μs 20.5558 KOps/s 20.6984 KOps/s $\color{#d91a1a}-0.69\%$
test_values_nested_locked 0.1185ms 50.7556μs 19.7023 KOps/s 19.9593 KOps/s $\color{#d91a1a}-1.29\%$
test_values_nested_leaf 75.5120μs 42.8107μs 23.3586 KOps/s 23.4987 KOps/s $\color{#d91a1a}-0.60\%$
test_values_stack_nested 76.7810μs 48.9248μs 20.4395 KOps/s 20.3259 KOps/s $\color{#35bf28}+0.56\%$
test_values_stack_nested_leaf 0.1760ms 42.9428μs 23.2868 KOps/s 22.9133 KOps/s $\color{#35bf28}+1.63\%$
test_values_stack_nested_locked 86.8810μs 50.3244μs 19.8711 KOps/s 19.8373 KOps/s $\color{#35bf28}+0.17\%$
test_membership 1.5940μs 0.4986μs 2.0057 MOps/s 1.9991 MOps/s $\color{#35bf28}+0.33\%$
test_membership_nested 19.1405μs 1.8843μs 530.7043 KOps/s 541.4387 KOps/s $\color{#d91a1a}-1.98\%$
test_membership_nested_leaf 27.0800μs 1.9356μs 516.6422 KOps/s 528.3786 KOps/s $\color{#d91a1a}-2.22\%$
test_membership_stacked_nested 30.1810μs 1.9166μs 521.7552 KOps/s 509.1584 KOps/s $\color{#35bf28}+2.47\%$
test_membership_stacked_nested_leaf 31.0710μs 1.9160μs 521.9258 KOps/s 504.4794 KOps/s $\color{#35bf28}+3.46\%$
test_membership_nested_last 32.4410μs 3.0661μs 326.1424 KOps/s 332.2788 KOps/s $\color{#d91a1a}-1.85\%$
test_membership_nested_leaf_last 34.5410μs 3.0047μs 332.8147 KOps/s 328.8113 KOps/s $\color{#35bf28}+1.22\%$
test_membership_stacked_nested_last 26.3410μs 2.9919μs 334.2320 KOps/s 332.5319 KOps/s $\color{#35bf28}+0.51\%$
test_membership_stacked_nested_leaf_last 28.8310μs 3.0061μs 332.6622 KOps/s 334.9635 KOps/s $\color{#d91a1a}-0.69\%$
test_nested_getleaf 34.6500μs 6.1038μs 163.8324 KOps/s 163.3683 KOps/s $\color{#35bf28}+0.28\%$
test_nested_get 42.4900μs 5.7665μs 173.4144 KOps/s 173.0744 KOps/s $\color{#35bf28}+0.20\%$
test_stacked_getleaf 28.7300μs 6.0741μs 164.6328 KOps/s 165.8415 KOps/s $\color{#d91a1a}-0.73\%$
test_stacked_get 32.2510μs 5.6601μs 176.6766 KOps/s 176.5559 KOps/s $\color{#35bf28}+0.07\%$
test_nested_getitemleaf 29.3500μs 6.0935μs 164.1086 KOps/s 162.8981 KOps/s $\color{#35bf28}+0.74\%$
test_nested_getitem 0.1953ms 5.7567μs 173.7104 KOps/s 173.9187 KOps/s $\color{#d91a1a}-0.12\%$
test_stacked_getitemleaf 30.7810μs 6.0943μs 164.0886 KOps/s 165.9001 KOps/s $\color{#d91a1a}-1.09\%$
test_stacked_getitem 0.1978ms 5.7535μs 173.8080 KOps/s 175.2374 KOps/s $\color{#d91a1a}-0.82\%$
test_lock_nested 8.2627ms 0.4351ms 2.2983 KOps/s 2.2952 KOps/s $\color{#35bf28}+0.13\%$
test_lock_stack_nested 0.4549ms 0.3925ms 2.5480 KOps/s 2.5444 KOps/s $\color{#35bf28}+0.14\%$
test_unlock_nested 0.7660ms 0.3634ms 2.7515 KOps/s 2.7228 KOps/s $\color{#35bf28}+1.05\%$
test_unlock_stack_nested 0.4580ms 0.3299ms 3.0313 KOps/s 3.0169 KOps/s $\color{#35bf28}+0.48\%$
test_flatten_speed 0.1578ms 77.2579μs 12.9437 KOps/s 13.1528 KOps/s $\color{#d91a1a}-1.59\%$
test_unflatten_speed 0.3543ms 0.3207ms 3.1186 KOps/s 3.0939 KOps/s $\color{#35bf28}+0.80\%$
test_common_ops 1.5485ms 1.2199ms 819.7430 Ops/s 763.8978 Ops/s $\textbf{\color{#35bf28}+7.31\%}$
test_creation 34.6400μs 1.4920μs 670.2510 KOps/s 670.6354 KOps/s $\color{#d91a1a}-0.06\%$
test_creation_empty 47.8610μs 14.0741μs 71.0527 KOps/s 66.5859 KOps/s $\textbf{\color{#35bf28}+6.71\%}$
test_creation_nested_1 81.3320μs 15.7596μs 63.4532 KOps/s 59.1818 KOps/s $\textbf{\color{#35bf28}+7.22\%}$
test_creation_nested_2 0.1482ms 18.1652μs 55.0504 KOps/s 51.4058 KOps/s $\textbf{\color{#35bf28}+7.09\%}$
test_clone 1.4034ms 29.6365μs 33.7422 KOps/s 34.5498 KOps/s $\color{#d91a1a}-2.34\%$
test_getitem[int] 1.1989ms 15.7935μs 63.3173 KOps/s 61.2284 KOps/s $\color{#35bf28}+3.41\%$
test_getitem[slice_int] 0.1581ms 27.6877μs 36.1171 KOps/s 35.6493 KOps/s $\color{#35bf28}+1.31\%$
test_getitem[range] 0.2969ms 0.1095ms 9.1346 KOps/s 9.2357 KOps/s $\color{#d91a1a}-1.09\%$
test_getitem[tuple] 0.1676ms 23.9632μs 41.7307 KOps/s 41.0756 KOps/s $\color{#35bf28}+1.59\%$
test_getitem[list] 0.3441ms 96.7855μs 10.3321 KOps/s 10.2210 KOps/s $\color{#35bf28}+1.09\%$
test_setitem_dim[int] 0.1930ms 43.9221μs 22.7676 KOps/s 22.3991 KOps/s $\color{#35bf28}+1.65\%$
test_setitem_dim[slice_int] 0.1119ms 64.5061μs 15.5024 KOps/s 15.0880 KOps/s $\color{#35bf28}+2.75\%$
test_setitem_dim[range] 0.3064ms 0.1260ms 7.9384 KOps/s 7.9333 KOps/s $\color{#35bf28}+0.07\%$
test_setitem_dim[tuple] 0.2056ms 59.0228μs 16.9426 KOps/s 16.8464 KOps/s $\color{#35bf28}+0.57\%$
test_setitem 0.1922ms 40.4161μs 24.7426 KOps/s 24.3334 KOps/s $\color{#35bf28}+1.68\%$
test_set 0.1932ms 39.9821μs 25.0112 KOps/s 24.5881 KOps/s $\color{#35bf28}+1.72\%$
test_set_shared 0.3556ms 54.4218μs 18.3750 KOps/s 18.0936 KOps/s $\color{#35bf28}+1.56\%$
test_update 0.1980ms 48.3456μs 20.6844 KOps/s 20.0256 KOps/s $\color{#35bf28}+3.29\%$
test_update_nested 0.1874ms 56.4712μs 17.7081 KOps/s 17.2850 KOps/s $\color{#35bf28}+2.45\%$
test_update__nested 0.2126ms 62.0423μs 16.1180 KOps/s 15.8274 KOps/s $\color{#35bf28}+1.84\%$
test_set_nested 0.1946ms 42.1686μs 23.7143 KOps/s 20.9685 KOps/s $\textbf{\color{#35bf28}+13.10\%}$
test_set_nested_new 0.1978ms 47.0610μs 21.2490 KOps/s 19.1703 KOps/s $\textbf{\color{#35bf28}+10.84\%}$
test_select 0.2155ms 59.5126μs 16.8032 KOps/s 15.4790 KOps/s $\textbf{\color{#35bf28}+8.55\%}$
test_select_nested 65.4520μs 41.5740μs 24.0535 KOps/s 23.7001 KOps/s $\color{#35bf28}+1.49\%$
test_exclude_nested 0.1046ms 58.5283μs 17.0857 KOps/s 16.6901 KOps/s $\color{#35bf28}+2.37\%$
test_empty[True] 0.5754ms 0.2600ms 3.8463 KOps/s 3.8711 KOps/s $\color{#d91a1a}-0.64\%$
test_empty[False] 8.5462μs 0.7374μs 1.3562 MOps/s 1.3375 MOps/s $\color{#35bf28}+1.40\%$
test_to 0.1728ms 26.4233μs 37.8453 KOps/s 36.3580 KOps/s $\color{#35bf28}+4.09\%$
test_to_nonblocking 0.2237ms 25.4977μs 39.2192 KOps/s 38.4570 KOps/s $\color{#35bf28}+1.98\%$
test_unbind_speed 1.0449ms 0.2707ms 3.6944 KOps/s 3.5716 KOps/s $\color{#35bf28}+3.44\%$
test_unbind_speed_stack0 0.3464ms 0.2737ms 3.6534 KOps/s 3.5846 KOps/s $\color{#35bf28}+1.92\%$
test_unbind_speed_stack1 0.1019s 0.7066ms 1.4151 KOps/s 1.4044 KOps/s $\color{#35bf28}+0.77\%$
test_split 0.1044s 2.2039ms 453.7310 Ops/s 443.7195 Ops/s $\color{#35bf28}+2.26\%$
test_chunk 0.1032s 2.2025ms 454.0287 Ops/s 443.7904 Ops/s $\color{#35bf28}+2.31\%$
test_to[False] 3.6332ms 3.3343ms 299.9127 Ops/s 287.5022 Ops/s $\color{#35bf28}+4.32\%$
test_to[True] 4.9089ms 4.5005ms 222.1997 Ops/s 217.1842 Ops/s $\color{#35bf28}+2.31\%$
test_to_njt[False] 0.3344s 0.2524s 3.9615 Ops/s 4.3212 Ops/s $\textbf{\color{#d91a1a}-8.32\%}$
test_to_njt[True] 0.3746s 0.2859s 3.4979 Ops/s 3.4745 Ops/s $\color{#35bf28}+0.68\%$
test_creation[device0] 0.3776ms 0.1280ms 7.8096 KOps/s 7.7568 KOps/s $\color{#35bf28}+0.68\%$
test_creation_from_tensor 0.3967ms 0.1303ms 7.6720 KOps/s 7.5833 KOps/s $\color{#35bf28}+1.17\%$
test_add_one[memmap_tensor0] 0.2241ms 8.2849μs 120.7015 KOps/s 115.2304 KOps/s $\color{#35bf28}+4.75\%$
test_contiguous[memmap_tensor0] 38.4700μs 2.1758μs 459.5942 KOps/s 431.6599 KOps/s $\textbf{\color{#35bf28}+6.47\%}$
test_stack[memmap_tensor0] 0.1562ms 6.7190μs 148.8311 KOps/s 142.6672 KOps/s $\color{#35bf28}+4.32\%$
test_memmaptd_index 1.0853ms 0.4315ms 2.3175 KOps/s 2.2177 KOps/s $\color{#35bf28}+4.50\%$
test_memmaptd_index_astensor 1.0152ms 0.5047ms 1.9812 KOps/s 1.9234 KOps/s $\color{#35bf28}+3.01\%$
test_memmaptd_index_op 1.3630ms 1.0022ms 997.8091 Ops/s 955.6684 Ops/s $\color{#35bf28}+4.41\%$
test_serialize_model 0.1315s 0.1309s 7.6371 Ops/s 7.6533 Ops/s $\color{#d91a1a}-0.21\%$
test_serialize_model_pickle 1.3477s 1.2189s 0.8204 Ops/s 0.8219 Ops/s $\color{#d91a1a}-0.18\%$
test_serialize_weights 0.1310s 0.1303s 7.6748 Ops/s 6.9091 Ops/s $\textbf{\color{#35bf28}+11.08\%}$
test_serialize_weights_returnearly 0.2097s 55.4684ms 18.0283 Ops/s 16.9455 Ops/s $\textbf{\color{#35bf28}+6.39\%}$
test_serialize_weights_pickle 1.3539s 1.2135s 0.8241 Ops/s 0.8376 Ops/s $\color{#d91a1a}-1.62\%$
test_reshape_pytree 0.1352ms 35.8459μs 27.8972 KOps/s 27.3362 KOps/s $\color{#35bf28}+2.05\%$
test_reshape_td 0.1610ms 42.5113μs 23.5231 KOps/s 22.8481 KOps/s $\color{#35bf28}+2.95\%$
test_view_pytree 0.1784ms 35.7177μs 27.9973 KOps/s 27.2521 KOps/s $\color{#35bf28}+2.73\%$
test_view_td 0.1865ms 46.7772μs 21.3779 KOps/s 20.5534 KOps/s $\color{#35bf28}+4.01\%$
test_unbind_pytree 0.1773ms 34.2861μs 29.1663 KOps/s 28.4601 KOps/s $\color{#35bf28}+2.48\%$
test_unbind_td 0.4811ms 42.1048μs 23.7502 KOps/s 23.0342 KOps/s $\color{#35bf28}+3.11\%$
test_split_pytree 0.1535ms 47.4297μs 21.0838 KOps/s 21.3699 KOps/s $\color{#d91a1a}-1.34\%$
test_split_td 0.7250ms 56.7982μs 17.6062 KOps/s 17.6211 KOps/s $\color{#d91a1a}-0.08\%$
test_add_pytree 0.2050ms 55.9919μs 17.8597 KOps/s 17.3150 KOps/s $\color{#35bf28}+3.15\%$
test_add_td 0.2678ms 95.6198μs 10.4581 KOps/s 10.4752 KOps/s $\color{#d91a1a}-0.16\%$
test_compile_add_one_nested[tensordict-compile] 0.3047ms 0.1624ms 6.1579 KOps/s 6.1126 KOps/s $\color{#35bf28}+0.74\%$
test_compile_add_one_nested[tensordict-eager] 0.3128ms 0.1606ms 6.2262 KOps/s 6.0981 KOps/s $\color{#35bf28}+2.10\%$
test_compile_add_one_nested[pytree-compile] 0.3003ms 0.1570ms 6.3695 KOps/s 6.2954 KOps/s $\color{#35bf28}+1.18\%$
test_compile_add_one_nested[pytree-eager] 0.3428ms 0.1810ms 5.5243 KOps/s 5.3638 KOps/s $\color{#35bf28}+2.99\%$
test_compile_copy_nested[tensordict-compile] 0.1624ms 21.2228μs 47.1191 KOps/s 45.7480 KOps/s $\color{#35bf28}+3.00\%$
test_compile_copy_nested[tensordict-eager] 0.1389ms 49.3558μs 20.2611 KOps/s 20.5443 KOps/s $\color{#d91a1a}-1.38\%$
test_compile_copy_nested[pytree-compile] 0.4323ms 65.6172μs 15.2399 KOps/s 15.0826 KOps/s $\color{#35bf28}+1.04\%$
test_compile_copy_nested[pytree-eager] 88.9610μs 50.6904μs 19.7276 KOps/s 19.7746 KOps/s $\color{#d91a1a}-0.24\%$
test_compile_add_one_flat[tensordict-compile] 0.4653ms 0.3214ms 3.1111 KOps/s 3.0734 KOps/s $\color{#35bf28}+1.23\%$
test_compile_add_one_flat[tensordict-eager] 0.3740ms 0.2316ms 4.3177 KOps/s 4.1919 KOps/s $\color{#35bf28}+3.00\%$
test_compile_add_one_flat[tensorclass-compile] 0.2757ms 0.1294ms 7.7290 KOps/s 7.4510 KOps/s $\color{#35bf28}+3.73\%$
test_compile_add_one_flat[tensorclass-eager] 0.2106ms 66.3536μs 15.0708 KOps/s 14.0741 KOps/s $\textbf{\color{#35bf28}+7.08\%}$
test_compile_add_one_flat[pytree-compile] 0.4633ms 0.3312ms 3.0197 KOps/s 3.0194 KOps/s $\color{#35bf28}+0.01\%$
test_compile_add_one_flat[pytree-eager] 0.7865ms 0.6113ms 1.6360 KOps/s 1.5757 KOps/s $\color{#35bf28}+3.82\%$
test_compile_add_self_flat[tensordict-eager] 0.4361ms 0.2818ms 3.5481 KOps/s 3.4509 KOps/s $\color{#35bf28}+2.82\%$
test_compile_add_self_flat[tensordict-compile] 0.4647ms 0.3252ms 3.0753 KOps/s 3.0847 KOps/s $\color{#d91a1a}-0.30\%$
test_compile_add_self_flat[tensorclass-eager] 0.2241ms 77.3115μs 12.9347 KOps/s 12.2155 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_compile_add_self_flat[tensorclass-compile] 0.2920ms 0.1308ms 7.6457 KOps/s 7.3087 KOps/s $\color{#35bf28}+4.61\%$
test_compile_add_self_flat[pytree-eager] 0.6854ms 0.5179ms 1.9308 KOps/s 1.8720 KOps/s $\color{#35bf28}+3.15\%$
test_compile_add_self_flat[pytree-compile] 0.4731ms 0.3302ms 3.0287 KOps/s 3.0109 KOps/s $\color{#35bf28}+0.59\%$
test_compile_copy_flat[tensordict-compile] 0.1604ms 19.2804μs 51.8661 KOps/s 50.4638 KOps/s $\color{#35bf28}+2.78\%$
test_compile_copy_flat[tensordict-eager] 0.2123ms 37.7328μs 26.5022 KOps/s 26.2228 KOps/s $\color{#35bf28}+1.07\%$
test_compile_copy_flat[pytree-compile] 0.2444ms 69.7492μs 14.3371 KOps/s 14.2667 KOps/s $\color{#35bf28}+0.49\%$
test_compile_copy_flat[pytree-eager] 0.2716ms 51.8489μs 19.2868 KOps/s 19.2177 KOps/s $\color{#35bf28}+0.36\%$
test_compile_assign_and_add[tensordict-compile] 2.4004ms 0.8356ms 1.1967 KOps/s 1.1066 KOps/s $\textbf{\color{#35bf28}+8.15\%}$
test_compile_assign_and_add[tensordict-eager] 3.4003ms 3.1686ms 315.6002 Ops/s 309.3057 Ops/s $\color{#35bf28}+2.04\%$
test_compile_assign_and_add[pytree-compile] 2.4498ms 0.8550ms 1.1696 KOps/s 1.0771 KOps/s $\textbf{\color{#35bf28}+8.59\%}$
test_compile_assign_and_add[pytree-eager] 3.5240ms 3.1721ms 315.2482 Ops/s 313.1534 Ops/s $\color{#35bf28}+0.67\%$
test_compile_indexing[tensor-tensordict-compile] 0.2725ms 0.1237ms 8.0842 KOps/s 8.3460 KOps/s $\color{#d91a1a}-3.14\%$
test_compile_indexing[tensor-tensordict-eager] 2.2737ms 64.5618μs 15.4890 KOps/s 16.5615 KOps/s $\textbf{\color{#d91a1a}-6.48\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.3176ms 0.1228ms 8.1444 KOps/s 8.7866 KOps/s $\textbf{\color{#d91a1a}-7.31\%}$
test_compile_indexing[tensor-tensorclass-eager] 0.2233ms 45.1685μs 22.1393 KOps/s 23.4761 KOps/s $\textbf{\color{#d91a1a}-5.69\%}$
test_compile_indexing[tensor-pytree-compile] 0.3397ms 0.1252ms 7.9854 KOps/s 8.7210 KOps/s $\textbf{\color{#d91a1a}-8.43\%}$
test_compile_indexing[tensor-pytree-eager] 0.2278ms 45.6010μs 21.9293 KOps/s 23.4369 KOps/s $\textbf{\color{#d91a1a}-6.43\%}$
test_compile_indexing[slice-tensordict-compile] 0.3290ms 0.1521ms 6.5752 KOps/s 6.7358 KOps/s $\color{#d91a1a}-2.38\%$
test_compile_indexing[slice-tensordict-eager] 0.1910ms 25.2007μs 39.6814 KOps/s 38.4826 KOps/s $\color{#35bf28}+3.12\%$
test_compile_indexing[slice-tensorclass-compile] 0.2950ms 0.1473ms 6.7874 KOps/s 7.0257 KOps/s $\color{#d91a1a}-3.39\%$
test_compile_indexing[slice-tensorclass-eager] 0.1394ms 20.4202μs 48.9710 KOps/s 47.4261 KOps/s $\color{#35bf28}+3.26\%$
test_compile_indexing[slice-pytree-compile] 0.3242ms 0.1495ms 6.6903 KOps/s 6.9463 KOps/s $\color{#d91a1a}-3.69\%$
test_compile_indexing[slice-pytree-eager] 0.2131ms 23.4685μs 42.6104 KOps/s 47.2752 KOps/s $\textbf{\color{#d91a1a}-9.87\%}$
test_compile_indexing[int-tensordict-compile] 0.3277ms 0.1544ms 6.4746 KOps/s 6.6978 KOps/s $\color{#d91a1a}-3.33\%$
test_compile_indexing[int-tensordict-eager] 0.5169ms 24.9655μs 40.0553 KOps/s 38.6537 KOps/s $\color{#35bf28}+3.63\%$
test_compile_indexing[int-tensorclass-compile] 0.3299ms 0.1483ms 6.7423 KOps/s 6.9532 KOps/s $\color{#d91a1a}-3.03\%$
test_compile_indexing[int-tensorclass-eager] 0.1105ms 20.1943μs 49.5189 KOps/s 47.5908 KOps/s $\color{#35bf28}+4.05\%$
test_compile_indexing[int-pytree-compile] 0.3163ms 0.1500ms 6.6655 KOps/s 6.9859 KOps/s $\color{#d91a1a}-4.59\%$
test_compile_indexing[int-pytree-eager] 0.1505ms 20.2840μs 49.3000 KOps/s 46.8917 KOps/s $\textbf{\color{#35bf28}+5.14\%}$
test_mod_add[eager] 0.2132ms 32.6735μs 30.6058 KOps/s 30.9435 KOps/s $\color{#d91a1a}-1.09\%$
test_mod_add[compile] 0.2639ms 86.6555μs 11.5400 KOps/s 12.0110 KOps/s $\color{#d91a1a}-3.92\%$
test_mod_add[compile-overhead] 0.3146ms 0.1531ms 6.5321 KOps/s 5.8801 KOps/s $\textbf{\color{#35bf28}+11.09\%}$
test_mod_wrap[eager] 0.4440ms 0.2466ms 4.0556 KOps/s 4.1917 KOps/s $\color{#d91a1a}-3.25\%$
test_mod_wrap[compile] 1.5281ms 0.2998ms 3.3358 KOps/s 3.3087 KOps/s $\color{#35bf28}+0.82\%$
test_mod_wrap[compile-overhead] 7.8777ms 4.0244ms 248.4812 Ops/s 245.1806 Ops/s $\color{#35bf28}+1.35\%$
test_mod_wrap_and_backward[eager] 1.4764ms 1.3031ms 767.4154 Ops/s 713.8027 Ops/s $\textbf{\color{#35bf28}+7.51\%}$
test_mod_wrap_and_backward[compile] 1.5808ms 1.3388ms 746.9177 Ops/s 698.1603 Ops/s $\textbf{\color{#35bf28}+6.98\%}$
test_mod_wrap_and_backward[compile-overhead] 1.4194ms 0.9471ms 1.0559 KOps/s 955.1934 Ops/s $\textbf{\color{#35bf28}+10.54\%}$
test_seq_add[eager] 0.2780ms 97.5496μs 10.2512 KOps/s 10.1674 KOps/s $\color{#35bf28}+0.82\%$
test_seq_add[compile] 0.2615ms 96.5253μs 10.3600 KOps/s 10.7964 KOps/s $\color{#d91a1a}-4.04\%$
test_seq_add[compile-overhead] 0.2777ms 0.1256ms 7.9638 KOps/s 7.9281 KOps/s $\color{#35bf28}+0.45\%$
test_seq_wrap[eager] 0.5945ms 0.3864ms 2.5880 KOps/s 2.6381 KOps/s $\color{#d91a1a}-1.90\%$
test_seq_wrap[compile] 0.5077ms 0.3316ms 3.0161 KOps/s 3.0103 KOps/s $\color{#35bf28}+0.19\%$
test_seq_wrap[compile-overhead] 0.3651ms 0.2229ms 4.4872 KOps/s 4.4598 KOps/s $\color{#35bf28}+0.62\%$
test_func_call_runtime[False-eager] 0.8925ms 0.7160ms 1.3967 KOps/s 1.3677 KOps/s $\color{#35bf28}+2.12\%$
test_func_call_runtime[False-compile] 1.1748ms 0.8202ms 1.2192 KOps/s 1.2440 KOps/s $\color{#d91a1a}-2.00\%$
test_func_call_runtime[False-compile-overhead] 0.5495ms 0.3729ms 2.6817 KOps/s 2.7531 KOps/s $\color{#d91a1a}-2.59\%$
test_func_call_runtime[True-eager] 1.2368ms 0.9234ms 1.0830 KOps/s 1.1261 KOps/s $\color{#d91a1a}-3.83\%$
test_func_call_runtime[True-compile] 0.9784ms 0.8150ms 1.2270 KOps/s 1.2119 KOps/s $\color{#35bf28}+1.24\%$
test_func_call_runtime[True-compile-overhead] 0.5828ms 0.3845ms 2.6010 KOps/s 2.5771 KOps/s $\color{#35bf28}+0.93\%$
test_func_call_cm_runtime[False-eager] 0.8786ms 0.7083ms 1.4118 KOps/s 1.3817 KOps/s $\color{#35bf28}+2.18\%$
test_func_call_cm_runtime[False-compile] 0.9781ms 0.7996ms 1.2507 KOps/s 1.2373 KOps/s $\color{#35bf28}+1.08\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5183ms 0.3658ms 2.7337 KOps/s 2.7466 KOps/s $\color{#d91a1a}-0.47\%$
test_func_call_cm_runtime[True-eager] 1.3539ms 1.0003ms 999.6839 Ops/s 997.4776 Ops/s $\color{#35bf28}+0.22\%$
test_func_call_cm_runtime[True-compile] 1.0231ms 0.8506ms 1.1756 KOps/s 1.1702 KOps/s $\color{#35bf28}+0.47\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5524ms 0.4074ms 2.4544 KOps/s 2.4297 KOps/s $\color{#35bf28}+1.02\%$
test_vmap_func_call_cm_runtime[eager] 2.4835ms 2.0329ms 491.9146 Ops/s 487.1047 Ops/s $\color{#35bf28}+0.99\%$
test_vmap_func_call_cm_runtime[compile] 1.0440ms 0.8659ms 1.1549 KOps/s 1.1470 KOps/s $\color{#35bf28}+0.69\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5606ms 0.4178ms 2.3936 KOps/s 2.4006 KOps/s $\color{#d91a1a}-0.29\%$
test_distributed 3.7067ms 0.1788ms 5.5927 KOps/s 8.7210 KOps/s $\textbf{\color{#d91a1a}-35.87\%}$
test_tdmodule 26.5710μs 14.2878μs 69.9897 KOps/s 67.9673 KOps/s $\color{#35bf28}+2.98\%$
test_tdmodule_dispatch 48.2810μs 26.8598μs 37.2303 KOps/s 35.6887 KOps/s $\color{#35bf28}+4.32\%$
test_tdseq 36.8210μs 14.8406μs 67.3827 KOps/s 64.7871 KOps/s $\color{#35bf28}+4.01\%$
test_tdseq_dispatch 50.9110μs 29.6044μs 33.7787 KOps/s 32.2639 KOps/s $\color{#35bf28}+4.70\%$
test_instantiation_functorch 2.0660ms 1.8631ms 536.7485 Ops/s 531.0919 Ops/s $\color{#35bf28}+1.07\%$
test_exec_functorch 0.3707ms 0.2113ms 4.7320 KOps/s 4.8026 KOps/s $\color{#d91a1a}-1.47\%$
test_exec_functional_call 0.3884ms 0.2074ms 4.8221 KOps/s 4.8378 KOps/s $\color{#d91a1a}-0.32\%$
test_exec_td_decorator 0.4572ms 0.2607ms 3.8354 KOps/s 3.8102 KOps/s $\color{#35bf28}+0.66\%$
test_vmap_mlp_speed_decorator[True-True] 0.9883ms 0.6814ms 1.4676 KOps/s 1.4997 KOps/s $\color{#d91a1a}-2.14\%$
test_vmap_mlp_speed_decorator[True-False] 0.8763ms 0.6848ms 1.4603 KOps/s 1.4990 KOps/s $\color{#d91a1a}-2.59\%$
test_vmap_mlp_speed_decorator[False-True] 0.7966ms 0.6115ms 1.6352 KOps/s 1.7042 KOps/s $\color{#d91a1a}-4.05\%$
test_vmap_mlp_speed_decorator[False-False] 0.7814ms 0.5913ms 1.6913 KOps/s 1.6981 KOps/s $\color{#d91a1a}-0.40\%$
test_vmap_transformer_speed_decorator[True-True] 20.1424ms 19.0910ms 52.3808 Ops/s 52.6255 Ops/s $\color{#d91a1a}-0.46\%$
test_vmap_transformer_speed_decorator[True-False] 20.0402ms 19.0615ms 52.4618 Ops/s 52.5063 Ops/s $\color{#d91a1a}-0.08\%$
test_vmap_transformer_speed_decorator[False-True] 20.2799ms 19.4394ms 51.4420 Ops/s 53.0807 Ops/s $\color{#d91a1a}-3.09\%$
test_vmap_transformer_speed_decorator[False-False] 19.0892ms 18.8539ms 53.0393 Ops/s 52.8757 Ops/s $\color{#35bf28}+0.31\%$
test_to_module_speed[True] 1.4073ms 1.0215ms 978.9883 Ops/s 981.6716 Ops/s $\color{#d91a1a}-0.27\%$
test_to_module_speed[False] 1.4535ms 1.0085ms 991.5692 Ops/s 998.1794 Ops/s $\color{#d91a1a}-0.66\%$
test_tc_init 65.7310μs 33.2814μs 30.0468 KOps/s 28.5201 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_tc_init_nested 0.1713ms 70.7984μs 14.1246 KOps/s 14.4163 KOps/s $\color{#d91a1a}-2.02\%$
test_tc_first_layer_tensor 13.4930μs 0.7051μs 1.4181 MOps/s 1.4432 MOps/s $\color{#d91a1a}-1.73\%$
test_tc_first_layer_nontensor 28.8700μs 2.2620μs 442.0908 KOps/s 437.9161 KOps/s $\color{#35bf28}+0.95\%$
test_tc_second_layer_tensor 8.9802μs 1.3914μs 718.7038 KOps/s 703.2368 KOps/s $\color{#35bf28}+2.20\%$
test_tc_second_layer_nontensor 31.0500μs 2.9641μs 337.3660 KOps/s 333.1632 KOps/s $\color{#35bf28}+1.26\%$
test_unbind 0.2071s 9.8534ms 101.4876 Ops/s 88.3421 Ops/s $\textbf{\color{#35bf28}+14.88\%}$
test_full_like 0.7905ms 0.5721ms 1.7478 KOps/s 1.7442 KOps/s $\color{#35bf28}+0.21\%$
test_zeros_like 0.3808ms 0.1983ms 5.0432 KOps/s 5.0371 KOps/s $\color{#35bf28}+0.12\%$
test_ones_like 0.3740ms 0.1982ms 5.0450 KOps/s 5.0424 KOps/s $\color{#35bf28}+0.05\%$
test_clone 0.5958ms 0.4153ms 2.4081 KOps/s 2.4070 KOps/s $\color{#35bf28}+0.04\%$
test_squeeze 0.1088ms 9.7858μs 102.1890 KOps/s 100.8226 KOps/s $\color{#35bf28}+1.36\%$
test_unsqueeze 0.2195ms 74.5327μs 13.4169 KOps/s 13.0359 KOps/s $\color{#35bf28}+2.92\%$
test_split 0.4256ms 0.1598ms 6.2566 KOps/s 6.0118 KOps/s $\color{#35bf28}+4.07\%$
test_permute 0.2185ms 0.1748ms 5.7202 KOps/s 5.4996 KOps/s $\color{#35bf28}+4.01\%$
test_stack 1.3770ms 0.8521ms 1.1735 KOps/s 1.1904 KOps/s $\color{#d91a1a}-1.42\%$
test_cat 1.3706ms 1.2318ms 811.8402 Ops/s 811.5232 Ops/s $\color{#35bf28}+0.04\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants