Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] COMPOSITE_LP_AGGREGATE env variable #1190

Merged
merged 1 commit into from
Jan 21, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 21, 2025

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 21, 2025
ghstack-source-id: 16b07d0eac582cfd419612f87e38e1a7acffcfc0
Pull Request resolved: #1190
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2025
@vmoens vmoens merged commit 1abb8f8 into gh/vmoens/45/base Jan 21, 2025
13 of 18 checks passed
vmoens added a commit that referenced this pull request Jan 21, 2025
ghstack-source-id: 16b07d0eac582cfd419612f87e38e1a7acffcfc0
Pull Request resolved: #1190
@vmoens vmoens deleted the gh/vmoens/45/head branch January 21, 2025 09:56
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}28$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 74.0310μs 11.5786μs 86.3661 KOps/s 75.5462 KOps/s $\textbf{\color{#35bf28}+14.32\%}$
test_plain_set_stack_nested 40.9800μs 11.8179μs 84.6172 KOps/s 75.1125 KOps/s $\textbf{\color{#35bf28}+12.65\%}$
test_plain_set_nested_inplace 37.3610μs 12.6428μs 79.0963 KOps/s 70.5168 KOps/s $\textbf{\color{#35bf28}+12.17\%}$
test_plain_set_stack_nested_inplace 40.3810μs 12.6590μs 78.9950 KOps/s 69.9810 KOps/s $\textbf{\color{#35bf28}+12.88\%}$
test_items 28.0710μs 2.9066μs 344.0447 KOps/s 338.0267 KOps/s $\color{#35bf28}+1.78\%$
test_items_nested 0.4283ms 0.3709ms 2.6959 KOps/s 2.7519 KOps/s $\color{#d91a1a}-2.03\%$
test_items_nested_locked 0.4056ms 0.3682ms 2.7156 KOps/s 2.7514 KOps/s $\color{#d91a1a}-1.30\%$
test_items_nested_leaf 0.1698ms 58.6559μs 17.0486 KOps/s 17.2545 KOps/s $\color{#d91a1a}-1.19\%$
test_items_stack_nested 0.3972ms 0.3681ms 2.7165 KOps/s 2.7776 KOps/s $\color{#d91a1a}-2.20\%$
test_items_stack_nested_leaf 85.7520μs 58.3924μs 17.1255 KOps/s 16.7508 KOps/s $\color{#35bf28}+2.24\%$
test_items_stack_nested_locked 0.4353ms 0.3683ms 2.7151 KOps/s 2.7281 KOps/s $\color{#d91a1a}-0.48\%$
test_keys 27.8900μs 3.4500μs 289.8557 KOps/s 291.4224 KOps/s $\color{#d91a1a}-0.54\%$
test_keys_nested 0.1233ms 88.8541μs 11.2544 KOps/s 11.3383 KOps/s $\color{#d91a1a}-0.74\%$
test_keys_nested_locked 0.7566ms 94.4797μs 10.5843 KOps/s 10.6340 KOps/s $\color{#d91a1a}-0.47\%$
test_keys_nested_leaf 0.1122ms 79.3856μs 12.5967 KOps/s 12.6177 KOps/s $\color{#d91a1a}-0.17\%$
test_keys_stack_nested 0.1277ms 89.2500μs 11.2045 KOps/s 11.1652 KOps/s $\color{#35bf28}+0.35\%$
test_keys_stack_nested_leaf 0.1149ms 79.5920μs 12.5641 KOps/s 12.5367 KOps/s $\color{#35bf28}+0.22\%$
test_keys_stack_nested_locked 0.1248ms 95.2600μs 10.4976 KOps/s 10.5631 KOps/s $\color{#d91a1a}-0.62\%$
test_values 6.1585μs 0.8522μs 1.1734 MOps/s 1.1819 MOps/s $\color{#d91a1a}-0.72\%$
test_values_nested 60.9210μs 37.6277μs 26.5762 KOps/s 26.5848 KOps/s $\color{#d91a1a}-0.03\%$
test_values_nested_locked 68.4910μs 39.5044μs 25.3137 KOps/s 25.5128 KOps/s $\color{#d91a1a}-0.78\%$
test_values_nested_leaf 65.3610μs 42.1983μs 23.6977 KOps/s 23.9674 KOps/s $\color{#d91a1a}-1.13\%$
test_values_stack_nested 0.1878ms 38.3019μs 26.1084 KOps/s 25.8399 KOps/s $\color{#35bf28}+1.04\%$
test_values_stack_nested_leaf 78.6010μs 43.0457μs 23.2311 KOps/s 23.3579 KOps/s $\color{#d91a1a}-0.54\%$
test_values_stack_nested_locked 93.4920μs 39.6305μs 25.2331 KOps/s 24.7453 KOps/s $\color{#35bf28}+1.97\%$
test_membership 1.8965μs 0.5203μs 1.9221 MOps/s 1.9810 MOps/s $\color{#d91a1a}-2.97\%$
test_membership_nested 14.1950μs 2.0390μs 490.4313 KOps/s 472.4362 KOps/s $\color{#35bf28}+3.81\%$
test_membership_nested_leaf 18.4305μs 2.0114μs 497.1682 KOps/s 491.0779 KOps/s $\color{#35bf28}+1.24\%$
test_membership_stacked_nested 27.0100μs 2.0726μs 482.4764 KOps/s 478.2236 KOps/s $\color{#35bf28}+0.89\%$
test_membership_stacked_nested_leaf 62.3910μs 2.0880μs 478.9344 KOps/s 481.2868 KOps/s $\color{#d91a1a}-0.49\%$
test_membership_nested_last 31.8300μs 3.1074μs 321.8099 KOps/s 317.4649 KOps/s $\color{#35bf28}+1.37\%$
test_membership_nested_leaf_last 37.0610μs 3.1299μs 319.4963 KOps/s 312.3128 KOps/s $\color{#35bf28}+2.30\%$
test_membership_stacked_nested_last 0.1725ms 3.1276μs 319.7345 KOps/s 322.6004 KOps/s $\color{#d91a1a}-0.89\%$
test_membership_stacked_nested_leaf_last 34.6410μs 3.1139μs 321.1415 KOps/s 314.3216 KOps/s $\color{#35bf28}+2.17\%$
test_nested_getleaf 0.1747ms 6.1329μs 163.0547 KOps/s 161.5533 KOps/s $\color{#35bf28}+0.93\%$
test_nested_get 0.1826ms 5.8826μs 169.9925 KOps/s 170.8474 KOps/s $\color{#d91a1a}-0.50\%$
test_stacked_getleaf 32.3910μs 6.1905μs 161.5371 KOps/s 161.9238 KOps/s $\color{#d91a1a}-0.24\%$
test_stacked_get 0.2025ms 5.9403μs 168.3430 KOps/s 171.0731 KOps/s $\color{#d91a1a}-1.60\%$
test_nested_getitemleaf 38.2410μs 6.4714μs 154.5268 KOps/s 152.7337 KOps/s $\color{#35bf28}+1.17\%$
test_nested_getitem 27.6800μs 6.1735μs 161.9836 KOps/s 160.7831 KOps/s $\color{#35bf28}+0.75\%$
test_stacked_getitemleaf 41.4910μs 6.4437μs 155.1904 KOps/s 153.9642 KOps/s $\color{#35bf28}+0.80\%$
test_stacked_getitem 43.8100μs 6.1193μs 163.4172 KOps/s 162.1157 KOps/s $\color{#35bf28}+0.80\%$
test_lock_nested 8.9141ms 0.3515ms 2.8453 KOps/s 2.8754 KOps/s $\color{#d91a1a}-1.05\%$
test_lock_stack_nested 0.4934ms 0.3471ms 2.8807 KOps/s 2.9013 KOps/s $\color{#d91a1a}-0.71\%$
test_unlock_nested 0.3919ms 0.2865ms 3.4904 KOps/s 3.5728 KOps/s $\color{#d91a1a}-2.31\%$
test_unlock_stack_nested 0.4122ms 0.2842ms 3.5185 KOps/s 3.5328 KOps/s $\color{#d91a1a}-0.40\%$
test_flatten_speed 0.1136ms 75.9746μs 13.1623 KOps/s 13.2249 KOps/s $\color{#d91a1a}-0.47\%$
test_unflatten_speed 0.3663ms 0.3247ms 3.0802 KOps/s 3.1206 KOps/s $\color{#d91a1a}-1.29\%$
test_common_ops 0.7779ms 0.6122ms 1.6335 KOps/s 1.5412 KOps/s $\textbf{\color{#35bf28}+5.99\%}$
test_creation 94.8810μs 1.7498μs 571.5097 KOps/s 570.3175 KOps/s $\color{#35bf28}+0.21\%$
test_creation_empty 65.7710μs 7.0829μs 141.1846 KOps/s 98.9343 KOps/s $\textbf{\color{#35bf28}+42.71\%}$
test_creation_nested_1 33.3000μs 8.8229μs 113.3410 KOps/s 84.1737 KOps/s $\textbf{\color{#35bf28}+34.65\%}$
test_creation_nested_2 31.4810μs 11.7109μs 85.3904 KOps/s 69.1048 KOps/s $\textbf{\color{#35bf28}+23.57\%}$
test_clone 50.3510μs 11.3220μs 88.3234 KOps/s 92.8127 KOps/s $\color{#d91a1a}-4.84\%$
test_getitem[int] 1.3477ms 11.1769μs 89.4700 KOps/s 93.9234 KOps/s $\color{#d91a1a}-4.74\%$
test_getitem[slice_int] 0.1637ms 21.7980μs 45.8758 KOps/s 47.8513 KOps/s $\color{#d91a1a}-4.13\%$
test_getitem[range] 0.1357ms 39.4863μs 25.3252 KOps/s 26.2927 KOps/s $\color{#d91a1a}-3.68\%$
test_getitem[tuple] 0.1092ms 19.1773μs 52.1451 KOps/s 54.6604 KOps/s $\color{#d91a1a}-4.60\%$
test_getitem[list] 0.1579ms 34.9088μs 28.6460 KOps/s 29.4729 KOps/s $\color{#d91a1a}-2.81\%$
test_setitem_dim[int] 0.1255ms 21.3405μs 46.8592 KOps/s 50.3663 KOps/s $\textbf{\color{#d91a1a}-6.96\%}$
test_setitem_dim[slice_int] 80.1310μs 40.3122μs 24.8064 KOps/s 25.2338 KOps/s $\color{#d91a1a}-1.69\%$
test_setitem_dim[range] 80.4610μs 55.5216μs 18.0110 KOps/s 18.3701 KOps/s $\color{#d91a1a}-1.95\%$
test_setitem_dim[tuple] 81.3610μs 34.6511μs 28.8592 KOps/s 29.6454 KOps/s $\color{#d91a1a}-2.65\%$
test_setitem 0.1117ms 15.3776μs 65.0295 KOps/s 61.1153 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_set 53.8200μs 14.9242μs 67.0051 KOps/s 62.6621 KOps/s $\textbf{\color{#35bf28}+6.93\%}$
test_set_shared 0.5097ms 0.1623ms 6.1611 KOps/s 6.1843 KOps/s $\color{#d91a1a}-0.38\%$
test_update 0.3994ms 17.1597μs 58.2760 KOps/s 51.6086 KOps/s $\textbf{\color{#35bf28}+12.92\%}$
test_update_nested 53.2800μs 22.7811μs 43.8961 KOps/s 39.8732 KOps/s $\textbf{\color{#35bf28}+10.09\%}$
test_update__nested 0.5139ms 26.6324μs 37.5482 KOps/s 37.7311 KOps/s $\color{#d91a1a}-0.48\%$
test_set_nested 54.6200μs 16.4916μs 60.6371 KOps/s 57.9622 KOps/s $\color{#35bf28}+4.61\%$
test_set_nested_new 0.1762ms 18.8890μs 52.9410 KOps/s 52.4406 KOps/s $\color{#35bf28}+0.95\%$
test_select 71.3210μs 31.2182μs 32.0326 KOps/s 31.5583 KOps/s $\color{#35bf28}+1.50\%$
test_select_nested 0.1012ms 44.5745μs 22.4343 KOps/s 22.9861 KOps/s $\color{#d91a1a}-2.40\%$
test_exclude_nested 94.0520μs 64.2857μs 15.5556 KOps/s 16.0405 KOps/s $\color{#d91a1a}-3.02\%$
test_empty[True] 0.3260ms 0.2986ms 3.3489 KOps/s 3.4253 KOps/s $\color{#d91a1a}-2.23\%$
test_empty[False] 3.8191μs 0.8332μs 1.2001 MOps/s 1.2056 MOps/s $\color{#d91a1a}-0.45\%$
test_to 88.2110μs 57.5023μs 17.3906 KOps/s 17.3427 KOps/s $\color{#35bf28}+0.28\%$
test_to_nonblocking 93.5710μs 48.5394μs 20.6018 KOps/s 20.8331 KOps/s $\color{#d91a1a}-1.11\%$
test_unbind_speed 0.2862ms 0.2461ms 4.0627 KOps/s 4.1967 KOps/s $\color{#d91a1a}-3.19\%$
test_unbind_speed_stack0 0.3270ms 0.2447ms 4.0860 KOps/s 4.2173 KOps/s $\color{#d91a1a}-3.11\%$
test_unbind_speed_stack1 95.3441ms 0.7453ms 1.3417 KOps/s 1.3585 KOps/s $\color{#d91a1a}-1.24\%$
test_split 95.4225ms 1.6266ms 614.7725 Ops/s 631.2088 Ops/s $\color{#d91a1a}-2.60\%$
test_chunk 97.5497ms 1.6308ms 613.2066 Ops/s 627.7503 Ops/s $\color{#d91a1a}-2.32\%$
test_consolidate[False-None] 2.8127ms 2.7046ms 369.7461 Ops/s 334.4725 Ops/s $\textbf{\color{#35bf28}+10.55\%}$
test_consolidate[default-None] 1.9905ms 1.7419ms 574.0911 Ops/s 587.4905 Ops/s $\color{#d91a1a}-2.28\%$
test_consolidate[reduce-overhead-None] 1.8930ms 1.7766ms 562.8751 Ops/s 571.1336 Ops/s $\color{#d91a1a}-1.45\%$
test_consolidate_njt[False-None] 6.7969ms 6.6019ms 151.4725 Ops/s 151.0708 Ops/s $\color{#35bf28}+0.27\%$
test_to[False-False-None] 1.9363ms 1.7113ms 584.3505 Ops/s 586.2574 Ops/s $\color{#d91a1a}-0.33\%$
test_to[True-False-None] 1.6301ms 1.3795ms 724.8760 Ops/s 724.4853 Ops/s $\color{#35bf28}+0.05\%$
test_to[within-False-None] 4.4992ms 4.2090ms 237.5865 Ops/s 170.1876 Ops/s $\textbf{\color{#35bf28}+39.60\%}$
test_to[True-default-None] 5.7700ms 5.5421ms 180.4357 Ops/s 187.6667 Ops/s $\color{#d91a1a}-3.85\%$
test_to_njt[False-False-None] 7.1320ms 6.9389ms 144.1148 Ops/s 141.9668 Ops/s $\color{#35bf28}+1.51\%$
test_to_njt[True-False-None] 6.0367ms 5.6168ms 178.0366 Ops/s 175.6529 Ops/s $\color{#35bf28}+1.36\%$
test_to_njt[within-False-None] 12.5172ms 12.3466ms 80.9943 Ops/s 79.7008 Ops/s $\color{#35bf28}+1.62\%$
test_creation[device0] 0.4589ms 81.7910μs 12.2263 KOps/s 12.2701 KOps/s $\color{#d91a1a}-0.36\%$
test_creation_from_tensor 0.4460ms 84.6110μs 11.8188 KOps/s 11.7123 KOps/s $\color{#35bf28}+0.91\%$
test_add_one[memmap_tensor0] 0.2323ms 7.4582μs 134.0802 KOps/s 141.4191 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_contiguous[memmap_tensor0] 1.8396μs 0.4374μs 2.2865 MOps/s 2.3410 MOps/s $\color{#d91a1a}-2.33\%$
test_stack[memmap_tensor0] 60.0110μs 4.6666μs 214.2885 KOps/s 230.2495 KOps/s $\textbf{\color{#d91a1a}-6.93\%}$
test_memmaptd_index 1.5371ms 0.2559ms 3.9077 KOps/s 4.1530 KOps/s $\textbf{\color{#d91a1a}-5.91\%}$
test_memmaptd_index_astensor 0.5240ms 0.3173ms 3.1519 KOps/s 3.3222 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_memmaptd_index_op 0.7371ms 0.5898ms 1.6954 KOps/s 1.6367 KOps/s $\color{#35bf28}+3.59\%$
test_serialize_model 0.1320s 0.1311s 7.6261 Ops/s 7.6655 Ops/s $\color{#d91a1a}-0.51\%$
test_serialize_model_pickle 1.3740s 1.2171s 0.8216 Ops/s 0.8177 Ops/s $\color{#35bf28}+0.48\%$
test_serialize_weights 0.1328s 0.1301s 7.6878 Ops/s 7.6618 Ops/s $\color{#35bf28}+0.34\%$
test_serialize_weights_returnearly 0.3297s 55.3887ms 18.0542 Ops/s 14.5219 Ops/s $\textbf{\color{#35bf28}+24.32\%}$
test_serialize_weights_pickle 1.3661s 1.2162s 0.8222 Ops/s 0.8219 Ops/s $\color{#35bf28}+0.03\%$
test_reshape_pytree 0.1075ms 22.4736μs 44.4966 KOps/s 44.7761 KOps/s $\color{#d91a1a}-0.62\%$
test_reshape_td 57.0710μs 27.7553μs 36.0292 KOps/s 35.9327 KOps/s $\color{#35bf28}+0.27\%$
test_view_pytree 44.4910μs 22.2157μs 45.0132 KOps/s 45.9306 KOps/s $\color{#d91a1a}-2.00\%$
test_view_td 66.3910μs 32.2319μs 31.0252 KOps/s 31.3709 KOps/s $\color{#d91a1a}-1.10\%$
test_unbind_pytree 85.8810μs 28.4658μs 35.1299 KOps/s 34.8885 KOps/s $\color{#35bf28}+0.69\%$
test_unbind_td 0.7439ms 37.4259μs 26.7194 KOps/s 27.2849 KOps/s $\color{#d91a1a}-2.07\%$
test_split_pytree 63.1110μs 30.5425μs 32.7413 KOps/s 32.9106 KOps/s $\color{#d91a1a}-0.51\%$
test_split_td 0.9642ms 39.6485μs 25.2217 KOps/s 26.1767 KOps/s $\color{#d91a1a}-3.65\%$
test_add_pytree 0.1735ms 36.4203μs 27.4572 KOps/s 28.3919 KOps/s $\color{#d91a1a}-3.29\%$
test_add_td 0.1866ms 47.5503μs 21.0303 KOps/s 18.7130 KOps/s $\textbf{\color{#35bf28}+12.38\%}$
test_compile_add_one_nested[tensordict-compile] 0.2727ms 0.1244ms 8.0398 KOps/s 7.9319 KOps/s $\color{#35bf28}+1.36\%$
test_compile_add_one_nested[tensordict-eager] 0.5403ms 0.1332ms 7.5079 KOps/s 7.5664 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_add_one_nested[pytree-compile] 0.1331ms 96.3658μs 10.3771 KOps/s 10.1785 KOps/s $\color{#35bf28}+1.95\%$
test_compile_add_one_nested[pytree-eager] 1.3125ms 0.1520ms 6.5774 KOps/s 6.6553 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_copy_nested[tensordict-compile] 0.1625ms 24.2081μs 41.3086 KOps/s 41.2118 KOps/s $\color{#35bf28}+0.23\%$
test_compile_copy_nested[tensordict-eager] 0.1188ms 29.3167μs 34.1103 KOps/s 34.2833 KOps/s $\color{#d91a1a}-0.50\%$
test_compile_copy_nested[pytree-compile] 0.4244ms 64.5303μs 15.4966 KOps/s 15.3549 KOps/s $\color{#35bf28}+0.92\%$
test_compile_copy_nested[pytree-eager] 95.5220μs 48.8437μs 20.4735 KOps/s 20.3238 KOps/s $\color{#35bf28}+0.74\%$
test_compile_add_one_flat[tensordict-compile] 0.1861ms 0.1419ms 7.0477 KOps/s 7.1976 KOps/s $\color{#d91a1a}-2.08\%$
test_compile_add_one_flat[tensordict-eager] 0.3550ms 0.2202ms 4.5422 KOps/s 4.6032 KOps/s $\color{#d91a1a}-1.32\%$
test_compile_add_one_flat[tensorclass-compile] 0.1434ms 98.8163μs 10.1198 KOps/s 10.3808 KOps/s $\color{#d91a1a}-2.51\%$
test_compile_add_one_flat[tensorclass-eager] 0.2067ms 56.1166μs 17.8200 KOps/s 17.9129 KOps/s $\color{#d91a1a}-0.52\%$
test_compile_add_one_flat[pytree-compile] 0.2685ms 0.1355ms 7.3784 KOps/s 7.5384 KOps/s $\color{#d91a1a}-2.12\%$
test_compile_add_one_flat[pytree-eager] 0.6309ms 0.4912ms 2.0357 KOps/s 2.0890 KOps/s $\color{#d91a1a}-2.55\%$
test_compile_add_self_flat[tensordict-eager] 0.3692ms 0.2643ms 3.7833 KOps/s 3.8194 KOps/s $\color{#d91a1a}-0.94\%$
test_compile_add_self_flat[tensordict-compile] 0.2951ms 0.1432ms 6.9829 KOps/s 7.1858 KOps/s $\color{#d91a1a}-2.82\%$
test_compile_add_self_flat[tensorclass-eager] 0.2156ms 67.8916μs 14.7294 KOps/s 14.6456 KOps/s $\color{#35bf28}+0.57\%$
test_compile_add_self_flat[tensorclass-compile] 0.2437ms 99.7147μs 10.0286 KOps/s 10.3159 KOps/s $\color{#d91a1a}-2.78\%$
test_compile_add_self_flat[pytree-eager] 0.5717ms 0.4050ms 2.4689 KOps/s 2.5024 KOps/s $\color{#d91a1a}-1.34\%$
test_compile_add_self_flat[pytree-compile] 0.1809ms 0.1343ms 7.4436 KOps/s 7.5183 KOps/s $\color{#d91a1a}-0.99\%$
test_compile_copy_flat[tensordict-compile] 0.2157ms 24.4546μs 40.8921 KOps/s 53.8403 KOps/s $\textbf{\color{#d91a1a}-24.05\%}$
test_compile_copy_flat[tensordict-eager] 55.8910μs 31.2745μs 31.9749 KOps/s 32.0303 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_copy_flat[pytree-compile] 0.1047ms 71.4541μs 13.9950 KOps/s 14.0867 KOps/s $\color{#d91a1a}-0.65\%$
test_compile_copy_flat[pytree-eager] 85.1010μs 52.0155μs 19.2250 KOps/s 19.3798 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_assign_and_add[tensordict-compile] 1.6634ms 0.3980ms 2.5124 KOps/s 2.1816 KOps/s $\textbf{\color{#35bf28}+15.16\%}$
test_compile_assign_and_add[tensordict-eager] 3.0908ms 2.6794ms 373.2228 Ops/s 383.1160 Ops/s $\color{#d91a1a}-2.58\%$
test_compile_assign_and_add[pytree-compile] 1.6214ms 0.4387ms 2.2796 KOps/s 2.2642 KOps/s $\color{#35bf28}+0.68\%$
test_compile_assign_and_add[pytree-eager] 3.4866ms 2.7100ms 368.9976 Ops/s 383.1150 Ops/s $\color{#d91a1a}-3.68\%$
test_compile_indexing[tensor-tensordict-compile] 0.5288ms 0.1178ms 8.4919 KOps/s 8.6215 KOps/s $\color{#d91a1a}-1.50\%$
test_compile_indexing[tensor-tensordict-eager] 0.5537ms 78.9303μs 12.6694 KOps/s 12.3238 KOps/s $\color{#35bf28}+2.80\%$
test_compile_indexing[tensor-tensorclass-compile] 0.6356ms 0.1105ms 9.0504 KOps/s 8.9528 KOps/s $\color{#35bf28}+1.09\%$
test_compile_indexing[tensor-tensorclass-eager] 0.3149ms 71.8100μs 13.9256 KOps/s 14.4259 KOps/s $\color{#d91a1a}-3.47\%$
test_compile_indexing[tensor-pytree-compile] 0.2997ms 0.1124ms 8.8977 KOps/s 8.8610 KOps/s $\color{#35bf28}+0.41\%$
test_compile_indexing[tensor-pytree-eager] 0.2492ms 71.4343μs 13.9989 KOps/s 13.8179 KOps/s $\color{#35bf28}+1.31\%$
test_compile_indexing[slice-tensordict-compile] 0.2403ms 0.1055ms 9.4795 KOps/s 10.0088 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_compile_indexing[slice-tensordict-eager] 0.1426ms 17.8998μs 55.8666 KOps/s 47.4486 KOps/s $\textbf{\color{#35bf28}+17.74\%}$
test_compile_indexing[slice-tensorclass-compile] 0.2754ms 0.1001ms 9.9894 KOps/s 10.4603 KOps/s $\color{#d91a1a}-4.50\%$
test_compile_indexing[slice-tensorclass-eager] 0.1628ms 16.2781μs 61.4322 KOps/s 62.9642 KOps/s $\color{#d91a1a}-2.43\%$
test_compile_indexing[slice-pytree-compile] 0.2785ms 0.1006ms 9.9388 KOps/s 10.4059 KOps/s $\color{#d91a1a}-4.49\%$
test_compile_indexing[slice-pytree-eager] 0.1539ms 16.1423μs 61.9489 KOps/s 62.9429 KOps/s $\color{#d91a1a}-1.58\%$
test_compile_indexing[int-tensordict-compile] 0.2795ms 0.1076ms 9.2943 KOps/s 9.8678 KOps/s $\textbf{\color{#d91a1a}-5.81\%}$
test_compile_indexing[int-tensordict-eager] 0.5589ms 17.6074μs 56.7942 KOps/s 54.6238 KOps/s $\color{#35bf28}+3.97\%$
test_compile_indexing[int-tensorclass-compile] 0.2818ms 0.1011ms 9.8958 KOps/s 9.9757 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_indexing[int-tensorclass-eager] 0.2074ms 19.2466μs 51.9572 KOps/s 63.9761 KOps/s $\textbf{\color{#d91a1a}-18.79\%}$
test_compile_indexing[int-pytree-compile] 0.2740ms 0.1009ms 9.9073 KOps/s 10.0446 KOps/s $\color{#d91a1a}-1.37\%$
test_compile_indexing[int-pytree-eager] 0.4424ms 16.2562μs 61.5150 KOps/s 63.1534 KOps/s $\color{#d91a1a}-2.59\%$
test_mod_add[eager] 0.1761ms 37.6529μs 26.5584 KOps/s 23.9173 KOps/s $\textbf{\color{#35bf28}+11.04\%}$
test_mod_add[compile] 0.4056ms 83.8949μs 11.9197 KOps/s 12.2626 KOps/s $\color{#d91a1a}-2.80\%$
test_mod_add[compile-overhead] 0.3312ms 0.1699ms 5.8843 KOps/s 5.6902 KOps/s $\color{#35bf28}+3.41\%$
test_mod_wrap[eager] 0.4144ms 0.2525ms 3.9603 KOps/s 3.8981 KOps/s $\color{#35bf28}+1.60\%$
test_mod_wrap[compile] 0.4442ms 0.2845ms 3.5146 KOps/s 3.4791 KOps/s $\color{#35bf28}+1.02\%$
test_mod_wrap[compile-overhead] 7.1310ms 3.7177ms 268.9816 Ops/s 268.8193 Ops/s $\color{#35bf28}+0.06\%$
test_mod_wrap_and_backward[eager] 1.7011ms 1.5095ms 662.4854 Ops/s 686.0508 Ops/s $\color{#d91a1a}-3.43\%$
test_mod_wrap_and_backward[compile] 1.8224ms 1.2724ms 785.9321 Ops/s 727.5407 Ops/s $\textbf{\color{#35bf28}+8.03\%}$
test_mod_wrap_and_backward[compile-overhead] 1.4901ms 0.9532ms 1.0491 KOps/s 978.2418 Ops/s $\textbf{\color{#35bf28}+7.24\%}$
test_seq_add[eager] 0.2595ms 0.1152ms 8.6774 KOps/s 8.2213 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_seq_add[compile] 0.2384ms 89.5902μs 11.1619 KOps/s 10.9887 KOps/s $\color{#35bf28}+1.58\%$
test_seq_add[compile-overhead] 0.2743ms 0.1300ms 7.6932 KOps/s 7.7726 KOps/s $\color{#d91a1a}-1.02\%$
test_seq_wrap[eager] 0.5880ms 0.4192ms 2.3852 KOps/s 2.3135 KOps/s $\color{#35bf28}+3.10\%$
test_seq_wrap[compile] 0.4511ms 0.3003ms 3.3296 KOps/s 3.2827 KOps/s $\color{#35bf28}+1.43\%$
test_seq_wrap[compile-overhead] 0.3008ms 0.2247ms 4.4511 KOps/s 4.3623 KOps/s $\color{#35bf28}+2.04\%$
test_func_call_runtime[False-eager] 0.8981ms 0.7467ms 1.3392 KOps/s 1.3225 KOps/s $\color{#35bf28}+1.27\%$
test_func_call_runtime[False-compile] 0.9582ms 0.7804ms 1.2813 KOps/s 1.3459 KOps/s $\color{#d91a1a}-4.80\%$
test_func_call_runtime[False-compile-overhead] 0.5005ms 0.3661ms 2.7313 KOps/s 2.7133 KOps/s $\color{#35bf28}+0.66\%$
test_func_call_runtime[True-eager] 1.0752ms 0.9197ms 1.0873 KOps/s 1.0771 KOps/s $\color{#35bf28}+0.95\%$
test_func_call_runtime[True-compile] 0.9123ms 0.7663ms 1.3050 KOps/s 1.3029 KOps/s $\color{#35bf28}+0.17\%$
test_func_call_runtime[True-compile-overhead] 0.4360ms 0.3874ms 2.5815 KOps/s 2.5865 KOps/s $\color{#d91a1a}-0.19\%$
test_func_call_cm_runtime[False-eager] 0.9350ms 0.7565ms 1.3218 KOps/s 1.3460 KOps/s $\color{#d91a1a}-1.80\%$
test_func_call_cm_runtime[False-compile] 0.8869ms 0.7530ms 1.3281 KOps/s 1.3111 KOps/s $\color{#35bf28}+1.29\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4279ms 0.3697ms 2.7047 KOps/s 2.7180 KOps/s $\color{#d91a1a}-0.49\%$
test_func_call_cm_runtime[True-eager] 1.1586ms 1.0151ms 985.1447 Ops/s 985.1765 Ops/s $-0.00\%$
test_func_call_cm_runtime[True-compile] 1.2218ms 1.0052ms 994.8649 Ops/s 976.6549 Ops/s $\color{#35bf28}+1.86\%$
test_func_call_cm_runtime[True-compile-overhead] 1.1546ms 1.0043ms 995.7042 Ops/s 977.8164 Ops/s $\color{#35bf28}+1.83\%$
test_vmap_func_call_cm_runtime[eager] 2.5063ms 2.1082ms 474.3332 Ops/s 466.2647 Ops/s $\color{#35bf28}+1.73\%$
test_vmap_func_call_cm_runtime[compile] 0.9931ms 0.8213ms 1.2176 KOps/s 1.2215 KOps/s $\color{#d91a1a}-0.32\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5858ms 0.4167ms 2.3997 KOps/s 2.3802 KOps/s $\color{#35bf28}+0.82\%$
test_distributed 6.7719ms 0.2461ms 4.0631 KOps/s 8.4867 KOps/s $\textbf{\color{#d91a1a}-52.12\%}$
test_tdmodule 0.1862ms 20.5206μs 48.7315 KOps/s 45.5527 KOps/s $\textbf{\color{#35bf28}+6.98\%}$
test_tdmodule_dispatch 52.5110μs 35.3603μs 28.2803 KOps/s 26.1973 KOps/s $\textbf{\color{#35bf28}+7.95\%}$
test_tdseq 0.1939ms 21.0166μs 47.5815 KOps/s 45.6766 KOps/s $\color{#35bf28}+4.17\%$
test_tdseq_dispatch 58.8610μs 37.1978μs 26.8833 KOps/s 24.3800 KOps/s $\textbf{\color{#35bf28}+10.27\%}$
test_instantiation_functorch 1.7640ms 1.6165ms 618.6127 Ops/s 635.1508 Ops/s $\color{#d91a1a}-2.60\%$
test_exec_functorch 0.2845ms 0.1516ms 6.5944 KOps/s 6.8011 KOps/s $\color{#d91a1a}-3.04\%$
test_exec_functional_call 0.2394ms 0.1424ms 7.0211 KOps/s 7.0928 KOps/s $\color{#d91a1a}-1.01\%$
test_exec_td_decorator 0.3820ms 0.1930ms 5.1803 KOps/s 5.2306 KOps/s $\color{#d91a1a}-0.96\%$
test_vmap_mlp_speed_decorator[True-True] 0.8366ms 0.6930ms 1.4430 KOps/s 1.4350 KOps/s $\color{#35bf28}+0.56\%$
test_vmap_mlp_speed_decorator[True-False] 0.8305ms 0.6923ms 1.4445 KOps/s 1.3899 KOps/s $\color{#35bf28}+3.93\%$
test_vmap_mlp_speed_decorator[False-True] 0.7907ms 0.6065ms 1.6489 KOps/s 1.6602 KOps/s $\color{#d91a1a}-0.68\%$
test_vmap_mlp_speed_decorator[False-False] 0.7593ms 0.6250ms 1.6000 KOps/s 1.6528 KOps/s $\color{#d91a1a}-3.19\%$
test_vmap_transformer_speed_decorator[True-True] 20.1017ms 19.3565ms 51.6623 Ops/s 51.6597 Ops/s $+0.01\%$
test_vmap_transformer_speed_decorator[True-False] 19.5181ms 19.3668ms 51.6347 Ops/s 51.7428 Ops/s $\color{#d91a1a}-0.21\%$
test_vmap_transformer_speed_decorator[False-True] 19.4166ms 19.2350ms 51.9887 Ops/s 52.1505 Ops/s $\color{#d91a1a}-0.31\%$
test_vmap_transformer_speed_decorator[False-False] 19.4592ms 19.2278ms 52.0081 Ops/s 52.2124 Ops/s $\color{#d91a1a}-0.39\%$
test_to_module_speed[True] 1.4350ms 0.9737ms 1.0270 KOps/s 1.0366 KOps/s $\color{#d91a1a}-0.92\%$
test_to_module_speed[False] 1.0383ms 0.9543ms 1.0478 KOps/s 1.0437 KOps/s $\color{#35bf28}+0.40\%$
test_tc_init 67.2310μs 33.9618μs 29.4449 KOps/s 26.2759 KOps/s $\textbf{\color{#35bf28}+12.06\%}$
test_tc_init_nested 0.1044ms 68.3820μs 14.6237 KOps/s 12.8944 KOps/s $\textbf{\color{#35bf28}+13.41\%}$
test_tc_first_layer_tensor 28.8200μs 0.8138μs 1.2288 MOps/s 1.4211 MOps/s $\textbf{\color{#d91a1a}-13.53\%}$
test_tc_first_layer_nontensor 20.2500μs 2.2426μs 445.9161 KOps/s 439.7544 KOps/s $\color{#35bf28}+1.40\%$
test_tc_second_layer_tensor 8.9000μs 1.4330μs 697.8513 KOps/s 706.5930 KOps/s $\color{#d91a1a}-1.24\%$
test_tc_second_layer_nontensor 31.1600μs 3.0236μs 330.7340 KOps/s 328.4114 KOps/s $\color{#35bf28}+0.71\%$
test_unbind 0.2182s 12.1892ms 82.0397 Ops/s 141.8902 Ops/s $\textbf{\color{#d91a1a}-42.18\%}$
test_full_like 10.8442ms 9.6664ms 103.4512 Ops/s 103.6403 Ops/s $\color{#d91a1a}-0.18\%$
test_zeros_like 4.9704ms 4.3854ms 228.0279 Ops/s 113.4273 Ops/s $\textbf{\color{#35bf28}+101.03\%}$
test_ones_like 5.6214ms 4.3971ms 227.4230 Ops/s 225.4611 Ops/s $\color{#35bf28}+0.87\%$
test_clone 7.5119ms 6.8365ms 146.2726 Ops/s 147.9279 Ops/s $\color{#d91a1a}-1.12\%$
test_squeeze 0.1273ms 10.0111μs 99.8889 KOps/s 102.2445 KOps/s $\color{#d91a1a}-2.30\%$
test_unsqueeze 0.1350ms 72.5473μs 13.7841 KOps/s 13.3165 KOps/s $\color{#35bf28}+3.51\%$
test_split 0.3707ms 0.1595ms 6.2715 KOps/s 6.0887 KOps/s $\color{#35bf28}+3.00\%$
test_permute 0.3289ms 0.1780ms 5.6178 KOps/s 5.3594 KOps/s $\color{#35bf28}+4.82\%$
test_stack 52.3677ms 51.4358ms 19.4417 Ops/s 19.2893 Ops/s $\color{#35bf28}+0.79\%$
test_cat 52.9824ms 51.5133ms 19.4124 Ops/s 19.4026 Ops/s $\color{#35bf28}+0.05\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants