-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] COMPOSITE_LP_AGGREGATE env variable #1190
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vmoens
added a commit
that referenced
this pull request
Jan 21, 2025
ghstack-source-id: 16b07d0eac582cfd419612f87e38e1a7acffcfc0 Pull Request resolved: #1190
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Jan 21, 2025
vmoens
added a commit
that referenced
this pull request
Jan 21, 2025
ghstack-source-id: 16b07d0eac582cfd419612f87e38e1a7acffcfc0 Pull Request resolved: #1190
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 74.0310μs | 11.5786μs | 86.3661 KOps/s | 75.5462 KOps/s | |
test_plain_set_stack_nested | 40.9800μs | 11.8179μs | 84.6172 KOps/s | 75.1125 KOps/s | |
test_plain_set_nested_inplace | 37.3610μs | 12.6428μs | 79.0963 KOps/s | 70.5168 KOps/s | |
test_plain_set_stack_nested_inplace | 40.3810μs | 12.6590μs | 78.9950 KOps/s | 69.9810 KOps/s | |
test_items | 28.0710μs | 2.9066μs | 344.0447 KOps/s | 338.0267 KOps/s | |
test_items_nested | 0.4283ms | 0.3709ms | 2.6959 KOps/s | 2.7519 KOps/s | |
test_items_nested_locked | 0.4056ms | 0.3682ms | 2.7156 KOps/s | 2.7514 KOps/s | |
test_items_nested_leaf | 0.1698ms | 58.6559μs | 17.0486 KOps/s | 17.2545 KOps/s | |
test_items_stack_nested | 0.3972ms | 0.3681ms | 2.7165 KOps/s | 2.7776 KOps/s | |
test_items_stack_nested_leaf | 85.7520μs | 58.3924μs | 17.1255 KOps/s | 16.7508 KOps/s | |
test_items_stack_nested_locked | 0.4353ms | 0.3683ms | 2.7151 KOps/s | 2.7281 KOps/s | |
test_keys | 27.8900μs | 3.4500μs | 289.8557 KOps/s | 291.4224 KOps/s | |
test_keys_nested | 0.1233ms | 88.8541μs | 11.2544 KOps/s | 11.3383 KOps/s | |
test_keys_nested_locked | 0.7566ms | 94.4797μs | 10.5843 KOps/s | 10.6340 KOps/s | |
test_keys_nested_leaf | 0.1122ms | 79.3856μs | 12.5967 KOps/s | 12.6177 KOps/s | |
test_keys_stack_nested | 0.1277ms | 89.2500μs | 11.2045 KOps/s | 11.1652 KOps/s | |
test_keys_stack_nested_leaf | 0.1149ms | 79.5920μs | 12.5641 KOps/s | 12.5367 KOps/s | |
test_keys_stack_nested_locked | 0.1248ms | 95.2600μs | 10.4976 KOps/s | 10.5631 KOps/s | |
test_values | 6.1585μs | 0.8522μs | 1.1734 MOps/s | 1.1819 MOps/s | |
test_values_nested | 60.9210μs | 37.6277μs | 26.5762 KOps/s | 26.5848 KOps/s | |
test_values_nested_locked | 68.4910μs | 39.5044μs | 25.3137 KOps/s | 25.5128 KOps/s | |
test_values_nested_leaf | 65.3610μs | 42.1983μs | 23.6977 KOps/s | 23.9674 KOps/s | |
test_values_stack_nested | 0.1878ms | 38.3019μs | 26.1084 KOps/s | 25.8399 KOps/s | |
test_values_stack_nested_leaf | 78.6010μs | 43.0457μs | 23.2311 KOps/s | 23.3579 KOps/s | |
test_values_stack_nested_locked | 93.4920μs | 39.6305μs | 25.2331 KOps/s | 24.7453 KOps/s | |
test_membership | 1.8965μs | 0.5203μs | 1.9221 MOps/s | 1.9810 MOps/s | |
test_membership_nested | 14.1950μs | 2.0390μs | 490.4313 KOps/s | 472.4362 KOps/s | |
test_membership_nested_leaf | 18.4305μs | 2.0114μs | 497.1682 KOps/s | 491.0779 KOps/s | |
test_membership_stacked_nested | 27.0100μs | 2.0726μs | 482.4764 KOps/s | 478.2236 KOps/s | |
test_membership_stacked_nested_leaf | 62.3910μs | 2.0880μs | 478.9344 KOps/s | 481.2868 KOps/s | |
test_membership_nested_last | 31.8300μs | 3.1074μs | 321.8099 KOps/s | 317.4649 KOps/s | |
test_membership_nested_leaf_last | 37.0610μs | 3.1299μs | 319.4963 KOps/s | 312.3128 KOps/s | |
test_membership_stacked_nested_last | 0.1725ms | 3.1276μs | 319.7345 KOps/s | 322.6004 KOps/s | |
test_membership_stacked_nested_leaf_last | 34.6410μs | 3.1139μs | 321.1415 KOps/s | 314.3216 KOps/s | |
test_nested_getleaf | 0.1747ms | 6.1329μs | 163.0547 KOps/s | 161.5533 KOps/s | |
test_nested_get | 0.1826ms | 5.8826μs | 169.9925 KOps/s | 170.8474 KOps/s | |
test_stacked_getleaf | 32.3910μs | 6.1905μs | 161.5371 KOps/s | 161.9238 KOps/s | |
test_stacked_get | 0.2025ms | 5.9403μs | 168.3430 KOps/s | 171.0731 KOps/s | |
test_nested_getitemleaf | 38.2410μs | 6.4714μs | 154.5268 KOps/s | 152.7337 KOps/s | |
test_nested_getitem | 27.6800μs | 6.1735μs | 161.9836 KOps/s | 160.7831 KOps/s | |
test_stacked_getitemleaf | 41.4910μs | 6.4437μs | 155.1904 KOps/s | 153.9642 KOps/s | |
test_stacked_getitem | 43.8100μs | 6.1193μs | 163.4172 KOps/s | 162.1157 KOps/s | |
test_lock_nested | 8.9141ms | 0.3515ms | 2.8453 KOps/s | 2.8754 KOps/s | |
test_lock_stack_nested | 0.4934ms | 0.3471ms | 2.8807 KOps/s | 2.9013 KOps/s | |
test_unlock_nested | 0.3919ms | 0.2865ms | 3.4904 KOps/s | 3.5728 KOps/s | |
test_unlock_stack_nested | 0.4122ms | 0.2842ms | 3.5185 KOps/s | 3.5328 KOps/s | |
test_flatten_speed | 0.1136ms | 75.9746μs | 13.1623 KOps/s | 13.2249 KOps/s | |
test_unflatten_speed | 0.3663ms | 0.3247ms | 3.0802 KOps/s | 3.1206 KOps/s | |
test_common_ops | 0.7779ms | 0.6122ms | 1.6335 KOps/s | 1.5412 KOps/s | |
test_creation | 94.8810μs | 1.7498μs | 571.5097 KOps/s | 570.3175 KOps/s | |
test_creation_empty | 65.7710μs | 7.0829μs | 141.1846 KOps/s | 98.9343 KOps/s | |
test_creation_nested_1 | 33.3000μs | 8.8229μs | 113.3410 KOps/s | 84.1737 KOps/s | |
test_creation_nested_2 | 31.4810μs | 11.7109μs | 85.3904 KOps/s | 69.1048 KOps/s | |
test_clone | 50.3510μs | 11.3220μs | 88.3234 KOps/s | 92.8127 KOps/s | |
test_getitem[int] | 1.3477ms | 11.1769μs | 89.4700 KOps/s | 93.9234 KOps/s | |
test_getitem[slice_int] | 0.1637ms | 21.7980μs | 45.8758 KOps/s | 47.8513 KOps/s | |
test_getitem[range] | 0.1357ms | 39.4863μs | 25.3252 KOps/s | 26.2927 KOps/s | |
test_getitem[tuple] | 0.1092ms | 19.1773μs | 52.1451 KOps/s | 54.6604 KOps/s | |
test_getitem[list] | 0.1579ms | 34.9088μs | 28.6460 KOps/s | 29.4729 KOps/s | |
test_setitem_dim[int] | 0.1255ms | 21.3405μs | 46.8592 KOps/s | 50.3663 KOps/s | |
test_setitem_dim[slice_int] | 80.1310μs | 40.3122μs | 24.8064 KOps/s | 25.2338 KOps/s | |
test_setitem_dim[range] | 80.4610μs | 55.5216μs | 18.0110 KOps/s | 18.3701 KOps/s | |
test_setitem_dim[tuple] | 81.3610μs | 34.6511μs | 28.8592 KOps/s | 29.6454 KOps/s | |
test_setitem | 0.1117ms | 15.3776μs | 65.0295 KOps/s | 61.1153 KOps/s | |
test_set | 53.8200μs | 14.9242μs | 67.0051 KOps/s | 62.6621 KOps/s | |
test_set_shared | 0.5097ms | 0.1623ms | 6.1611 KOps/s | 6.1843 KOps/s | |
test_update | 0.3994ms | 17.1597μs | 58.2760 KOps/s | 51.6086 KOps/s | |
test_update_nested | 53.2800μs | 22.7811μs | 43.8961 KOps/s | 39.8732 KOps/s | |
test_update__nested | 0.5139ms | 26.6324μs | 37.5482 KOps/s | 37.7311 KOps/s | |
test_set_nested | 54.6200μs | 16.4916μs | 60.6371 KOps/s | 57.9622 KOps/s | |
test_set_nested_new | 0.1762ms | 18.8890μs | 52.9410 KOps/s | 52.4406 KOps/s | |
test_select | 71.3210μs | 31.2182μs | 32.0326 KOps/s | 31.5583 KOps/s | |
test_select_nested | 0.1012ms | 44.5745μs | 22.4343 KOps/s | 22.9861 KOps/s | |
test_exclude_nested | 94.0520μs | 64.2857μs | 15.5556 KOps/s | 16.0405 KOps/s | |
test_empty[True] | 0.3260ms | 0.2986ms | 3.3489 KOps/s | 3.4253 KOps/s | |
test_empty[False] | 3.8191μs | 0.8332μs | 1.2001 MOps/s | 1.2056 MOps/s | |
test_to | 88.2110μs | 57.5023μs | 17.3906 KOps/s | 17.3427 KOps/s | |
test_to_nonblocking | 93.5710μs | 48.5394μs | 20.6018 KOps/s | 20.8331 KOps/s | |
test_unbind_speed | 0.2862ms | 0.2461ms | 4.0627 KOps/s | 4.1967 KOps/s | |
test_unbind_speed_stack0 | 0.3270ms | 0.2447ms | 4.0860 KOps/s | 4.2173 KOps/s | |
test_unbind_speed_stack1 | 95.3441ms | 0.7453ms | 1.3417 KOps/s | 1.3585 KOps/s | |
test_split | 95.4225ms | 1.6266ms | 614.7725 Ops/s | 631.2088 Ops/s | |
test_chunk | 97.5497ms | 1.6308ms | 613.2066 Ops/s | 627.7503 Ops/s | |
test_consolidate[False-None] | 2.8127ms | 2.7046ms | 369.7461 Ops/s | 334.4725 Ops/s | |
test_consolidate[default-None] | 1.9905ms | 1.7419ms | 574.0911 Ops/s | 587.4905 Ops/s | |
test_consolidate[reduce-overhead-None] | 1.8930ms | 1.7766ms | 562.8751 Ops/s | 571.1336 Ops/s | |
test_consolidate_njt[False-None] | 6.7969ms | 6.6019ms | 151.4725 Ops/s | 151.0708 Ops/s | |
test_to[False-False-None] | 1.9363ms | 1.7113ms | 584.3505 Ops/s | 586.2574 Ops/s | |
test_to[True-False-None] | 1.6301ms | 1.3795ms | 724.8760 Ops/s | 724.4853 Ops/s | |
test_to[within-False-None] | 4.4992ms | 4.2090ms | 237.5865 Ops/s | 170.1876 Ops/s | |
test_to[True-default-None] | 5.7700ms | 5.5421ms | 180.4357 Ops/s | 187.6667 Ops/s | |
test_to_njt[False-False-None] | 7.1320ms | 6.9389ms | 144.1148 Ops/s | 141.9668 Ops/s | |
test_to_njt[True-False-None] | 6.0367ms | 5.6168ms | 178.0366 Ops/s | 175.6529 Ops/s | |
test_to_njt[within-False-None] | 12.5172ms | 12.3466ms | 80.9943 Ops/s | 79.7008 Ops/s | |
test_creation[device0] | 0.4589ms | 81.7910μs | 12.2263 KOps/s | 12.2701 KOps/s | |
test_creation_from_tensor | 0.4460ms | 84.6110μs | 11.8188 KOps/s | 11.7123 KOps/s | |
test_add_one[memmap_tensor0] | 0.2323ms | 7.4582μs | 134.0802 KOps/s | 141.4191 KOps/s | |
test_contiguous[memmap_tensor0] | 1.8396μs | 0.4374μs | 2.2865 MOps/s | 2.3410 MOps/s | |
test_stack[memmap_tensor0] | 60.0110μs | 4.6666μs | 214.2885 KOps/s | 230.2495 KOps/s | |
test_memmaptd_index | 1.5371ms | 0.2559ms | 3.9077 KOps/s | 4.1530 KOps/s | |
test_memmaptd_index_astensor | 0.5240ms | 0.3173ms | 3.1519 KOps/s | 3.3222 KOps/s | |
test_memmaptd_index_op | 0.7371ms | 0.5898ms | 1.6954 KOps/s | 1.6367 KOps/s | |
test_serialize_model | 0.1320s | 0.1311s | 7.6261 Ops/s | 7.6655 Ops/s | |
test_serialize_model_pickle | 1.3740s | 1.2171s | 0.8216 Ops/s | 0.8177 Ops/s | |
test_serialize_weights | 0.1328s | 0.1301s | 7.6878 Ops/s | 7.6618 Ops/s | |
test_serialize_weights_returnearly | 0.3297s | 55.3887ms | 18.0542 Ops/s | 14.5219 Ops/s | |
test_serialize_weights_pickle | 1.3661s | 1.2162s | 0.8222 Ops/s | 0.8219 Ops/s | |
test_reshape_pytree | 0.1075ms | 22.4736μs | 44.4966 KOps/s | 44.7761 KOps/s | |
test_reshape_td | 57.0710μs | 27.7553μs | 36.0292 KOps/s | 35.9327 KOps/s | |
test_view_pytree | 44.4910μs | 22.2157μs | 45.0132 KOps/s | 45.9306 KOps/s | |
test_view_td | 66.3910μs | 32.2319μs | 31.0252 KOps/s | 31.3709 KOps/s | |
test_unbind_pytree | 85.8810μs | 28.4658μs | 35.1299 KOps/s | 34.8885 KOps/s | |
test_unbind_td | 0.7439ms | 37.4259μs | 26.7194 KOps/s | 27.2849 KOps/s | |
test_split_pytree | 63.1110μs | 30.5425μs | 32.7413 KOps/s | 32.9106 KOps/s | |
test_split_td | 0.9642ms | 39.6485μs | 25.2217 KOps/s | 26.1767 KOps/s | |
test_add_pytree | 0.1735ms | 36.4203μs | 27.4572 KOps/s | 28.3919 KOps/s | |
test_add_td | 0.1866ms | 47.5503μs | 21.0303 KOps/s | 18.7130 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.2727ms | 0.1244ms | 8.0398 KOps/s | 7.9319 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.5403ms | 0.1332ms | 7.5079 KOps/s | 7.5664 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1331ms | 96.3658μs | 10.3771 KOps/s | 10.1785 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 1.3125ms | 0.1520ms | 6.5774 KOps/s | 6.6553 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 0.1625ms | 24.2081μs | 41.3086 KOps/s | 41.2118 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1188ms | 29.3167μs | 34.1103 KOps/s | 34.2833 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.4244ms | 64.5303μs | 15.4966 KOps/s | 15.3549 KOps/s | |
test_compile_copy_nested[pytree-eager] | 95.5220μs | 48.8437μs | 20.4735 KOps/s | 20.3238 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.1861ms | 0.1419ms | 7.0477 KOps/s | 7.1976 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3550ms | 0.2202ms | 4.5422 KOps/s | 4.6032 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1434ms | 98.8163μs | 10.1198 KOps/s | 10.3808 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.2067ms | 56.1166μs | 17.8200 KOps/s | 17.9129 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2685ms | 0.1355ms | 7.3784 KOps/s | 7.5384 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.6309ms | 0.4912ms | 2.0357 KOps/s | 2.0890 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3692ms | 0.2643ms | 3.7833 KOps/s | 3.8194 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2951ms | 0.1432ms | 6.9829 KOps/s | 7.1858 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2156ms | 67.8916μs | 14.7294 KOps/s | 14.6456 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.2437ms | 99.7147μs | 10.0286 KOps/s | 10.3159 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.5717ms | 0.4050ms | 2.4689 KOps/s | 2.5024 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.1809ms | 0.1343ms | 7.4436 KOps/s | 7.5183 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.2157ms | 24.4546μs | 40.8921 KOps/s | 53.8403 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 55.8910μs | 31.2745μs | 31.9749 KOps/s | 32.0303 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1047ms | 71.4541μs | 13.9950 KOps/s | 14.0867 KOps/s | |
test_compile_copy_flat[pytree-eager] | 85.1010μs | 52.0155μs | 19.2250 KOps/s | 19.3798 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 1.6634ms | 0.3980ms | 2.5124 KOps/s | 2.1816 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.0908ms | 2.6794ms | 373.2228 Ops/s | 383.1160 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 1.6214ms | 0.4387ms | 2.2796 KOps/s | 2.2642 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.4866ms | 2.7100ms | 368.9976 Ops/s | 383.1150 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.5288ms | 0.1178ms | 8.4919 KOps/s | 8.6215 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5537ms | 78.9303μs | 12.6694 KOps/s | 12.3238 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.6356ms | 0.1105ms | 9.0504 KOps/s | 8.9528 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.3149ms | 71.8100μs | 13.9256 KOps/s | 14.4259 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.2997ms | 0.1124ms | 8.8977 KOps/s | 8.8610 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.2492ms | 71.4343μs | 13.9989 KOps/s | 13.8179 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.2403ms | 0.1055ms | 9.4795 KOps/s | 10.0088 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1426ms | 17.8998μs | 55.8666 KOps/s | 47.4486 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.2754ms | 0.1001ms | 9.9894 KOps/s | 10.4603 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 0.1628ms | 16.2781μs | 61.4322 KOps/s | 62.9642 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.2785ms | 0.1006ms | 9.9388 KOps/s | 10.4059 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 0.1539ms | 16.1423μs | 61.9489 KOps/s | 62.9429 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.2795ms | 0.1076ms | 9.2943 KOps/s | 9.8678 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.5589ms | 17.6074μs | 56.7942 KOps/s | 54.6238 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.2818ms | 0.1011ms | 9.8958 KOps/s | 9.9757 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 0.2074ms | 19.2466μs | 51.9572 KOps/s | 63.9761 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.2740ms | 0.1009ms | 9.9073 KOps/s | 10.0446 KOps/s | |
test_compile_indexing[int-pytree-eager] | 0.4424ms | 16.2562μs | 61.5150 KOps/s | 63.1534 KOps/s | |
test_mod_add[eager] | 0.1761ms | 37.6529μs | 26.5584 KOps/s | 23.9173 KOps/s | |
test_mod_add[compile] | 0.4056ms | 83.8949μs | 11.9197 KOps/s | 12.2626 KOps/s | |
test_mod_add[compile-overhead] | 0.3312ms | 0.1699ms | 5.8843 KOps/s | 5.6902 KOps/s | |
test_mod_wrap[eager] | 0.4144ms | 0.2525ms | 3.9603 KOps/s | 3.8981 KOps/s | |
test_mod_wrap[compile] | 0.4442ms | 0.2845ms | 3.5146 KOps/s | 3.4791 KOps/s | |
test_mod_wrap[compile-overhead] | 7.1310ms | 3.7177ms | 268.9816 Ops/s | 268.8193 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.7011ms | 1.5095ms | 662.4854 Ops/s | 686.0508 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.8224ms | 1.2724ms | 785.9321 Ops/s | 727.5407 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.4901ms | 0.9532ms | 1.0491 KOps/s | 978.2418 Ops/s | |
test_seq_add[eager] | 0.2595ms | 0.1152ms | 8.6774 KOps/s | 8.2213 KOps/s | |
test_seq_add[compile] | 0.2384ms | 89.5902μs | 11.1619 KOps/s | 10.9887 KOps/s | |
test_seq_add[compile-overhead] | 0.2743ms | 0.1300ms | 7.6932 KOps/s | 7.7726 KOps/s | |
test_seq_wrap[eager] | 0.5880ms | 0.4192ms | 2.3852 KOps/s | 2.3135 KOps/s | |
test_seq_wrap[compile] | 0.4511ms | 0.3003ms | 3.3296 KOps/s | 3.2827 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3008ms | 0.2247ms | 4.4511 KOps/s | 4.3623 KOps/s | |
test_func_call_runtime[False-eager] | 0.8981ms | 0.7467ms | 1.3392 KOps/s | 1.3225 KOps/s | |
test_func_call_runtime[False-compile] | 0.9582ms | 0.7804ms | 1.2813 KOps/s | 1.3459 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.5005ms | 0.3661ms | 2.7313 KOps/s | 2.7133 KOps/s | |
test_func_call_runtime[True-eager] | 1.0752ms | 0.9197ms | 1.0873 KOps/s | 1.0771 KOps/s | |
test_func_call_runtime[True-compile] | 0.9123ms | 0.7663ms | 1.3050 KOps/s | 1.3029 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.4360ms | 0.3874ms | 2.5815 KOps/s | 2.5865 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.9350ms | 0.7565ms | 1.3218 KOps/s | 1.3460 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.8869ms | 0.7530ms | 1.3281 KOps/s | 1.3111 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4279ms | 0.3697ms | 2.7047 KOps/s | 2.7180 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.1586ms | 1.0151ms | 985.1447 Ops/s | 985.1765 Ops/s | |
test_func_call_cm_runtime[True-compile] | 1.2218ms | 1.0052ms | 994.8649 Ops/s | 976.6549 Ops/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.1546ms | 1.0043ms | 995.7042 Ops/s | 977.8164 Ops/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5063ms | 2.1082ms | 474.3332 Ops/s | 466.2647 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9931ms | 0.8213ms | 1.2176 KOps/s | 1.2215 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.5858ms | 0.4167ms | 2.3997 KOps/s | 2.3802 KOps/s | |
test_distributed | 6.7719ms | 0.2461ms | 4.0631 KOps/s | 8.4867 KOps/s | |
test_tdmodule | 0.1862ms | 20.5206μs | 48.7315 KOps/s | 45.5527 KOps/s | |
test_tdmodule_dispatch | 52.5110μs | 35.3603μs | 28.2803 KOps/s | 26.1973 KOps/s | |
test_tdseq | 0.1939ms | 21.0166μs | 47.5815 KOps/s | 45.6766 KOps/s | |
test_tdseq_dispatch | 58.8610μs | 37.1978μs | 26.8833 KOps/s | 24.3800 KOps/s | |
test_instantiation_functorch | 1.7640ms | 1.6165ms | 618.6127 Ops/s | 635.1508 Ops/s | |
test_exec_functorch | 0.2845ms | 0.1516ms | 6.5944 KOps/s | 6.8011 KOps/s | |
test_exec_functional_call | 0.2394ms | 0.1424ms | 7.0211 KOps/s | 7.0928 KOps/s | |
test_exec_td_decorator | 0.3820ms | 0.1930ms | 5.1803 KOps/s | 5.2306 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8366ms | 0.6930ms | 1.4430 KOps/s | 1.4350 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8305ms | 0.6923ms | 1.4445 KOps/s | 1.3899 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7907ms | 0.6065ms | 1.6489 KOps/s | 1.6602 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7593ms | 0.6250ms | 1.6000 KOps/s | 1.6528 KOps/s | |
test_vmap_transformer_speed_decorator[True-True] | 20.1017ms | 19.3565ms | 51.6623 Ops/s | 51.6597 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.5181ms | 19.3668ms | 51.6347 Ops/s | 51.7428 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.4166ms | 19.2350ms | 51.9887 Ops/s | 52.1505 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.4592ms | 19.2278ms | 52.0081 Ops/s | 52.2124 Ops/s | |
test_to_module_speed[True] | 1.4350ms | 0.9737ms | 1.0270 KOps/s | 1.0366 KOps/s | |
test_to_module_speed[False] | 1.0383ms | 0.9543ms | 1.0478 KOps/s | 1.0437 KOps/s | |
test_tc_init | 67.2310μs | 33.9618μs | 29.4449 KOps/s | 26.2759 KOps/s | |
test_tc_init_nested | 0.1044ms | 68.3820μs | 14.6237 KOps/s | 12.8944 KOps/s | |
test_tc_first_layer_tensor | 28.8200μs | 0.8138μs | 1.2288 MOps/s | 1.4211 MOps/s | |
test_tc_first_layer_nontensor | 20.2500μs | 2.2426μs | 445.9161 KOps/s | 439.7544 KOps/s | |
test_tc_second_layer_tensor | 8.9000μs | 1.4330μs | 697.8513 KOps/s | 706.5930 KOps/s | |
test_tc_second_layer_nontensor | 31.1600μs | 3.0236μs | 330.7340 KOps/s | 328.4114 KOps/s | |
test_unbind | 0.2182s | 12.1892ms | 82.0397 Ops/s | 141.8902 Ops/s | |
test_full_like | 10.8442ms | 9.6664ms | 103.4512 Ops/s | 103.6403 Ops/s | |
test_zeros_like | 4.9704ms | 4.3854ms | 228.0279 Ops/s | 113.4273 Ops/s | |
test_ones_like | 5.6214ms | 4.3971ms | 227.4230 Ops/s | 225.4611 Ops/s | |
test_clone | 7.5119ms | 6.8365ms | 146.2726 Ops/s | 147.9279 Ops/s | |
test_squeeze | 0.1273ms | 10.0111μs | 99.8889 KOps/s | 102.2445 KOps/s | |
test_unsqueeze | 0.1350ms | 72.5473μs | 13.7841 KOps/s | 13.3165 KOps/s | |
test_split | 0.3707ms | 0.1595ms | 6.2715 KOps/s | 6.0887 KOps/s | |
test_permute | 0.3289ms | 0.1780ms | 5.6178 KOps/s | 5.3594 KOps/s | |
test_stack | 52.3677ms | 51.4358ms | 19.4417 Ops/s | 19.2893 Ops/s | |
test_cat | 52.9824ms | 51.5133ms | 19.4124 Ops/s | 19.4026 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):