-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Cudagraphs #986
Merged
Merged
[Feature] Cudagraphs #986
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Sep 11, 2024
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 44.9230μs | 20.2935μs | 49.2769 KOps/s | 48.1950 KOps/s | |
test_plain_set_stack_nested | 40.4560μs | 20.2741μs | 49.3240 KOps/s | 46.7545 KOps/s | |
test_plain_set_nested_inplace | 64.9810μs | 22.1341μs | 45.1791 KOps/s | 44.2036 KOps/s | |
test_plain_set_stack_nested_inplace | 69.3900μs | 22.3665μs | 44.7098 KOps/s | 44.7502 KOps/s | |
test_items | 18.0340μs | 4.1504μs | 240.9414 KOps/s | 236.3517 KOps/s | |
test_items_nested | 0.5622ms | 0.3298ms | 3.0324 KOps/s | 3.0677 KOps/s | |
test_items_nested_locked | 0.6390ms | 0.3299ms | 3.0314 KOps/s | 3.0491 KOps/s | |
test_items_nested_leaf | 0.1573ms | 84.9869μs | 11.7665 KOps/s | 11.6020 KOps/s | |
test_items_stack_nested | 0.6346ms | 0.3315ms | 3.0166 KOps/s | 3.0437 KOps/s | |
test_items_stack_nested_leaf | 0.1584ms | 86.6158μs | 11.5452 KOps/s | 11.5263 KOps/s | |
test_items_stack_nested_locked | 0.6486ms | 0.3379ms | 2.9593 KOps/s | 3.0109 KOps/s | |
test_keys | 23.0130μs | 3.6910μs | 270.9295 KOps/s | 277.0112 KOps/s | |
test_keys_nested | 0.2683ms | 97.6110μs | 10.2448 KOps/s | 10.5283 KOps/s | |
test_keys_nested_locked | 0.7958ms | 0.1025ms | 9.7570 KOps/s | 10.1924 KOps/s | |
test_keys_nested_leaf | 0.1583ms | 82.6219μs | 12.1033 KOps/s | 12.7868 KOps/s | |
test_keys_stack_nested | 0.2019ms | 97.6924μs | 10.2362 KOps/s | 10.4511 KOps/s | |
test_keys_stack_nested_leaf | 0.1796ms | 82.4706μs | 12.1255 KOps/s | 12.5887 KOps/s | |
test_keys_stack_nested_locked | 0.3090ms | 0.1025ms | 9.7589 KOps/s | 10.0299 KOps/s | |
test_values | 10.8282μs | 1.0767μs | 928.7664 KOps/s | 1.0048 MOps/s | |
test_values_nested | 96.7000μs | 47.7035μs | 20.9628 KOps/s | 21.1891 KOps/s | |
test_values_nested_locked | 0.1220ms | 47.3255μs | 21.1303 KOps/s | 21.0452 KOps/s | |
test_values_nested_leaf | 97.7920μs | 42.0639μs | 23.7733 KOps/s | 23.7393 KOps/s | |
test_values_stack_nested | 98.8570μs | 47.6410μs | 20.9903 KOps/s | 20.8766 KOps/s | |
test_values_stack_nested_leaf | 0.1074ms | 42.4419μs | 23.5616 KOps/s | 23.9206 KOps/s | |
test_values_stack_nested_locked | 95.8890μs | 48.1760μs | 20.7572 KOps/s | 21.0088 KOps/s | |
test_membership | 17.4620μs | 0.9270μs | 1.0788 MOps/s | 1.1799 MOps/s | |
test_membership_nested | 49.2120μs | 2.5806μs | 387.4999 KOps/s | 379.7985 KOps/s | |
test_membership_nested_leaf | 20.3480μs | 2.5677μs | 389.4509 KOps/s | 365.7579 KOps/s | |
test_membership_stacked_nested | 50.0130μs | 2.5734μs | 388.5879 KOps/s | 386.2551 KOps/s | |
test_membership_stacked_nested_leaf | 17.6330μs | 2.5907μs | 385.9894 KOps/s | 381.2843 KOps/s | |
test_membership_nested_last | 50.9050μs | 3.8850μs | 257.4010 KOps/s | 263.9171 KOps/s | |
test_membership_nested_leaf_last | 40.5360μs | 3.8082μs | 262.5890 KOps/s | 261.4246 KOps/s | |
test_membership_stacked_nested_last | 44.7830μs | 3.8342μs | 260.8085 KOps/s | 187.8979 KOps/s | |
test_membership_stacked_nested_leaf_last | 29.1440μs | 3.8692μs | 258.4527 KOps/s | 186.9683 KOps/s | |
test_nested_getleaf | 52.3370μs | 11.2425μs | 88.9478 KOps/s | 93.7320 KOps/s | |
test_nested_get | 65.3420μs | 10.0614μs | 99.3897 KOps/s | 97.6181 KOps/s | |
test_stacked_getleaf | 36.0280μs | 10.6302μs | 94.0716 KOps/s | 92.5325 KOps/s | |
test_stacked_get | 54.0810μs | 10.0832μs | 99.1753 KOps/s | 98.6730 KOps/s | |
test_nested_getitemleaf | 59.5710μs | 10.9185μs | 91.5873 KOps/s | 90.1808 KOps/s | |
test_nested_getitem | 34.6640μs | 10.1537μs | 98.4863 KOps/s | 96.3202 KOps/s | |
test_stacked_getitemleaf | 55.7540μs | 10.9515μs | 91.3116 KOps/s | 91.5176 KOps/s | |
test_stacked_getitem | 31.0270μs | 10.1550μs | 98.4733 KOps/s | 98.7140 KOps/s | |
test_lock_nested | 1.1537ms | 0.4853ms | 2.0608 KOps/s | 2.0636 KOps/s | |
test_lock_stack_nested | 0.8272ms | 0.4533ms | 2.2062 KOps/s | 2.2181 KOps/s | |
test_unlock_nested | 95.0731ms | 0.4981ms | 2.0076 KOps/s | 2.4515 KOps/s | |
test_unlock_stack_nested | 0.7198ms | 0.3687ms | 2.7121 KOps/s | 2.6688 KOps/s | |
test_flatten_speed | 0.6353ms | 0.1096ms | 9.1200 KOps/s | 9.5276 KOps/s | |
test_unflatten_speed | 0.6418ms | 0.4692ms | 2.1312 KOps/s | 2.1521 KOps/s | |
test_common_ops | 5.2125ms | 1.1294ms | 885.4351 Ops/s | 886.7956 Ops/s | |
test_creation | 29.6960μs | 2.0851μs | 479.5972 KOps/s | 478.8909 KOps/s | |
test_creation_empty | 79.5810μs | 17.4802μs | 57.2077 KOps/s | 54.3249 KOps/s | |
test_creation_nested_1 | 46.8570μs | 20.2265μs | 49.4402 KOps/s | 46.6593 KOps/s | |
test_creation_nested_2 | 94.4130μs | 24.3210μs | 41.1167 KOps/s | 37.7734 KOps/s | |
test_clone | 0.1906ms | 17.3746μs | 57.5553 KOps/s | 55.1652 KOps/s | |
test_getitem[int] | 0.7693ms | 16.4333μs | 60.8519 KOps/s | 58.6557 KOps/s | |
test_getitem[slice_int] | 0.1361ms | 30.0574μs | 33.2697 KOps/s | 31.8464 KOps/s | |
test_getitem[range] | 0.1920ms | 58.0061μs | 17.2396 KOps/s | 17.2734 KOps/s | |
test_getitem[tuple] | 0.1291ms | 25.0450μs | 39.9281 KOps/s | 38.9486 KOps/s | |
test_getitem[list] | 0.3352ms | 53.3772μs | 18.7346 KOps/s | 18.8283 KOps/s | |
test_setitem_dim[int] | 54.5210μs | 33.3627μs | 29.9736 KOps/s | 29.0004 KOps/s | |
test_setitem_dim[slice_int] | 0.1061ms | 61.2532μs | 16.3257 KOps/s | 15.8716 KOps/s | |
test_setitem_dim[range] | 0.1560ms | 84.1818μs | 11.8790 KOps/s | 11.4403 KOps/s | |
test_setitem_dim[tuple] | 0.1299ms | 51.2740μs | 19.5031 KOps/s | 19.6915 KOps/s | |
test_setitem | 0.1974ms | 30.3870μs | 32.9088 KOps/s | 32.2092 KOps/s | |
test_set | 0.1545ms | 29.9652μs | 33.3721 KOps/s | 32.7659 KOps/s | |
test_set_shared | 1.3080ms | 0.2217ms | 4.5113 KOps/s | 4.7228 KOps/s | |
test_update | 0.1823ms | 36.6645μs | 27.2743 KOps/s | 26.7648 KOps/s | |
test_update_nested | 0.1850ms | 47.1423μs | 21.2124 KOps/s | 20.5825 KOps/s | |
test_update__nested | 0.1902ms | 35.7440μs | 27.9767 KOps/s | 27.1634 KOps/s | |
test_set_nested | 0.1955ms | 32.3786μs | 30.8846 KOps/s | 29.9505 KOps/s | |
test_set_nested_new | 0.1753ms | 37.7761μs | 26.4718 KOps/s | 25.7835 KOps/s | |
test_select | 0.2235ms | 55.0188μs | 18.1756 KOps/s | 17.4702 KOps/s | |
test_select_nested | 0.1162ms | 59.4482μs | 16.8214 KOps/s | 15.5991 KOps/s | |
test_exclude_nested | 0.1409ms | 75.4135μs | 13.2602 KOps/s | 12.5679 KOps/s | |
test_empty[True] | 0.5865ms | 0.3153ms | 3.1714 KOps/s | 3.1584 KOps/s | |
test_empty[False] | 13.1043μs | 1.1972μs | 835.3161 KOps/s | 815.3780 KOps/s | |
test_unbind_speed | 0.5586ms | 0.2950ms | 3.3895 KOps/s | 3.2168 KOps/s | |
test_unbind_speed_stack0 | 0.5148ms | 0.2951ms | 3.3885 KOps/s | 3.3218 KOps/s | |
test_unbind_speed_stack1 | 0.1002s | 0.8245ms | 1.2129 KOps/s | 1.3490 KOps/s | |
test_split | 2.2368ms | 1.9894ms | 502.6717 Ops/s | 442.3783 Ops/s | |
test_chunk | 0.1004s | 2.3722ms | 421.5580 Ops/s | 437.1694 Ops/s | |
test_creation[device0] | 0.2404ms | 0.1180ms | 8.4764 KOps/s | 8.3648 KOps/s | |
test_creation_from_tensor | 3.5688ms | 0.1190ms | 8.4058 KOps/s | 8.6127 KOps/s | |
test_add_one[memmap_tensor0] | 0.1924ms | 7.5459μs | 132.5227 KOps/s | 131.8736 KOps/s | |
test_contiguous[memmap_tensor0] | 28.4940μs | 1.8931μs | 528.2217 KOps/s | 519.2956 KOps/s | |
test_stack[memmap_tensor0] | 51.8160μs | 5.5005μs | 181.8027 KOps/s | 173.6110 KOps/s | |
test_memmaptd_index | 1.0791ms | 0.3992ms | 2.5048 KOps/s | 2.4579 KOps/s | |
test_memmaptd_index_astensor | 0.9725ms | 0.4803ms | 2.0818 KOps/s | 2.0242 KOps/s | |
test_memmaptd_index_op | 1.5788ms | 1.0090ms | 991.0662 Ops/s | 961.3237 Ops/s | |
test_serialize_model | 0.1286s | 0.1187s | 8.4239 Ops/s | 8.2945 Ops/s | |
test_serialize_model_pickle | 0.4501s | 0.3885s | 2.5743 Ops/s | 2.5043 Ops/s | |
test_serialize_weights | 0.1253s | 0.1166s | 8.5768 Ops/s | 8.4700 Ops/s | |
test_serialize_weights_returnearly | 0.2714s | 0.1739s | 5.7518 Ops/s | 6.3371 Ops/s | |
test_serialize_weights_pickle | 0.4633s | 0.4071s | 2.4563 Ops/s | 1.0700 Ops/s | |
test_serialize_weights_filesystem | 0.1454s | 0.1393s | 7.1805 Ops/s | 6.9728 Ops/s | |
test_serialize_model_filesystem | 0.1592s | 0.1494s | 6.6917 Ops/s | 6.1864 Ops/s | |
test_reshape_pytree | 87.9230μs | 38.1607μs | 26.2050 KOps/s | 26.0186 KOps/s | |
test_reshape_td | 96.0180μs | 45.4522μs | 22.0011 KOps/s | 20.0203 KOps/s | |
test_view_pytree | 0.1150ms | 37.7418μs | 26.4958 KOps/s | 25.9513 KOps/s | |
test_view_td | 0.1323ms | 51.1945μs | 19.5333 KOps/s | 18.1305 KOps/s | |
test_unbind_pytree | 0.1091ms | 35.3821μs | 28.2629 KOps/s | 28.0645 KOps/s | |
test_unbind_td | 0.3179ms | 44.0500μs | 22.7015 KOps/s | 22.1488 KOps/s | |
test_split_pytree | 0.1150ms | 37.8935μs | 26.3898 KOps/s | 26.7752 KOps/s | |
test_split_td | 0.4719ms | 56.7056μs | 17.6350 KOps/s | 16.9400 KOps/s | |
test_add_pytree | 0.1046ms | 44.2861μs | 22.5805 KOps/s | 22.6278 KOps/s | |
test_add_td | 0.1751ms | 80.8281μs | 12.3719 KOps/s | 11.5184 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1230ms | 55.9847μs | 17.8620 KOps/s | 17.5972 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.4491ms | 0.1913ms | 5.2285 KOps/s | 5.1628 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1092ms | 55.6432μs | 17.9717 KOps/s | 17.5278 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.3122ms | 0.1432ms | 6.9844 KOps/s | 7.1369 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 0.1330ms | 20.5550μs | 48.6501 KOps/s | 47.5267 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1475ms | 67.8650μs | 14.7351 KOps/s | 13.9720 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1517ms | 77.0162μs | 12.9843 KOps/s | 13.3580 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1437ms | 69.6883μs | 14.3496 KOps/s | 14.8160 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.3579ms | 0.1709ms | 5.8531 KOps/s | 5.7329 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3771ms | 0.1887ms | 5.3006 KOps/s | 4.9766 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1056ms | 45.5960μs | 21.9318 KOps/s | 20.9762 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.6287ms | 70.1591μs | 14.2533 KOps/s | 14.0212 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.5138ms | 0.1782ms | 5.6127 KOps/s | 5.7211 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.5411ms | 0.2925ms | 3.4194 KOps/s | 3.4210 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3260ms | 0.2009ms | 4.9766 KOps/s | 4.7272 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.3645ms | 0.1729ms | 5.7848 KOps/s | 5.7723 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2768ms | 63.3542μs | 15.7843 KOps/s | 15.3514 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1187ms | 46.9298μs | 21.3084 KOps/s | 20.8859 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.4222ms | 0.2295ms | 4.3568 KOps/s | 4.1970 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.2851ms | 0.1744ms | 5.7324 KOps/s | 5.6980 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.2272ms | 0.1020ms | 9.8051 KOps/s | 9.6413 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1297ms | 56.5739μs | 17.6760 KOps/s | 17.2323 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1783ms | 78.0534μs | 12.8117 KOps/s | 13.1015 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1296ms | 69.1689μs | 14.4574 KOps/s | 14.0662 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.3072ms | 0.1981ms | 5.0482 KOps/s | 5.0185 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 1.9909ms | 1.6202ms | 617.2111 Ops/s | 581.5811 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.3910ms | 0.1961ms | 5.0991 KOps/s | 5.2101 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.3513ms | 1.0979ms | 910.8126 Ops/s | 894.0810 Ops/s | |
test_compile_assign_and_add_stack[compile] | 0.7460ms | 0.4207ms | 2.3772 KOps/s | 2.3856 KOps/s | |
test_compile_assign_and_add_stack[eager] | 3.9110ms | 3.6417ms | 274.5944 Ops/s | 263.9524 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1221ms | 32.8905μs | 30.4039 KOps/s | 28.2334 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 1.0511ms | 49.0853μs | 20.3727 KOps/s | 20.3076 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1111ms | 29.4633μs | 33.9406 KOps/s | 32.8467 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 75.5410μs | 29.0467μs | 34.4274 KOps/s | 35.3385 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 95.7290μs | 28.9491μs | 34.5433 KOps/s | 32.9601 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 73.3270μs | 28.6086μs | 34.9546 KOps/s | 35.0490 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1638ms | 74.3798μs | 13.4445 KOps/s | 13.4693 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.5501ms | 27.3785μs | 36.5250 KOps/s | 34.2444 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1512ms | 69.3353μs | 14.4227 KOps/s | 14.7557 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 95.7880μs | 22.3375μs | 44.7678 KOps/s | 43.6262 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1462ms | 69.0134μs | 14.4899 KOps/s | 14.6067 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 74.4690μs | 22.4651μs | 44.5136 KOps/s | 43.3276 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1459ms | 74.0870μs | 13.4976 KOps/s | 13.5066 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 1.1046ms | 27.5795μs | 36.2589 KOps/s | 35.0932 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1622ms | 69.1238μs | 14.4668 KOps/s | 14.8093 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 0.2910ms | 22.4763μs | 44.4913 KOps/s | 43.9466 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1705ms | 69.8266μs | 14.3212 KOps/s | 14.6290 KOps/s | |
test_compile_indexing[int-pytree-eager] | 64.2700μs | 22.1624μs | 45.1215 KOps/s | 44.0837 KOps/s | |
test_mod_add[eager] | 86.5940μs | 24.3472μs | 41.0724 KOps/s | 38.9623 KOps/s | |
test_mod_add[compile] | 0.1124ms | 36.7599μs | 27.2035 KOps/s | 25.4371 KOps/s | |
test_mod_add[compile-overhead] | 0.1103ms | 37.4027μs | 26.7360 KOps/s | 25.5667 KOps/s | |
test_mod_wrap[eager] | 0.3999ms | 0.2059ms | 4.8574 KOps/s | 4.7879 KOps/s | |
test_mod_wrap[compile] | 0.4364ms | 0.2286ms | 4.3743 KOps/s | 4.2372 KOps/s | |
test_mod_wrap[compile-overhead] | 0.6142ms | 0.2303ms | 4.3425 KOps/s | 4.3084 KOps/s | |
test_mod_wrap_and_backward[eager] | 11.6447ms | 10.7584ms | 92.9502 Ops/s | 83.0654 Ops/s | |
test_mod_wrap_and_backward[compile] | 12.6682ms | 11.2616ms | 88.7974 Ops/s | 80.9293 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 12.8029ms | 11.7793ms | 84.8949 Ops/s | 76.8817 Ops/s | |
test_seq_add[eager] | 0.2220ms | 91.2945μs | 10.9536 KOps/s | 10.8818 KOps/s | |
test_seq_add[compile] | 0.1412ms | 62.2768μs | 16.0573 KOps/s | 15.8304 KOps/s | |
test_seq_add[compile-overhead] | 0.1345ms | 60.3700μs | 16.5645 KOps/s | 16.0084 KOps/s | |
test_seq_wrap[eager] | 0.9736ms | 0.3842ms | 2.6028 KOps/s | 2.6324 KOps/s | |
test_seq_wrap[compile] | 1.3953ms | 0.2676ms | 3.7376 KOps/s | 3.7310 KOps/s | |
test_seq_wrap[compile-overhead] | 1.3553ms | 0.2638ms | 3.7912 KOps/s | 3.7460 KOps/s | |
test_func_call_runtime[False-eager] | 0.9140ms | 0.5100ms | 1.9607 KOps/s | 1.9093 KOps/s | |
test_func_call_runtime[False-compile] | 1.0596ms | 0.5069ms | 1.9729 KOps/s | 1.9354 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.6384ms | 0.4962ms | 2.0153 KOps/s | 1.9451 KOps/s | |
test_func_call_runtime[True-eager] | 1.1820ms | 0.7278ms | 1.3740 KOps/s | 1.3395 KOps/s | |
test_func_call_runtime[True-compile] | 0.8449ms | 0.5120ms | 1.9533 KOps/s | 1.8909 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.8746ms | 0.5095ms | 1.9627 KOps/s | 1.9130 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8985ms | 0.5053ms | 1.9789 KOps/s | 1.9385 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.6190ms | 0.4984ms | 2.0063 KOps/s | 1.9294 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.6502ms | 0.4988ms | 2.0049 KOps/s | 1.9384 KOps/s | |
test_func_call_cm_runtime[True-eager] | 0.9924ms | 0.8487ms | 1.1782 KOps/s | 1.1234 KOps/s | |
test_func_call_cm_runtime[True-compile] | 1.0836ms | 0.7246ms | 1.3801 KOps/s | 1.3542 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.2257ms | 0.7350ms | 1.3606 KOps/s | 1.3427 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.4160ms | 1.8641ms | 536.4582 Ops/s | 519.7422 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 2.6960ms | 1.9084ms | 524.0116 Ops/s | 505.5672 Ops/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 2.6632ms | 1.9053ms | 524.8545 Ops/s | 506.0725 Ops/s | |
test_distributed | 0.2607ms | 0.1242ms | 8.0518 KOps/s | 7.7921 KOps/s | |
test_tdmodule | 38.1920μs | 16.9065μs | 59.1490 KOps/s | 53.2644 KOps/s | |
test_tdmodule_dispatch | 54.3320μs | 35.3376μs | 28.2985 KOps/s | 27.4899 KOps/s | |
test_tdseq | 42.1180μs | 20.7290μs | 48.2415 KOps/s | 45.8338 KOps/s | |
test_tdseq_dispatch | 71.5230μs | 40.8272μs | 24.4935 KOps/s | 23.7227 KOps/s | |
test_instantiation_functorch | 1.7908ms | 1.5856ms | 630.6907 Ops/s | 613.4312 Ops/s | |
test_instantiation_td | 1.9742ms | 1.1625ms | 860.1888 Ops/s | 843.1359 Ops/s | |
test_exec_functorch | 0.4194ms | 0.1826ms | 5.4777 KOps/s | 5.4454 KOps/s | |
test_exec_functional_call | 0.2801ms | 0.1719ms | 5.8173 KOps/s | 5.7499 KOps/s | |
test_exec_td | 0.2430ms | 0.1670ms | 5.9872 KOps/s | 5.8888 KOps/s | |
test_exec_td_decorator | 1.0469ms | 0.2199ms | 4.5471 KOps/s | 4.3907 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.9610ms | 0.6419ms | 1.5580 KOps/s | 1.5537 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.8668ms | 0.6456ms | 1.5490 KOps/s | 1.5673 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.8030ms | 0.4988ms | 2.0049 KOps/s | 2.0044 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7794ms | 0.4966ms | 2.0138 KOps/s | 2.0322 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.4049ms | 0.6247ms | 1.6008 KOps/s | 1.5974 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0215ms | 0.6189ms | 1.6159 KOps/s | 1.5905 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7300ms | 0.5062ms | 1.9757 KOps/s | 1.8883 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8666ms | 0.5063ms | 1.9750 KOps/s | 1.9434 KOps/s | |
test_to_module_speed[True] | 1.5307ms | 1.2838ms | 778.9516 Ops/s | 755.1510 Ops/s | |
test_to_module_speed[False] | 1.3788ms | 1.2464ms | 802.3412 Ops/s | 779.9303 Ops/s | |
test_tc_init | 76.4920μs | 44.6406μs | 22.4012 KOps/s | 21.8162 KOps/s | |
test_tc_init_nested | 0.1589ms | 86.5809μs | 11.5499 KOps/s | 11.1188 KOps/s | |
test_tc_first_layer_tensor | 16.8410μs | 1.5769μs | 634.1392 KOps/s | 634.0496 KOps/s | |
test_tc_first_layer_nontensor | 46.7770μs | 4.8887μs | 204.5529 KOps/s | 208.9749 KOps/s | |
test_tc_second_layer_tensor | 18.0430μs | 2.8746μs | 347.8732 KOps/s | 342.8935 KOps/s | |
test_tc_second_layer_nontensor | 46.9070μs | 6.3099μs | 158.4811 KOps/s | 159.8643 KOps/s | |
test_unbind | 0.4668s | 13.1897ms | 75.8165 Ops/s | 77.2346 Ops/s | |
test_full_like | 7.8778ms | 6.8227ms | 146.5697 Ops/s | 123.1951 Ops/s | |
test_zeros_like | 12.1542ms | 6.5106ms | 153.5945 Ops/s | 343.8066 Ops/s | |
test_ones_like | 17.3182ms | 8.1008ms | 123.4452 Ops/s | 283.8320 Ops/s | |
test_clone | 16.0662ms | 9.6616ms | 103.5024 Ops/s | 190.0642 Ops/s | |
test_squeeze | 70.1110μs | 12.8497μs | 77.8226 KOps/s | 77.2793 KOps/s | |
test_unsqueeze | 0.1780ms | 94.6330μs | 10.5671 KOps/s | 10.6218 KOps/s | |
test_split | 0.5272ms | 0.1945ms | 5.1414 KOps/s | 4.9864 KOps/s | |
test_permute | 0.5094ms | 0.2277ms | 4.3917 KOps/s | 4.3368 KOps/s | |
test_stack | 27.9671ms | 25.0587ms | 39.9063 Ops/s | 39.7633 Ops/s | |
test_cat | 26.6968ms | 24.8799ms | 40.1931 Ops/s | 40.6181 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.6514ms | 14.9261μs | 66.9967 KOps/s | 66.4069 KOps/s | |
test_plain_set_stack_nested | 39.6020μs | 15.0438μs | 66.4727 KOps/s | 65.5422 KOps/s | |
test_plain_set_nested_inplace | 51.8030μs | 15.9367μs | 62.7484 KOps/s | 62.2163 KOps/s | |
test_plain_set_stack_nested_inplace | 47.5920μs | 15.7533μs | 63.4787 KOps/s | 62.6470 KOps/s | |
test_items | 26.5310μs | 2.8922μs | 345.7579 KOps/s | 345.4870 KOps/s | |
test_items_nested | 0.3550ms | 0.3158ms | 3.1668 KOps/s | 3.2197 KOps/s | |
test_items_nested_locked | 0.3633ms | 0.3157ms | 3.1678 KOps/s | 3.1755 KOps/s | |
test_items_nested_leaf | 85.2940μs | 63.5398μs | 15.7382 KOps/s | 15.8589 KOps/s | |
test_items_stack_nested | 0.3662ms | 0.3192ms | 3.1326 KOps/s | 3.2007 KOps/s | |
test_items_stack_nested_leaf | 0.1179ms | 64.6095μs | 15.4776 KOps/s | 15.3885 KOps/s | |
test_items_stack_nested_locked | 0.4911ms | 0.3172ms | 3.1523 KOps/s | 3.1736 KOps/s | |
test_keys | 41.1320μs | 3.4333μs | 291.2648 KOps/s | 293.8637 KOps/s | |
test_keys_nested | 81.0450μs | 55.2874μs | 18.0873 KOps/s | 18.0979 KOps/s | |
test_keys_nested_locked | 1.9791ms | 60.4679μs | 16.5377 KOps/s | 16.6672 KOps/s | |
test_keys_nested_leaf | 75.4850μs | 46.8898μs | 21.3266 KOps/s | 21.4808 KOps/s | |
test_keys_stack_nested | 89.3750μs | 55.5851μs | 17.9904 KOps/s | 18.1204 KOps/s | |
test_keys_stack_nested_leaf | 72.2240μs | 47.0151μs | 21.2698 KOps/s | 21.1171 KOps/s | |
test_keys_stack_nested_locked | 98.5550μs | 60.4043μs | 16.5551 KOps/s | 16.5501 KOps/s | |
test_values | 5.2803μs | 0.8063μs | 1.2403 MOps/s | 1.2466 MOps/s | |
test_values_nested | 55.1430μs | 27.4572μs | 36.4203 KOps/s | 36.2484 KOps/s | |
test_values_nested_locked | 55.3330μs | 29.3581μs | 34.0622 KOps/s | 34.1040 KOps/s | |
test_values_nested_leaf | 52.1730μs | 24.1460μs | 41.4148 KOps/s | 40.9809 KOps/s | |
test_values_stack_nested | 70.2340μs | 27.9314μs | 35.8020 KOps/s | 34.6152 KOps/s | |
test_values_stack_nested_leaf | 0.1904ms | 24.5474μs | 40.7375 KOps/s | 39.4354 KOps/s | |
test_values_stack_nested_locked | 0.1368ms | 29.8349μs | 33.5178 KOps/s | 32.4375 KOps/s | |
test_membership | 1.7696μs | 0.4702μs | 2.1269 MOps/s | 2.1291 MOps/s | |
test_membership_nested | 16.9010μs | 1.7436μs | 573.5240 KOps/s | 572.5567 KOps/s | |
test_membership_nested_leaf | 11.5107μs | 1.6966μs | 589.4073 KOps/s | 581.7905 KOps/s | |
test_membership_stacked_nested | 50.3130μs | 1.7742μs | 563.6446 KOps/s | 562.8179 KOps/s | |
test_membership_stacked_nested_leaf | 17.4910μs | 1.7768μs | 562.8190 KOps/s | 565.5517 KOps/s | |
test_membership_nested_last | 31.4420μs | 2.6454μs | 378.0137 KOps/s | 386.4512 KOps/s | |
test_membership_nested_leaf_last | 24.5120μs | 2.6409μs | 378.6530 KOps/s | 387.2543 KOps/s | |
test_membership_stacked_nested_last | 32.4320μs | 2.6013μs | 384.4222 KOps/s | 311.8175 KOps/s | |
test_membership_stacked_nested_leaf_last | 24.1710μs | 2.6199μs | 381.6881 KOps/s | 313.2047 KOps/s | |
test_nested_getleaf | 37.5420μs | 6.1533μs | 162.5148 KOps/s | 163.7981 KOps/s | |
test_nested_get | 0.2014ms | 5.7306μs | 174.5011 KOps/s | 173.9776 KOps/s | |
test_stacked_getleaf | 30.9610μs | 6.0826μs | 164.4022 KOps/s | 166.3888 KOps/s | |
test_stacked_get | 40.2030μs | 5.7138μs | 175.0152 KOps/s | 177.8551 KOps/s | |
test_nested_getitemleaf | 29.5120μs | 6.1969μs | 161.3722 KOps/s | 162.4935 KOps/s | |
test_nested_getitem | 30.8920μs | 5.7839μs | 172.8928 KOps/s | 172.3563 KOps/s | |
test_stacked_getitemleaf | 52.0320μs | 6.1872μs | 161.6230 KOps/s | 163.4587 KOps/s | |
test_stacked_getitem | 32.6720μs | 5.7103μs | 175.1213 KOps/s | 173.5111 KOps/s | |
test_lock_nested | 4.6840ms | 0.4207ms | 2.3772 KOps/s | 2.3286 KOps/s | |
test_lock_stack_nested | 0.4896ms | 0.3863ms | 2.5888 KOps/s | 2.5613 KOps/s | |
test_unlock_nested | 0.7771ms | 0.3583ms | 2.7907 KOps/s | 2.7175 KOps/s | |
test_unlock_stack_nested | 0.4442ms | 0.3257ms | 3.0704 KOps/s | 3.0250 KOps/s | |
test_flatten_speed | 0.2614ms | 81.2788μs | 12.3033 KOps/s | 12.5452 KOps/s | |
test_unflatten_speed | 0.3220ms | 0.2817ms | 3.5494 KOps/s | 3.5589 KOps/s | |
test_common_ops | 1.4757ms | 1.2784ms | 782.2133 Ops/s | 715.3211 Ops/s | |
test_creation | 22.2810μs | 1.4910μs | 670.7071 KOps/s | 685.7907 KOps/s | |
test_creation_empty | 54.3130μs | 17.2695μs | 57.9055 KOps/s | 58.0553 KOps/s | |
test_creation_nested_1 | 54.5430μs | 19.0441μs | 52.5098 KOps/s | 52.1250 KOps/s | |
test_creation_nested_2 | 51.5030μs | 21.9907μs | 45.4738 KOps/s | 44.8428 KOps/s | |
test_clone | 60.4230μs | 29.1330μs | 34.3254 KOps/s | 33.6433 KOps/s | |
test_getitem[int] | 1.2589ms | 16.3756μs | 61.0665 KOps/s | 59.7698 KOps/s | |
test_getitem[slice_int] | 0.1689ms | 27.8924μs | 35.8520 KOps/s | 34.8501 KOps/s | |
test_getitem[range] | 0.2202ms | 0.1096ms | 9.1257 KOps/s | 9.0603 KOps/s | |
test_getitem[tuple] | 0.1189ms | 23.6985μs | 42.1968 KOps/s | 38.4168 KOps/s | |
test_getitem[list] | 0.2777ms | 97.8718μs | 10.2174 KOps/s | 9.4981 KOps/s | |
test_setitem_dim[int] | 69.1230μs | 44.5219μs | 22.4609 KOps/s | 20.5426 KOps/s | |
test_setitem_dim[slice_int] | 0.2062ms | 67.9904μs | 14.7080 KOps/s | 14.5897 KOps/s | |
test_setitem_dim[range] | 0.1545ms | 0.1268ms | 7.8836 KOps/s | 7.7972 KOps/s | |
test_setitem_dim[tuple] | 0.1887ms | 60.9941μs | 16.3950 KOps/s | 15.3802 KOps/s | |
test_setitem | 0.1886ms | 42.4822μs | 23.5393 KOps/s | 21.2928 KOps/s | |
test_set | 0.2243ms | 43.4876μs | 22.9951 KOps/s | 21.8438 KOps/s | |
test_set_shared | 0.3759ms | 51.3771μs | 19.4639 KOps/s | 19.0598 KOps/s | |
test_update | 0.2019ms | 51.1781μs | 19.5396 KOps/s | 18.0391 KOps/s | |
test_update_nested | 0.2413ms | 58.1344μs | 17.2015 KOps/s | 16.0899 KOps/s | |
test_update__nested | 0.2085ms | 58.3793μs | 17.1293 KOps/s | 15.2957 KOps/s | |
test_set_nested | 0.1889ms | 44.3097μs | 22.5684 KOps/s | 20.7481 KOps/s | |
test_set_nested_new | 0.1969ms | 47.7227μs | 20.9544 KOps/s | 19.3306 KOps/s | |
test_select | 0.5409ms | 61.8753μs | 16.1615 KOps/s | 15.1368 KOps/s | |
test_select_nested | 70.8340μs | 42.3386μs | 23.6191 KOps/s | 23.3867 KOps/s | |
test_exclude_nested | 0.1708ms | 58.5018μs | 17.0935 KOps/s | 16.6223 KOps/s | |
test_empty[True] | 0.2715ms | 0.2431ms | 4.1131 KOps/s | 4.1205 KOps/s | |
test_empty[False] | 3.5252μs | 0.7446μs | 1.3429 MOps/s | 1.3614 MOps/s | |
test_to | 73.3540μs | 25.2846μs | 39.5498 KOps/s | 34.2046 KOps/s | |
test_to_nonblocking | 54.2130μs | 24.5241μs | 40.7761 KOps/s | 34.9949 KOps/s | |
test_unbind_speed | 0.3551ms | 0.2834ms | 3.5281 KOps/s | 3.4417 KOps/s | |
test_unbind_speed_stack0 | 0.4064ms | 0.2826ms | 3.5386 KOps/s | 3.4559 KOps/s | |
test_unbind_speed_stack1 | 93.5340ms | 0.7179ms | 1.3930 KOps/s | 1.3878 KOps/s | |
test_split | 95.4369ms | 2.2020ms | 454.1411 Ops/s | 456.1561 Ops/s | |
test_chunk | 95.4070ms | 2.2101ms | 452.4593 Ops/s | 453.0936 Ops/s | |
test_creation[device0] | 0.3415ms | 0.1262ms | 7.9219 KOps/s | 7.7358 KOps/s | |
test_creation_from_tensor | 0.3751ms | 0.1280ms | 7.8145 KOps/s | 7.4055 KOps/s | |
test_add_one[memmap_tensor0] | 0.1314ms | 8.9055μs | 112.2906 KOps/s | 107.1517 KOps/s | |
test_contiguous[memmap_tensor0] | 38.6720μs | 2.2045μs | 453.6206 KOps/s | 434.0575 KOps/s | |
test_stack[memmap_tensor0] | 35.1420μs | 6.6089μs | 151.3111 KOps/s | 136.7417 KOps/s | |
test_memmaptd_index | 1.0846ms | 0.4308ms | 2.3214 KOps/s | 2.2738 KOps/s | |
test_memmaptd_index_astensor | 0.7358ms | 0.4877ms | 2.0504 KOps/s | 2.0006 KOps/s | |
test_memmaptd_index_op | 1.4485ms | 1.0602ms | 943.2355 Ops/s | 903.1674 Ops/s | |
test_serialize_model | 0.1289s | 0.1285s | 7.7795 Ops/s | 7.7773 Ops/s | |
test_serialize_model_pickle | 1.3709s | 1.2169s | 0.8218 Ops/s | 0.8244 Ops/s | |
test_serialize_weights | 0.2174s | 0.1415s | 7.0683 Ops/s | 7.7707 Ops/s | |
test_serialize_weights_returnearly | 0.2469s | 56.4852ms | 17.7037 Ops/s | 15.7402 Ops/s | |
test_serialize_weights_pickle | 1.3646s | 1.2153s | 0.8228 Ops/s | 0.8219 Ops/s | |
test_reshape_pytree | 0.1223ms | 36.3498μs | 27.5105 KOps/s | 26.1450 KOps/s | |
test_reshape_td | 0.1697ms | 42.4095μs | 23.5796 KOps/s | 22.3848 KOps/s | |
test_view_pytree | 64.3130μs | 36.0009μs | 27.7771 KOps/s | 26.5827 KOps/s | |
test_view_td | 0.1306ms | 47.4525μs | 21.0737 KOps/s | 20.1497 KOps/s | |
test_unbind_pytree | 0.1396ms | 35.1143μs | 28.4784 KOps/s | 27.6548 KOps/s | |
test_unbind_td | 0.4433ms | 43.5653μs | 22.9540 KOps/s | 22.0131 KOps/s | |
test_split_pytree | 0.2231ms | 46.7419μs | 21.3941 KOps/s | 21.2976 KOps/s | |
test_split_td | 0.6813ms | 56.2747μs | 17.7700 KOps/s | 17.5443 KOps/s | |
test_add_pytree | 0.2056ms | 56.5120μs | 17.6954 KOps/s | 17.0978 KOps/s | |
test_add_td | 0.2396ms | 91.6473μs | 10.9114 KOps/s | 10.1651 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.4101ms | 0.2090ms | 4.7848 KOps/s | 4.5530 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.2956ms | 0.1566ms | 6.3848 KOps/s | 6.3454 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.2859ms | 0.1458ms | 6.8572 KOps/s | 6.6952 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.3608ms | 0.1854ms | 5.3923 KOps/s | 5.4190 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 96.2650μs | 21.4911μs | 46.5309 KOps/s | 48.0107 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 91.5750μs | 44.0388μs | 22.7073 KOps/s | 21.8589 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.2179ms | 62.4114μs | 16.0227 KOps/s | 15.6158 KOps/s | |
test_compile_copy_nested[pytree-eager] | 81.4540μs | 49.4013μs | 20.2424 KOps/s | 20.2417 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.3718ms | 0.3199ms | 3.1264 KOps/s | 2.9682 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3460ms | 0.2072ms | 4.8268 KOps/s | 4.7215 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.2393ms | 0.1276ms | 7.8378 KOps/s | 7.4036 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1230ms | 60.6710μs | 16.4823 KOps/s | 15.4431 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.4645ms | 0.3178ms | 3.1464 KOps/s | 2.8835 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.7824ms | 0.6239ms | 1.6028 KOps/s | 1.5748 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3529ms | 0.2480ms | 4.0322 KOps/s | 3.9637 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.3755ms | 0.3205ms | 3.1197 KOps/s | 2.9339 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1528ms | 71.0347μs | 14.0776 KOps/s | 13.2373 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.2551ms | 0.1287ms | 7.7674 KOps/s | 7.3379 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.6917ms | 0.5251ms | 1.9042 KOps/s | 1.8293 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.4046ms | 0.3185ms | 3.1394 KOps/s | 2.9715 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1098ms | 18.4235μs | 54.2786 KOps/s | 53.3083 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 65.5630μs | 27.3091μs | 36.6178 KOps/s | 37.1136 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1127ms | 69.8641μs | 14.3135 KOps/s | 14.3410 KOps/s | |
test_compile_copy_flat[pytree-eager] | 86.9750μs | 51.5708μs | 19.3908 KOps/s | 19.4543 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 2.3247ms | 0.8121ms | 1.2313 KOps/s | 1.0941 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.5099ms | 3.1492ms | 317.5377 Ops/s | 303.9940 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 2.2893ms | 0.8028ms | 1.2456 KOps/s | 1.1045 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.3448ms | 3.1310ms | 319.3817 Ops/s | 294.5170 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.2590ms | 0.1088ms | 9.1892 KOps/s | 8.8437 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.2027ms | 60.1744μs | 16.6184 KOps/s | 14.7427 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.2661ms | 0.1037ms | 9.6452 KOps/s | 9.4257 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.2105ms | 45.2470μs | 22.1009 KOps/s | 22.3963 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.2947ms | 0.1068ms | 9.3657 KOps/s | 9.3817 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.2174ms | 45.2771μs | 22.0862 KOps/s | 22.3843 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.2867ms | 0.1394ms | 7.1724 KOps/s | 6.7591 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.2713ms | 25.8138μs | 38.7389 KOps/s | 37.2753 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.2818ms | 0.1326ms | 7.5398 KOps/s | 7.3647 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 70.7940μs | 21.1782μs | 47.2183 KOps/s | 46.5154 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.2983ms | 0.1376ms | 7.2692 KOps/s | 7.3426 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 0.1422ms | 21.1067μs | 47.3783 KOps/s | 47.2710 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.2778ms | 0.1439ms | 6.9509 KOps/s | 6.8242 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.5163ms | 25.7435μs | 38.8447 KOps/s | 38.7692 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.2755ms | 0.1358ms | 7.3627 KOps/s | 7.0545 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 60.1640μs | 22.4886μs | 44.4669 KOps/s | 47.7510 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.2970ms | 0.1356ms | 7.3771 KOps/s | 7.0429 KOps/s | |
test_compile_indexing[int-pytree-eager] | 0.3866ms | 22.3070μs | 44.8290 KOps/s | 46.8846 KOps/s | |
test_mod_add[eager] | 0.1962ms | 34.3479μs | 29.1139 KOps/s | 28.4694 KOps/s | |
test_mod_add[compile] | 0.2186ms | 73.3171μs | 13.6394 KOps/s | 13.3429 KOps/s | |
test_mod_add[compile-overhead] | 0.2607ms | 0.1359ms | 7.3610 KOps/s | 6.9809 KOps/s | |
test_mod_wrap[eager] | 0.3864ms | 0.2441ms | 4.0974 KOps/s | 3.7931 KOps/s | |
test_mod_wrap[compile] | 0.4148ms | 0.3112ms | 3.2132 KOps/s | 3.1062 KOps/s | |
test_mod_wrap[compile-overhead] | 7.6371ms | 4.0700ms | 245.7001 Ops/s | 255.1650 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.5508ms | 1.3449ms | 743.5661 Ops/s | 689.5112 Ops/s | |
test_mod_wrap_and_backward[compile] | 2.4506ms | 1.3253ms | 754.5177 Ops/s | 686.7726 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.3216ms | 0.9064ms | 1.1033 KOps/s | 971.3738 Ops/s | |
test_seq_add[eager] | 0.2591ms | 0.1041ms | 9.6096 KOps/s | 9.4666 KOps/s | |
test_seq_add[compile] | 0.5786ms | 81.0229μs | 12.3422 KOps/s | 11.6530 KOps/s | |
test_seq_add[compile-overhead] | 0.2559ms | 0.1150ms | 8.6960 KOps/s | 8.3618 KOps/s | |
test_seq_wrap[eager] | 0.5628ms | 0.4045ms | 2.4721 KOps/s | 2.4371 KOps/s | |
test_seq_wrap[compile] | 0.4674ms | 0.3165ms | 3.1593 KOps/s | 2.9739 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3138ms | 0.2257ms | 4.4307 KOps/s | 4.2634 KOps/s | |
test_func_call_runtime[False-eager] | 0.9792ms | 0.7851ms | 1.2738 KOps/s | 1.2522 KOps/s | |
test_func_call_runtime[False-compile] | 0.9952ms | 0.8381ms | 1.1931 KOps/s | 1.2254 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.4943ms | 0.3602ms | 2.7759 KOps/s | 2.6744 KOps/s | |
test_func_call_runtime[True-eager] | 1.0529ms | 0.8946ms | 1.1178 KOps/s | 1.0886 KOps/s | |
test_func_call_runtime[True-compile] | 0.9912ms | 0.8276ms | 1.2084 KOps/s | 1.1793 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.5413ms | 0.3944ms | 2.5353 KOps/s | 2.4603 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8320ms | 0.7268ms | 1.3759 KOps/s | 1.3249 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.9537ms | 0.8208ms | 1.2184 KOps/s | 1.2195 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4854ms | 0.3631ms | 2.7540 KOps/s | 2.6586 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.1361ms | 0.9874ms | 1.0128 KOps/s | 974.6907 Ops/s | |
test_func_call_cm_runtime[True-compile] | 1.0106ms | 0.8526ms | 1.1729 KOps/s | 1.1243 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.5669ms | 0.4198ms | 2.3820 KOps/s | 2.2912 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5694ms | 2.0724ms | 482.5296 Ops/s | 478.0238 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 1.0803ms | 0.8977ms | 1.1140 KOps/s | 1.1174 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.5252ms | 0.4230ms | 2.3640 KOps/s | 2.2650 KOps/s | |
test_distributed | 3.1119ms | 0.2189ms | 4.5688 KOps/s | 8.4800 KOps/s | |
test_tdmodule | 45.9230μs | 15.1584μs | 65.9702 KOps/s | 66.6670 KOps/s | |
test_tdmodule_dispatch | 51.3720μs | 30.3527μs | 32.9460 KOps/s | 32.3328 KOps/s | |
test_tdseq | 35.5420μs | 15.9980μs | 62.5077 KOps/s | 63.8824 KOps/s | |
test_tdseq_dispatch | 63.5530μs | 33.1984μs | 30.1220 KOps/s | 29.8781 KOps/s | |
test_instantiation_functorch | 2.0231ms | 1.8516ms | 540.0629 Ops/s | 521.4223 Ops/s | |
test_instantiation_td | 1.7997ms | 1.1885ms | 841.3810 Ops/s | 816.5350 Ops/s | |
test_exec_functorch | 0.2901ms | 0.2094ms | 4.7752 KOps/s | 4.6653 KOps/s | |
test_exec_functional_call | 0.3567ms | 0.2086ms | 4.7939 KOps/s | 4.6622 KOps/s | |
test_exec_td | 0.3587ms | 0.2137ms | 4.6804 KOps/s | 4.5080 KOps/s | |
test_exec_td_decorator | 0.5882ms | 0.2545ms | 3.9288 KOps/s | 3.6734 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.8233ms | 0.6853ms | 1.4591 KOps/s | 1.3949 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.8225ms | 0.6831ms | 1.4639 KOps/s | 1.3809 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.9909ms | 0.5727ms | 1.7462 KOps/s | 1.6439 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7303ms | 0.5746ms | 1.7405 KOps/s | 1.6340 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8747ms | 0.6971ms | 1.4345 KOps/s | 1.4164 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0107ms | 0.7009ms | 1.4266 KOps/s | 1.4089 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7382ms | 0.5863ms | 1.7056 KOps/s | 1.6062 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8015ms | 0.6134ms | 1.6302 KOps/s | 1.6065 KOps/s | |
test_vmap_transformer_speed[True-True] | 8.5432ms | 8.3447ms | 119.8372 Ops/s | 117.4525 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.5293ms | 8.3130ms | 120.2939 Ops/s | 117.6782 Ops/s | |
test_vmap_transformer_speed[False-True] | 8.4607ms | 8.1389ms | 122.8667 Ops/s | 120.2123 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.5834ms | 8.1946ms | 122.0311 Ops/s | 120.6638 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 20.3675ms | 19.5528ms | 51.1434 Ops/s | 50.3290 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 20.2711ms | 19.6276ms | 50.9488 Ops/s | 50.7365 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 20.2029ms | 19.5081ms | 51.2607 Ops/s | 51.2882 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 20.1613ms | 19.4185ms | 51.4973 Ops/s | 51.0228 Ops/s | |
test_to_module_speed[True] | 1.5128ms | 0.9251ms | 1.0809 KOps/s | 1.0623 KOps/s | |
test_to_module_speed[False] | 1.3270ms | 0.8968ms | 1.1151 KOps/s | 1.0804 KOps/s | |
test_tc_init | 71.6240μs | 34.7541μs | 28.7736 KOps/s | 28.5148 KOps/s | |
test_tc_init_nested | 0.1079ms | 70.0218μs | 14.2813 KOps/s | 13.6737 KOps/s | |
test_tc_first_layer_tensor | 5.0817μs | 0.6548μs | 1.5271 MOps/s | 1.4771 MOps/s | |
test_tc_first_layer_nontensor | 27.4410μs | 2.2216μs | 450.1296 KOps/s | 451.4787 KOps/s | |
test_tc_second_layer_tensor | 9.5403μs | 1.4135μs | 707.4777 KOps/s | 726.4368 KOps/s | |
test_tc_second_layer_nontensor | 82.0840μs | 2.9802μs | 335.5438 KOps/s | 339.1467 KOps/s | |
test_unbind | 0.1962s | 11.9545ms | 83.6507 Ops/s | 92.4827 Ops/s | |
test_full_like | 0.7522ms | 0.5752ms | 1.7386 KOps/s | 1.7306 KOps/s | |
test_zeros_like | 0.3535ms | 0.1981ms | 5.0468 KOps/s | 5.0517 KOps/s | |
test_ones_like | 0.3570ms | 0.1981ms | 5.0475 KOps/s | 5.0547 KOps/s | |
test_clone | 0.5659ms | 0.4140ms | 2.4156 KOps/s | 2.4102 KOps/s | |
test_squeeze | 0.1542ms | 10.8458μs | 92.2014 KOps/s | 96.0070 KOps/s | |
test_unsqueeze | 0.2372ms | 78.1162μs | 12.8014 KOps/s | 13.0979 KOps/s | |
test_split | 0.3846ms | 0.1589ms | 6.2951 KOps/s | 6.0507 KOps/s | |
test_permute | 0.2407ms | 0.1809ms | 5.5280 KOps/s | 5.2144 KOps/s | |
test_stack | 1.3995ms | 0.8780ms | 1.1389 KOps/s | 1.1679 KOps/s | |
test_cat | 1.2587ms | 1.2318ms | 811.8374 Ops/s | 811.7973 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
enhancement
New feature or request
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added CudaGraphModule class to provide a user-friendly interface to CUDA graphs for PyTorch callables.
This class enables fast, CPU-overhead-free execution of operations on GPU while ensuring essential checks for input functions. Includes documentation and example usage.
cc @mikaylagawarecki @albanD @eellison @BoyuanFeng @Chillee