forked from pytorch/torchrec
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: # context * added comments and changed some variable name for better readability # explicit assumptions * constraint is the total_hbm * objective function is the **sum** of the perf metric over all tables # implicit assumptions * each table is evenly sharded to all the devices, or can be treated as evenly sharded, so we don't need to consider bottleneck effect * device doesn't need to be identical, i.e., could have different hbm storage * each hbm_bin is distributed over all the devices (equivalently) # variables * `proposal`: includes a sharding option for each table * `proposal_list`: includes a list of `proposal`s, each of which has the best perf under a given total_hbm constraint. * `hbm_by_fqn`: memory constraint lookup table: [table_id][sharding_option_id] * `perf_by_fqn`: performance metrics lookup table: [table_id][sharding_option_id] NOTE: hbm is measured in unit of `bin`, such as 0.4 bin, 1.6 bin, etc. # dp table * dimensions: `table_count` x `bin_count` x `case`, where `case` is a tuple of (`perf`, `hbm`) * dp table caches the best **case** that has [0 - table_i] tables, under given `hbm`. NOTE: **memory complexity** is `table_count` x `bin_count`, assuming `bin_count` >> `option_count`, otherwise `table_count` x `option_count`. **time complexity** is `table_count` x `bin_count` x `option_count` * firstly, `table_i` loops over the tables to add each table one by one * secondly, `option_j` loops over the options of the current `table_i` * thirdly, `hbm` loops over the bin_count for each hbm constraint to find the minimal # usage * dp algorithm will only be called once when `self._inited == False` * `propose` will return a proposal from the `proposal_list` * each time calling the `feedback` will move the current proposal to the next (from a higher bhm constraint to a lower one) Differential Revision: D61565731
- Loading branch information
1 parent
b6380be
commit 313d4e7
Showing
1 changed file
with
68 additions
and
47 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters