Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory optimization algorithm #1523

Open
wants to merge 7 commits into
base: repo-refactor
Choose a base branch
from
Open

Conversation

wmdi
Copy link
Collaborator

@wmdi wmdi commented Oct 13, 2024

Description of changes:

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@wmdi wmdi added repo-refactor Topics related to the repo and search refactors rr-compiler Repo refactor compiler library labels Oct 13, 2024
@wmdi wmdi requested a review from lockshaw October 13, 2024 22:11
@wmdi wmdi self-assigned this Oct 13, 2024
Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 17 files at r1, all commit messages.
Reviewable status: 11 of 17 files reviewed, 10 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/cost_estimator/cost_estimator.h line 16 at r1 (raw file):

struct ICostEstimator {
  virtual CostMetric estimate_cost(OpCostEstimateKey const &) const = 0;
  virtual CostMetric estimate_cost(TensorSetMovement const &) const = 0;

Why does communication cost return a memory usage?


lib/compiler/include/compiler/machine_mapping/feasible_machine_mapping_result.struct.toml line 16 at r1 (raw file):

[[fields]]
name = "cost"
type = "::FlexFlow::CostMetric"

Separate out into different result type


lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 23 at r1 (raw file):

    MachineSpecification const &resources,
    MachineMappingConstraints const &constraints,
    MachineMemoryConstraints const &memory_constraints,

I think for now let's separate the memory optimization algorithm from the existing algorithm. There may be a bit of code duplication, but I don't think it should be too much as they can both use a lot of existing infrastructure, and it is likely to make things a lot more maintainable in the long run.


lib/compiler/include/compiler/machine_mapping/machine_mapping_config.struct.toml line 2 at r1 (raw file):

namespace = "FlexFlow"
name = "MachineMappingConfig"

Remove in favor of separate algorithm implementation


lib/compiler/include/compiler/machine_mapping/machine_mapping_result.h line 27 at r1 (raw file):

                       &parallel_split_transformation);
[[nodiscard]] MachineMappingResult
    parallel_combine(MachineMappingConfig const &config,

Add operations for a new result type rather than overriding these


lib/compiler/include/compiler/machine_mapping/machine_memory_constraints/machine_memory_constraints.struct.toml line 2 at r1 (raw file):

namespace = "FlexFlow"
name = "MachineMemoryConstraints"

Probably better to just make this a field on MachineSpecification


lib/compiler/src/compiler/cost_estimator/cost_metric.cc line 10 at r1 (raw file):

      /*memory=*/0,
  };
}

I'm not really convinced this is all that useful of a function?


lib/compiler/src/compiler/cost_estimator/cost_metric.cc line 36 at r1 (raw file):

  for (CostMetric const &cost : costs) {
    result = combine_cost_metrics_intra_device_sequential(result, cost);
  }

foldl1 from utils/containers?

Code quote:

  CostMetric result = zero_cost_metric();
  for (CostMetric const &cost : costs) {
    result = combine_cost_metrics_intra_device_sequential(result, cost);
  }

lib/compiler/src/compiler/cost_estimator/cost_metric.cc line 40 at r1 (raw file):

}

CostMetric combine_cost_metrics_intra_device_parallel(CostMetric const &c1,

What does "intra_device" mean here?

Code quote:

intra_device

lib/compiler/src/compiler/cost_estimator/cost_metric.cc line 51 at r1 (raw file):

  for (CostMetric const &cost : costs) {
    result = combine_cost_metrics_intra_device_parallel(result, cost);
  }

foldl1 from utils/containers?

Code quote:

  CostMetric result = zero_cost_metric();
  for (CostMetric const &cost : costs) {
    result = combine_cost_metrics_intra_device_parallel(result, cost);
  }

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 26 files reviewed, 10 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/cost_estimator/cost_estimator.h line 16 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why does communication cost return a memory usage?

Done.


lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 23 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I think for now let's separate the memory optimization algorithm from the existing algorithm. There may be a bit of code duplication, but I don't think it should be too much as they can both use a lot of existing infrastructure, and it is likely to make things a lot more maintainable in the long run.

Done.


lib/compiler/include/compiler/machine_mapping/machine_mapping_config.struct.toml line 2 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Remove in favor of separate algorithm implementation

Done.


lib/compiler/include/compiler/machine_mapping/machine_memory_constraints/machine_memory_constraints.struct.toml line 2 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Probably better to just make this a field on MachineSpecification

Done.

Copy link

codecov bot commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 94.12568% with 43 lines in your changes missing coverage. Please review.

Project coverage is 78.41%. Comparing base (1d5140d) to head (0c0e7b0).

Files with missing lines Patch % Lines
...ompiler/src/compiler/cost_estimator/cost_metric.cc 0.00% 26 Missing ⚠️
...ization/get_optimal_machine_mapping_with_memory.cc 93.33% 10 Missing ⚠️
...ompiler/machine_mapping/cost_estimator_for_test.cc 83.33% 4 Missing ⚠️
..._optimization/machine_mapping_cache_with_memory.cc 80.00% 2 Missing ⚠️
...optimization/machine_mapping_result_with_memory.cc 98.64% 1 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##           repo-refactor    #1523      +/-   ##
=================================================
+ Coverage          78.16%   78.41%   +0.24%     
=================================================
  Files                860      866       +6     
  Lines              27994    28917     +923     
  Branches             770      784      +14     
=================================================
+ Hits               21881    22674     +793     
- Misses              6113     6243     +130     
Flag Coverage Δ
unittests 78.41% <94.12%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...r/include/compiler/cost_estimator/cost_estimator.h 100.00% <ø> (ø)
...iler/src/compiler/cost_estimator/cost_estimator.cc 100.00% <100.00%> (ø)
...ization/get_optimal_machine_mapping_with_memory.cc 100.00% <100.00%> (ø)
...optimization/machine_mapping_result_with_memory.cc 100.00% <100.00%> (ø)
...optimization/machine_mapping_result_with_memory.cc 98.64% <98.64%> (ø)
..._optimization/machine_mapping_cache_with_memory.cc 80.00% <80.00%> (ø)
...ompiler/machine_mapping/cost_estimator_for_test.cc 89.47% <83.33%> (-10.53%) ⬇️
...ization/get_optimal_machine_mapping_with_memory.cc 93.33% <93.33%> (ø)
...ompiler/src/compiler/cost_estimator/cost_metric.cc 0.00% <0.00%> (ø)

... and 113 files with indirect coverage changes

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing is some unit testing should be added--currently the test coverage for this PR is 3.5%.

Reviewed 2 of 17 files at r1, 22 of 22 files at r2, all commit messages.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/cost_estimator/cost_estimator.h line 18 at r2 (raw file):

  virtual float estimate_cost(TensorSetMovement const &) const = 0;
  virtual CostMetric
      estimate_cost_with_memory(OpCostEstimateKey const &) const = 0;

I don't think we need another function here--communication shouldn't return a memory usage, but returning memory usage for an operator is fine.

it would be nice to get a more descriptive name than CostMetric, maybe something like OperatorCostMetrics to specify it's just for operators and it contains more than one "metric"?

Suggestion:

  virtual CostMetric estimate_cost(OpCostEstimateKey const &) const = 0;
  virtual float estimate_cost(TensorSetMovement const &) const = 0;

lib/compiler/include/compiler/cost_estimator/cost_estimator.h line 31 at r2 (raw file):

  float estimate_cost(OpCostEstimateKey const &k) const;
  float estimate_cost(TensorSetMovement const &m) const;
  CostMetric estimate_cost_with_memory(OpCostEstimateKey const &k) const;

Suggestion:

  CostMetric estimate_cost(OpCostEstimateKey const &k) const;
  float estimate_cost(TensorSetMovement const &m) const;

lib/compiler/include/compiler/cost_estimator/cost_metric.h line 7 at r2 (raw file):

#include <vector>

namespace FlexFlow {

These operations feel like something algorithm-specific rather than relevant to cost_estimator--I have no problem with them existing, but maybe the file should be somewhere else? Would the new machine_mapping_with_memory directory be appropriate?


lib/compiler/include/compiler/machine_mapping/machine_mapping_config.struct.toml line 2 at r1 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Done.

This file still seems to be here? Is it still necessary?


lib/compiler/include/compiler/machine_mapping/memory_optimization/machine_mapping_cache_with_memory.struct.toml line 0 at r2 (raw file):
Rename to machine_mapping_with_memory_cache.struct.toml


lib/compiler/include/compiler/machine_mapping/memory_optimization/machine_mapping_cache_with_memory.struct.toml line 2 at r2 (raw file):

namespace = "FlexFlow"
name = "MachineMappingCacheWithMemory"

Suggestion:

name = "MachineMappingWithMemoryCache"

lib/compiler/include/compiler/machine_mapping/memory_optimization/machine_mapping_result_with_memory.struct.toml line 2 at r2 (raw file):

namespace = "FlexFlow"
name = "MachineMappingResultWithMemory"

Suggestion:

ame = "MachineMappingWithMemoryResult"

lib/compiler/include/compiler/machine_mapping/memory_optimization/machine_mapping_result_with_memory.struct.toml line 20 at r2 (raw file):

[[fields]]
name = "machine_mappings"
type = "std::unordered_set<::FlexFlow::SingleMachineMapping>"

What does this set represent? What is the meaning behind a SingleMachineMapping--is that a machine mapping for a single layer, or a single pareto-optimal solution, or something else entirely? SingleMachineMapping likely needs a clearer name.

Code quote:

type = "std::unordered_set<::FlexFlow::SingleMachineMapping>"

lib/compiler/src/compiler/machine_mapping/memory_optimization/machine_mapping_cache_with_memory.cc line 12 at r2 (raw file):

std::optional<MachineMappingResultWithMemory>
    machine_mapping_cache_with_memory_load(

For all the functions here

Suggestion:

    machine_mapping_with_memory_cache_load(

lib/compiler/src/compiler/machine_mapping/memory_optimization/machine_mapping_result_with_memory.cc line 26 at r2 (raw file):

}

MachineMappingResultWithMemory remove_non_dominating_machine_mapping_result(

Suggestion:

remove_non_pareto_optimal_machine_mapping_results

lib/compiler/test/src/compiler/machine_mapping/cost_estimator_for_test.h line 32 at r2 (raw file):

  CostMetric
      estimate_cost_with_memory(OpCostEstimateKey const &) const override;
};

Suggestion:

struct TestCostEstimator : public ICostEstimator {
  std::function<CostMetric(OpCostEstimateKey const &)> get_operator_cost;
  std::function<float(TensorSetMovement const &)> get_communication_cost;

  TestCostEstimator() = delete;
  TestCostEstimator(decltype(get_operator_cost) const &get_operator_cost,
                    decltype(get_communication_cost)
                        const &get_communication_cost,
                    decltype(get_operator_cost_with_memory)
                        const &get_operator_cost_with_memory);

  CostMetric estimate_cost(OpCostEstimateKey const &) const override;

  float estimate_cost(TensorSetMovement const &) const override;
};

lib/compiler/test/src/compiler/machine_mapping/cost_estimator_for_test.h line 54 at r2 (raw file):

    std::unordered_map<TensorSetMovement, float> const &comm_cost_map,
    std::unordered_map<OpCostEstimateKey, CostMetric> const
        &op_cost_with_memory_map);

Suggestion:

CostEstimator make_fake_cost_estimator(
    std::function<CostMetric(OpCostEstimateKey const &)> const &get_operator_cost,
    std::function<float(TensorSetMovement const &)> const
        &get_communication_cost);

CostEstimator make_fake_cost_estimator(
    std::unordered_map<OpCostEstimateKey, CostMetric> const &op_cost_map,
    std::unordered_map<TensorSetMovement, float> const &comm_cost_map);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repo-refactor Topics related to the repo and search refactors rr-compiler Repo refactor compiler library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants