-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DLRM Model Computational Graph #1532
base: repo-refactor
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)
lib/models/src/models/dlrm/dlrm.cc
line 44 at r1 (raw file):
} tensor_guid_t create_dlrm_mlp(ComputationGraphBuilder &cgb,
This MLP implementation refers to the original FlexFlow implementation at https://github.com/flexflow/FlexFlow/blob/inference/examples/cpp/DLRM/dlrm.cc#L44. Not sure if there was a specific reason to do this in the past. We can decide what to do here. Torchrec doesn't have this logic to initialize the initializers.
@lockshaw
lib/models/src/models/dlrm/dlrm.cc
line 145 at r1 (raw file):
/*mlp_layers=*/config.dense_arch_layer_sizes); std::vector<tensor_guid_t> emb_outputs;
As suggested, I tried this:
std::vector<tensor_guid_t> emb_outputs = transform(
zip(config.embedding_size, sparse_inputs),
[&](std::pair<int, tensor_guid_t> &combined_pair) -> tensor_guid_t {
return create_dlrm_sparse_embedding_network(
/*cgb=*/cgb,
/*config=*/config,
/*input=*/combined_pair.second,
/*input_dim=*/combined_pair.first,
/*output_dim=*/config.embedding_dim);
});
But I couldn't make it compile by using transform and zip...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)
lib/models/src/models/dlrm/dlrm.cc
line 128 at r1 (raw file):
// Create input tensors std::vector<tensor_guid_t> sparse_inputs(
This had the following review comment before:
Quote:
I think this current implementation makes sparse_inputs
a vector of the same tensor_guid_t
over and over again, rather than being a vector of identically-shaped tensors. Was that intentional? If not, then maybe the below code would be better?
std::vector<tensor_guid_t> sparse_inputs = repeat(config.embedding_size.size(), [&]() { return create_input_tensor({config.batch_size, config.embedding_bag_size}, DataType::INT64); });
The intention was to create multiple tensors as the inputs. I'm not sure if this vector creation syntax will lead to having all inputs referring to a single tensor? I would think create_input_tensor
should return a copy of tensor for each item in the vector?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## repo-refactor #1532 +/- ##
=================================================
- Coverage 78.16% 78.13% -0.04%
=================================================
Files 860 862 +2
Lines 27994 28286 +292
Branches 770 772 +2
=================================================
+ Hits 21881 22100 +219
- Misses 6113 6186 +73
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check FB, Torchrec and the paper for DLRM initializer implementation to decide whether we want to keep the special logic.
Try to generate the dot representation.
Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @easyeasydev, @lockshaw, and @reyna-abhyankar)
lib/models/include/models/dlrm/dlrm_config.struct.toml
line 29 at r1 (raw file):
[[fields]] name = "embedding_bag_size" type = "size_t"
Prefer int
(see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)
lib/models/include/models/dlrm/dlrm_config.struct.toml
line 37 at r1 (raw file):
[[fields]] name = "dense_arch_layer_sizes" type = "std::vector<size_t>"
Prefer int
to avoid confusing failures if negative values are used (the long-term solution is to use the nonnegative_int
getting added in #1533, but for now this is the best solution)
Suggestion:
type = "std::vector<int>
lib/models/include/models/dlrm/dlrm_config.struct.toml
line 41 at r1 (raw file):
[[fields]] name = "over_arch_layer_sizes" type = "std::vector<size_t>"
Prefer int
(see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)
lib/models/include/models/dlrm/dlrm_config.struct.toml
line 45 at r1 (raw file):
[[fields]] name = "arch_interaction_op" type = "std::string"
Would this make more sense as an enum? Is there a list of the allowed arch interaction ops somewhere? Based on https://github.com/facebookresearch/dlrm/blob/64063a359596c72a29c670b4fcc9450bb342e764/dlrm_s_pytorch.py#L483-L515 it seems that the only possible values are dot
and cat
--is that the implementation you were referring to?
lib/models/include/models/dlrm/dlrm_config.struct.toml
line 49 at r1 (raw file):
[[fields]] name = "batch_size" type = "size_t"
Prefer int
(see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)
lib/models/src/models/dlrm/dlrm.cc
line 10 at r1 (raw file):
/** * @brief Get the default DLRM config.
Why have this in the .cc
file rather than the .h
file? Does it make any difference in the rendered doxygen docs? (recall that you can run proj doxygen --browser
to see the docs)
lib/models/src/models/dlrm/dlrm.cc
line 13 at r1 (raw file):
* * @details The configs here refer to the example at * https://github.com/flexflow/FlexFlow/blob/inference/examples/cpp/DLRM/dlrm.cc.
This link is great--always welcome to add more of these! 🙂
lib/models/src/models/dlrm/dlrm.cc
line 13 at r1 (raw file):
* * @details The configs here refer to the example at * https://github.com/flexflow/FlexFlow/blob/inference/examples/cpp/DLRM/dlrm.cc.
Can you change this to a permalink? Right now if inference
moves or deletes the dlrm.cc
file this url is going to 404, or could also just end up going out-of-date and being confusing. For this type of thing I'd prefer permalinks wherever possible
lib/models/src/models/dlrm/dlrm.cc
line 44 at r1 (raw file):
Previously, easyeasydev wrote…
This MLP implementation refers to the original FlexFlow implementation at https://github.com/flexflow/FlexFlow/blob/inference/examples/cpp/DLRM/dlrm.cc#L44. Not sure if there was a specific reason to do this in the past. We can decide what to do here. Torchrec doesn't have this logic to initialize the initializers.
@lockshaw
This at least looks similar? https://github.com/facebookresearch/dlrm/blob/64063a359596c72a29c670b4fcc9450bb342e764/dlrm_s_pytorch.py#L218-L228
lib/models/src/models/dlrm/dlrm.cc
line 49 at r1 (raw file):
std::vector<size_t> const &mlp_layers) { tensor_guid_t t = input; for (size_t i = 0; i < mlp_layers.size() - 1; i++) {
You could also just zip mlp_layers
and mlp_layers[1:]
if you want to avoid all the careful index calculation
Code quote:
for (size_t i = 0; i < mlp_layers.size() - 1; i++) {
lib/models/src/models/dlrm/dlrm.cc
line 50 at r1 (raw file):
tensor_guid_t t = input; for (size_t i = 0; i < mlp_layers.size() - 1; i++) { float std_dev = sqrt(2.0f / (mlp_layers[i + 1] + mlp_layers[i]));
Suggestion:
float std_dev = sqrt(2.0f / (mlp_layers.at(i + 1) + mlp_layers.at(i)));
lib/models/src/models/dlrm/dlrm.cc
line 58 at r1 (raw file):
}}; std_dev = sqrt(2.0f / mlp_layers[i + 1]);
Suggestion:
std_dev = sqrt(2.0f / mlp_layers.at(i + 1));
lib/models/src/models/dlrm/dlrm.cc
line 66 at r1 (raw file):
t = cgb.dense(/*input=*/t, /*outDim=*/mlp_layers[i + 1],
Suggestion:
/*outDim=*/mlp_layers.at(i + 1),
lib/models/src/models/dlrm/dlrm.cc
line 102 at r1 (raw file):
tensor_guid_t const &bottom_mlp_output, std::vector<tensor_guid_t> const &emb_outputs) { if (config.arch_interaction_op != "cat") {
Considering that there are only two interaction ops, it might be cleaner to just use an enum?
lib/models/src/models/dlrm/dlrm.cc
line 128 at r1 (raw file):
Previously, easyeasydev wrote…
This had the following review comment before:
Quote:
I think this current implementation makessparse_inputs
a vector of the sametensor_guid_t
over and over again, rather than being a vector of identically-shaped tensors. Was that intentional? If not, then maybe the below code would be better?std::vector<tensor_guid_t> sparse_inputs = repeat(config.embedding_size.size(), [&]() { return create_input_tensor({config.batch_size, config.embedding_bag_size}, DataType::INT64); });
The intention was to create multiple tensors as the inputs. I'm not sure if this vector creation syntax will lead to having all inputs referring to a single tensor? I would think
create_input_tensor
should return a copy of tensor for each item in the vector?
Using a lambda would fix it. I'm actually a bit confused about how this current code compiles--based on the source code of repeat
(
result.push_back(f()); |
F
need to be callable?
lib/models/src/models/dlrm/dlrm.cc
line 135 at r1 (raw file):
tensor_guid_t dense_input = create_input_tensor( {config.batch_size, config.dense_arch_layer_sizes.front()}, DataType::HALF); // TODO: change this to DataType::FLOAT
Leaving a comment so we don't accidentally merge this PR before this gets addressed
lib/models/src/models/dlrm/dlrm.cc
line 145 at r1 (raw file):
Previously, easyeasydev wrote…
As suggested, I tried this:
std::vector<tensor_guid_t> emb_outputs = transform( zip(config.embedding_size, sparse_inputs), [&](std::pair<int, tensor_guid_t> &combined_pair) -> tensor_guid_t { return create_dlrm_sparse_embedding_network( /*cgb=*/cgb, /*config=*/config, /*input=*/combined_pair.second, /*input_dim=*/combined_pair.first, /*output_dim=*/config.embedding_dim); });
But I couldn't make it compile by using transform and zip...
Can you give me an error message? It's a bit hard to debug without any information.
Also, if you'd like a slightly syntactically-cleaner solution (I'd probably recommend this if you don't mind the little bit of extra work), feel free to just add the zip_with.h
implementation from my current branch into this PR, along with the associated tests: https://github.com/lockshaw/FlexFlow/blob/e635b1fe6a1c8a2e8ecb14fe301a4e3843be65fb/lib/utils/include/utils/containers/zip_with.h
lib/pcg/src/pcg/computation_graph_builder.cc
line 168 at r1 (raw file):
std::optional<std::string> const &name) { // NOT_IMPLEMENTED() return input;
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @easyeasydev and @reyna-abhyankar)
lib/models/src/models/dlrm/dlrm.cc
line 135 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Leaving a comment so we don't accidentally merge this PR before this gets addressed
Done.
lib/pcg/src/pcg/computation_graph_builder.cc
line 168 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Will do.
Done.
Description of changes:
This PR adds the computational graph of DLRM model.
Related Issues:
Linked Issues:
Issues closed by this PR:
This change is