Skip to content

Commit

Permalink
Fix format bullet points
Browse files Browse the repository at this point in the history
  • Loading branch information
PaulZhang12 committed Oct 2, 2024
1 parent 875c97b commit d4ed6bd
Showing 1 changed file with 13 additions and 12 deletions.
25 changes: 13 additions & 12 deletions intermediate_source/torchrec_interactive_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@
# Embeddings are trained in RecSys through the following process:
#
# * **Input/lookup indices are fed into the model, as unique IDs**. IDs are
# hashed to the total size of the embedding table to prevent issues when
# the ID > number of rows
# hashed to the total size of the embedding table to prevent issues when
# the ID > number of rows
#
# * Embeddings are then retrieved and **pooled, such as taking the sum or
# mean of the embeddings**. This is required as there can be a variable number of
Expand Down Expand Up @@ -220,7 +220,7 @@
# ------------------------------
#
# This section goes over TorchRec Modules and data types including such
# entities as ``EmbeddingCollection``and ``EmbeddingBagCollection``,
# entities as ``EmbeddingCollection`` and ``EmbeddingBagCollection``,
# ``JaggedTensor``, ``KeyedJaggedTensor``, ``KeyedTensor`` and more.
#
# From ``EmbeddingBag`` to ``EmbeddingBagCollection``
Expand Down Expand Up @@ -918,17 +918,18 @@ def _wait_impl(self) -> torch.Tensor:
# very sensitive to **performance and size of the model**. Running just
# the trained model in a Python environment is incredibly inefficient.
# There are two key differences between inference and training
# environments: \* **Quantization**: Inference models are typically
# quantized, where model parameters lose precision for lower latency in
# predictions and reduced model size. For example FP32 (4 bytes) in
# trained model to INT8 (1 byte) for each embedding weight. This is also
# necessary given the vast scale of embedding tables, as we want to use as
# few devices as possible for inference to minimize latency.
# environments:
# * **Quantization**: Inference models are typically
# quantized, where model parameters lose precision for lower latency in
# predictions and reduced model size. For example FP32 (4 bytes) in
# trained model to INT8 (1 byte) for each embedding weight. This is also
# necessary given the vast scale of embedding tables, as we want to use as
# few devices as possible for inference to minimize latency.
#
# * **C++ environment**: Inference latency is very important, so in order to ensure
# ample performance, the model is typically ran in a C++ environment,
# along with the situations where we don't have a Python runtime, like on
# device.
# ample performance, the model is typically ran in a C++ environment,
# along with the situations where we don't have a Python runtime, like on
# device.
#
# TorchRec provides primitives for converting a TorchRec model into being
# inference ready with:
Expand Down

0 comments on commit d4ed6bd

Please sign in to comment.