MetricCountSeqAndTokens counst tokens in label (if exist) + improving epoch metrics print #352

michalozeryflato · 2024-05-07T11:35:54Z

add optional decoder_input to MetricCountSeqAndTokens
print traceback before throw - helps in debugging
modify epoch stats printing - each column has its own (fixed) width

2. print traceback before throw - helps in debugging 3. modify epoch stats printing - each column has its own (fixed) width

SagiPolaczek · 2024-05-07T11:54:03Z

fuse/data/ops/op_base.py

@@ -134,6 +135,7 @@ def op_call(
 + f"error in __call__ method of op={op}, op_id={op_id}, sample_id={get_sample_id(sample_dict)} - more details below"
 + "*************************************************************************************************************************************\n"
 )
+ print(traceback.print_exc())


Are you sure you need to wrap it with print() ? And I think we still get the traceback so I'm not sure why it's needed..

Removed the print().
adding traceback.print_exc() helps me debug -makes it easier to locate the source of the problem closer and debug it.
If you don't find this useful, I will revert the change, and make it only in my local copy of the file

SagiPolaczek · 2024-05-07T12:09:27Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

 ignore_index: Optional[int] = None,
 ) -> Dict[str, torch.Tensor]:
 """Count number of sequences and tokens
 Args:
 encoder_input_key:
 key to encoder_input
+ decoder_input_key:
+ key to encoder_input


key to *decoder_input :)

SagiPolaczek · 2024-05-07T12:15:06Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ if decoder_input_key is not None and decoder_input_key in batch_dict:
+ decoder_input = batch_dict[decoder_input_key].detach()
+ mask2 = get_mask(decoder_input, ignore_index)
+ assert mask2.shape[0] == mask.shape[0]
+ token_num += mask2.sum().to(dtype=torch.int64)


Just making sure:
Here's the support for counting the labels tokens as well?

If so, best to double check with @mosheraboh :)

@SagiPolaczek I updated @mosheraboh, but better confirm the change with him.

mosheraboh

LGTM

mosheraboh · 2024-05-16T12:23:20Z

fuse/utils/misc/misc.py

- col_width = max(max_col_width, max_val_width, col_width)
-
- dashes = (col_width + 2) * len(df.columns.values)
+ max_val_width_per_col = np.vectorize(len)(df.values.astype(str)).max(


Here you've modified the code to set the width per column instead of a single width for all columns?

yes, to make the overall width smaller - so it better fits the screen

mosheraboh · 2024-05-16T12:24:11Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

 ignore_index: Optional[int] = None,
 ) -> Dict[str, torch.Tensor]:
 """Count number of sequences and tokens
 Args:
 encoder_input_key:
 key to encoder_input
+ decoder_input_key:
+ key to encoder_input


Note that the previous command preds[:, target] created a matrix of size [n,n]. Another simple alternative is preds[torch.arange(preds.shape[0]), target]

mosheraboh

Looks great

mosheraboh · 2024-05-23T09:54:27Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

@@ -177,18 +194,18 @@ def _perplexity_update(
 preds = preds.detach()
 target = target.detach()

- preds = preds.reshape(-1, preds.shape[-1])
- target = target.reshape(-1)
+ preds = preds.view(-1, preds.shape[-1])


why? to save memory?

@mosheraboh From the documentation of reshape: When possible, the returned tensor will be a view of input. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.

Question is:
(1) do we assume view will always work, and otherwise alert the user. Or
(2) do we want to be permissive and copy the data (despite additional GPU memory) when view is not possible (i.e. memory is not contiguous)

Is it really that expensive in memory?

If it is, than I would warn the user conditionally to Tensor.is_contiguous() result :)

mosheraboh · 2024-05-23T09:56:10Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

@@ -137,6 +153,7 @@ def __init__(

 # Copied internal function https://github.com/Lightning-AI/metrics/blob/825d17f32ee0b9a2a8024c89d4a09863d7eb45c3/src/torchmetrics/functional/text/perplexity.py#L68
 # copied and not imported to not be affected by internal interface modifications.
+# modifications: (1) reshape => view (2) apply mask at the beginning of computation (3) use torch.gather


Nice solution.

SagiPolaczek

THANK YOU!

Two comments inline

SagiPolaczek · 2024-05-23T10:56:49Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

- target = target.where(
- target != ignore_index, torch.tensor(0, device=target.device)
- )
+ target = target[mask]


Note that we change the shape of target here:

With Tensor.where() we keep the same shape, but with target[mask] we take only the values where the mask is True.

I quickly ran an example:

>>> t = torch.tensor([[1, 2], [3, 4]]) >>> t tensor([[1, 2], [3, 4]]) >>> t.ne(1) tensor([[False, True], [ True, True]]) >>> m = t.ne(1) >>> t[m] tensor([2, 3, 4]) >>> t.where( t != 1, 0) tensor([[0, 2], [3, 4]])

I'm not sure it's critical (maybe it's even better, saving memory)

yes, I have reduced the side of pred and target, since at the end the entries corresponding to the mask are dropped. So it seems to produce the same results, but with less memory/time (not having to process the masked entries)

SagiPolaczek · 2024-05-23T11:04:21Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

@@ -177,18 +194,18 @@ def _perplexity_update(
 preds = preds.detach()
 target = target.detach()

- preds = preds.reshape(-1, preds.shape[-1])
- target = target.reshape(-1)
+ preds = preds.view(-1, preds.shape[-1])


Is it really that expensive in memory?

If it is, than I would warn the user conditionally to Tensor.is_contiguous() result :)

SagiPolaczek

LGTM! Thanks!

SagiPolaczek · 2024-05-23T17:22:30Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

 if ignore_index is not None:
 mask = target.ne(ignore_index)


I guess we can use get_mask() here

1. add optional decoder_input to MetricCountSeqAndTokens

a6429ab

2. print traceback before throw - helps in debugging 3. modify epoch stats printing - each column has its own (fixed) width

michalozeryflato requested review from SagiPolaczek and mosheraboh May 7, 2024 11:35

SagiPolaczek reviewed May 7, 2024

View reviewed changes

michalozeryflato added 2 commits May 7, 2024 15:34

addressing Sagi's comments

f1f36a3

Merge branch 'master' into support_ul2

b0a3a56

mosheraboh previously approved these changes May 16, 2024

View reviewed changes

michalozeryflato added 2 commits May 17, 2024 18:17

Merge branch 'master' into support_ul2

df0cf14

improve _perplexity_update: to consume less GPU memory.

0553d82

Note that the previous command preds[:, target] created a matrix of size [n,n]. Another simple alternative is preds[torch.arange(preds.shape[0]), target]

michalozeryflato dismissed mosheraboh’s stale review via 0553d82 May 23, 2024 06:08

fix a typo in the documentation

1dabbea

michalozeryflato requested review from mosheraboh and SagiPolaczek May 23, 2024 06:13

mosheraboh previously approved these changes May 23, 2024

View reviewed changes

fix exceptions in _perplexity_update - caught by the test code!!

2087bdb

michalozeryflato dismissed mosheraboh’s stale review via 2087bdb May 23, 2024 10:31

SagiPolaczek previously approved these changes May 23, 2024

View reviewed changes

michalozeryflato dismissed SagiPolaczek’s stale review via 34d4d8a May 23, 2024 12:01

revert back to reshape

34d4d8a

michalozeryflato requested a review from SagiPolaczek May 23, 2024 16:57

SagiPolaczek approved these changes May 23, 2024

View reviewed changes

michalozeryflato merged commit 1810465 into master May 23, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetricCountSeqAndTokens counst tokens in label (if exist) + improving epoch metrics print #352

MetricCountSeqAndTokens counst tokens in label (if exist) + improving epoch metrics print #352

michalozeryflato commented May 7, 2024

SagiPolaczek May 7, 2024

michalozeryflato May 7, 2024

SagiPolaczek May 7, 2024

mosheraboh May 16, 2024

SagiPolaczek May 7, 2024

michalozeryflato May 7, 2024

mosheraboh left a comment

mosheraboh May 16, 2024

michalozeryflato May 23, 2024

mosheraboh May 16, 2024

mosheraboh left a comment

mosheraboh May 23, 2024

michalozeryflato May 23, 2024

SagiPolaczek May 23, 2024

mosheraboh May 23, 2024

SagiPolaczek left a comment

SagiPolaczek May 23, 2024

SagiPolaczek May 23, 2024

SagiPolaczek May 23, 2024

michalozeryflato May 23, 2024

SagiPolaczek May 23, 2024

SagiPolaczek left a comment

SagiPolaczek May 23, 2024

MetricCountSeqAndTokens counst tokens in label (if exist) + improving epoch metrics print #352

MetricCountSeqAndTokens counst tokens in label (if exist) + improving epoch metrics print #352

Conversation

michalozeryflato commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mosheraboh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mosheraboh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SagiPolaczek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SagiPolaczek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment