Update aggregator.py #2995

zina-cs · 2024-09-27T15:52:09Z

Changes

Added an error message

Reason for changes

send warning message to avoid Inconsistencies arise when the dataset size is less than the provided or default 'subset_size'.

Related tickets

Closes: #2562

I had an inquiry:
I noticed that subset_size is sometimes put as 100, or 300, or specified in the advanced parameters. Should a default be used here, or could you point me to where I can find the correct subset_size to be imported?

l-bat · 2024-09-30T09:55:27Z

nncf/common/tensor_statistics/aggregator.py

@@ -82,6 +84,11 @@ def collect_statistics(self, model: TModel, graph: NNCFGraph) -> None:
 empty_statistics = False
 if empty_statistics:
 raise nncf.ValidationError(EMPTY_DATASET_ERROR)
+
+ if len(self.dataset) < subset_size:


There is no guarantee that __len__ is implemented for the dataset. In the Statistics collection loop, you can count the number of processed input_data, and after the loop, you can check if subset_size is higher than the number of samples in the provided dataset. I think we should display a warning instead of an error in this case.

l-bat · 2024-10-04T08:59:50Z

nncf/common/tensor_statistics/aggregator.py

@@ -23,6 +23,8 @@
 from nncf.data.dataset import DataItem
 from nncf.data.dataset import Dataset
 from nncf.data.dataset import ModelInput
+import warnings


Suggested change

import warnings

from nncf.common.logging import nncf_logger

l-bat · 2024-10-04T09:02:08Z

nncf/common/tensor_statistics/aggregator.py

 raise nncf.ValidationError(EMPTY_DATASET_ERROR)
+
+ if subset_size > processed_samples:
+ warnings.warn(f"Dataset contains only {processed_samples} samples, smaller than the requested subset size {subset_size}.") 


Suggested change

warnings.warn(f"Dataset contains only {processed_samples} samples, smaller than the requested subset size {subset_size}.")

nncf_logger.warning(f"Dataset contains only {processed_samples} samples, smaller than the requested subset size {subset_size}.")

l-bat · 2024-10-04T09:02:44Z

nncf/common/tensor_statistics/aggregator.py

+
+

Please avoid making unnecessary changes

Update aggregator.py

4a88a47

zina-cs requested a review from a team as a code owner September 27, 2024 15:52

github-actions bot added the NNCF Common Pull request that updates NNCF Common label Sep 27, 2024

alexsu52 requested a review from l-bat September 30, 2024 09:41

l-bat reviewed Sep 30, 2024

View reviewed changes

Update aggregator.py

af21404

l-bat reviewed Oct 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update aggregator.py #2995

Update aggregator.py #2995

zina-cs commented Sep 27, 2024

l-bat Sep 30, 2024

l-bat Oct 4, 2024

l-bat Oct 4, 2024

l-bat Oct 4, 2024

	warnings.warn(f"Dataset contains only {processed_samples} samples, smaller than the requested subset size {subset_size}.")
	nncf_logger.warning(f"Dataset contains only {processed_samples} samples, smaller than the requested subset size {subset_size}.")

Update aggregator.py #2995

Are you sure you want to change the base?

Update aggregator.py #2995

Conversation

zina-cs commented Sep 27, 2024

Changes

Reason for changes

Related tickets

l-bat Sep 30, 2024

Choose a reason for hiding this comment

l-bat Oct 4, 2024

Choose a reason for hiding this comment

l-bat Oct 4, 2024

Choose a reason for hiding this comment

l-bat Oct 4, 2024

Choose a reason for hiding this comment