Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update aggregator.py #2995

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

zina-cs
Copy link

@zina-cs zina-cs commented Sep 27, 2024

Changes

Added an error message

Reason for changes

send warning message to avoid Inconsistencies arise when the dataset size is less than the provided or default 'subset_size'.

Related tickets

Closes: #2562

I had an inquiry:
I noticed that subset_size is sometimes put as 100, or 300, or specified in the advanced parameters. Should a default be used here, or could you point me to where I can find the correct subset_size to be imported?

@zina-cs zina-cs requested a review from a team as a code owner September 27, 2024 15:52
@github-actions github-actions bot added the NNCF Common Pull request that updates NNCF Common label Sep 27, 2024
@alexsu52 alexsu52 requested a review from l-bat September 30, 2024 09:41
@@ -82,6 +84,11 @@ def collect_statistics(self, model: TModel, graph: NNCFGraph) -> None:
empty_statistics = False
if empty_statistics:
raise nncf.ValidationError(EMPTY_DATASET_ERROR)

if len(self.dataset) < subset_size:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that __len__ is implemented for the dataset. In the Statistics collection loop, you can count the number of processed input_data, and after the loop, you can check if subset_size is higher than the number of samples in the provided dataset. I think we should display a warning instead of an error in this case.

@@ -23,6 +23,8 @@
from nncf.data.dataset import DataItem
from nncf.data.dataset import Dataset
from nncf.data.dataset import ModelInput
import warnings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import warnings
from nncf.common.logging import nncf_logger

raise nncf.ValidationError(EMPTY_DATASET_ERROR)

if subset_size > processed_samples:
warnings.warn(f"Dataset contains only {processed_samples} samples, smaller than the requested subset size {subset_size}.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
warnings.warn(f"Dataset contains only {processed_samples} samples, smaller than the requested subset size {subset_size}.")
nncf_logger.warning(f"Dataset contains only {processed_samples} samples, smaller than the requested subset size {subset_size}.")

Comment on lines +37 to +38


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid making unnecessary changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NNCF Common Pull request that updates NNCF Common
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Good First Issue][NNCF]: Dump actual_subset_size to ov.Model
2 participants