-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PANDASmall
dataset
#664
Add PANDASmall
dataset
#664
Conversation
thanks @nkaenzig, looks good. How did you determine the data size (20% of slides & 200 patches) -- do we know that for example 10% of data or 100 patches would not be sufficient? |
@roman807 Good question. The number 200 for the # patches was determined experimentally: Regarding the 20% question: This dataset has 6 classes, we want to make sure that in each of the train, val & test splits, we still have sufficient examples per class. Using the current ratio, we have 166 WSIs per class in the train set, and 83 samples per class in each val/test. Especially for the val/test set I don't want to go lower in terms of sample count. Also at 20%, the evaluation runtime becomes reasonable: for ViT-S inference |
Thinking about terminology. Should we use "small" instead of "tiny"? I think tiny usually refers to something very small, e.g. minimal data for unit or integration test |
Closes #662
-> results in 25x less patches, therefore runs approximately 25x faster than the full panda benchmark, given that patch embedding generation takes up most compute time