New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Lizard contains subsets of other datasets #25

Open

ClaudiaWinklmayr opened this issue Dec 5, 2024 · 0 comments

Contributor

ClaudiaWinklmayr commented Dec 5, 2024

Quotes from Lizard publication

"To generate our final dataset, we considered data from 6 different sources: GlaS [34], CRAG [17], CoNSeP [20], DigestPath, PanNuke [15] and TCGA [21]. " Lizard-paper
"For both CoNSeP and PanNuke, we only utilised the images and not the associated annotations from the original datasets to ensure that the same label generation pipeline was used on all input data" Lizard-paper

Caution when training

when training/testing with the prepared datasets, we need to be aware that they are not mutually exclusive but have some degree of overlap
In particular when using CoNSep and PanNuke, we need to be aware that they are part of Lizard but their annotations in Lizard may differ from the original annotations.

Potential for leakage

Phikon-v2 is (among others) trained on TCGA, which is also a part of Lizard. See this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment