Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lizard contains subsets of other datasets #25

Open
ClaudiaWinklmayr opened this issue Dec 5, 2024 · 0 comments
Open

Lizard contains subsets of other datasets #25

ClaudiaWinklmayr opened this issue Dec 5, 2024 · 0 comments

Comments

@ClaudiaWinklmayr
Copy link
Contributor

Quotes from Lizard publication

  • "To generate our final dataset, we considered data from 6 different sources: GlaS [34], CRAG [17], CoNSeP [20], DigestPath, PanNuke [15] and TCGA [21]. " Lizard-paper
  • "For both CoNSeP and PanNuke, we only utilised the images and not the associated annotations from the original datasets to ensure that the same label generation pipeline was used on all input data" Lizard-paper

Caution when training

  • when training/testing with the prepared datasets, we need to be aware that they are not mutually exclusive but have some degree of overlap
  • In particular when using CoNSep and PanNuke, we need to be aware that they are part of Lizard but their annotations in Lizard may differ from the original annotations.

Potential for leakage

  • Phikon-v2 is (among others) trained on TCGA, which is also a part of Lizard. See this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant