Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to dataset used in the paper #1

Open
shntnu opened this issue Apr 12, 2022 · 4 comments
Open

Access to dataset used in the paper #1

shntnu opened this issue Apr 12, 2022 · 4 comments

Comments

@shntnu
Copy link
Member

shntnu commented Apr 12, 2022

Alessandro Palma asked:

Since I am dealing with generative modeling, I am mostly using the CytoGAN paper as a reference, together with other publications employing the dataset.
I also tried to employ cProfiler to perform cell segmentation on the dataset, but I wasn’t able to make it work remotely on my servers. For now, I am working with 96x96x3 crops of the plates which can contain more than one cell (not an ideal condition). I was Therefore wondering if the CytoGAN dataset is available anywhere online for direct usage.

Alessandro, I suppose you are already familiar with this (primary) resource? https://bbbc.broadinstitute.org/BBBC021 but that you are looking for the processed version?

@allepalma
Copy link

Hi,

Thanks very much for your fast answer. Yes, exactly. What I am looking for is a segmentation mask for the cells in BBBC021. I have tried to get the CellProfiler projects in the repository to work, but the set up does not seem to match the names of the downloaded files. I also could not manage to utilize the software on remote servers. For now I am using 96x96x3 crops as in the uploaded image. However I would definitely benefit for exact single-cell outlines like the ones in the CytoGAN paper.

Is there any public binary mask for BBBC021?

Thank you again for your support and for the great datasets!

image

@shntnu
Copy link
Member Author

shntnu commented Apr 12, 2022

Sounds good

These files are available internally at s3://imaging-platform/projects/dp_treatment-classification_az/

We will need to move them to a publicly accessible location and then ensure that all the contents can be made available as is.

I can't promise a fast turnaround but I'll keep this on my list.

Update: we should move it to s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/deep_learning

  • dp-project restore
  • dp-project transfer
  • analysis restore
  • analysis transfer
  • load_data_csv restore
  • load_data_csv transfer
  • backend restore
  • backend transfer
  • sync to correct location
  • delete wrong location aws s3 rm --recursive s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/
  • restore images
  • sync images
  • delete s3://imaging-platform/projects/dp_treatment-classification_az
python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/dp-project/  --max_workers 8 --logfile dp-project_log.csv


source=s3://imaging-platform/projects/dp_treatment-classification_az/dp-project/ 
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/deep_learning

aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source} \
  ${destination}  

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l

Similarly

python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/workspace/analysis/ljosa_2013/  --max_workers 8 --logfile analysis_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/workspace/analysis/ljosa_2013/
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/analysis/ljosa_2013/

aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source} \
  ${destination}  

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l
python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/workspace/load_data_csv/ljosa_2013/  --max_workers 8 --logfile load_data_csv_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/workspace/load_data_csv/ljosa_2013/
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/load_data_csv/ljosa_2013/

aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source} \
  ${destination}  

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l
python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/workspace/backend/ljosa_2013/  --max_workers 8 --logfile backend_csv_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/workspace/backend/ljosa_2013/
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/backend/ljosa_2013/

aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source} \
  ${destination}  

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l
python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/workspace/images  --max_workers 8 --logfile images_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/workspace/images
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/broad-az/images

parallel \
  --dry-run \
  aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source}/Week{1}/ \
  ${destination}/Week{1}/images/ ::: 1 2 3 4 5 6 7 8 9 10

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l

For our notes:

This should be the same (processed version of the) dataset that we used in all these publications

  1. Caicedo JC, McQuin C, Goodman A, Singh S, & Carpenter AE (2018). Weakly Supervised Learning of Single-Cell Feature Embeddings. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 9309–9318 / doi. pdf. PMID: 30918435. PMCID: PMC6432648 (Conference paper)
  2. Goldsborough P, Pawlowski N, Caicedo JC, Singh S, Carpenter AE (2017). CytoGAN: Generative Modeling of Cell Images. Workshop on Machine Learning in Computational Biology, Neural Information Processing Systems (NeurIPS). bioRxiv. p. 227645 / doi. pdf. (Conference paper)
  3. Pawlowski N, Caicedo JC, Singh S, Carpenter AE, Storkey A (2016). Automating Morphological Profiling with Generic Deep Convolutional Networks. Neural Information Processing Systems (NeurIPS) MLCB Workshop 2016 Conference / doi. pdf. PMCID: N/A (Conference paper)

@allepalma
Copy link

Thank you again for your help!

@shntnu
Copy link
Member Author

shntnu commented Jun 7, 2022

@allepalma The files are now available at s3://cellpainting-gallery/cpg0010-caie-drugresponse/

Unfortunately, the documentation is pretty sparse so you'd need to figure out the structure yourself (and please add notes to this issue in case you have any clarifying notes for future users)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants