Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify example samples #1

Closed
lcolladotor opened this issue Jul 6, 2020 · 6 comments
Closed

Identify example samples #1

lcolladotor opened this issue Jul 6, 2020 · 6 comments
Assignees

Comments

@lcolladotor
Copy link
Member

Add a directory and a Rmd or R file with the code for selecting around 40 RNA-seq samples from one of the public LIBD projects (BrainSeq Phase I, II or maybe even III; stem cell; DG; BPD, ...) with:

  • Some samples that are part of a complex problem that would need to be dropped unless re-genotyping was an option (not for the example scenario)
  • Some samples that are part of a simpler problem that can be un-swapped (pending confirmation by re-genotyping; they'll be used in the example scenario).

So after dropping the first set of samples (complex problem) we want a balanced scenario of samples across diagnosis status, age (similar mean/median & range), and sex.

Use the main LIBD RSE with ~5,500 samples to find the JHPCE paths to the FASTQ files for the selected samples.

Main output

Provide a small table (Rdata maybe or a csv file) with:

  • RNum
  • BrNum
  • SwappingCase: no_swap, swap_simple, swap_complex
  • FASTQpath: could be more than one per sample which you can store as a S4Vectors::CharacterList() in a S4Vectors::DataFrame() object.
  • Age
  • Sex
  • Primary diagnosis
  • Brain region (ideally all samples should be from 1 brain region, though this will depend on what LIBD project you choose)
  • RIN
@lahuuki
Copy link
Member

lahuuki commented Jul 9, 2020

issues 5+9 and 10+100+101 are all from psychENCODE_BP may be good candidates

@lahuuki
Copy link
Member

lahuuki commented Jul 9, 2020

Breakdown of all single dataset issues:
Nicotine_NAc: 12
psychENCODE_MDD: 13,15,16,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,85,86,87,88
psychENCODE_BP: 5,9,10,100,101
BrainSeq_Phase3_Caudate: 3,4
BrainSeq_Phase1: 52,53,91,92,93,96,97,98,99

@lcolladotor
Copy link
Member Author

The fastq info can be saved as a samples.manifest file like the ones used by SPEAQeasy (@Nick-Eagles can help if needed). That is, subset /dcl01/lieber/ajaffe/lab/zandiHyde_bipolar_rnaseq/preprocessed_data/.samples_unmerged.manifest with the correct paths (from Nina and/or Andrew). So no need to add FASTQpath anymore to the subset of colData(rse) (save this small file and commit it).

@lcolladotor
Copy link
Member Author

For writing the manifest file, use write.table(new_manifest, file = opt$sampleids, row.names = FALSE, col.names = FALSE, quote = FALSE, sep = '\t') https://github.com/LieberInstitute/RNAseq-pipeline/blob/ab71dedb36bcc3dad57233e645fabd5deb96d446/sh/find_sample_info.R#L91

@lcolladotor
Copy link
Member Author

@lahuuki can you verify whether it's ok to close this issue? Thanks!

@lahuuki
Copy link
Member

lahuuki commented May 3, 2021

We can close this, we finished up that example a while ago.

@lahuuki lahuuki closed this as completed May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants