Identify example samples #1

lcolladotor · 2020-07-06T15:48:05Z

Add a directory and a Rmd or R file with the code for selecting around 40 RNA-seq samples from one of the public LIBD projects (BrainSeq Phase I, II or maybe even III; stem cell; DG; BPD, ...) with:

Some samples that are part of a complex problem that would need to be dropped unless re-genotyping was an option (not for the example scenario)
Some samples that are part of a simpler problem that can be un-swapped (pending confirmation by re-genotyping; they'll be used in the example scenario).

So after dropping the first set of samples (complex problem) we want a balanced scenario of samples across diagnosis status, age (similar mean/median & range), and sex.

Use the main LIBD RSE with ~5,500 samples to find the JHPCE paths to the FASTQ files for the selected samples.

Main output

Provide a small table (Rdata maybe or a csv file) with:

RNum
BrNum
SwappingCase: no_swap, swap_simple, swap_complex
FASTQpath: could be more than one per sample which you can store as a S4Vectors::CharacterList() in a S4Vectors::DataFrame() object.
Age
Sex
Primary diagnosis
Brain region (ideally all samples should be from 1 brain region, though this will depend on what LIBD project you choose)
RIN

The text was updated successfully, but these errors were encountered:

lahuuki · 2020-07-09T17:54:40Z

issues 5+9 and 10+100+101 are all from psychENCODE_BP may be good candidates

lahuuki · 2020-07-09T18:50:01Z

Breakdown of all single dataset issues:
Nicotine_NAc: 12
psychENCODE_MDD: 13,15,16,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,85,86,87,88
psychENCODE_BP: 5,9,10,100,101
BrainSeq_Phase3_Caudate: 3,4
BrainSeq_Phase1: 52,53,91,92,93,96,97,98,99

lcolladotor · 2020-07-09T19:59:59Z

The fastq info can be saved as a samples.manifest file like the ones used by SPEAQeasy (@Nick-Eagles can help if needed). That is, subset /dcl01/lieber/ajaffe/lab/zandiHyde_bipolar_rnaseq/preprocessed_data/.samples_unmerged.manifest with the correct paths (from Nina and/or Andrew). So no need to add FASTQpath anymore to the subset of colData(rse) (save this small file and commit it).

lcolladotor · 2020-07-09T20:10:52Z

For writing the manifest file, use write.table(new_manifest, file = opt$sampleids, row.names = FALSE, col.names = FALSE, quote = FALSE, sep = '\t') https://github.com/LieberInstitute/RNAseq-pipeline/blob/ab71dedb36bcc3dad57233e645fabd5deb96d446/sh/find_sample_info.R#L91

lcolladotor · 2021-05-02T02:22:00Z

@lahuuki can you verify whether it's ok to close this issue? Thanks!

lahuuki · 2021-05-03T14:37:48Z

We can close this, we finished up that example a while ago.

lcolladotor assigned lahuuki and joshstolz Jul 6, 2020

This was referenced Jul 6, 2020

DNA genotyping required info for the example data #2

Closed

Script for downloading example data #3

Closed

lahuuki referenced this issue Jul 10, 2020

Add fastq filepaths from manifest

57c28fe

lahuuki closed this as completed May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify example samples #1

Identify example samples #1

lcolladotor commented Jul 6, 2020

lahuuki commented Jul 9, 2020

lahuuki commented Jul 9, 2020

lcolladotor commented Jul 9, 2020

lcolladotor commented Jul 9, 2020

lcolladotor commented May 2, 2021

lahuuki commented May 3, 2021

Identify example samples #1

Identify example samples #1

Comments

lcolladotor commented Jul 6, 2020

Main output

lahuuki commented Jul 9, 2020

lahuuki commented Jul 9, 2020

lcolladotor commented Jul 9, 2020

lcolladotor commented Jul 9, 2020

lcolladotor commented May 2, 2021

lahuuki commented May 3, 2021