Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert to complete pipeline #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,28 @@ RNASeq tools

A collection of tools for analysis of RNA-seq and transcriptomic data used by the [Hibberd Lab](http://hibberdlab.github.io).

## Batch read processing scripts
### setup

Scripts for organising raw data, for example by processing data downloaded from sequencing services to concatenate and name files by sample.

* **prepare_samples_TGAC.rb** - parses the TGAC SampleAlias.txt file and concatenates and renames gzipped FASTQ files by sample name.

### preprocess

* **trim-batch** - run trimmomatic on a series of FASTQ read files, optionally trimming paired and single reads in the same run. After quality analysis, this is the first step in an RNASeq pipeline.
- todo:
- run multiple trimmomatic processes in parallel
* **khmer-batch** - run digital normalisation on a series of FASTQ read files, preserving the kmer counting hash between runs, to create a single normalised read dataset. Useful for incorporating a new read dataset with old data to generate an improved *de-novo* assembly.
- todo:
- add option to use filter-by-abund

### expression

* **express_sample.rb** - run eXpress on each replicate of a sample, collating results into a single CSV.
* **sailfish_sample.rb** - run Sailfish on each replicate of a sample, collating results into a single CSV.
* **EBSeq_experiment.R** - run differential expression analysis using EBSeq.
* **GO_analyse.R** - run GO term enrichment analysis.
* **plot_GO_analysis.R** - generate plots of GO term analysis, including a representation of replative enrichment between samples, and which genes are important in each category.

## License

Expand Down
File renamed without changes.
5 changes: 5 additions & 0 deletions prep_and_trim.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
prepare_samples_TGAC.rb | tee sample_preparation.log
trim-batch.rb --singlefile files_for_trimming.txt \
--jar /home/rds45/apps/Trimmomatic-0.30/trimmomatic-0.30.jar \
--adapters /data/adapters/adapters_list.fa \
--cleanup | tee quality_adapter_trimming.log
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 3 additions & 3 deletions trim-batch.rb → preprocess/trim-batch.rb
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ def check_list(inlist, outlist)
cmd = pairedcmd
cmd = cmd.gsub(/INFILEF/, infilef)
cmd = cmd.gsub(/INFILER/, infiler)
inpathl = File.dirname(infilef)
infilel = File.basename(infilef)
inpathf = File.dirname(infilef)
infilef = File.basename(infilef)
inpathr = File.dirname(infiler)
infiler = File.basename(infiler)
cmd = cmd.gsub(/OUTFILEF/, "#{inpathf}/#{TRIMPREFIX}#{infilef}")
Expand Down Expand Up @@ -148,7 +148,7 @@ def check_list(inlist, outlist)
logline['file'] = infile
unpaired_trimlog << logline
end
File.delete infile if opts.cleanup
# File.delete infile if opts.cleanup
end

datestr = Time.now.strftime('%d_%m_%Y_%H_%M_%S')
Expand Down
File renamed without changes.