Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new experiment processing/ reprocessing rules to gxa #24

Merged
merged 164 commits into from
May 20, 2022

Conversation

pmb59
Copy link
Member

@pmb59 pmb59 commented Nov 12, 2021

The aim of this PR is to add new rules and make of use existing ones (recalculations) to address processing of new experiments and reprocessing steps, in order to achieve a super-set of data analysis rules for GXA. This workflow now deals with previous atlas_prod scripts used before moving our systems to new EBI infrastructure.

The super-set of rules can now handle the following experiment types:

  • baseline
  • differential rnaseq
  • differential microarray

This PR includes:

  • logic that allows Snakemake rules collecting I/O files depending on the analysis goal (recalculations or reprocessing).
  • (re)processing/recalculations only for selected specie/s
  • updated differential expression strategy
  • a number of Atlas scripts from production (which will be moved to atlas-analysis and then included as submodule here)
  • communication to ISL db (only for new experiment or reprocessing)
  • Major code changes/ new files are at:
- Snakefile
- Snakefile-reprocess
- Snakefile-sorting-hat
- bin/reprocessing_routines.sh
- bin/round_log2_fold_changes.R (reviewed)
- envs/isl-db.yaml
- run_sorting_hat_test_data.sh

Future merge requests will address:

  • replacing generic bash routines with conda atlas-bash-util
  • refactoring scripts using a submodule atlas-analysis
  • adding proteomics and differential proteomics analysis rules
  • add conditional structures that handle cases of optional/required parameters and files

pmb59 and others added 30 commits November 12, 2021 10:37
If the env variable $ACCESSIONS is defined before running the script, only those accession will be considered for recalculations/reprocessing. If more than 1 accession is specified, then they should be separated by ":". e.g. `ACCESSIONS=E-MTAB-1234:E-MTAB-2544`
…accession before commencing the reprocessing.
…akemake, further modifications to variables needed for gxa_generate_methods.pl
…anscript_relative_isoforms. This is agreement with previous logic
@pmb59 pmb59 merged commit ae6bfd8 into develop May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants