All notable changes to this project will be documented in this file. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- CELLECT-GENES: CELLECT-GENES is a workflow to identify genes 'driving' the prioritization of cell types. These genes are found by intersecting the top specifically expressed genes with genes enriched for genetic signal. See CELLECT-GENES tutorial for details
- Add option to exclude MHC region for CELLECT-MAGMA
- Removed example config and references to it
- Fixed minor bugs with KEEP_ANNOTS - now prints genes correctly.
- CELLECT-MAGMA gene coordinates updated to extactly match CELLECT-LDSC (which remains the same). See #51 - Unify LDSC and MAGMA gene coordination files
- Added CELLECT-LDSC option - save the SNP:ES mappings as an output file if
KEEP_ANNOTS
is True in the config. See #53 - Exporting genes assigned to GWAS variants
- Removed
WINDOW_LD_BASED
option (CELLECT-LDSC only feature).
- CELLECT-MAGMA (analysis types supported: prioritization, conditional). See https://ctg.cncr.nl/software/magma for more information about the algorithm and auxiliary files.
- Extra output directory layer. The results are output into 2 separate subdirectories of the base output directory: CELLECT-LDSC and CELLECT-MAGMA respectively.
- WiKi (respective sections for CELLECT-MAGMA)
- Minimized the overlap of functions between CELLECT-LDSC and CELLECT-MAGMA via includes.
- Config file. The file was divided into sections of common and LDSC-/MAGMA-specific parameters.
- WiKi (respective sections for CELLECT-LDSC)
- README Documentation
- Improved CELLECT-LDSC handling of continuous gene annotations. Importantly, this changes the prioritization results slightly. CELLECT v. 1.1.0 and 1.0.0 produce prioritization results with Pearson's correlation ~0.95.
- Redundant "make multigeneset" functionality
- Some legacy references to incorrect paths
See commit for overview of important changes: https://github.com/perslab/CELLECT/commit/c7513a19c0dfc02b905e507f4c94924cee6a4ca4 See also perslab#8 for a description of the problem.
- Replaced use of BEDtools for merging overlapping genes into a single track with BEDOPs which finds all unique overlapping regions and is more appropriate for continuous data
- General restructuring of rules and scripts to account for above
- Added BEDOPs to env
- Some additional tidying and minor bugfixes
- CELLECT-LDSC conditional analysis type
- CELLECT-LDSC h2 analysis type
- Matrix expression specificity input file format
- Result parser (results/*.csv files)
- README documentation
- log files in Snakemake workflow
- Result file headers
- Config file restructured (breaking changes)
- Several config settings
- Snakemake workflow optimization (wildcards etc)
First stable release for CELLECT S-LDSC prioritization. No support for S-LDSC h2 or conditional analysis.