Skip to content

Commit

Permalink
Merge branch 'develop' into patch_chipseeker
Browse files Browse the repository at this point in the history
  • Loading branch information
Maarten-vd-Sande authored Feb 3, 2024
2 parents f5506b7 + c4ac63a commit 036e259
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 8 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ All changed fall under either one of these types: `Added`, `Changed`, `Deprecate
### Fixed

- chipseeker env got corrupted, it should work again.
- replaced deprecated --split-e flag with --split-3 flag for fastq downloading
- removed support for GSA as their "API" changed

## [1.2.1] - 2023-11-15

Expand Down
3 changes: 1 addition & 2 deletions docs/content/workflows/download_fastq.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Downloading public data in bulk from the NCBI, ENA, and DDBJ databases has never

#### Download SRA file

The five most popular databases that store sequencing data are National Center for Biotechnology Information (NCBI), the European Nucleotide Archive (ENA), the DNA Data Bank of Japan (DDBJ), the Genome Sequence Archive (GSA), and the Encode project (ENCODE).
The five most popular databases that store sequencing data are National Center for Biotechnology Information (NCBI), the European Nucleotide Archive (ENA), the DNA Data Bank of Japan (DDBJ), the Genome Sequence Archive (GSA) (GSA is currently not supported anymore), and the Encode project (ENCODE).
ENA, ENCODE, and GSA store the actual fastq files, and DDBJ and NCBI store the raw data (as a sra file) from which a fastq can be derived.
For this reason for each sample on DDBJ and NCBI seq2science will first check if it can be downloaded from ENA as a fastq directly.
Otherwise we will download the samples in its raw format. To convert this data to a fastq it has to be "*dumped*".
Expand All @@ -22,7 +22,6 @@ As an example, the `samples.tsv` could look something like this:

```
sample
CRX123 <-- GSA experiment
DRX890 <-- DDBJ experiment
DRR098 <-- DDBJ run
ENCSR765 <-- ENCODE assay
Expand Down
2 changes: 1 addition & 1 deletion seq2science/rules/get_fastq.smk
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ rule sra2fastq_PE:
# dump to tmp dir
parallel-fastq-dump -s {input} -O {output.tmpdir} \
--threads {threads} --split-e --skip-technical --dumpbase \
--threads {threads} --split-3 --skip-technical --dumpbase \
--readids --clip --read-filter pass --defline-seq '@$ac.$si.$sg/$ri' \
--defline-qual '+' --gzip >> {log} 2>&1
Expand Down
2 changes: 1 addition & 1 deletion seq2science/workflows/download_fastq/samples.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ SRX257149
SRR800037
DRX029591
DRR032791
CRX269079
# CRX269079 # currently not supported
ENCSR535GFO
ENCFF172MDS
8 changes: 4 additions & 4 deletions tests/dag_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,10 @@ if [ $1 = "alignment" ]; then
assert_rulecount $1 'ena2fastq_PE|sra2fastq_PE' 1
assert_rulecount $1 'ena2fastq_PE|sra2fastq_PE' 1

printf "\ndownload gsa\n"
seq2science run download-fastq -nr --configfile tests/$WF/default_config.yaml --snakemakeOptions quiet=True config={samples:tests/download_fastq/gsa_encode_samples.tsv} | tee tests/local_test_results/${1}_dag
assert_rulecount $1 'gsa_or_encode2fastq_SE' 5
assert_rulecount $1 'gsa_or_encode2fastq_PE' 1
# printf "\ndownload gsa\n"
# seq2science run download-fastq -nr --configfile tests/$WF/default_config.yaml --snakemakeOptions quiet=True config={samples:tests/download_fastq/gsa_encode_samples.tsv} | tee tests/local_test_results/${1}_dag
# assert_rulecount $1 'gsa_or_encode2fastq_SE' 5
# assert_rulecount $1 'gsa_or_encode2fastq_PE' 1

# alignment workflow
WF=alignment
Expand Down

0 comments on commit 036e259

Please sign in to comment.