Skip to content

Commit

Permalink
update_reference_sh
Browse files Browse the repository at this point in the history
  • Loading branch information
Yong committed May 31, 2023
1 parent 7a3a976 commit bd7903a
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 15 deletions.
6 changes: 3 additions & 3 deletions assets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ You can download reference genome, pre-build BWA index and annotated regions (e.

```bash
## eg: ./download_build_reference.sh hg38 /your/genome/data/path/hg38
$ ./workflow/download_reference.sh [GENOME] [DEST_DIR]
$ ./assets/Reference/download_reference.sh [GENOME] [DEST_DIR]
```

* Build reference genomes index
If your sequencing libraries come with spike-ins, you can build new aligner index after combining spike-in genome with human genome. The new index information will be appended to corresponding manifest file.

```bash
## eg: ./build_reference_index.sh hg38 ./data/BAC_F19K16_F24B22.fa hg38_BAC_F19K16_F24B22 /your/genome/data/path/hg38
$ ./workflow/build_reference_index.sh [GENOME] [SPIKEIN_FA] [INDEX_PREFIX] [DEST_DIR]
## eg: ./assets/Reference/build_reference_index.sh hg38 ./data/BAC_F19K16_F24B22.fa hg38_BAC_F19K16_F24B22 /your/genome/data/path/hg38
$ ./assets/Reference/build_reference_index.sh [GENOME] [SPIKEIN_FA] [INDEX_PREFIX] [DEST_DIR]
```


Expand Down
7 changes: 3 additions & 4 deletions assets/Reference/build_reference_index.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
#!/bin/bash

## the script will build BWA index for combined human and spike-in genomes.
## "Usage: ./build_reference_index.sh [GENOME] [SPIKEIN_FA] [INDEX_PREFIX] [DEST_DIR]"
## "Example: ./build_reference_index.sh hg38 ./data/BAC_F19K16_F24B22.fa hg38_BAC_F19K16_F24B22 /your/genome/data/path/hg38"
## "Example: ./build_reference_index.sh hg19 ./data/BAC_F19K16_F24B22.fa hg19_BAC_F19K16_F24B22 /cluster/projects/tcge/DB/cfmedip-seq-pepeline/hg19"
## "Usage: ./assets/Reference/build_reference_index.sh [GENOME] [SPIKEIN_FA] [INDEX_PREFIX] [DEST_DIR]"
## "Example: ./assets/Reference/build_reference_index.sh hg38 ./assets/Spike-in_genomes/BAC_F19K16_F24B22.fa hg38_BAC_F19K16_F24B22 /your/genome/data/path/hg38"

#################
## initilizaiton
Expand Down Expand Up @@ -33,7 +32,7 @@ cat ${hg_fa} ${SPIKEIN_FA} > ${DEST_DIR}/${INDEX_PREFIX}.fa
cd ${DEST_DIR}

echo "=== Building bwa index for mereged genomes ..."
conda activate tcge-cfmedip-seq-pipeline
conda activate MEDIPIPE

bwa index -a bwtsw ${INDEX_PREFIX}.fa

Expand Down
16 changes: 8 additions & 8 deletions assets/Reference/download_reference.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
## "A TSV file [DEST_DIR]/[GENOME].tsv will be generated. Use it for pipeline."
## "Supported genomes: hg19 and hg38"; Arabidopsis TAIR10 genome will be downloaded,
## as well as building bwa index for merged genomes.
## "Usage: ./download_build_reference.sh [GENOME] [DEST_DIR]"
## "Example: ./download_build_reference.sh hg38 /your/genome/data/path/hg38"
## "Usage: ./assets/Reference/download_build_reference.sh [GENOME] [DEST_DIR]"
## "Example: ./assets/Reference/download_build_reference.sh hg38 /your/genome/data/path/hg38"


#################
Expand Down Expand Up @@ -47,7 +47,7 @@ if [[ "${GENOME}" == "hg38" ]]; then
PROM="https://www.encodeproject.org/files/ENCFF140XLU/@@download/ENCFF140XLU.bed.gz"
ENH="https://www.encodeproject.org/files/ENCFF212UAV/@@download/ENCFF212UAV.bed.gz"

REF_FA_TAIR10="https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas"
#REF_FA_TAIR10="https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas"

fi

Expand All @@ -68,7 +68,7 @@ if [[ "${GENOME}" == "hg19" ]]; then
ENH="https://storage.googleapis.com/encode-pipeline-genome-data/hg19/ataqc/reg2map_honeybadger2_dnase_enh_p2.bed.gz"

## Arabidopsis
REF_FA_TAIR10="https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas"
# REF_FA_TAIR10="https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas"

fi

Expand All @@ -84,12 +84,12 @@ wget -c -O $(basename ${REF_MITO_FA}) ${REF_MITO_FA}
wget -c -O $(basename ${CHRSZ}) ${CHRSZ}

## TAIR10
wget -c -O $(basename ${REF_FA_TAIR10}) ${REF_FA_TAIR10}
sed -i -e 's/^>/>tair10_chr/' TAIR10_chr_all.fas
gzip TAIR10_chr_all.fas
#wget -c -O $(basename ${REF_FA_TAIR10}) ${REF_FA_TAIR10}
#sed -i -e 's/^>/>tair10_chr/' TAIR10_chr_all.fas
#gzip TAIR10_chr_all.fas

## combine genomes
cat $(basename ${REF_FA}) TAIR10_chr_all.fas.gz > ${GENOME}_tair10.fa.gz
# cat $(basename ${REF_FA}) TAIR10_chr_all.fas.gz > ${GENOME}_tair10.fa.gz

## annotated regions
wget -N -c ${BLACKLIST}
Expand Down

0 comments on commit bd7903a

Please sign in to comment.