vk_useful_ngs_oneliners

https://training.bactgen.sanger.ac.uk/#/Advanced_Bioinformatics/


https://wwood.github.io/singlem
https://github.com/wwood/singlem?tab=readme-ov-file

A hybrid assembly and MAG recovery pipeline 
https://github.com/rhysnewell/aviary
########################## Ssuis Analysis ###
https://github.com/boasvdp/Ssuis_genomic_epidemiology/blob/master/workflow/scripts/summary.py
https://github.com/boasvdp/Ssuis_genomic_epidemiology/blob/master/workflow/scripts/abricate_to_phandango.py
#!/usr/bin/env python3

import pandas as pd
import argparse
import sys

parser = argparse.ArgumentParser(description='Convert an ABRicate summary to Phandango format')

parser.add_argument('input', help="Input ABRicate summary file", type=str)
parser.add_argument("-o", "--output", dest="output", help="Output Phandango file", type=str, default=sys.stdout)

args = parser.parse_args()

df = pd.read_csv(args.input, dtype=str, sep = '\t', index_col=0)
df.replace('.', 'absent', inplace=True)
df.replace('[0-9.][0-9.]*', 'present', regex=True, inplace=True)
df['name'] = df.index
df['name'].replace('.tsv', '', regex=True, inplace=True)
df['name'].replace('abricate_virulence_out/', '', regex=True, inplace=True)
df.set_index('name', inplace=True)
df.drop('NUM_FOUND', axis=1, inplace=True)
df.to_csv(args.output, sep = ',')
##########################################
https://github.com/KathrynCampbell/MADDOG/blob/main/Nigeria_report.Rmd
https://github.com/agmcfarland/FluPipeline/blob/main/sample_report.Rmd
#########################################
associating tree with data
https://xiayh17.gitee.io/treedata-book/chapter9.html
https://github.com/MMID-coding-workshop?tab=repositories

https://github.com/MMID-coding-workshop/2022-03-02-Data-visualization-using-ggtree/blob/main/2022-03-02-Introduction%20to%20ggTree.pdf
#########################################
https://chiliubio.github.io/microeco_tutorial/
#########################################
ggtree and plots data and arrange
https://github.com/theiagen/reports/blob/master/report_template.Rmd
#########################################
https://nbviewer.org/github/donnemartin/data-science-ipython-notebooks/blob/master/aws/aws.ipynb#s3cmd
#########################################
https://bioinformaticsworkbook.org/list.html#gsc.tab=0
#########################################
Very good nextflow tutorial
https://sateeshperi.github.io/nextflow_varcal/nextflow/
#########################################
https://github.com/staehlo/Demo_workflow_fastq_to_vcf
#########################################
sudo nextflow run metashot/mag-illumina --reads 'D22-004208*_R{1,2}*.fastq.gz' --run_metaplasmidspades --outdir results
#########################################
https://github.com/jhayer/nf-metavir/blob/master/modules/diamond.nf
script:
        """
        #Use of format 102 for producing a taxonomy table output file (it uses LCA)
        #This output will later be made compatible for kraken-report for import into Pavian
        diamond blastx -d ${db_diamond} -q ${contigs} -o ${id}_dx_tax.tab -f 102 -p 8
        # adding the first column U/C to the tab file for kraken-report
        awk -F'\\t' '{if(\$2>0)\$1="C" FS \$1;else \$1="U" FS \$1;}1' OFS='\\t' ${id}_dx_tax.tab > ${id}_dx_tax_UC.tab
        #the diamond output can now be converted into kraken report for pavian
        ./${kraken_report} --db ${kraken1_nt_db} ${id}_dx_tax_UC.tab > ${id}_dx_krak-report.txt
        rm ${id}_dx_tax_UC.tab
###########################################
https://github.com/NCGAS/Microbial-visualization/blob/master/visualization_of_metagenomes.ipynb
###########################################
docker from conda yaml file
https://github.com/lifebit-ai/dry-bench-skills-for-researchers/blob/main/classes/4-intro-to-nextflow/nextflow.md
https://github.com/lifebit-ai/dry-bench-skills-for-researchers/blob/main/classes/3-intro-to-conda-docker/2-build-test-share-reuse-docker.ipynb
https://github.com/ISCB-Academy/Elements-of-Style-Reproducible-Workflow-Creation-Maintenance-Tutorial/blob/main/lessons/build-test-share-dockerfiles-github.md
###########################################
ETE tree plot
https://github.com/emiracherif/ONTdeCIPHER/blob/main/Scripts/plot_tree.py
###########################################
This repository host the scripts for analyzing the nucleotide diversity (π) via deep-sequencing data in the study "Vaccine induced selection pressure on seasonal influenza in mice". The viral population diversity measurements including πN and πS were estimated using SNPGenie.
https://github.com/Leo-Poon-Lab/Vaccine-induced-selection-pressure-flu
###########################################
metagenomics good tutorial
https://github.com/GenomicsAotearoa/metagenomics_summer_school
###########################################
nextflow traning
https://github.com/GenomicsAotearoa/training-nextflow
###########################################
very good 16S analysis sop
https://github.com/cb-42/Dickson_16S_SOP
##############################################
To get taxonomy ranks information with ETE3 Python3 module
https://github.com/linzhi2013/taxonomy_ranks
#################################
create  kraken data base and add new genomes
https://github.com/IdoBar/Pathogen_diagnostics_analysis/blob/master/Pathogen_diagnostics_analysis.Rmd
#############################
https://github.com/valery-shap/parse_tormes_result/blob/main/sum_table_from_tormes.py
##############################
slurm_cheatsheet
https://github.com/cambiotraining/hpc-intro/blob/main/99-slurm_cheatsheet.md
##############################
joining files based on common column

https://github.com/npbhavya/Scripts/blob/master/counts_table.py
#############################

https://github.com/vpeddu/bmebootcamp-metagenomics/blob/master/main.nf
#############################
https://github.com/biocorecrg
############################
concat tables

https://github.com/stevekm/nextflow-boilerplate/blob/master/bin/concat-tables.py

DESCRIPTION: This script will concatenate multiple flat text
based tables which have a common 1-line header
bash equivalent:
$ head -1 $(echo $FILES | cut -d ' ' -f1) > test_output.tsv
$ for i in $FILES; do tail -n +2 "$i" >> test_output.tsv; done
###########################
recombination analysis

python3 -m openrdp prrsv_aligned.fasta ./prrsv_aligned.csv -cfg ~/softwares/OpenRDP/openrdp/tests/test_cfg.ini -all

simplot
http://babarlelephant.free-hoster.net/simplotfirstN.html
###########################


https://github.com/marbl/canu/issues/1674

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"

## Metaphlan4 (https://huttenhower.sph.harvard.edu/metaphlan)
conda activate metaphalan4.1.1
for i in *R1_001.fastq.gz;do metaphlan $i,${i/R1/R2} --add_viruses --bowtie2out ${i/_S*/_metagenome.bowtie2.bz2} --nproc 50 --input_type fastq -o ${i/_S*/_profiled_metagenome.txt};done


## Metaphlan3 (https://huttenhower.sph.harvard.edu/metaphlan)
conda activate metaphlan3-env
for i in *R1_001.fastq.gz;do metaphlan $i,${i/R1/R2} --add_viruses --bowtie2out ${i/_S*/_metagenome.bowtie2.bz2} --nproc 50 --input_type fastq -o ${i/_S*/_profiled_metagenome.txt};done
merge_metaphlan_tables.py *.txt > merged_abundance_table.txt
grep -E "s__|clade" merged_abundance_table.txt | sed 's/^.*s__//g'| cut -f1,3- | sed -e 's/clade_name/Species/g' > merged_abundance_table_species.txt

# Taken from http://bioinformatics.cvr.ac.uk/blog/short-command-lines-for-manipulation-fastq-and-fasta-sequence-files/

# bcl2fastq conversion of nextseq (https://anaconda.org/bioconda/bcl2fastq-nextseq (https://github.com/brwnj/bcl2fastq))

conda install -c bioconda bcl2fastq-nextseq
bcl_to_fastq --runfolder ./190827_NB551648_0010_AH2C2JBGXB

# make expression matrix from multiple file htseq

awk '{arr[$1]=arr[$1]"\t"$2}END{for(i in arr)print i,arr[i]}' *count.txt >> merged_htseq_counts.tsv

#replace by matching ids file in fasta file
awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' Seg1_PB2_T_Lineage.ids Seg1_PB2_T_Lineage.fasta

    Explanation:

      FNR==NR { ... }   # FNR is the current record number, NR is the record number
                        # so FNR==NR simply means: "while we process the first file listed
                        # in this case it's "master.txt"
      array[$1]=$2      # add column 1 to an array with a value of column 2
      next              # go onto the next record

      {                 # this could be written as: FNR!=NR
                        # so this means "while we process the second file listed..."
      for (i in array)  # means "for every element/key in the array..."
      gsub(i, array[i]) # perform a global substitution on each line replacing the key
                        # with it's value if found
      }1                # this is shorthand for 'print'
      
      
# replace by matching ids file in fasta file (V2)
 awk 'FNR==NR{  a[">"$1]=$2;next}$1 in a{  sub(/>/,">"a[$1]"|",$1)}1' D20-012995-2.tanoti.blastn2.ids.2.3 D20-012995-2.tanoti.sorted.fasta | cut -d"|" -f1 > output.fasta


# lower case fasta to UPPERCASE fasta

awk 'BEGIN{FS=" "}{if(!/>/){print toupper($0)}else{print $1}}' in.fasta > out.fasta

# fasta to tab

for i in *fasta; do perl -e ' $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) { print "\n" } s/ |$/\t/; $count++; $_ .= "\t"; } else { s/ //g; $len += length($_) } print $_; } print "\n"; warn "\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n"; ' $i > `basename ${i/fasta/fasta.tab}`;done 

# Trinity assembly cpmmand

for i in *R1_paired_trimmed.fastq.gz;do Trinity --seqType fq --left $i,${i/paired/single} --right ${i/_R1/_R2},${i/R1_paired/R2_single} --max_memory 10G --CPU 50 --output ${i/_R1*/.trinity_out};done

# blastn command

for i in *trinity_out;do blastn -query $i/inchworm.K25.L25.DS.fa -db ~/softwares/ncbi_database/reovirus/reovirus.fasta -out `basename ${i/out/blastn2}` -outfmt "6 qseqid qlen qcovs stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore" -evalue 1e-05 -max_target_seqs 5 -num_threads 50;done

# blastx command with Diamond
for i in *_out; do diamond blastx -d ~/softwares/ncbi_database/virus_proteome/viral_refseq.dmnd -q $i/Trinity.fasta -f 6 qseqid qlen qcovhsp stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore --sensitive --no-auto-append --top 10 --out ${i/out/blastx};done

# Convert a multi-line fasta to a singleline fasta

awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' sample1.fa > sample1_singleline.fa 

# To convert a fastq file to fasta in a single line using sed

sed '/^@/!d;s//>/;N' sample1.fq > sample1.fa
 
# Dirty way to count the number of sequences in a fastq

grep -c '^@' sample1.fq

#It’s dirty because sometimes the quality information line may also start with “@” so the number of sequences could be overestimated.
# A more precise way is to count the lines and divide by four:

cat sample1.fq | echo $((`wc -l`/4))

# One liner to remove the description information from a fasta file and just keep the identifier

perl -p -i -e 's/>(.+?) .+/>$1/g' sample1.fa
 
# Get all the identifier names from a fasta file

perl -ne 'if(/^>(\S+)/){print "$1\n"}' sample1.fa
 
# Extract sequences by their ID from a fasta file
# For example, you want to get the sequences with id1 and id2 as identifiers

perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw(id1 id2)}print if $c' sample1.fa

# If you have a long list of identifiers in a file called ids.txt, then the following should do the trick:

perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if @ARGV' ids.txt sample1.fa
 
# Convert from a two column text tab-delimited file (ID and sequence) to a fasta file

awk -vOFS='' '{print ">",$1,"\n",$2,"\n";}' two_column_sample_tab.txt > sample1.fa
 
# Get the length of a fasta sequence (the sequence must in singleline)

cat sample1_singleline.fa | awk 'NR%2==0' | awk '{print length($1)}'

## Minion Kraken2 analysis

for i in fastq_pass/*fastq;do kraken2 --db ~/softwares/minikrake2_db/minikraken2_v2_8GB_201904_UPDATE --threads 50 $i --report `basename ${i/.fastq/.report.txt}` --output `basename ${i/.fastq/.output}`;done

##Ssuis serotyping
srst2 --input_pe *fastq.gz --forward _R1_001 --reverse _R2_001 --output SsuisSerotype --log --mlst_db ../Ssuis_Serotyping.fasta --mlst_definitions ../Ssuis_Serotyping_Definitions.txt

##Ssuis serotyping 2
./Ssuis_serotypingPipeline.pl --fastq_directory /home/vsingh/vdl/Mor_Project_147/serotyping_github/SsuisSerotyping_pipeline/SsuisSerotyping_pipeline/data --forward _R1_001 --reverse _R2_001 --ends pe


for i in ../03-NanoFilt/*fastq;do flye --nano-raw $i --genome-size 5m --meta --threads 50 --min-overlap 1000 --out-dir `basename ${i/_nanofilt*}`;done


grep -i "\[.*virus" ../D20.014264.1repeat.blastx | awk -F$'\t' '!seen[$1]++' | cut -d "[" -f2 | sed "s/\].*//g" | sort | uniq -c | sed "s/^ \+//g" |sed -e "s/ /$(printf '\t')/" | sort -k1,1nr | less


## Refrence based assembly
tanoti -P 50 -p 1 -r KHV.fasta -i D20-007170-KHV_VRIS_S4_R1_001.fastq D20-007170-KHV_VRIS_S4_R2_001.fastq -o tanoti.sam
python2 ~/vdl/Mor_Project_151/rota/D20.008687.1/sam2consensus/sam2consensus.py -i tanoti.sam -o tanoti.fasta


nextflow run peterk87/nf-illmap --reads "./*_R{1,2}_001.fastq.gz" --outdir results2 --refs "ref.seq.fasta" -profile singularity


##Lazypipe virome
perl vk_pipeline_trinity.pl -1 /home/vsingh/softwares/lazypipe/data/D20-012995_S10_R1_001.fastq.gz -2 /home/vsingh/softwares/lazypipe/data/D20-012995_S10_R2_001.fastq.gz --hostgen /home/vsingh/softwares/lazypipe/genomes_host/pig_genomic.fna.gz --res /home/vsingh/softwares/lazypipe --label D20-012995-3 --numth 50 --inlen 300 --ass trinity --gen mga --ann sans --pipe 1:7,9:11

for i in /home/vsingh/vdl/Mor_Project_158/prrsv/*R1_001.fastq.gz;do perl vk_pipeline_trinity.pl -1 $i -2 ${i/R1/R2} --hostgen /home/vsingh/softwares/lazypipe/genomes_host/pig_genomic.fna.gz --res /home/vsingh/vdl/Mor_Project_158/prrsv --label `basename ${i/_R1*}` --numth 50 --inlen 300 --ass trinity --gen mgm --ann blastp --pipe 1:3,5:7,9:11;done

for i in /home/vsingh/vdl/Mor_Project_196/VSSW/*R1_001.fastq.gz;do perl vk_pipeline_trinity4.pl -1 $i -2 ${i/R1/R2} --hostgen /home/vsingh/softwares/lazypipe/genomes_host/pig_genomic.fna.gz --res /home/vsingh/vdl/Mor_Project_196/VSSW --label `basename ${i/_R1*}` --numth 50 --inlen 300 --ass trinity --gen mga --ann blastp --pipe 1:3,5:7,9:11;done

## multiple fasta to indivisual fasta
cat cat_sigma-C_aligned_nogaps.fasta | awk '{if (substr($0, 1, 1)==">") {filename=(substr($0,2) ".fa")}print $0 > filename}'

# metaplan2
for i in *R1_001.fastq.gz;do metaphlan2.py $i,${i/R1/R2} -o ${i/_R1*/.txt} --input_type multifastq --nproc 50 --bowtie2out ${i/_R1*/.bz2};done

## Extract text between words (e.g. w1,w2)
grep -o -P '(?<=w1).*(?=w2)'

## Arrangin fasta file in order
samtools faidx sequences.fasta
samtools faidx sequences.fasta $(cat order.txt) > sequences.reordered.fasta

##prokka annotation
prokka --kingdom viruses --genus Cyprinivirus --species "Cyprinid herpesvirus 3" --strain MN-2020 --prefix D20-007170-KHV --outdir D20-007170-KHV_annotation2 --locustag D20-007170-KHV D20-007170-KHV_VRIS_S4.Pilon.fasta

##nf-core/ampliseq pipeline
nextflow run -r 1.1.2 nf-core/ampliseq -profile docker --reads "/Users/vikashsingh/vdl/data" --FW_primer GTGYCAGCMGCCGCGGTAA --RV_primer GGACTACNVGGGTWTCTAAT --max_memory '60.GB' --max_cpus 12

##Reovirus
sort -k2,2nr -k3,3nr D20-019670-17_VRIS_S3.blastx | awk -F"\t" '!seen[$1]++' | sort -k4,4 -k2,2nr | grep -i orthoreo | awk -F"\t" '!seen[$4]++' | sort -k2,2nr |awk '{if($2 > 900){print $0}}'| less

##guppy singularity

cd ~/softwares

singularity pull shub://photocyte/guppy_gpu_singularity

singularity exec --nv guppy_gpu_singularity_latest.sif guppy_basecaller --help

                      OR
https://www.chpc.utah.edu/documentation/software/singularity.php


nextflow run angelovangel/nextflow-kraken2 --fqpattern *_R{1,2}_001.fastq.gz --kraken_db ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken_8GB_202003.tgz --outdir test2

##Virontus viral Oxford Nanopore sequence analysis pipeline
nextflow run peterk87/nf-virontus -profile singularity --reads "fastq_pass/all.fastq" --ref_fasta ref.fasta --low_coverage 1

## 16S pipeline
Kraken-based
sudo nextflow run angelovangel/nextflow-kraken2 --readsdir . --fqpattern "*R{1,2}*.fastq.gz" --taxlevel G --kraken_db s3://genome-idx/kraken/16S_Silva138_20200326.tgz

git clone https://github.com/h3abionet/16S-rDNA-dada2-pipeline

cd 16S-rDNA-dada2-pipeline

nextflow run main.nf -profile standard --reads="/home/vsingh/vdl/Mor_Project_163/D20-021867/16S-rDNA-dada2-pipeline/fastq/*_R{1,2}_001.fastq.gz" trimFor 24 --trimRev 25 --reference="/home/vsingh/vdl/Mor_Project_163/D20-021867/16S-rDNA-dada2-pipeline/silva_nr_v132_train_set.fa.gz" species="/home/vsingh/vdl/Mor_Project_163/D20-021867/16S-rDNA-dada2-pipeline/silva_species_assignment_v132.fa.gz" --outdir="/home/vsingh/vdl/Mor_Project_163/D20-021867/16S-rDNA-dada2-pipeline/out"


Ampliseq nf-core
sudo nextflow run nf-core/ampliseq -r 2.1.1 -profile docker --input "data" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --dada_tax_agglom_max 8 --qiime_tax_agglom_max 7


sudo nextflow run nf-core/ampliseq -r 2.11.0 -profile docker --input_folder './species/' --outdir nfcore_ampliseq_GTDB --min_read_counts 1000 --ignore_empty_input_files --ignore_failed_trimming --ignore_failed_filtering --skip_cutadapt --trunclenf 200 --trunclenr 150 --vsearch_cluster --filter_ssu "bac" --exclude_taxa "mitochondria,chloroplast,archaea" --metadata_category_barplot "condition" --tax_agglom_max 7 --picrust --ancombc --dada_ref_taxonomy gtdb=R09-RS220 --dada_taxonomy_rc


nextflow run nf-core/ampliseq -r 2.11.0 -profile singularity \
--input samplesheet.tsv \
--metadata metadata.tsv \
--outdir nfcore_ampliseq_GTDB \
--min_read_counts 1000 \
--ignore_empty_input_files \
--ignore_failed_trimming \
--ignore_failed_filtering \
--skip_cutadapt \
--trunclenf 200 \
--trunclenr 150 \
--vsearch_cluster \
--filter_ssu "bac" \
--exclude_taxa "mitochondria,chloroplast,archaea" \
--metadata_category_barplot "condition" \
--tax_agglom_max 7 \
--picrust \
--ancombc \
--dada_ref_taxonomy gtdb=R09-RS220 \
--dada_taxonomy_rc


## Rota
conda deactivate
PERL5LIB="";
conda activate trinity-env

for i in *R1_001.fastq.gz;do Trinity --seqType fq --left $i --right ${i/R1/R2} --max_memory 200G --CPU 50 --output ${i/R1*/trinity_out};done

for i in Reoviridae.fa;do blastn -query $i -db /home/vsingh/softwares/ncbi_database/virus_genomes/rota.complete.fasta -out `basename ${i/fa/blastn}` -max_hsps 1 -outfmt "6 qseqid qlen qcovs stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore" -evalue 1e-05 -max_target_seqs 100 -num_threads 50;done

#RVA
for i in *blastn;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do grep -i "$j" $i | grep "virus.A" | sort  -k15,15nr -k2,2nr| grep -vi partial|awk -F"\t" '{if($3 >= 50){print $0}}' |head -1 >> ${i/blastn/rota-A};done;done
for i in *blastn;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do grep -i "$j" $i | grep "virus.A" | sort  -k15,15nr -k2,2nr| grep -vi partial|awk -F"\t" '{if($3 >= 50){print $0}}' |head -1 >> ${i/blastn/rota-A2};done;done
for i in *rota-A2;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do sed -i "s/$j.*/$j\t$j/g" $i;done;done
for i in *rota-A2;do awk -F"\t" '{print $1"\t"$1"|"$5}' $i > ${i/rota-A2/rota-A-replace.ids};done
for i in *rota-A;do vk_correct_orient_fasta_blastn.sh $i Reoviridae.fa;done
for i in *rota-A.corr.fasta;do awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' ${i/.corr.fasta/-replace.ids} $i > ${i}2;done
for i in *rota-A.corr.fasta2;do mv $i ${i/.corr.fasta2/.fasta};done
for i in *rota-A.fasta;do sed -i "s/>/>${i/-rota-A*}|/g" $i;done

#RVB
for i in *blastn;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do grep -i "$j" $i | grep "virus.B" | sort -k15,15nr -k2,2nr| grep -vi partial|awk -F"\t" '{if($3 >= 50){print $0}}' |head -1 >> ${i/blastn/rota-B};done;done
for i in *blastn;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do grep -i "$j" $i | grep "virus.B" | sort -k15,15nr -k2,2nr| grep -vi partial|awk -F"\t" '{if($3 >= 50){print $0}}' |head -1 >> ${i/blastn/rota-B2};done;done
for i in *rota-B2;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do sed -i "s/$j.*/$j\t$j/g" $i;done;done
for i in *rota-B2;do awk -F"\t" '{print $1"\t"$1"|"$5}' $i > ${i/rota-B2/rota-B-replace.ids};done
for i in *rota-B;do vk_correct_orient_fasta_blastn.sh $i Reoviridae.fa;done
for i in *rota-B.corr.fasta;do awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' ${i/.corr.fasta/-replace.ids} $i > ${i}2;done
for i in *rota-B.corr.fasta2;do mv $i ${i/.corr.fasta2/.fasta};done
for i in *rota-B.fasta;do sed -i "s/>/>${i/-rota-B*}|/g" $i;done

#RVC
for i in *blastn;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do grep -i "$j" $i | grep "virus.C" | sort -k15,15nr -k2,2nr| grep -vi partial|awk -F"\t" '{if($3 >= 50){print $0}}' |head -1 >> ${i/blastn/rota-C};done;done
for i in *blastn;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do grep -i "$j" $i | grep "virus.C" | sort -k15,15nr -k2,2nr| grep -vi partial|awk -F"\t" '{if($3 >= 50){print $0}}' |head -1 >> ${i/blastn/rota-C2};done;done
for i in *rota-C2;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do sed -i "s/$j.*/$j\t$j/g" $i;done;done
for i in *rota-C2;do awk -F"\t" '{print $1"\t"$1"|"$5}' $i > ${i/rota-C2/rota-C-replace.ids};done
for i in *rota-C;do vk_correct_orient_fasta_blastn.sh $i Reoviridae.fa;done
for i in *rota-C.corr.fasta;do awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' ${i/.corr.fasta/-replace.ids} $i > ${i}2;done
for i in *rota-C.corr.fasta2;do mv $i ${i/.corr.fasta2/.fasta};done
for i in *rota-C.fasta;do sed -i "s/>/>${i/-rota-C*}|/g" $i;done


#RVH
for i in *blastn;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do grep -i "$j" $i | grep "virus.H" | sort -k15,15nr -k2,2nr| grep -vi partial|awk -F"\t" '{if($3 >= 50){print $0}}' |head -1 >> ${i/blastn/rota-H};done;done
for i in *blastn;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do grep -i "$j" $i | grep "virus.H" | sort -k15,15nr -k2,2nr| grep -vi partial|awk -F"\t" '{if($3 >= 50){print $0}}' |head -1 >> ${i/blastn/rota-H2};done;done
for i in *rota-H2;do for j in VP1 VP2 VP3 VP4 VP6 VP7 NSP1 NSP2 NSP3 NSP4 NSP5; do sed -i "s/$j.*/$j\t$j/g" $i;done;done
for i in *rota-H2;do awk -F"\t" '{print $1"\t"$1"|"$5}' $i > ${i/rota-H2/rota-H-replace.ids};done
for i in *rota-H;do vk_correct_orient_fasta_blastn.sh $i Reoviridae.fa;done
for i in *rota-H.corr.fasta;do awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' ${i/.corr.fasta/-replace.ids} $i > ${i}2;done
for i in *rota-H.corr.fasta2;do mv $i ${i/.corr.fasta2/.fasta};done
for i in *rota-H.fasta;do sed -i "s/>/>${i/-rota-H*}|/g" $i;done


# TRV denovo
for i in *R1_001.fastq.gz;do /home/vsingh/softwares/lazypipe/lazypipe/trinityrnaseq-v2.10.0/Trinity --seqType fq --left $i --right ${i/R1/R2} --max_memory 200G --CPU 50 --no_run_chrysalis --output ${i/R1*/trinity_out};done
for i in *_out; do diamond blastx -d ~/softwares/ncbi_database/virus_proteome/viral_refseq.dmnd -q $i/inchworm.DS.fa -f 6 qseqid qlen qcovhsp stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore --sensitive --no-auto-append --top 10 --out ${i/out/blastx};done
for i in *blastx;do for j in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS; do grep -i "$j" $i | awk -F"\t" '{if($15 >= 20){print $0}}' | sort  -k2,2nr -k15,15nr | head -1 >> ${i}2;done;done
for i in *blastx2;do sed "s/\t/ /g" $i | cut -d" " -f1,5 | awk -F" " '{print $1"\t"$1"|"$2}' > ${i/blastx2/rep.ids};done
for i in *blastx2;do vk_correct_orient_fasta_blastx.sh $i ${i/blastx2/out}/inchworm.DS.fa;done
for i in *corr.fasta;do awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' ${i/blastx2*/rep.ids} $i > ${i/_trinity*/.fasta};done
rm *corr.fasta
for i in *fasta;do sed -i "s/>/>${i/_S*/\|}/g" $i;done
for i in *fasta;do sed -i "s/(.*)//g" $i;done


##TRV reference based
#### MEGAHIT ###
cp ~/softwares/ncbi_database/reovirus/reovirus.fasta* .
for i in *R1_001.fastq.gz;do megahit -1 $i -2 ${i/R1/R2} -o ${i/R1*/megahit_out};done
for i in *out;do blastn -task blastn -query $i/final.contigs.fa -db reovirus.fasta -out `basename ${i/out/blastn}` -max_hsps 1 -outfmt "6 qseqid qlen qcovs stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore" -evalue 1e-05 -max_target_seqs 50 -num_threads 50;done
for i in *blastn;do for j in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS; do grep -i "$j" $i | grep -v KR9979 | grep -v MG869811 |awk -F"\t" '{if($15 >= 20){print $0}}' | sort  -k2,2nr -k15,15nr | grep -v -i partial | head -1 >> ${i}2;done;done
for i in *blastn2;do cut -f4 $i > ${i/blastn2/ref.ids};done
for i in *ref.ids;do for j in `cat $i | cut -d" " -f1`;do grep -A1 ${j} reovirus.fasta | sed "/^--/d" | sed "s/ //g" >> ${i/ids/fasta};done;done
pigz -d *fastq.gz
for i in *R1_001.fastq;do tanoti -P 50 -p 1 -r ${i/R1*/megahit_ref.fasta} -i $i ${i/_R1/_R2} -o ${i/R1*/tanoti.sam};done
for i in *sam;do sam2consensus.py -i $i -o ${i/sam/fasta};done
for i in *tanoti.fasta;do cat $i/* > $i/${i/tanoti/merged};done
for i in *tanoti.fasta;do for j in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS; do grep -A1 -i "$j" $i/${i/tanoti/merged} >> ${i/tanoti/refASM};done;done
for i in *refASM.fasta;do for j in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS; do sed -i "s/${j}.*/$j|$j/g" $i;done;done
pigz *fastq


###### TRINITY #####

cp ~/softwares/ncbi_database/reovirus/reovirus.fasta* .
for i in *R1_001.fastq.gz;do /home/vsingh/softwares/lazypipe/lazypipe/trinityrnaseq-v2.10.0/Trinity --seqType fq --left $i --right ${i/R1/R2} --max_memory 200G --CPU 50 --no_run_chrysalis --output ${i/R1*/trinity_out};done
for i in *out;do blastn -task blastn -query $i/inch*fa -db reovirus.fasta -out `basename ${i/out/blastn}` -max_hsps 1 -outfmt "6 qseqid qlen qcovs stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore" -evalue 1e-05 -max_target_seqs 50 -num_threads 50;done
for i in *blastn;do for j in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS; do grep -i "$j" $i | grep -v KR9979 | grep -v MG869811 |awk -F"\t" '{if($15 >= 20){print $0}}' | sort  -k2,2nr -k15,15nr | grep -v -i partial | head -1 >> ${i}2;done;done
for i in *blastn2;do cut -f4 $i > ${i/blastn2/ref.ids};done
for i in *ref.ids;do for j in `cat $i | cut -d" " -f1`;do grep -A1 ${j} reovirus.fasta | sed "/^--/d" | sed "s/ //g" >> ${i/ids/fasta};done;done
pigz -d *fastq.gz
for i in *R1_001.fastq;do tanoti -P 50 -p 1 -r ${i/R1*/trinity_ref.fasta} -i $i ${i/_R1/_R2} -o ${i/R1*/tanoti.sam};done
for i in *sam;do sam2consensus.py -i $i -o ${i/sam/fasta};done
for i in *tanoti.fasta;do cat $i/* > $i/${i/tanoti/merged};done
for i in *tanoti.fasta;do for j in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS; do grep -A1 -i "$j" $i/${i/tanoti/merged} >> ${i/tanoti/refASM};done;done
for i in *refASM.fasta;do for j in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS; do sed -i "s/${j}.*/$j|$j/g" $i;done;done
pigz *fastq


########## TREE MAKING REOVIRUS ###########
for j in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS; do grep -A1 -i "$j" tt | sed "/^--/d" | rev | cut -d"|" -f2- | rev > ${j}.fasta;done

conda activate assembly

for i in lambda-A lambda-B lambda-C mu-A mu-B mu-NS sigma-A sigma-B sigma-C sigma-NS;do mafft --globalpair --thread 50 --maxiterate 1000 ${i}.fasta > ${i}_aligned.fasta;done
for i in *aligned.fasta;do trimal -in $i -out ${i/.fasta/_nogaps.fasta} -gt 0.6;done
mkdir cocat_tree
mv *nogaps.fasta cocat_tree
cd cocat_tree
vk_concat_alignments.py . > concat.fasta
# for i in *nogap*;do sed -i "/^>/ s/$/\|${i/_ali*}/g" $i;done
# for i in concat.fasta;do sed -i "/^>/ s/$/\|concat/g" $i;done
# for i in *.fasta;do ~/softwares/raxml-ng/build/raxml-ng --all --msa $i --model GTR+G --tree pars{10} --bs-trees 100 --threads 50;done
for i in *fasta;do FastTree -nt -gtr $i > ${i/fasta/tree};done
for i in *.fasta;do mafft --globalpair  --reorder --thread 50 --maxiterate 1000 ${i} > ${i}2;done
for i in *fasta2;do panito $i > ${i/fasta2/csv};done

ls -ltr


for i in *support;do java -jar ~/softwares/FigTree_v1.4.4/lib/figtree.jar -graphic PDF -width 900 -height 1300 $i $i.pdf;done


##flu
for i in *out;do blastn -query $i/scaffolds.fasta -db ~/softwares/ncbi_database/flu_database/flu.fasta -out `basename ${i/out/blastn}` -max_hsps 1 -outfmt "6 qseqid qlen qcovs stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore" -evalue 1e-05 -max_target_seqs 50 -num_threads 50;done
for i in *blastn;do for j in `seq 1 1 8`; do grep -i "segment $j" $i | awk -F"\t" '$3 >= 50 {print $0}' |head -1 >> ${i/blastn/blastn2};done;done
for i in *blastn2;do cut -f1,4 $i | grep -o -P '(?<=\)\)).*(?=\()'| cut -d" " -f1,2,3 | sed "s/^ //g" | sed "s/ /-/g" | paste $i - | awk -F"\t" '{print $1"\t"$1"|"$16}' > ${i/blastn2/rep.ids};done
for i in *blastn2;do vk_correct_orient_fasta_blastn.sh $i ${i/blastn2/out}/scaffolds.fasta;done
for i in *corr.fasta;do awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' ${i/blastn2*/rep.ids} $i > ${i/ASM*/fasta};done
rm *corr.fasta
for i in *fasta;do sed -i "s/>/>${i/_S*/\|}/g" $i;done
for i in *R1_001.fastq;do tanoti -P 50 -p 1 -r ${i/R1*/trinity_ref.fasta} -i $i ${i/_R1/_R2} -o ${i/R1*/tanoti.sam};done
for i in *sam;do sam2consensus.py -i $i -o ${i/sam/fasta};done
for i in *tanoti.fasta;do cat $i/* > $i/${i/tanoti/merged};done

##flu trinity denovo assembly
conda deactivate
PERL5LIB="";
conda activate trinity-env
for i in *R1_001.fastq.gz;do Trinity --seqType fq --left $i --right ${i/R1/R2} --max_memory 200G --CPU 50 --output ${i/R1*/trinity_out};done
for i in *trinity_out;do blastn -task blastn -query $i/Trinity.fasta -db /home/vsingh/softwares/ncbi_database/flu_database/flu.fasta -out `basename ${i/out/blastn}` -max_hsps 1 -outfmt "6 qseqid qlen qcovs stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore" -evalue 1e-05 -max_target_seqs 50 -num_threads 50;done
for i in *blastn;do for j in `seq 1 1 8`;do grep -i "segment $j" $i | awk -F"\t" '$3 >= 50 {print $0}' | sed "s/ /./g" | sort  -k2,2nr -k15,15nr | grep -v -i partial | head -1 >> ${i}2;done;done
for i in *blastn2;do cut -f1,4 $i | sed "s/\t.*segment/\tsegment/g" | sed "s/,/ /g" | sed "s/\./ /g" | sed "s/segment /segment-/g" | cut -d" " -f1 | awk -F"\t" '{print $1"\t"$1"|"$2}' > ${i/blastn2/rep.ids};done
for i in *blastn2;do vk_correct_orient_fasta_blastn.sh $i ${i/blastn2/out}/Trinity.fasta;done
for i in *corr.fasta;do awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' ${i/blastn2*/rep.ids} $i > ${i/_trinity*/.fasta};done
pigz -d *fastq.gz
for i in *VRIS_S*.fasta;do tanoti -P 50 -p 1 -r $i -i ${i/VRIS*/VRIS*R1*fastq} ${i/VRIS*/VRIS*R1*fastq} -o ${i/VRIS*/VRIS.sam};done
for i in *sam;do sam2consensus.py -i $i -o ${i/.sam/.fasta};done
for i in *VRIS.fasta;do for j in `seq 1 1 8`; do grep -i "segment-$j" $i/*merged.fasta >> ${i/VRIS/refASM};done;done
for i in *VRIS.fasta;do for j in `seq 1 1 8`; do grep -A1 -i "segment-$j" $i/*merged.fasta >> ${i/VRIS/refASM};done;done
pigz *fastq

##Flu reference based
for i in *R1_001.fastq.gz;do /home/vsingh/softwares/lazypipe/lazypipe/trinityrnaseq-v2.10.0/Trinity --seqType fq --left $i --right ${i/R1/R2} --max_memory 200G --CPU 50 --no_run_chrysalis --output ${i/R1*/trinity_out};done
for i in *out;do blastn -query $i/inch*fa -db ~/softwares/ncbi_database/flu_database/flu.fasta -out `basename ${i/out/blastn}` -max_hsps 1 -outfmt "6 qseqid qlen qcovs stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore" -evalue 1e-05 -max_target_seqs 50 -num_threads 50;done
for i in *blastn;do for j in `seq 1 1 8`;do grep -i "segment $j" $i | awk -F"\t" '$3 >= 50 {print $0}' | sed "s/ /./g" | sort  -k2,2nr -k15,15nr | grep -v -i partial | head -1 >> ${i}2;done;done
for i in *blastn2;do cut -f4 $i | cut -d"." -f1 > ${i/blastn2/ref.ids};done
#for i in *ref.ids;do epost -input $i -db nucleotide | efetch -format fasta > ${i/ids/fasta};done
for i in *ref.ids;do for j in `cat $i | cut -d" " -f1`;do grep -A1 ${j} /home/vsingh/softwares/ncbi_database/flu_database/flu.fasta | sed "/^--/d" | sed "s/ //g" >> ${i/ids/fasta};done;done
for i in *ref.fasta;do for j in `seq 1 1 8`; do sed -i "s/segment\ ${j}.*/segment\ ${j}/g" $i;done;done
for i in *ref.fasta;do for j in `seq 1 1 8`; do sed -i "s/ .*segment/_segment/g" $i;done;done
for i in *ref.fasta;do for j in `seq 1 1 8`; do sed -i "s/ /-/g" $i ;done;done
pigz -d *fastq.gz
for i in *R1_001.fastq;do tanoti -P 50 -p 1 -r ${i/R1*/trinity_ref.fasta} -i $i ${i/_R1/_R2} -o ${i/R1*/tanoti.sam};done
for i in *sam;do sam2consensus.py -i $i -o ${i/.sam/.fasta};done
for i in *tanoti.fasta;do for j in `seq 1 1 8`;do cat $i/* | grep -A1 -i segment-$j >> ${i/tanoti*/refASM.fasta};done;done
for i in *refASM.fasta;do for j in `seq 1 1 8`; do sed -i "s/c25.*_segment/segment/g" $i;done;done
for i in *refASM.fasta;do sed -i "s/_tanoti//g" $i;done
for i in *refASM.fasta;do for j in `seq 1 1 8`; do sed -i "s/ /\|/g" $i;done;done
for i in *refASM.fasta;do cut -d"|" -f1,2,3,4 $i > ${i/ASM/ASM2};done


for i in *refASM.fasta;do for j in `seq 1 1 8`; do sed -i "s/segment-${j}.*/segment-${j}|segment-${j}/g" $i;done;done
for i in *refASM.fasta;do sed -i "s/_tanoti//g" $i;done
for i in *refASM.fasta;do cut -d"|" -f1,3 $i > ${i/ASM/ASM2};done
pigz *.fastq

#### octoflu ##########
docker pull flucrew/octoflu
docker run -it -v ${PWD}:/data flucrew/octoflu:latest /bin/bash
git clone https://github.com/flu-crew/octoFLU.git
cd octoFLU
makeblastdb -in ./reference_data/reference.fa -dbtype nucl
bash octoFLU.sh /data/flu-sequence.fasta


########## nf-flu ############
echo -e "sample,fastq_1,fastq_2" >> samplesheet.csv
for i in *R1_001.fastq.gz;do echo -e "${i/_S*},`pwd`/$i,`pwd`/${i/R1/R2}" >> samplesheet.csv;done
sudo nextflow run CFIA-NCFAD/nf-flu --input samplesheet.csv --platform illumina -profile docker


## PCV3
for i in *R1_001.fastq.gz;do /home/vsingh/softwares/lazypipe/lazypipe/trinityrnaseq-v2.10.0/Trinity --seqType fq --left $i --right ${i/R1/R2} --max_memory 200G --CPU 50 --no_run_chrysalis --output ${i/R1*/trinity_out};done
for i in *trinity_out;do blastn -query $i/inchworm.DS.fa -db ~/softwares/ncbi_database/pcv3_database/PCV3.fasta -out `basename ${i/out/blastn}` -max_hsps 1 -outfmt "6 qseqid qlen qcovs stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore" -evalue 1e-05 -max_target_seqs 50 -num_threads 50;done
for i in *blastn;do sort -k2,2nr $i| grep -i "Porcine circovirus 3" | grep -v partial | cut -f4 | cut -d" " -f1 | head -1 >${i/trinity*/epost.ids};done
for i in *ids;do epost -input $i -db nucleotide | efetch -format fasta > ${i/epost*/PCV3_ref.fasta};done
pigz -d *gz
for i in *R1_001.fastq;do tanoti -P 50 -p 1 -r ${i/R1*/PCV3_ref.fasta} -i $i ${i/R1/R2}  -o ${i/R1*/tanoti.sam};done
for i in *sam;do python2 ~/vdl/Mor_Project_151/rota/D20.008687.1/sam2consensus/sam2consensus.py -i $i -o ${i/sam/fasta};done
for i in *tanoti.fasta;do cp $i/*fasta ${i/tanoti*/PCV3.fasta};done
for i in *PCV3.fasta;do sed -i "s/_tanoti.*/|PCV3/g" $i;done


##################################################################################################################################################################

docker image rm $(docker image ls -q) --force

## puma papiloma virus annotation

git clone https://github.com/KVD-lab/puma.git

cd puma/data_dir

docker run --rm -v `pwd`/../data_dir:/data -v `pwd`/../input_and_output:/in_out -v `pwd`/../scripts:/script2 kvdlab/puma:1.2.1 /script2/run_puma.py -i /in_out/mallard-duck.fasta -o /in_out/puma_out2 -d /data

##RaxML tree
./raxml-ng --all --msa toti-orf2-AA.fasta --model LG+G8+F --tree pars{10} --bs-trees 100 --threads 50

##Kraken
sudo nextflow run angelovangel/nextflow-kraken2 --readsdir . --fqpattern '*_R{1,2}*.fastq.gz' --kraken_db /home/vsingh/db/kraken/minikraken_8GB_202003.tgz


##V-pipe

conda deactivate
PERL5LIB="";
conda activate V-pipe
mkdir my_project_work_dir
cd my_project_work_dir
/home/vsingh/softwares/v-pipe/V-pipe/init_project.sh


### PRRSv annotation + RFLP analysis #####
for i in *fasta;do /home/vsingh/softwares/SPAR/SPAR/run2.py annotate --output ${i/fasta}gff3 $i;done

for i in *fasta;do /home/vsingh/softwares/SPAR/SPAR/run2.py rflp $i > ${i/fasta}rflp.fna;done

for i in ORF1a ORF1ab ORF2 ORF2b ORF3 ORF4 ORF5a ORF5 ORF6 ORF7;do grep -A1 -E "$i$" D21-037947-SERM_pool11-13.cds3.fasta | sed "/^--$/d" > ${i}.fasta;done

conda activate agat
PERL5LIB="";
for i in *gff3;do agat_sp_extract_sequences.pl --gff $i --fasta ${i/gff3}fasta -t cds -o ${i/gff3}cds.fasta;done
for i in *cds.fasta;do sed "s/ .*//g" $i | sed "s/gene/CDS/g" | sed "s/\:/_/g" | sed "s/|/_/g" | sed "s/\//_/g" > ${i/cds/cds2};done
for i in *cds2.fasta;do vk_multiFastaSingleFasta.sh $i;done
for i in ORF1a ORF1ab ORF2 ORF3 ORF4 ORF5 ORF6 ORF7;do java -jar ~/softwares/FigTree_v1.4.4/lib/figtree.jar -graphic PDF -width 600 -height 800 ${i}_aligned_nogaps.fasta.raxml.support ${i}_tree.pdf;done

### Steph PRRSv annotation + RFLP analysis #####
for i in Arteriviridae.fa; do diamond blastx -d ~/softwares/ncbi_database/virus_proteome/viral_refseq.dmnd -q $i -f 6 qseqid qlen qcovhsp stitle slen pident length mismatch gapopen qstart qend sstart send evalue bitscore --sensitive --no-auto-append --top 10 --out ${i/fa/blastx};done
vk_correct_orient_fasta_blastx.sh Arteriviridae.blastx Arteriviridae.fa

for i in *fasta;do /home/vsingh/softwares/SPAR/SPAR/run2.py annotate --output ${i/fasta}gff3 $i;done
for i in *gff3;do sed '5d;16d' $i | sed "s/ORF1ab/ORF1b/g" > ${i/.gff3/_ORF1b.gff3};done
conda activate agat
PERL5LIB="";
for i in *_ORF1b.gff3;do agat_sp_extract_sequences.pl --gff $i --fasta ${i/_ORF1b.gff3}.fasta -t cds -o ${i/gff3}ORF.fasta;done
for i in *ORF.fasta;do sed "s/-gene.*//g" $i| sed "s/:/_/g" > ${i/fasta/fasta2};done
for i in *ORF.fasta2;do awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' $i > ${i/fasta2/fasta3};done
for i in *ORF.fasta3;do vk_multiFastaSingleFasta.sh $i;done
rm *ORF.fasta *ORF.fasta2

### calcivirus annotation
mkdir calci
staphb-tk vadr fcv.fasta calci --out_allfasta 


## nonopore

viroconstrictor --input reads --output virocons --primers NONE --platform nanopore --reference ref.fa --threads 50 --features NONE --amplicon-type end-to-end


### 
python3 RISST.py -i *.fasta -r A_pleuropneumoniae_cps_locus_reference.gbk  -f

##nf-core/mag
nextflow run nf-core/mag -r 2.2.1 -profile singularity --input '*_R{1,2}*.fastq.gz' --outdir myco


##Reference-free clustering and consensus forming of long-read amplicon sequencing
https://github.com/ksahlin/NGSpeciesID


####### s hyicus
awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' core_gene_alignment.aln > core_gene_alignment.aln2
for i in `cat tt2`;do grep -w -A1 "${i}" core_gene_alignment.aln2 >> core_gene_alignment.aln3; done
panito core_gene_alignment.aln3 > core_gene_alignment.csv

####### amzon s3 data back-ups
https://www.msi.umn.edu/support/faq/how-do-i-use-second-tier-storage-command-line

ssh vsingh@login.msi.umn.edu
srun -N 1 -n 1 -t 4:00:00 -p interactive --tmp 20gb --pty bash
ssh vsingh@mesabi.msi.umn.edu
s3cmd sync /home/vdl/data_delivery/umgc/ s3://vikash
s3cmd sync /home/vdl/data_release/umgc/miseq s3://vikash
s3cmd ls s3://vikash/


s3cmd sync /home/skumar/data_release/umgc/data_delivery s3://sunil-mor