Skip to content

Commit

Permalink
fix merge conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
lrauschning committed Dec 19, 2024
2 parents c8d8f99 + d1784b6 commit 12cbd50
Show file tree
Hide file tree
Showing 22 changed files with 555 additions and 272 deletions.
31 changes: 31 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# This workflow will lint python and cython files using pylint and cython, respectively
# adapted from https://github.com/actions/starter-workflows/blob/main/ci/pylint.yml
# Linting can be configured in pyproject.toml

name: "Lint using pylint and cython-lint"

on:
push:
branches: ["main", "dev"]
pull_request:
branches: ["main", "dev"]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: "Set up Python 3.12"
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip
- name: Install linters
run: |
python -m pip install --upgrade pip
pip install pylint cython-lint
- name: Run lint
run: |
pylint $(git ls-files 'msyd/*.py')
cython-lint --no-pycodestyle $(git ls-files '*.pyx')
44 changes: 0 additions & 44 deletions .github/workflows/python-package.yml

This file was deleted.

19 changes: 19 additions & 0 deletions .github/workflows/run_test_example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash


# hacky way to hopefully alias the calls
# normal alias does not seem to work in GitHub CI
# necessary, as the hacky git install does not install the CLI entrypoints
#echo "#!/bin/python" > syri
#echo "import syri.scripts.syri;syri.scripts.syri.main()" >> syri
#chmod +x ./syri
#echo "minimap2.py" > ./minimap2
#chmod +x ./minimap2
#PATH=$PATH:./
#syri --version
#minimap2
## run using source to preserve alias
#source ./example/example_workflow.sh

#$CONDA/bin/conda activate msyd
$(tail -n +2 ./example/example_workflow.sh | sed -e 's/^syri/python <(echo "import syri.scripts.syri;syri.scripts.syri.main()")/' -e 's/^minimap2/.\/minimap2/' )
61 changes: 61 additions & 0 deletions .github/workflows/test_build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# This workflow will use the build action to build the msyd python package, and call the CLI interface to check the install worked
name: Test build

on:
push:
branches: [ "main", "dev" ]
pull_request:
branches: [ "main", "dev" ]

# Cancel if a newer run is started
# taken from https://github.com/nf-core/modules/blob/master/.github/workflows/nf-test.yml
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true


jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13"]
steps:
- name: Checkout repo
uses: actions/checkout@v4
# use old install manually
# since switching to conda no longer supports different python version
# - name: Build msyd
# uses: schneebergerlab/msyd@main
# # with:
# # python-version: ${{ matrix.python-version }}
- name: Set up Python ${{ inputs.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
cache: pip
- name: Update pip
run: python -m pip install --upgrade pip setuptools
shell: bash
- name: Install SyRI manually
run: |
# manually install syris dependencies
# the python version spoofing requires the --no-deps flag, so this is necessary
pip install Cython numpy pandas scipy psutil igraph longestrunsubsequence pysam pulp
# manually use pip to install syri from github, as it isn't on pypi
# spoof python version to get around bounds check
pip install 'git+https://github.com/schneebergerlab/syri.git' --python-version '3.10' --no-deps --no-warn-conflicts --target $(python -m site --user-site)
shell: bash
- name: Install other dependencies
run: pip install -r requirements.txt
shell: bash
- name: Build msyd
run: pip install .
shell: bash
- name: Test installation
run: |
msyd --version
msyd -h
msyd call -h
msyd view -h
33 changes: 33 additions & 0 deletions .github/workflows/test_example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Test example.sh

on:
# do not run, as the conda package is currently broken,
# making getting the right environment not possible.
# push:
# branches: [ "main", "dev" ]
# pull_request:
# branches: [ "main", "dev" ]

# Cancel if a newer run is started
# taken from https://github.com/nf-core/modules/blob/master/.github/workflows/nf-test.yml
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true


jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Build msyd
uses: schneebergerlab/msyd@main
# with:
# python-version: "3.12"
# - name: Install minimap2 manually # conda doesn't seem to work
# run: |
# curl -L https://github.com/lh3/minimap2/releases/download/v2.28/minimap2-2.28_x64-linux.tar.bz2 | tar -jxvf -
# mv minimap2-2.28_x64-linux\/minimap2 ./
# ./minimap2 -h # test it worked & is callable
- name: Test example_run.sh
run: ./.github/workflows/run_test_example.sh
shell: bash
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# msyd
[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
![lint](https://github.com/schneebergerlab/msyd/actions/workflows/lint.yml/badge.svg)
![build](https://github.com/schneebergerlab/msyd/actions/workflows/test_build.yml/badge.svg)


msyd is still under active development, so expect some bugs and changes!
If in doubt about the behaviour of msyd or how it might change, feel free to reach out by opening an issue!
Expand Down
48 changes: 48 additions & 0 deletions action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# adapted from https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Build msyd
description: "Checks out, installs dependencies and builds the msyd package. Formulated as a composite action to reduce code duplication in testing. Composite workflows cannot perserve state. Currently broken b/c of a version bound in the SyRI conda package."

#inputs:
# python-version:
# description: 'Python version to use'
# required: true
# default: '3.12'

runs:
using: composite
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python ${{ inputs.python-version }}
uses: actions/setup-python@v5
with:
python-version: '3.10'
cache: pip
# - name: Update pip
# run: python -m pip install --upgrade pip setuptools
# shell: bash
# - name: Install SyRI manually
# run: |
# # manually install syris dependencies
# # the python version spoofing requires the --no-deps flag, so this is necessary
# pip install Cython numpy pandas scipy psutil igraph longestrunsubsequence pysam pulp
# # manually use pip to install syri from github, as it isn't on pypi
# # spoof python version to get around bounds check
# pip install 'git+https://github.com/schneebergerlab/syri.git' --python-version '3.10' --no-deps --no-warn-conflicts --target $(python -m site --user-site)
# shell: bash
# - name: Install other dependencies
# run: pip install -r requirements.txt
# shell: bash
- name: Setup conda env, install msyd
run: |
#$CONDA/bin/conda init
#source ~/.bashrc
#$CONDA/bin/conda env create -n msyd --file ./environment.yml
#$CONDA/bin/conda activate msyd
conda env update -n base --file ./environment.yml
shell: bash
# python -m pip install mappy
- name: Build msyd
run: pip install .
shell: bash
9 changes: 9 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::syri=1.7.0
- bioconda::minimap2=2.1.1
- bioconda::mappy=2.28
- conda-forge::cython=3.0.11
- conda-forge::intervaltree=3.1.0
50 changes: 30 additions & 20 deletions example/example_workflow.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
#!/bin/sh

## Download some genomes
# This file serves as an example workflow illustrating how and when to use msyd.
# It is a part of the msyd CI, and should pass so long as your system


## Download three publicly available, high quality A. thaliana genomes

# download the Col-CC assembly
curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_028009825.1/download?include_annotation_type=GENOME_FASTA&filename=GCA_028009825.1.zip" -H "Accept: application/zip"
Expand All @@ -15,14 +19,15 @@ curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GC
unzip "./*.zip"
mv ncbi_dataset/data/*/*.fna ./

## Prepare them for running msyd
### Prepare them for running msyd

# rename them to shorter names
## rename them to shorter names
mv GCA_001651475.1_Ler_Assembly_genomic.fna ler.fna
mv GCA_028009825.1_Col-CC_genomic.fna col.fna
mv GCA_902460295.1_Arabidopsis_thaliana_Sha_genomic.fna sha.fna
mv GCA_024498555.1_ASM2449855v1_genomic.fna swe.fna

## filter out small scaffolds
grep -n -P ">" ./*.fna
# col and swe do not require truncating,
# for ler the small scaffolds start at line 1442097
Expand All @@ -33,45 +38,50 @@ head -n 1480076 sha.fna > sha.filtered.fna
mv sha.filtered.fna sha.fna


## Generate inputs for msyd
### Generate inputs for msyd

# generate alignments to col-CC
## generate alignments to col-CC
mv col.fna ref.fna
minimap2 -cx asm5 --eqx ref.fna ler.fna > ler.paf
minimap2 -cx asm5 --eqx ref.fna sha.fna > sha.paf
minimap2 -cx asm5 --eqx ref.fna swe.fna > swe.paf
minimap2 -cx asm10 --eqx ref.fna ler.fna > ler.paf
minimap2 -cx asm10 --eqx ref.fna sha.fna > sha.paf
minimap2 -cx asm10 --eqx ref.fna swe.fna > swe.paf

# run syri on the alignments
## run syri on the alignments
# make sure to pass --cigar and specify appropriate prefixes, so the msyd output is more easily interpretable
syri --nc 5 -F P --cigar --prefix ler -c ler.paf -r ref.fna -q ler.fna --lf ler.syri.log --samplename ler
syri --nc 5 -F P --cigar --prefix sha -c sha.paf -r ref.fna -q sha.fna --lf sha.syri.log --samplename sha
syri --nc 5 -F P --cigar --prefix swe -c swe.paf -r ref.fna -q swe.fna --lf swe.syri.log --samplename swe

## construct genomes.tsv file
# as msyd needs many input files, the paths are stored in a samplesheet
echo "#name\taln\tsyri\tvcf\tseq" > genomes.tsv
for f in *syri.out
do
bs=$(basename $f syri.out)
echo "$bs\t$bs.paf\t${bs}syri.out\t${bs}syri.vcf\t${bs}.fna" >> genomes.tsv
done

# run msyd to call pansynteny
msyd call -i genomes.tsv -o athalianas.pff -m athalianas.vcf -r ref.fna
### run msyd to call multisynteny
msyd call -c 5 -i genomes.tsv -o athalianas.psf -m athalianas.vcf -r ref.fna

### work with the output

## work with the output
## export multisynteny on Chr3 for use in visualization/other software

# CP116282 is the id corresponding to chromosome 3 in Col-CC
msyd view -e "on CP116283.1" -i athalianas.pff -o athalianas-chr3.pff
# CP116283 is the id corresponding to chromosome 3 in Col-CC
# filter for multisynteny on this chromosome
msyd view -e "on CP116283.1" -i athalianas.psf -o athalianas-chr3.psf

# convert to VCF for use in visualization/other software
msyd view -i athalianas-chr3.pff -o athalianas-chr3-syn.vcf
# export to VCF; this could also be done in the command above
msyd view -i athalianas-chr3.psf -o athalianas-chr3-syn.vcf

## download 1001 genome project VCF, filter for vars in pansyntenic regions
## download 1001 genome project VCF, filter for small variants structurally conserved regions

curl https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-56/plants/variation/vcf/arabidopsis_thaliana/arabidopsis_thaliana.vcf.gz -o ensembl_athaliana.vcf.gz
gunzip ensembl_athaliana.vcf.gz

# change from ids to chr numbers, to match vcf nomenclature
sed -e s/CP116280.1/1/ -e s/CP116281.1/2/ -e s/CP116282.1/3/ -e s/CP116283.1/4/ -e s/CP116284.1/5/ athalianas.pff > athalianas-chrnames.pff
sed -e s/CP116280\.1/Chr1/ -e s/CP116281\.1/Chr2/ -e s/CP116282\.1/Chr3/ -e s/CP116283\.1/Chr4/ -e s/CP116284\.1/Chr5/ athalianas.psf > athalianas-chrnames.psf

# filter for variants in pansyntenic regions!
msyd view -i athalianas-chrnames.pff -e "deg >= 3" -o pansynt-vars.vcf --intersect ensembl_athaliana.vcf
# filter for variants in coresyntenic regions!
msyd view -i athalianas-chrnames.psf -e "deg >= 3" -o coresynt-snvs.vcf --intersect ensembl_athaliana.vcf
2 changes: 1 addition & 1 deletion msyd/annotate_sv.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,4 @@ def concatsyriout(syrifins, qrynames):

return
# END
CP116280.1,784681,893258,OX291513.1,791184,899980,INV,IP-Evs-12
#CP116280.1,784681,893258,OX291513.1,791184,899980,INV,IP-Evs-12
Loading

0 comments on commit 12cbd50

Please sign in to comment.