Skip to content

Commit

Permalink
Merge pull request #17 from clatworthylab/major-revision
Browse files Browse the repository at this point in the history
major revision
zktuong authored Aug 12, 2022
2 parents f72ec62 + 8b5cdf1 commit 834f6eb
Showing 885 changed files with 4,745 additions and 57,330 deletions.
9 changes: 5 additions & 4 deletions .github/workflows/test_py3.yaml
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@ jobs:
max-parallel: 5
matrix:
os: [ubuntu-latest]
python-version: [3.7, 3.8, 3.9, "3.10"]
python-version: [3.8, 3.9, "3.10"]
runs-on: ${{ matrix.os }}
env:
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
@@ -53,16 +53,17 @@ jobs:
with:
auto-activate-base: true
auto-update-conda : true
activate-environment: bulkBCRseq
activate-environment: isotyper
channels: conda-forge, bioconda, anaconda, defaults
channel-priority: true
python-version: ${{ matrix.python-version }}
environment-file: environment.yml
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!

- name: Test with pytest
run: |
pytest --cov=BIN --cov-report=xml
export PYTHONPATH=/home/runner/work/bulkBCRseq/bulkBCRseq
pytest --cov=isotyper --cov-report=xml
- name: Show coverage
run: |
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -2,4 +2,5 @@
*.pyc
*.ipynb_checkpoints
SOP_RBR_RT-PCR Barcoded IsoTyper.pdf
SOP_RBR_RT-PCR Barcoded IsoTyper_murine.pdf
SOP_RBR_RT-PCR Barcoded IsoTyper_murine.pdf
tests/output/
2,815 changes: 0 additions & 2,815 deletions BIN/Generate_repertoire_statistics.py

This file was deleted.

3,758 changes: 0 additions & 3,758 deletions BIN/Read_processing_and_quality.py

This file was deleted.

9 changes: 0 additions & 9 deletions BIN/__init__.py

This file was deleted.

Binary file removed BIN/bam2fastx
Binary file not shown.
Binary file removed BIN/blastall
Binary file not shown.
47 changes: 0 additions & 47 deletions BIN/fasta_to_fastq.pl

This file was deleted.

24 changes: 0 additions & 24 deletions BIN/functions/__init__.py

This file was deleted.

252 changes: 0 additions & 252 deletions BIN/functions/_functions.py

This file was deleted.

578 changes: 0 additions & 578 deletions Processing_sequences_large_scale.py

This file was deleted.

135 changes: 88 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5717959.svg)](https://doi.org/10.5281/zenodo.5717959)
[![codecov](https://codecov.io/gh/clatworthylab/bulkBCRseq/branch/master/graph/badge.svg?token=I6APMCARTA)](https://codecov.io/gh/clatworthylab/bulkBCRseq)

# bulk_BCR_analysis
Bulk BCR-seq processing scripts use in Fitzpatrick et al., Nature (2020). Package belongs to Rachael Bashford-Rogers.
# bulkBCRseq : isotyper
Bulk BCR-seq processing package used in `Fitzpatrick et al., Nature (2020)`. The original (legacy) package/scripts was provided by Dr. Rachael Bashford-Rogers (Oxford).

This repo is an older version of what seems to be now at https://github.com/rbr1/BCR_TCR_PROCESSING_PIPELINE.
This repository is a `python3` reimplementation of the original `python2` scripts (found in [legacy branch](https://github.com/clatworthylab/bulkBCRseq/tree/legacy)); the original script is an older version of what seems to be now at https://github.com/rbr1/BCR_TCR_PROCESSING_PIPELINE.

Requires python>=3.7 (or 2.7 if using the legacy branch). Currently only works when cloned onto farm with all paths set up pointing to this folder properly.
Requires `python>=3.8` (or `python==2.7.9` if using the [legacy branch](https://github.com/clatworthylab/bulkBCRseq/tree/legacy)).

## Citation
Please cite the following papers:
```
Fitzpatrick, Z., Frazer, G., Ferro, A., Clare, S., Bouladoux, N., Ferdinand, J., Tuong, Z.K., Negro-Demontel, M.L., Kumar, N., Suchanek, O. and Tajsic, T., 2020. Gut-educated IgA plasma cells defend the meningeal venous sinuses. Nature, 587(7834), pp.472-476.
*Fitzpatrick, Z., Frazer, G., Ferro, A., Clare, S., Bouladoux, N., Ferdinand, J., Tuong, Z.K., Negro-Demontel, M.L., Kumar, N., Suchanek, O. and Tajsic, T., 2020. Gut-educated IgA plasma cells defend the meningeal venous sinuses. Nature, 587(7834), pp.472-476.*

*Bashford-Rogers, R.J., Palser, A.L., Huntly, B.J., Rance, R., Vassiliou, G.S., Follows, G.A. and Kellam, P., 2013. Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations. Genome research, 23(11), pp.1874-1884.*

*Bashford-Rogers, R.J.M., Bergamaschi, L., McKinney, E.F., Pombal, D.C., Mescia, F., Lee, J.C., Thomas, D.C., Flint, S.M., Kellam, P., Jayne, D.R.W. and Lyons, P.A., 2019. Analysis of the B cell receptor repertoire in six immune-mediated diseases. Nature, 574(7776), pp.122-126.*
Bashford-Rogers, R.J., Palser, A.L., Huntly, B.J., Rance, R., Vassiliou, G.S., Follows, G.A. and Kellam, P., 2013. Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations. Genome research, 23(11), pp.1874-1884.
Bashford-Rogers, R.J.M., Bergamaschi, L., McKinney, E.F., Pombal, D.C., Mescia, F., Lee, J.C., Thomas, D.C., Flint, S.M., Kellam, P., Jayne, D.R.W. and Lyons, P.A., 2019. Analysis of the B cell receptor repertoire in six immune-mediated diseases. Nature, 574(7776), pp.122-126.
```

## Pre-requisites:
## Setup:
```bash
# create a conda virtual environment
# sample for python 3 set up, switch to python 2 where appropriate
@@ -28,63 +28,104 @@ wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh
bash Miniconda3-py39_4.12.0-Linux-x86_64.sh
eval "$(/path/to/miniconda2/bin/conda shell.bash hook)"
conda init
conda create --name py3 python=3.9
conda create --name isotyper python=3.9

# clone this repo
# clone this repository
git clone https://github.com/clatworthylab/bulkBCRseq

# change into the directory and install dependencies
cd bulkBCRseq
conda env update --name py3 --file environment.yml
conda env update --name isotyper --file environment.yml
```

Usage instructions on Farm:
```bash
conda activate py3
# eithe run this everytime or just
# export to your ~/.bashrc or ~/.bash_profile
export PYTHONPATH=/path/to/bulkBCRseq:$PYTHONPATH
# always activate the environment before proceeding
conda activate isotyper
# main usage
python /path/to/bulkBCRseq/isotyper.py [options]
```

# if necessary, change path to where bulkBCRseq folder is
# cd path/to/bulkBCRseq
```
usage: isotyper.py [-h] [-i INPUT] [-s STEP] [-l LENGTH] [-dr] [-b] [-c CORES] [-m MEM] [-q QUEUE] [-p PROJECT] [-g GROUP]
options:
-h, --help show this help message and exit
main arguments:
-i INPUT, --input INPUT
input meta.txt file to run isotyper.
file must contain the following four columns:
1st column - name of sample.
2nd column - path to input file. Either .cram file or read 1 fastq(.gz) file.
3rd column - path to output folder.
4th column - organism. Either HOMO_SAPIENS or MUS_MUSCULUS.
no column names allowed.
-s STEP, --step STEP step to perform:
1 - Convert raw sequencing files to fastq and perform QC.
2 - Trim and filter reads.
3 - Generate networks.
4 - Generate network statistics.
-l LENGTH, --length LENGTH
minimum length of reads to keep. [Default 100]
-dr, --dryrun if passed, prints commands but don't actually run.
bsub arguments:
-b, --bsub if passed, submits each row in meta.txt file as a job to bsub.
-c CORES, --cores CORES
number of cores to run this on. [Default 10]
-m MEM, --mem MEM job memory request. [Default 8000]
-q QUEUE, --queue QUEUE
job queue to submit to. [Default normal]
-p PROJECT, --project PROJECT
sanger project to send as job. [Default team205]
-g GROUP, --group GROUP
sanger group to send as job. [Default teichlab]
```

## Note!
If you are starting from fastq files directly, please change the 5th column in the `.txt` file (path to `.cram`) to path to `_R1_001.fastq.gz` (read1) instead. If your read1 suffix isn't this pattern, please modify the `R1PATTERN` variable in here directly, after cloning this repository:
https://github.com/clatworthylab/bulkBCRseq/blob/3d17a2752a6b482f50c0b8d211db94ddf5e655d1/BIN/Read_processing_and_quality.py#L3641-L3643
If you are starting from fastq files directly, please change the 2nd column in the `.txt` file (path to `.cram`) to path to `_R1_001.fastq.gz` (read1) instead. If your read1/read2 suffix isn't this pattern, please modify the `R1PATTERN` and `R2PATTERN` variables file after cloning this repository, in the `_settings.py` directly:
https://github.com/clatworthylab/bulkBCRseq/blob/5d310de8863b64352d68230977c6e7e62d5c0b8f/isotyper/utilities/_settings.py#L25-L27


## Basic usage:
### Basic usage
```bash
python Processing_sequences_large_scale.py [sample file list] [commands (comma separated list)] [bsub command: Y/N] [print commands: Y/N] [run commands: Y/N]
# initial QC
python isotyper.py -i meta.txt -s 1
# trimming
python isotyper.py -i meta.txt -s 2
# generate network
python isotyper.py -i meta.txt -s 3
# generate network statistic
python isotyper.py -i meta.txt -s 4
```
Available commands: 1, 2, 3, 4

### Basic analysis: 1 - Converting raw sequencing files to fastq, QC
If using Sanger's farm:
```bash
python Processing_sequences_large_scale.py Samples_Mouse_Zach.txt 1 Y Y Y
# initial QC
python isotyper.py -i meta.txt -s 1 --bsub
# trimming
python isotyper.py -i meta.txt -s 2 --bsub
# generate network
python isotyper.py -i meta.txt -s 3 --bsub
# generate network statistic
python isotyper.py -i meta.txt -s 4 --bsub
```

### Basic analysis: 2 - Trimming and filtering reads
```bash
python Processing_sequences_large_scale.py Samples_Mouse_Zach.txt 2 Y Y Y
```
### Basic analysis: 3 - Network generation
```bash
python Processing_sequences_large_scale.py Samples_Mouse_Zach.txt 3 Y Y Y
```
### Basic analysis: 4 - Generating network and population statistics
```bash
python Processing_sequences_large_scale.py Samples_Mouse_Zach.txt 4 Y Y Y
```
Take a look [here](https://github.com/clatworthylab/bulkBCRseq/tree/master/tests/data) for example files to provide to the tool.


### Post-processing

After running steps `1` to `4`, please annotate the `Fully_reduced_{sample_id}.fasta` file for downstream analysis. You can annotate with [IMGT/HighV-QUEST](https://imgt.org/HighV-QUEST/home.action) or via other software e.g. [MiXCR](https://mixcr.readthedocs.io/en/latest/) in shotgun mode.

## Advanced usage - some private adjustments - not complete!:
```bash
python Processing_sequences_large_scale.py [sample file list] [concat file list] [commands (comma separated list)] [bsub command: Y/N] [print commands: Y/N] [run commands: Y/N]
```
Available commands: 3.5, 3.51
### Create the network from fully reduced fasta sequences: 3.5
```bash
python Processing_sequences_large_scale.py Samples_Mouse_DSS_2020.txt Samples_Mouse_DSS_2020_combined.txt 3.5 Y Y Y
mixcr analyze shotgun -s hsa --starting-material rna --receptor-type igh Fully_reduced_{sample_id}.fasta {sample_id}
# export to AIRR format
mixcr exportAirr --imgt-gaps in.[vdjca|clns|clna] out.tsv
```
### rerun the network generation pipeline using AIRR files: 3.51
```bash
python Processing_sequences_large_scale.py Samples_Mouse_DSS_2020.txt Samples_Mouse_DSS_2020_combined.txt 3.51 Y Y Y
```

To generate the network plots, you would use the node table (`Att_{sample_id}.txt`) and edge table (`Edges_{sample_id}.txt`) and feed it into a graphing software e.g. `networkx`/`igraph` and continue as per normal. The `orphan` folder has example scripts (probably buggy) on how to use `python-igraph` to generate the plots.

3 changes: 0 additions & 3 deletions Samples_Menna_Ondrej1.txt

This file was deleted.

7 changes: 0 additions & 7 deletions bulkBCRseq/__init__.py

This file was deleted.

525 changes: 0 additions & 525 deletions database/blast/human/human_BCR_C.fasta

This file was deleted.

Binary file removed database/blast/human/human_BCR_C.fasta.ndb
Binary file not shown.
Binary file removed database/blast/human/human_BCR_C.fasta.nhr
Binary file not shown.
Binary file removed database/blast/human/human_BCR_C.fasta.nin
Binary file not shown.
Binary file removed database/blast/human/human_BCR_C.fasta.nog
Binary file not shown.
Binary file removed database/blast/human/human_BCR_C.fasta.nos
Binary file not shown.
Binary file removed database/blast/human/human_BCR_C.fasta.not
Binary file not shown.
Binary file removed database/blast/human/human_BCR_C.fasta.nsq
Binary file not shown.
Binary file removed database/blast/human/human_BCR_C.fasta.ntf
Binary file not shown.
Binary file removed database/blast/human/human_BCR_C.fasta.nto
Binary file not shown.
154 changes: 0 additions & 154 deletions database/blast/mouse/mouse_BCR_C.fasta

This file was deleted.

Binary file removed database/blast/mouse/mouse_BCR_C.fasta.ndb
Binary file not shown.
Binary file removed database/blast/mouse/mouse_BCR_C.fasta.nhr
Binary file not shown.
Binary file removed database/blast/mouse/mouse_BCR_C.fasta.nin
Binary file not shown.
Binary file removed database/blast/mouse/mouse_BCR_C.fasta.nog
Binary file not shown.
Binary file removed database/blast/mouse/mouse_BCR_C.fasta.nos
Binary file not shown.
Binary file removed database/blast/mouse/mouse_BCR_C.fasta.not
Binary file not shown.
Binary file removed database/blast/mouse/mouse_BCR_C.fasta.nsq
Binary file not shown.
Binary file removed database/blast/mouse/mouse_BCR_C.fasta.ntf
Binary file not shown.
Binary file removed database/blast/mouse/mouse_BCR_C.fasta.nto
Binary file not shown.
5 changes: 0 additions & 5 deletions database/germlines/imgt/IMGT.yaml

This file was deleted.

1,121 changes: 0 additions & 1,121 deletions database/germlines/imgt/human/constant/imgt_human_IGHC.fasta

This file was deleted.

36 changes: 0 additions & 36 deletions database/germlines/imgt/human/constant/imgt_human_IGKC.fasta

This file was deleted.

92 changes: 0 additions & 92 deletions database/germlines/imgt/human/constant/imgt_human_IGLC.fasta

This file was deleted.

10 changes: 0 additions & 10 deletions database/germlines/imgt/human/constant/imgt_human_TRAC.fasta

This file was deleted.

31 changes: 0 additions & 31 deletions database/germlines/imgt/human/constant/imgt_human_TRBC.fasta

This file was deleted.

10 changes: 0 additions & 10 deletions database/germlines/imgt/human/constant/imgt_human_TRDC.fasta

This file was deleted.

74 changes: 0 additions & 74 deletions database/germlines/imgt/human/constant/imgt_human_TRGC.fasta

This file was deleted.

624 changes: 0 additions & 624 deletions database/germlines/imgt/human/leader/imgt_human_IGHL.fasta

This file was deleted.

321 changes: 0 additions & 321 deletions database/germlines/imgt/human/leader/imgt_human_IGKL.fasta

This file was deleted.

177 changes: 0 additions & 177 deletions database/germlines/imgt/human/leader/imgt_human_IGLL.fasta

This file was deleted.

200 changes: 0 additions & 200 deletions database/germlines/imgt/human/leader/imgt_human_TRAL.fasta

This file was deleted.

215 changes: 0 additions & 215 deletions database/germlines/imgt/human/leader/imgt_human_TRBL.fasta

This file was deleted.

36 changes: 0 additions & 36 deletions database/germlines/imgt/human/leader/imgt_human_TRDL.fasta

This file was deleted.

43 changes: 0 additions & 43 deletions database/germlines/imgt/human/leader/imgt_human_TRGL.fasta

This file was deleted.

89 changes: 0 additions & 89 deletions database/germlines/imgt/human/vdj/imgt_human_IGHD.fasta

This file was deleted.

31 changes: 0 additions & 31 deletions database/germlines/imgt/human/vdj/imgt_human_IGHJ.fasta

This file was deleted.

2,831 changes: 0 additions & 2,831 deletions database/germlines/imgt/human/vdj/imgt_human_IGHV.fasta

This file was deleted.

19 changes: 0 additions & 19 deletions database/germlines/imgt/human/vdj/imgt_human_IGKJ.fasta

This file was deleted.

757 changes: 0 additions & 757 deletions database/germlines/imgt/human/vdj/imgt_human_IGKV.fasta

This file was deleted.

21 changes: 0 additions & 21 deletions database/germlines/imgt/human/vdj/imgt_human_IGLJ.fasta

This file was deleted.

686 changes: 0 additions & 686 deletions database/germlines/imgt/human/vdj/imgt_human_IGLV.fasta

This file was deleted.

180 changes: 0 additions & 180 deletions database/germlines/imgt/human/vdj/imgt_human_TRAJ.fasta

This file was deleted.

785 changes: 0 additions & 785 deletions database/germlines/imgt/human/vdj/imgt_human_TRAV.fasta

This file was deleted.

7 changes: 0 additions & 7 deletions database/germlines/imgt/human/vdj/imgt_human_TRBD.fasta

This file was deleted.

33 changes: 0 additions & 33 deletions database/germlines/imgt/human/vdj/imgt_human_TRBJ.fasta

This file was deleted.

1,029 changes: 0 additions & 1,029 deletions database/germlines/imgt/human/vdj/imgt_human_TRBV.fasta

This file was deleted.

7 changes: 0 additions & 7 deletions database/germlines/imgt/human/vdj/imgt_human_TRDD.fasta

This file was deleted.

9 changes: 0 additions & 9 deletions database/germlines/imgt/human/vdj/imgt_human_TRDJ.fasta

This file was deleted.

169 changes: 0 additions & 169 deletions database/germlines/imgt/human/vdj/imgt_human_TRDV.fasta

This file was deleted.

14 changes: 0 additions & 14 deletions database/germlines/imgt/human/vdj/imgt_human_TRGJ.fasta

This file was deleted.

134 changes: 0 additions & 134 deletions database/germlines/imgt/human/vdj/imgt_human_TRGV.fasta

This file was deleted.

1,218 changes: 0 additions & 1,218 deletions database/germlines/imgt/human/vdj_aa/imgt_aa_human_IGHV.fasta

This file was deleted.

325 changes: 0 additions & 325 deletions database/germlines/imgt/human/vdj_aa/imgt_aa_human_IGKV.fasta

This file was deleted.

295 changes: 0 additions & 295 deletions database/germlines/imgt/human/vdj_aa/imgt_aa_human_IGLV.fasta

This file was deleted.

340 changes: 0 additions & 340 deletions database/germlines/imgt/human/vdj_aa/imgt_aa_human_TRAV.fasta

This file was deleted.

442 changes: 0 additions & 442 deletions database/germlines/imgt/human/vdj_aa/imgt_aa_human_TRBV.fasta

This file was deleted.

73 changes: 0 additions & 73 deletions database/germlines/imgt/human/vdj_aa/imgt_aa_human_TRDV.fasta

This file was deleted.

58 changes: 0 additions & 58 deletions database/germlines/imgt/human/vdj_aa/imgt_aa_human_TRGV.fasta

This file was deleted.

581 changes: 0 additions & 581 deletions database/germlines/imgt/mouse/constant/imgt_mouse_IGHC.fasta

This file was deleted.

57 changes: 0 additions & 57 deletions database/germlines/imgt/mouse/constant/imgt_mouse_IGKC.fasta

This file was deleted.

43 changes: 0 additions & 43 deletions database/germlines/imgt/mouse/constant/imgt_mouse_IGLC.fasta

This file was deleted.

17 changes: 0 additions & 17 deletions database/germlines/imgt/mouse/constant/imgt_mouse_TRAC.fasta

This file was deleted.

21 changes: 0 additions & 21 deletions database/germlines/imgt/mouse/constant/imgt_mouse_TRBC.fasta

This file was deleted.

10 changes: 0 additions & 10 deletions database/germlines/imgt/mouse/constant/imgt_mouse_TRDC.fasta

This file was deleted.

42 changes: 0 additions & 42 deletions database/germlines/imgt/mouse/constant/imgt_mouse_TRGC.fasta

This file was deleted.

783 changes: 0 additions & 783 deletions database/germlines/imgt/mouse/leader/imgt_mouse_IGHL.fasta

This file was deleted.

423 changes: 0 additions & 423 deletions database/germlines/imgt/mouse/leader/imgt_mouse_IGKL.fasta

This file was deleted.

15 changes: 0 additions & 15 deletions database/germlines/imgt/mouse/leader/imgt_mouse_IGLL.fasta

This file was deleted.

409 changes: 0 additions & 409 deletions database/germlines/imgt/mouse/leader/imgt_mouse_TRAL.fasta

This file was deleted.

131 changes: 0 additions & 131 deletions database/germlines/imgt/mouse/leader/imgt_mouse_TRBL.fasta

This file was deleted.

51 changes: 0 additions & 51 deletions database/germlines/imgt/mouse/leader/imgt_mouse_TRDL.fasta

This file was deleted.

24 changes: 0 additions & 24 deletions database/germlines/imgt/mouse/leader/imgt_mouse_TRGL.fasta

This file was deleted.

77 changes: 0 additions & 77 deletions database/germlines/imgt/mouse/vdj/imgt_mouse_IGHD.fasta

This file was deleted.

19 changes: 0 additions & 19 deletions database/germlines/imgt/mouse/vdj/imgt_mouse_IGHJ.fasta

This file was deleted.

2,769 changes: 0 additions & 2,769 deletions database/germlines/imgt/mouse/vdj/imgt_mouse_IGHV.fasta

This file was deleted.

21 changes: 0 additions & 21 deletions database/germlines/imgt/mouse/vdj/imgt_mouse_IGKJ.fasta

This file was deleted.

1,065 changes: 0 additions & 1,065 deletions database/germlines/imgt/mouse/vdj/imgt_mouse_IGKV.fasta

This file was deleted.

15 changes: 0 additions & 15 deletions database/germlines/imgt/mouse/vdj/imgt_mouse_IGLJ.fasta

This file was deleted.

122 changes: 0 additions & 122 deletions database/germlines/imgt/mouse/vdj/imgt_mouse_IGLV.fasta

This file was deleted.

169 changes: 0 additions & 169 deletions database/germlines/imgt/mouse/vdj/imgt_mouse_TRAJ.fasta

This file was deleted.

Loading

0 comments on commit 834f6eb

Please sign in to comment.