Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
shaomingfu committed Jul 16, 2021
1 parent dd33f61 commit a2f3c1e
Showing 1 changed file with 21 additions and 20 deletions.
41 changes: 21 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
# Introduction

Scallop-UMI is a reference-based transcript assembler for barcode-linked RNA-seq data.
The development of Scallop-UMI has been based on the [Scallop](https://github.com/Kingsford-Group/scallop) assembler.
Scallop2 is a reference-based transcript assembler
specifically optimized for paired-/multiple-end RNA-seq data.
The development of Scallop2 has been based on the [Scallop](https://github.com/Kingsford-Group/scallop) assembler.

# Installation

Scallop-UMI can be easily installed with conda: [![Anaconda-Server Badge](https://anaconda.org/bioconda/scallop-umi/badges/installer/conda.svg)](https://anaconda.org/bioconda/scallop-umi)
Scallop2 can be easily installed with conda: [![Anaconda-Server Badge](https://anaconda.org/bioconda/scallop2/badges/installer/conda.svg)](https://anaconda.org/bioconda/scallop2)

Install from source code: download the source code of Scallop-UMI from
[here](https://github.com/Shao-Group/scallop-umi/releases/download/v1.1.0/scallop-umi-1.1.0.tar.gz).
Scallop-UMI uses additional libraries of Boost and htslib.
Install from source code: download the source code of Scallop2 from
[here](https://github.com/Shao-Group/scallop2/releases/download/v1.1.1/scallop2-1.1.1.tar.gz).
Scallop2 uses additional libraries of Boost and htslib.
If they have not been installed in your system, you first
need to download and install them. You might also need to
export the runtime library path to certain environmental
variable (for example, `LD_LIBRARY_PATH`, for most linux distributions).
After install these dependencies, you then compile the source code of Scallop-UMI.
After install these dependencies, you then compile the source code of Scallop2.
If some of the above dependencies are not installed to the default system
directories (for example, `/usr/local`, for most linux distributions),
their corresponding installing paths should be specified to `configure` of Scallop-UMI.
their corresponding installing paths should be specified to `configure` of Scallop2.

## Download Boost
If Boost has not been downloaded/installed, download Boost
Expand Down Expand Up @@ -55,24 +56,24 @@ is an additional `lib` following the installation path):
export LD_LIBRARY_PATH=/path/to/your/htslib/lib:$LD_LIBRARY_PATH
```

## Build Scallop-UMI
## Build Scallop2

Use the following to compile Scallop-UMI:
Use the following to compile Scallop2:
```
./configure --with-htslib=/path/to/your/htslib --with-boost=/path/to/your/boost
make
```

If some of the dependencies are installed in the default system directory (for example, `/usr/lib`),
then the corresponding `--with-` option might not be necessary.
The executable file `scallop-umi` will appear at `src/scallop-umi`.
The executable file `scallop2` will appear at `src/scallop2`.


# Usage

The usage of `scallop-umi` is:
The usage of `scallop2` is:
```
./scallop-umi -i <input.bam> -o <output.gtf> [options]
./scallop2 -i <input.bam> -o <output.gtf> [options]
```

The `input.bam` is the read alignment file generated by some RNA-seq aligner, (for example, STAR or HISAT2).
Expand All @@ -83,17 +84,17 @@ samtools sort input.bam > input.sort.bam

The reconstructed transcripts shall be written as gtf format into `output.gtf`.

Scallop-UMI support the following parameters. Please refer
Scallop2 support the following parameters. Please refer
to the additional explanation below the table.

Parameters | Default Value | Description
------------------------- | ------------- | ----------
--help | | print usage of Scallop-UMI and exit
--version | | print version of Scallop-UMI and exit
--help | | print usage of Scallop2 and exit
--version | | print version of Scallop2 and exit
--preview | | show the inferred `library_type` and exit
--verbose | 1 | chosen from {0, 1, 2}
--library_type | empty | chosen from {empty, unstranded, first, second}
--min_transcript_coverage | 0.5 | the minimum coverage required to output a multi-exon transcript
--min_transcript_coverage | 1.5 | the minimum coverage required to output a multi-exon transcript
--min_single_exon_coverage | 20 | the minimum coverage required to output a single-exon transcript
--min_transcript_length_base |150 | the minimum base length of a transcript
--min_transcript_length_increase | 50 | the minimum increased length of a transcript with each additional exon
Expand All @@ -108,13 +109,13 @@ to the additional explanation below the table.

2. `--library_type` is highly recommended to provide. The `unstranded`, `first`, and `second`
correspond to `fr-unstranded`, `fr-firststrand`, and `fr-secondstrand` used in standard Illumina
sequencing libraries. If none of them is given, i.e., it is `empty` by default, then Scallop-UMI
sequencing libraries. If none of them is given, i.e., it is `empty` by default, then Scallop2
will try to infer the `library_type` by itself (see `--preview`). Notice that such inference is based
on the `XS` tag stored in the input `bam` file. If the input `bam` file do not contain `XS` tag,
then it is essential to provide the `library_type` to Scallop-UMI. You can try `--preview` to see
then it is essential to provide the `library_type` to Scallop2. You can try `--preview` to see
the inferred `library_type`.

3. `--min_transcript_coverage` is used to filter lowly expressed transcripts: Scallop-UMI will filter
3. `--min_transcript_coverage` is used to filter lowly expressed transcripts: Scallop2 will filter
out transcripts whose (predicted) raw counts (number of moleculars) is less than this number.

4. `--min_transcript_length_base` and `--min_transcript_length_increase` is combined to filter
Expand Down

0 comments on commit a2f3c1e

Please sign in to comment.