Skip to content

Commit

Permalink
added support for gs2
Browse files Browse the repository at this point in the history
  • Loading branch information
gf777 committed Feb 23, 2021
1 parent b995a6c commit d70009b
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 2 deletions.
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,21 @@ make -j 12

Merfin can be used to assess collapsed or duplicated region of the assembly (`-hist`, `-dump`) or to evaluate variant calls (`-vmer`). QV estimates for all scaffolds will also be generated with `-hist` and `-dump`.

In all cases a haploid/diploid peak estimate must be provided (`-peak`), either from the kmer histogram, or computed using the script `lookup.R` available under `scripts/lookup` (kcov).
In all cases a haploid/diploid peak estimate must be provided (`-peak`), either from the kmer histogram, or computed using [Genomescope 2.0](https://github.com/gf777/genomescope2.0)(kcov).

As a rule of thumb, the `-peak` should be:
- haploid, if the reference used for read mapping and variant calling contains both the primary and the haplotigs, or both haplotypes of a trio
- diploid (i.e. twice the haploid peak), for haploid representations of diploid genomes

Optionally, a custom lookup table of kmer copy numbers with associated multiplicities and probabilities can be provided (`-lookup`). The lookup table is also generated using the script `lookup.R` under `scripts/lookup`. This is recommended, and can significantly improve the accuracy of all analyses.
Optionally, a custom lookup table of kmer copy numbers with fitted multiplicities and probabilities can be provided (`-lookup`). The lookup table is generated when running our modified version of [Genomescope 2.0](https://github.com/gf777/genomescope2.0). This is recommended, and can significantly improve the accuracy of all analyses:

```
Rscript genomescope.R <kmer_histogram> <k_size> <output_folder> --fitted_hist [ploidy] [verbose]
kmer_histogram tab-delimited, 2-column file with (same as for Genomescope2, usually generated by meryl hist, or jellyfish)
k_size kmer length used for the histogram
ploidy haploid/diploid (default = 2)
--fitted_hist generates lookup_table.txt
```

### Assess collapses/duplications ###

Expand Down
2 changes: 2 additions & 0 deletions scripts/lookup_table/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Generate lookup table

--This is discountined, follow the main readme instead--

The script `lookup.R` is based on [Genomescope 2.0](http://qb.cshl.edu/genomescope/genomescope2.0/).

In addition to the canonical Genomescope output it generates fitted read multiplicity values and probabilities (`lookup_table.txt`).
Expand Down
File renamed without changes.

0 comments on commit d70009b

Please sign in to comment.