Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get chromosome bins #53

Open
mictadlo opened this issue Dec 17, 2018 · 6 comments
Open

How to get chromosome bins #53

mictadlo opened this issue Dec 17, 2018 · 6 comments

Comments

@mictadlo
Copy link

mictadlo commented Dec 17, 2018

Hi,
Running LACHESIS in the below way did not provide the expected chromosome numbers because I got 115 groups.

/usr/local/bin/Lachesis lachesis.ini
/LACHESIS/src/bin/CreateScaffoldedFasta.pl QMg_NbQ4P_RN.fasta lachesis

cat lachesis/REPORT.txt provided:

SPECIES = plant
OUTPUT_DIR = lachesis
DRAFT_ASSEMBLY_FASTA = QMg_NbQ4P_RN.fasta
SAM_DIR = /QRISdata/Q0231/lachesis
SAM_FILES = N_Ben_HiC2_rep1.bam N_Ben_HiC4_rep1.bam
RE_SITE_SEQ = GATC
USE_REFERENCE = 0
SIM_BIN_SIZE = 0
REF_ASSEMBLY_FASTA = test_case/hg19/Homo_sapiens_assembly19.fasta
BLAST_FILE_HEAD = test_case/draft_assembly/assembly
DO_CLUSTERING = 1
DO_ORDERING   = 1
DO_REPORTING  = 1
OVERWRITE_GLM = 0
OVERWRITE_CLMS = 0
CLUSTER_N = 19
CLUSTER_CONTIGS_WITH_CENS = -1
CLUSTER_MIN_RE_SITES = 25
CLUSTER_MAX_LINK_DENSITY = 2
CLUSTER_NONINFORMATIVE_RATIO = 3
CLUSTER_DRAW_HEATMAP = 1
CLUSTER_DRAW_DOTPLOT = 1
ORDER_MIN_N_RES_IN_TRUNK = 15
ORDER_MIN_N_RES_IN_SHREDS = 15
ORDER_DRAW_DOTPLOTS = 1
REPORT_EXCLUDED_GROUPS = -1
REPORT_QUALITY_FILTER = 1
REPORT_DRAW_HEATMAP = 1

ReportChart!

Info about input assembly:
DE NOVO ASSEMBLY, with no reference genome (less validation available)
Species: benth
N contigs:      1512            Total length:   2774612304              N50:    4284592
N clusters (derived):   115
N non-singleton clusters:       22
N orderings found:      115


############################
#                          #
#    CLUSTERING METRICS    #
#                          #
############################

Number of contigs in clusters:  1495            (98.88% of all contigs)
Length of contigs in clusters:  2773873324      (99.97% of all sequence length)

+----------+-----------+-------------+
|  CLUSTER | NUMBER OF |  LENGTH OF  |
|  NUMBER  |  CONTIGS  |   CONTIGS   | 
+----------+-----------+-------------+
|      0   |     114   |   285238080 |
|      1   |      97   |   232421157 |
|      2   |     117   |   197340285 |
|      3   |      84   |   187516710 |
|      4   |      80   |   179402476 |
|      5   |      89   |   165376221 |
|      6   |      65   |   157315626 |
|      7   |      80   |   151833938 |
|      8   |      80   |   148910574 |
|      9   |      79   |   140080377 |
|     10   |      88   |   137055451 |
|     11   |      65   |   135577112 |
|     12   |      60   |   133912412 |
|     13   |      70   |   117818930 |
|     14   |      65   |   116531146 |
|     15   |      63   |   102263122 |
|     16   |      28   |    93089711 |
|     17   |      48   |    87456991 |
|     18   |      15   |      964930 |
|     19   |       6   |      294111 |
|     20   |       7   |      283069 |
|     21   |       1   |      239832 |
|     22   |       1   |      145336 |
|     23   |       1   |      104136 |
|     24   |       1   |      101472 |
|     25   |       1   |       94178 |
|     26   |       1   |       77308 |
|     27   |       1   |       67648 |
|     28   |       1   |       67087 |
|     29   |       1   |       64664 |
|     30   |       1   |       59313 |
|     31   |       1   |       59081 |
|     32   |       1   |       57897 |
|     33   |       1   |       53810 |
|     34   |       1   |       50546 |
|     35   |       1   |       49583 |
|     36   |       2   |       48675 |
|     37   |       1   |       48060 |
|     38   |       1   |       44526 |
|     39   |       1   |       39160 |
|     40   |       1   |       37315 |
|     41   |       1   |       35095 |
|     42   |       1   |       32532 |
|     43   |       1   |       29921 |
|     44   |       1   |       28202 |
|     45   |       1   |       26998 |
|     46   |       1   |       26886 |
|     47   |       1   |       26813 |
|     48   |       1   |       26698 |
|     49   |       1   |       26687 |
|     50   |       1   |       26517 |
|     51   |       1   |       26501 |
|     52   |       1   |       26414 |
|     53   |       1   |       26363 |
|     54   |       1   |       26348 |
|     55   |       1   |       26272 |
|     56   |       1   |       26153 |
|     57   |       1   |       26101 |
|     58   |       1   |       26099 |
|     59   |       1   |       26012 |
|     60   |       1   |       25913 |
|     61   |       1   |       25836 |
|     62   |       1   |       25798 |
|     63   |       1   |       25728 |
|     64   |       1   |       25694 |
|     65   |       1   |       25584 |
|     66   |       1   |       25530 |
|     67   |       1   |       25343 |
|     68   |       1   |       25268 |
|     69   |       1   |       25212 |
|     70   |       1   |       25077 |
|     71   |       1   |       24936 |
|     72   |       1   |       24853 |
|     73   |       1   |       24700 |
|     74   |       1   |       24228 |
|     75   |       1   |       23985 |
|     76   |       1   |       23909 |
|     77   |       1   |       23321 |
|     79   |       1   |       23222 |
|     80   |       1   |       23141 |
|     81   |       1   |       23114 |
|     82   |       1   |       22951 |
|     83   |       1   |       22856 |
|     84   |       1   |       22373 |
|     85   |       1   |       22328 |
|     86   |       1   |       22169 |
|     87   |       1   |       20926 |
|     88   |       1   |       20183 |
|     89   |       1   |       19684 |
|     90   |       1   |       19675 |
|     91   |       1   |       19626 |
|     92   |       1   |       19153 |
|     93   |       1   |       18885 |
|     94   |       1   |       18838 |
|     95   |       1   |       18639 |
|     96   |       1   |       18249 |
|     97   |       1   |       18248 |
|     98   |       1   |       18233 |
|     99   |       1   |       18201 |
|    100   |       1   |       18200 |
|    101   |       1   |       18180 |
|    102   |       1   |       18142 |
|    103   |       1   |       17982 |
|    104   |       1   |       17787 |
|    105   |       1   |       17473 |
|    106   |       1   |       17401 |
|    107   |       1   |       17265 |
|    108   |       1   |       16586 |
|    109   |       1   |       16091 |
|    110   |       1   |       16056 |
|    111   |       1   |       15989 |
|    112   |       1   |       15859 |
|    113   |       1   |       15540 |
|    114   |       1   |       15213 |
+----------+-----------+-------------+
|   TOTAL  |    1495   |  2773873324 |
+----------+-----------+-------------+


############################
#                          #
#     ORDERING METRICS     #
#                          #
############################


Number of contigs in orderings: 0               (0% of all contigs in clusters, 0% of all contigs)
Length of contigs in orderings: 0       (0% of all length in clusters, 0% of all sequence length)
Number of contigs in trunks:    0               (-nan% of contigs in orderings)
Length of contigs in trunks:    0       (-nan% of length in orderings)

Fraction of contigs in orderings with high orientation quality: 0 (-nan%), with length 0 (-nan%)
Fraction of contigs in trunks    with high orientation quality: 0 (-nan%), with length 0 (-nan%)

How am I able to the expected 19 chromosomes?

Thank you in advance,

Michal

@JingaJenga
Copy link
Member

Hi Michal,

Thanks for your e-mail, and for your interest in the LACHESIS software! The first thing I should mention is that LACHESIS is no longer being actively developed or maintained, as stated on the Github front page. I recommend you take a look at the Juicer software from the Aiden lab (https://github.com/theaidenlab), a more recently developed and actively maintained piece of code that serves roughly the same purpose. Also, if you want a research kit that will ensure high-quality Hi-C results, I suggest contacting the folks at Phase Genomics (https://phasegenomics.com/).

As for your concern about 19 chromosomes: As stated in the original paper, LACHESIS can predict roughly, but not precisely, the number of chromosomes in the assembly. Your assembly actually shows a pretty steep drop-off in size after the first 19 scaffolds (#0-#18). This suggests that LACHESIS has correctly picked up on intra-chromosomal signals; even in the absence of external information, you could have estimated roughly 19 chromosomes from the scaffold sizes. I suggest you interpret the 19 largest scaffolds as roughly equivalent to the 19 chromosomes, with some possible noisiness around the merge (cluster #18 in particular is borderline in size.) The other, smaller scaffolds are likely true chromosomal sequence that should have been merged into scaffolds #0-#18 but LACHESIS did not see a strong enough signal to make that merge. Note that the combined length of scaffolds #19-#114 is only 57 Mb.

-- Josh

@mictadlo
Copy link
Author

mictadlo commented Dec 19, 2018

Hi Josh,
Thank you for your explanation. By any chance, do you know why none of the contigs have been ordered?

@JingaJenga
Copy link
Member

I'm not sure. The clusters are pretty large, so there should be enough signal to order them. Either there is a severe lack of Hi-C link density, or some of your assembly files might have been created incompletely. Try setting OVERWRITE_CLMS = 1.

@mictadlo
Copy link
Author

mictadlo commented Jan 2, 2019

Hi Josh,
I wish you a Happy New Year. Now, I created the BAM files with bwa mem -5SP [assembly.fasta] [fwd_hic.fastq] [rev_hic.fastq] | samblaster | samtools view -S -h -b -F 2316 > [aligned.bam] as recommended by phasegenomics. This has reduced the amount of clusters from 115 to 20.

ReportChart!

Info about input assembly:
DE NOVO ASSEMBLY, with no reference genome (less validation available)
Species: benth
N contigs:      1512            Total length:   2774612304              N50:    4284592
N clusters (derived):   20
N non-singleton clusters:       20
N orderings found:      20


############################
#                          #
#    CLUSTERING METRICS    #
#                          #
############################


Number of contigs in clusters:  1495            (98.88% of all contigs)
Length of contigs in clusters:  2773948172      (99.98% of all sequence length)

+----------+-----------+-------------+
|  CLUSTER | NUMBER OF |  LENGTH OF  |
|  NUMBER  |  CONTIGS  |   CONTIGS   | 
+----------+-----------+-------------+
|      0   |     207   |   304822244 |
|      1   |     104   |   251236598 |
|      2   |     103   |   215915806 |
|      3   |      85   |   185990618 |
|      4   |      96   |   185821186 |
|      5   |     137   |   169943199 |
|      6   |      79   |   169694706 |
|      7   |      87   |   160635652 |
|      8   |      80   |   155356232 |
|      9   |      80   |   128553045 |
|     10   |      59   |   121698875 |
|     11   |      53   |   120471892 |
|     12   |      62   |   114055062 |
|     13   |      45   |   105996889 |
|     14   |      53   |   105077856 |
|     15   |      57   |    88736847 |
|     16   |      44   |    76993531 |
|     17   |      30   |    68241346 |
|     18   |      28   |    44391277 |
|     19   |       6   |      315311 |
+----------+-----------+-------------+
|   TOTAL  |    1495   |  2773948172 |
+----------+-----------+-------------+

Unfortunately, they are not ordered and oriented:

Number of contigs in orderings: 0               (0% of all contigs in clusters, 0% of all contigs)
Length of contigs in orderings: 0       (0% of all length in clusters, 0% of all sequence length)
Number of contigs in trunks:    0               (-nan% of contigs in orderings)
Length of contigs in trunks:    0       (-nan% of length in orderings)

Fraction of contigs in orderings with high orientation quality: 0 (-nan%), with length 0 (-nan%)
Fraction of contigs in trunks    with high orientation quality: 0 (-nan%), with length 0 (-nan%)

I also tried OVERWRITE_CLMS = 1 without any success. Is it possible that this could be caused by the below files which were created outside the out folder?

-rw-r--r--  1 1032814217 root  24K Jan  2 03:39 QMg_NbQ4P_RN.fasta.counts_GATC.txt
-rw-r--r--  1 1032814217 root  17K Jan  2 04:13 QMg_NbQ4P_RN.fasta.names
-rw-r--r--  1 1032814217 root  102 Jan  2 05:46 heatmap.chrom_breaks.txt
-rw-r--r--  1 1032814217 root    6 Jan  2 05:46 heatmap.txt

Thank you in advance,

Michal

@baozg
Copy link

baozg commented Apr 29, 2019

@mictadlo I have same trouble with ordering? Did you have solved it ?

@jazberna1
Copy link

Hi,

I had the same issue, no contig ordering at all. I then found that my sam file was not ordered by read name.

Jorge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants