Skip to content

Latest commit

 

History

History
84 lines (62 loc) · 4.24 KB

README.md

File metadata and controls

84 lines (62 loc) · 4.24 KB

Syngraph

A toolkit for evolutionary analyses of linkage groups

Dependencies

Best addressed via conda

$ conda install pandas docopt ete3 pygraphviz matplotlib tqdm networkx=2.4 numpy=1.20.3=py38h9894fe3_0 python=3.8 more-itertools

Usage

Usage: syngraph <module> [<args>...] [-D -V -h]

  [Modules]
    build               Build graph from orthology data (e.g. BUSCO *.full_table.tsv)
    infer               Model rearrangements over a tree
    tabulate            Get table of extant and ancestral genomes
    viz                 Visualise graph/data [Under development]
    
  [Options]
    -h, --help          Show this screen.
    -D, --debug         Print debug information [TBI]
    -v, --version       Show version

  [Dependencies] 
    ---------------------------------------------------------------------------------------------
    | $ conda install -c conda-forge networkx=2.4 pandas docopt tqdm ete3 pygraphviz matplotlib |
    ---------------------------------------------------------------------------------------------

Build a syngraph from BUSCO data, allowing for missingness

syngraph build -d directory_of_tsv_files -m -o test

Model fissions and fusions over a tree, record rearrangements using taxon_1 as a reference

syngraph infer -g test.pickle -t newick.txt -r 2 -s taxon_1 -o test

Model translocations, fissions and fusions over a tree

syngraph infer -g test.pickle -t newick.txt -r 3 -s taxon_1 -o test

Tabulate extant and inferred genomes

syngraph tabulate -g test.with_ancestors.pickle -o test

Input data

Input data should only contain markers from chromosome-scale sequences as unscaffolded contigs will result in excess fission events being inferred.

If using BUSCO data, tsv files should be named My_taxon.\*.tsv where My_taxon is also a leaf in the newick tree. Each row should contain the BUSCO_ID, sequence, start position, and end position. These can be grepped from the *full_table.tsv file generated by BUSCO (Busco_id, Sequence, Gene_Start, Gene_End). E.g.:

0at7088 HG995313.1      5723272 5863707
1at7088 HG995286.1      19966914        20084934
2at7088 HG995296.1      11128843        11215510

Inferring rearrangements

After building a syngraph, inter-chromosomal rearrangements can be inferred with syngraph infer. This requires a newick tree relating the taxa in the analysis. Branch lengths are used by syngraph but this only influences how the tree is traversed, so approximate branch lengths are fine.

The -r option sets the inference mode, 2 for fissions and fusions, and 3 for fissions, fusions, and reciprocal translocations (which is currently experimental).

The -m option sets the minimum number of markers that can be involved in a rearrangement. Setting -m 1 will mean that a rearrangement will be reported when a single marker 'moves' between chromosomes. By contrast, setting higher values, e.g. -m 100, will mean that chromosome fissions or sets of complex rearrangements will be missed. A reasonable starting point is -m 5 although this may need to be adjusted given the density of markers, size of chromosomes, and accuracy of marker orthology.

The most useful output file is *.rearrangements.tsv. This lists rearrangements inferred over the tree. The branch of the tree where a rearrangement happened is denoted by its parent and child nodes. The event is reported as fission/fusion/translocation. Multiplicity is the number of events. This is normally 1, but can be more if a chromosome has fissioned into mutliple fragements. The last column is ref_seqs, and shows which chromosomes are involved in the rearrangement given an extant genome, an inferred ancestral genome, or a predefined list of marker --> chromosome relationships.

#parent child   event   multiplicity    ref_seqs
n7      Brenthis_ino    fusion  1       [['n5_2', 'n5_17'], ['n5_20']]
n5      n7      fusion  1       [['n5_6'], ['n5_19']]

Help

Syngraph is still under active development. Please open an issue if you have any questions about running the software or interpreting your results.

Cite

If you use syngraph in your research then please cite this preprint.