Skip to content

Commit

Permalink
Playing around with the 1D PG-SGD parameters
Browse files Browse the repository at this point in the history
  • Loading branch information
subwaystation committed Feb 1, 2024
1 parent a19163e commit 360edff
Show file tree
Hide file tree
Showing 6 changed files with 70 additions and 3 deletions.
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@
# -- Project information -----------------------------------------------------

project = u'odgi'
copyright = '2020-2024, *Guarracino A., *Heumos S., Nahnsen S., Prins P., Garrison E. Revision v0.8.4-d7ef5c6b'
copyright = '2020-2024, *Guarracino A., *Heumos S., Nahnsen S., Prins P., Garrison E. Revision v0.8.4-a19163ea'
author = u'*Andrea Guarracino, *Simon Heumos, Sven Nahnsen, Pjotr Prins, Erik Garrison'

# The short X.Y version
version = 'v0.8.4'
# The full version, including alpha/beta/rc tags
release = 'd7ef5c6b'
release = 'a19163ea'


# -- General configuration ---------------------------------------------------
Expand Down
Binary file added docs/img/DRB1-3123_sorted.U1000.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/DRB1-3123_sorted.j10000.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/DRB1-3123_sorted.x2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/rst/tutorials/exploratory_analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Color with respect to the node position

This is a linearized visualization, but the pangenome graphs are not linear when the embedded genomes present structural
variation. However, a graph can be optimized for being better visualized in 1-Dimension by sorting its nodes properly
(see the :ref:`sorting-layouting` tutorial for more information).
(see the :ref:`sort-layout` tutorial for more information).

To color the bars with respect to the node position in each path, execute:

Expand Down
67 changes: 67 additions & 0 deletions docs/rst/tutorials/sort_layout.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,73 @@ This prints to stdout:
Compared to before, these metrics show that the goodness of the sorting of the graph improved significantly.

--------------------------------------------
Playing around with the 1D PG-SGD parameters
--------------------------------------------

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What happens if the maximum number of iterations is very low?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash
odgi sort -i DRB1-3123_unsorted.og --threads 2 -P -Y -x 2 -o DRB1-3123_sorted.x2.og
odgi viz -i DRB1-3123_sorted.x2.og -o DRB1-3123_sorted.x2.png
.. image:: /img/DRB1-3123_sorted.x2.png

The graph appears very complex and not quite human readable. That's because in total there were two times the number
of total path steps node position updates instead of one hundred times the number of total path steps, which is the current default.
For very complex graphs, one may have to increase this number even further.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What happens if the minimum number of term updates is very high?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash
odgi sort -i DRB1-3123_unsorted.og --threads 2 -P -Y -U 1000 -o DRB1-3123_sorted.U1000.og
odgi viz -i DRB1-3123_sorted.U1000.og -o DRB1-3123_sorted.U1000.png
.. image:: /img/DRB1-3123_sorted.U1000.png

The graph lost it's complexity and is now linear. Compared to the 1D visualization using the default parameters, it is hard
to spot any differences. So let's take a look at the metrics:

.. code-block:: bash
odgi stats -i DRB1-3123_sorted.U1000.og -s -d -l -g
This prints to stdout:

.. code-block:: bash
#mean_links_length
path in_node_space in_nucleotide_space num_links_considered num_gap_links_not_penalized
all_paths 1.00361 8.30677 21870 15195
#sum_of_path_node_distances
path in_node_space in_nucleotide_space nodes nucleotides num_penalties num_penalties_different_orientation
all_paths 3.23238 3.73489 21882 163416 3750 1
We actually were able to improve the metrics compared to using default parameters. However, the runtime increased from under 1 second to ~30 seconds.
So one needs to be careful with such a parameter. Compared to the gains in linearity, such an additional time usage would be a huge
waste with very large graphs.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What happens if the threshold of the maximum distance of two nodes is very high?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash
odgi sort -i DRB1-3123_unsorted.og --threads 2 -P -Y -j 10000 -o DRB1-3123_sorted.j10000.og
odgi viz -i DRB1-3123_sorted.j10000.og -o DRB1-3123_sorted.j10000.png
.. image:: /img/DRB1-3123_sorted.j10000.png

The graph appears very complex and not quite human readable. That's because the iterations are terminated as soon as the
expected distance of two nodes, the nucleotide distance given by two randomly chosen path steps, is as close as 10000.
Naturally, this happens very soon.

=========================================================
1D reference-guided grooming and reference-guided sorting
=========================================================
Expand Down

0 comments on commit 360edff

Please sign in to comment.