-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
51 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Paths and Accessions | ||
|
||
Graphs have particular challenges when it comes to referring to regions within one. For example, take the following | ||
chromosome and a simple change to it: | ||
|
||
```mermaid | ||
flowchart LR | ||
A[ATCG] --> B[AAA] --> C[TTT] | ||
A --> D[A] --> C | ||
``` | ||
|
||
If we specified an insertion at position 6, would it refer to `ATCGA_A_` or `ATCGA_T_`? Paths offer a mechanism | ||
to provide a linear sequence through a graph. Paths are an ordered series of edges that provide a mechanism for | ||
addressing the space within a graph. For example, we could assign `Chr1` to the graph in the following manner: | ||
|
||
```mermaid | ||
flowchart LR | ||
A[ATCG] -->|chr1| B[AAA] -->|chr1| C[TTT] | ||
A --> D[A] --> C | ||
``` | ||
|
||
and thus provide a reference frame for modifications. On fasta import, each header automatically becomes a defined path, | ||
so `Chr1` and so on will be automatically assigned. | ||
|
||
## Accessions | ||
|
||
When inserting changes to a genome, we may want to refer to the modifications. For example, in the above change, we have | ||
no way to refer to the `A` node. Accessions allow us to provide an addressable subspace within the graph. This is | ||
especially useful for a stable reference to a sub-sequence such as a gene of interest or a landing pad, as iterative | ||
changes to the genome may alter its relative position. | ||
|
||
To illustrate this, suppose we are inserting a small landing pad into a genome. | ||
|
||
```mermaid | ||
flowchart LR | ||
A[ATCG] -->|chr1| B[AAA] -->|chr1| C[TTT] | ||
A -->|landing-pad| D[CCC] -->|landing-pad| C | ||
``` | ||
|
||
Now, suppose in a downstream sample we insert a sequence before the landing pad. | ||
|
||
```mermaid | ||
flowchart LR | ||
E[GGG] -->|chr1| A | ||
A[ATCG] -->|chr1| B[AAA] -->|chr1| C[TTT] | ||
A -->|landing-pad| D[CCC] -->|landing-pad| C | ||
``` | ||
|
||
While chr1's length has changed, `landing-pad`'s coordinates have not. This permits operations to simply refer to | ||
`landing-pad` instead of knowing the chromosomal coordinates of the landing pad. |