forked from jokergoo/circlize_book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path08-genomics-introduction.Rmd
64 lines (49 loc) · 2.55 KB
/
08-genomics-introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
```{r, echo = FALSE}
library(circlize)
```
# (PART) Applications in Genomics {-}
# Introduction {#genomic-introduction}
Circular visualization is popular in Genomics and related omics fields. It is
efficient in revealing associations in high dimensional genomic data. In genomic
plots, categories are usually chromosomes and data on x axes are genomic
positions, but it can also be any kind of general genomic categories.
To make is easy for Genomics analysis, **circlize** package particularly
provides functions which focus on genomic plots. These functions are
synonymous to the basic graphic functions but expect special format of input
data:
- `circos.genomicTrack()`: create a new track and add graphics.
- `circos.genomicPoints()`: low-level function, add points.
- `circos.genomicLines()`: low-level function, add lines or segments.
- `circos.genomicRect()`: low-level function, add rectangles.
- `circos.genomicText()`: low-level function, add text.
- `circos.genomicLink()`: add links.
The genomic functions are implemented by basic circlize functions (e.g.
`circos.track()`, `circos.points()`), thus, the use of genomic functions can
be mixed with the basic circlize functions.
## Input data {#input-data}
Genomic data is usually stored as a table where the first three columns
define the genomic regions and following columns are values associated with
the corresponding regions. Each genomic region is composed by three elements:
genomic category (in most case, it is the chromosome), start position on the
genomic category and the end position. Such data structure is known as
[_BED_](https://genome.ucsc.edu/FAQ/FAQformat#format1)
format and is broadly used in genomic research.
**circlize** provides a simple function `generateRandomBed()` which generates
random genomic data. Positions are uniformly generated from human genome and
the number of regions on chromosomes approximately proportional to the length
of chromosomes. In the function, `nr` and `nc` control the number of rows and
numeric columns that users need. Please note `nr` are not exactly the same as
the number of rows which are returned by the function. `fun` argument is a
self-defined function to generate random values.
```{r}
set.seed(999)
bed = generateRandomBed()
head(bed)
bed = generateRandomBed(nr = 200, nc = 4)
nrow(bed)
bed = generateRandomBed(nc = 2, fun = function(k) sample(letters, k, replace = TRUE))
head(bed)
```
All genomic functions in **circlize** expect input variable as a data frame
which contains genomic data or a list of data frames which contains genomic
data in different conditions.