tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.
Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.
There are two fundamental verbs of data tidying:
-
gather()
takes multiple columns, and gathers them into key-value pairs: it makes "wide" data longer. -
spread()
. takes two columns (key & value) and spreads in to multiple columns, it makes "long" data wider.
These verbs have a number of synonyms:
tidyr | gather | spread |
---|---|---|
reshape(2) | melt | cast |
spreadsheets | pivot | unpivot |
databases | fold | unfold |
tidyr also provides separate()
and extract()
functions which makes it easier to pull apart a column that represents multiple variables. The complement to separate()
is unite()
.
tidyr is available from CRAN. Install it with:
install.packages("tidyr")
The development version can be installed using:
# install.packages("devtools")
devtools::install_github("hadley/tidyr")
To get started, read the tidy data vignette (vignette("tidy-data")
) and check out the demos, demo(package = "tidyr")
).
Note that tidyr is designed for use in conjunction with dplyr, so you should always load both:
library(tidyr)
library(dplyr)
If you'd like to learn more about these data reshaping operators, I'd recommend the following papers: