forked from SurgicalInformatics/healthyr_book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
14_version_control.Rmd
133 lines (92 loc) · 7.59 KB
/
14_version_control.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# Version control{#chap14-h1}
\index{version control@\textbf{version control}}
> Always be a first rate version of yourself and not a second rate version of someone else.
> Judy Garland
Version control is essential for keeping track of data analysis projects, as well as collaborating.
It allows backup of scripts and collaboration on complex projects.
RStudio works really well with Git, an open source distributed version control system, and GitHub, a web-based Git repository hosting service.
\index{version control@\textbf{version control}!Git and GitHub}
Git is a piece of software which runs locally.
It may need to be installed first.
It is important to highlight the difference between Git (local version control software) and GitHub (a web-based repository store).
## Setup Git on RStudio and associate with GitHub
In RStudio, go to Tools -> Global Options and select the **Git/SVN** tab.
Ensure the path to the Git executable is correct.
This is particularly important in Windows where it may not default correctly (e.g., `C:/Program Files (x86)/Git/bin/git.exe`).
## Create an SSH RSA key and add to your GitHub account
\index{version control@\textbf{version control}!SSH RSA key}
In the **Git/SVN** tab, hit *Create RSA Key* (Figure \@ref(fig:chap14-fig-ssh)A).
In the window that appears, hit the *Create* button (Figure \@ref(fig:chap14-fig-ssh)B).
Close this window.
Click, *View public key* (Figure \@ref(fig:chap14-fig-ssh)C), and copy the displayed public key (Figure \@ref(fig:chap14-fig-ssh)D).
If you haven’t already, create a GitHub account.
On the GitHub website, open the account settings tab and click the SSH keys tab (Figure \@ref(fig:chap14-fig-github)A).
Click *Add SSH key* and paste in the public key you have copied from RStudio Figure \@ref(fig:chap14-fig-github)B).
```{r chap14-fig-ssh, echo = FALSE, fig.cap="Creating an SSH key in RStudio's Global Options."}
knitr::include_graphics("images/chapter14/1.png", auto_pdf = TRUE)
```
```{r chap14-fig-github, echo = FALSE, fig.cap="Adding your RStudio SSH key to your GitHub account."}
knitr::include_graphics("images/chapter14/2.png", auto_pdf = TRUE)
```
## Create a project in RStudio and commit a file
\index{version control@\textbf{version control}!Commit}
Next, return to RStudio and configure Git via the **Terminal** (Figure \@ref(fig:chap14-fig-globalsettings)A)).
Remember Git is a piece of software running on your own computer.
This is distinct to GitHub, which is the repository website.
We will now create a new project which we want to backup to GitHub.
In RStudio, click *New project* as normal (Figure \@ref(fig:chap14-fig-globalsettings)B).
Click *New Directory*.
Name the project and check *Create a git repository*.
Now in RStudio, create a new script which you will add to your repository.
After saving your new script (e.g., test.R), it should appear in the **Git** tab beside **Environment**.
Tick the file you wish to add, and the status should turn to a green ‘A’ (Figure \@ref(fig:chap14-fig-globalsettings)C).
Now click *Commit* and enter an identifying message in Commit message (Figure \@ref(fig:chap14-fig-globalsettings)D).
It makes sense to do this prior to linking the project and the GitHub repository, otherwise you'll have nothing to push to GitHub.
You have now committed the current version of this file to a Git repository on your computer/server.
```{r chap14-fig-globalsettings, echo = FALSE, fig.cap="Configuring your GitHub account via RStudio, creating a new project, commiting a script and pushing it to GitHub."}
knitr::include_graphics("images/chapter14/4.png", auto_pdf = TRUE)
```
## Create a new repository on GitHub and link to RStudio project
\index{version control@\textbf{version control}!repository}
Now you may want to push the contents of this commit to GitHub, so it is also backed-up off site and available to collaborators.
As always, you must exercise caution when working with sensitive data.
Take steps to stop yourself from accidentally pushing whole datasets to GitHub.^[It's fine to push some data to GitHub, especially if you want to make it publicly available, but you should do so consciously, not accidentally.]
You only want to push R code to GitHub, not the (sensitive) data.
When you see a dataset appear in the Git tab of your RStudio, select it, then click on More, and then Ignore.
This means the file does not get included in your Git repository, and it does not get pushed to GitHub.
GitHub is not for backing up sensitive datasets, it's for backing up R code.
And make sure your R code does not include passwords or access tokens.
In GitHub, create a *New repository*, called here myproject (Figure \@ref(fig:chap14-fig-newrepo)A).
You will now see the *Quick setup* page on GitHub.
Copy the code below *push an existing repository from the command line* (Figure \@ref(fig:chap14-fig-newrepo)B).
```{r chap14-fig-newrepo, echo = FALSE, fig.cap="Create a new repository (repo) on GitHub."}
knitr::include_graphics("images/chapter14/5.png", auto_pdf = TRUE)
```
Back in RStudio, paste the code into the **Terminal**.
Add your GitHub username and password (important!) (Figure \@ref(fig:chap14-fig-link)A).
You have now pushed your commit to GitHub, and should be able to see your files in your GitHub account.
\index{version control@\textbf{version control}!pull and push}
The **Pull** and **Push** buttons in RStudio will now also work (Figure \@ref(fig:chap14-fig-link)B).
To avoid always having to enter your password, copy the SSH address from GitHub and enter the code shown in Figure \@ref(fig:chap14-fig-link)C and D.
Check that the **Pull** and **Push** buttons work as expected (Figure \@ref(fig:chap14-fig-link)E).
Remember, after each Commit, you have to Push to GitHub, this doesn't happen automatically.
```{r chap14-fig-link, echo = FALSE, fig.cap="Linking an RStudio project with a GitHub repository."}
knitr::include_graphics("images/chapter14/6.png", auto_pdf = TRUE)
```
## Clone an existing GitHub project to new RStudio project
\index{version control@\textbf{version control}!clone}
An alternative situation is where a project already exists on GitHub and you want to copy it to an RStudio project. In version control world, this is called **cloning**.
In RStudio, click *New project* as normal. Click *Version Control* and select *Git*.
In *Clone Git Repository*, enter the GitHub repository URL as per Figure \@ref(fig:chap14-fig-clone)C.
Change the project directory name if necessary.
As above, to avoid repeatedly having to enter passwords, follow the steps in Figure \@ref(fig:chap14-fig-clone)D and E.
```{r chap14-fig-clone, echo = FALSE, fig.cap="Clone a GitHub repository to an RStudio project."}
knitr::include_graphics("images/chapter14/7.png", auto_pdf = TRUE)
```
## Summary
If your project is worth doing, then it is worth backing up!
This means you should use version control for every single project you are involved in.
You will quickly be surprised at how many times you wish to go back and rescue some deleted code, or to restore an accidentally deleted file.
It becomes even more important when collaborating and many individuals may be editing the same file.
As with the previous chapter, this is an area which data scientists have discovered much later than computer scientists.
Get it up and running in your own workflow and you will reap the rewards in the future.