forked from GeoScripting-WUR/RProjectManagement
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.Rmd
450 lines (277 loc) · 41.1 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
---
pagetitle: "Linux & version control"
author: "Loïc Dutrieux, Jan Verbesselt, Johannes Eberenz, Dainius Masiliūnas"
date: "`r format(Sys.time(), '%Y-%m-%d')`"
output:
rmdformats::html_clean:
highlight: zenburn
---
```{css, echo=FALSE}
@import url("https://netdna.bootstrapcdn.com/bootswatch/3.0.0/simplex/bootstrap.min.css");
.main-container {max-width: none;}
div.figcaption {display: none;}
pre {color: inherit; background-color: inherit;}
code[class^="sourceCode"]::before {
content: attr(class);
display: block;
text-align: right;
font-size: 70%;
}
code[class^="sourceCode r"]::before { content: "R Source";}
code[class^="sourceCode python"]::before { content: "Python Source"; }
code[class^="sourceCode bash"]::before { content: "Bash Source"; }
```
<font size="6">[WUR Geoscripting](https://geoscripting-wur.github.io/)</font> <img src="https://www.wur.nl/upload/854757ab-168f-46d7-b415-f8b501eebaa5_WUR_RGB_standard_2021-site.svg" alt="WUR logo" style="height: 35px; margin:inherit;"/>
# Course setup
```{block type="alert alert-info"}
**Important**: The Geoscripting course is a Master-level course given in Wageningen University. This set of documents that you are reading provide the theoretical material from the course for use both in the course itself, as well as for people who are following (parts of) the course externally or are in general interested in the topics that we cover. As such, these documents aim to be generic for all of the user groups above.
If you are a student following the course at Wageningen University (WUR), **please read** the information in the course guide in Teams and on [Brightspace](https://brightspace.wur.nl). All course-specific information and exercises can be found there. Information in the course guide overrules any information written in these pages, so **please read it carefully** and **check it often**. You will also find all the information on deliverables and exercises there.
```
# Linux & version control
## Introduction
Welcome to the Geoscripting course! Today we will get familiar with Linux, which is an advanced environment optimised for scripting, and with version control software that helps you collaborate with one another and keep track of your file versions. These tools are very important, as we will use them throughout the course for all course activities, and they will continue to be very useful after the end of the course for all your scripting work. Additionally you will learn about project structure, and familiarize yourself with RStudio.
```{block type="alert alert-info"}
Throughout the whole course, we will be working in a Linux environment, and all of **the material has only been tested on (and assumes) a Linux environment**. Every WUR student will get access to a Linux virtual machine.
```
## Learning objectives
At the end of the tutorial, you should be able to:
* Know what Linux is & what you can do with it
* Get comfortable working within a Linux environment
* Explain why software licenses are important and what software license options there are
* Apply a software license to your own code
* Use version control to develop, maintain, and share your code with others
* Set up a project structure
* Get familiar with (relative) paths
* Submit an exercise using Git and GitLab
# Linux
*Linux* is a free and open-source operating system kernel. The kernel interacts with computer hardware and exposes its capabilities for your scripts! Together with a lot of small, handy programs, it forms an *operating system* called *GNU/Linux*. However, unlike e.g. Windows, there is not a single "GNU/Linux operating system". Rather, there is a [huge variety](https://en.wikipedia.org/wiki/List_of_Linux_distributions) of Linux distributions. Each Linux distribution provides the same kernel, but different programs and environments, suitable for different use cases.
For example, one distribution that is very handy for geo-information science work is OSGeo-Live, which is an Ubuntu-based *Linux distribution* that has a wide range of free and open-source GIS and Remote sensing tools preinstalled.
See [this website](http://live.osgeo.org/) for more information.
These tools are also available in other distributions, but they have to be installed manually. A general-use distribution such as *Ubuntu* itself, *openSUSE* or *Fedora* is more suitable for regular day-to-day tasks, since not having the unnecessary tools installed takes less space and makes it work faster. It is also easier to find help for them than for specialised distributions.
For the Geoscripting course, we have developed what is effectively our very own Linux distribution, with the use case of providing all of the tools necessary to finish the course. These tools are also very useful after the end of the course to continue data processing, for example for writing Master theses. Within our laboratory, we have several computers that are running this Geoscripting distribution, so that transferring over the work from one computer running it to another one would be as easy as possible, so you can continue working uninterrupted even after the end of the course. The Geoscripting distribution is nothing more than a set of scripts that install the necessary tools on top of what plain Ubuntu provides.
## Why use a Linux distribution?
A Linux environment makes it much easier to install and combine a variety of open-source software, such as Python modules and GDAL, compared to other operating systems like Windows or macOS. In addition, open-source scientific software is often developed primarily for Linux (since that's what most supercomputers and servers run!), and so it tends to be more stable and have more features on Linux. Lastly, Linux has a set of standards that allow programs to interoperate with each other, so that e.g. you can access GRASS GIS from R, QGIS from Python, GRASS GIS from QGIS, Python from R etc. All of this is managed and checked for quality so that you can always use the latest and greatest software without worrying about version mismatches and compatibility between software tools.
For the course, it also makes it possible to use the wide variety of tools that we will work with, all from a single supported environment. That way, we can be sure that the tools work the same way for all of the students, and that we also test the exercise submissions using the same versions of the tools to get the same output.
## Getting started on Linux
During the course we will work in a Linux environment. **See the [Linux system setup](../system_setup/index.html) page on how to install and run the Linux virtual machine on your own computer.** The page also explains how to run Linux from a USB stick in case you don't have enough space for a virtual machine.
```{block, type="alert alert-danger"}
**Notice**: Make sure you **read the page linked above** and have no problems logging into and using the VM. From here on out, we will try to work from within the VM exclusively.
```
```{block, type="alert alert-info"}
**In case you can't get the VM running successfully** (and **only** in that case, so hopefully you don't need to do this!), there is an alternative: we have the possibility of providing access to a SURF Research Cloud VM setup. [See this page for instructions on gaining access to the SURF Research Cloud VM](../Intro2Linux/surfsara_tutorial.html).
If you are a power user and want to install Linux on your own laptop directly to have it run at full performance, see also a [theoretical overview of running Linux on your own hardware](../Intro2Linux/installation.html).
**The VMs are strongly recommended**. If you go for installing Linux yourself, the systems need to be set up manually and we do not have the time and manpower to support every student with this.
```
Once you have everything ready, login into your Linux VM, try out RStudio/RKWard, and also open QGIS. Explore the environment a little to get used to it.
# Software licenses
One key advantage of Linux is that it is free and open-source software. While it is free as in free beer, that is, it can be used at no cost, more importantly it is free as in free speech: all of the source code of the kernel and the absolute majority of the applications is licensed under a free software license.
A *software license* is a legal text that describes how the software and its source code can be used by other people. Software licenses are grounded in the framework of copyright: the protection of authors' intellectual rights. A *free* software license is a software license that gives others the freedom to run, copy, read, modify, and distribute changes to the original software and its source code. This is in addition to an *open-source* license, which makes the source code available and redistributable, but [does not necessarily make the source code free](https://www.gnu.org/philosophy/open-source-misses-the-point.html). Both free an open-source licenses have their overseeing bodies: the [Free Software Foundation](https://www.gnu.org/philosophy/free-sw.html) for free software licenses, and the [Open Source Initiative](https://opensource.org/osd) for open-source licenses. When a software fits both definitions (they often overlap), it is referred to as Free and Open-Source Software (FOSS), or les often as Free, Libre and Open-Source Software (FLOSS).
There are many advantages to FOSS. One advantage is that it fosters collaboration: one person implementing a feature makes it available for all of the users in the world. This enables such a massive effort required to create GNU/Linux distributions based on volunteer work, without needing to rely on commercial licensing, advertisements, donations or spyware to finance the work. It also allows anyone to remove such undesired parts of any software component, therefore ensuring higher quality of the software. Thus, while FOSS projects initially start weaker than *proprietary* (non-free or closed-source) software, in the long run the collaboration potential brings it on par and even overtaking the propriatary counterparts. See for example QGIS, which is FOSS, vs the proprietary ArcGIS.
A software license defines what others can do with your code, therefore before starting to write any code, **it is vital to think about the license** you would like to release your code under. This is because if you do not define any license, the default copyright terms apply: even if you publish the source code publicly, nobody is allowed to copy, redistribute or modify the code, in fact nobody is even allowed to read it! As an author, you are free to choose any license, both proprietary and FOSS licenses (or in fact no license altogether), but a proprietary license restricts the freedoms of others and therefore diminishes chances that others would want to collaborate with you to improve the code in the future. In addition, do not confuse a *software license* with *commercial licensing*, i.e. the requirement to activate a license subscription to use
There are two types of FOSS licenses: copyleft and permissive. A *permissive* license is one that allows copying, modifying and redistributing the code with no serious restrictions (usually with a restriction that the original author be credited for the work). A *copyleft* license adds a restriction that any modified versions that are distributed must be under the same (or equivalent) license. This restriction restricts others from restricting the terms of the software license in the future, therefore keeping the source code free forever. In other words, it's following the philosophy that if we want to achieve the most freedom, we must restrict the freedom to restrict freedom!
Lastly, there is also an option to *dedicate software to the public domain*, which is not a license per se, but a waiver of copyright. Software in the public domain allows anyone to do anything with it without any restrictions, therefore it is radically permissive. There is no requirement to credit the original author, for example. Since some jurisdictions do not allow authors to waive copyright (including Germany, France and Italy), there are licenses such as [CC0](https://creativecommons.org/share-your-work/public-domain/cc0/) that are aimed to make a work as free as possible by either dedicating it to the public domain, or if it is not possible, by giving it a permissive license.
How can you choose a software license in practice? There are [multiple](https://choosealicense.com/) [websites](https://www.gnu.org/licenses/license-recommendations.en.html) that give an overview of the most popular licenses that you can choose. Once you choose one, you need to follow the terms of the license about how to apply it. In most cases, it is sufficient to copy the terms of the license next to your source code and include it in your version control repository.
```{block, type="alert alert-success"}
> **Question 1**: If you wanted to contribute to a project that is licensed under the GNU General Public License v3 (copyleft), under which license(s) could you contribute? Which license would you choose in the end?
```
# Version control
Have you ever worked on a project and ended up having so many versions of your work that you didn't know which one was the latest, and what were the differences between the versions? Does the image below look familiar to you? Then you need to use version control (also called revision control). You will quickly understand that although it is designed primarily for big software development projects, being able to work with version control can be very helpful for scientists as well.
<center>
![file name](figs/fileNames.png)
</center>
The video below explains some basic concepts of version control and what the benefits of using it are.
<center>
<iframe src="https://player.vimeo.com/video/41027679" width="500" height="300" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
<p><a href="https://vimeo.com/41027679">What is VCS? (Git-SCM) • Git Basics #1</a> from <a href="https://vimeo.com/github">GitHub</a> on <a href="https://vimeo.com">Vimeo</a>.</p>
</center>
So to sum up, version control allows to keep track of:
* When you made changes to your files
* Why you made these changes
* What you changed
Additionally, version control:
* Facilitates collaboration with others
* Allows you to keep your code archived in a safe place (the cloud)
* Allows you to go back to previous version of your code
* Allows you to find out what changes broke your code
* Allows you to have experimental branches without breaking your code
* Allows you to keep different versions of your code without having to worry about file names and archiving organization
```{block, type="alert alert-success"}
> **Question 2**: Think of examples where you could use version control for things other than code.
```
The three most popular version control software are **Git**, **Mercurial** (abbreviated as hg) and **Subversion** (abbreviated as svn). *Git* is by far the most modern and popular one, so we will only use *Git* in this course.
## Git
<img src="figs/Git_logo.png" alt="git" style="width: 80px"/>
### What git does
**Git** keeps track of changes in a **local repository** you set up on your computer. Typically that is a directory that contains all your code and optionally the data your code needs in order to run. The local repository contains all your files, but also (in a hidden directory) all the changes to the files you have made. It does not keep track of all files automatically: you need to tell git which files to track and which not. Therefore a repository contains your current tracked files (**workspace**), an **index** of files that are being tracked, and the version history.
Every time you make significant changes to the files in your workspace, you have to **add** the changed files to the index, which selects the files whose changes you want to save, and **commit** them, which means saving the changes to the history tracking of your local repository.
Often you also setup a **remote repository**, stored on an online platform like [GitHub](https://github.com/), [GitLab](https://gitlab.com) or others. It is simply a remotely-hosted mirror of your local repository and allows you to have your work stored in a safe place and accessible from your other computers and potential collaborators. Once in a while (at the end of the day, or every new commit if you want) you can **push** your commits, which means sending them to the remote repository so it keeps in sync with your local one. When you want to update your local repository based on the content of a remote repository, you have to **pull** the commits from the remote repository.
### Summary of git semantics
+ **add**: Tell git that you want a file or changes to be tracked. These files/changes are not yet saved in the repository! They are listed as "staged" in the index or staging area for the next *commit*.
+ **commit**: Save the *staged* changes to your *local repository*. This is like putting a milestone or taking a snapshot of your project at that moment. A commit describes what has been changed, why and when. In the future you can always revert all tracked files to the state they were at when you created the commit.
+ **push**: Send previous changes you committed to the local repository to the remote repository.
+ **pull**: Update your local repository (and your workspace) with all new stuff from the remote repository. This command is simple, but potentially destructive, since it overwrites your files with the ones in the remote server. Hence it is not available in the Git GUI.
+ **fetch**: Get information about the latest commits from the remote repository, but do not apply them to your local repository automatically. This is always safe as it does not change your workspace.
+ **merge**: Merges two versions (branches) into one, applying the result to the workspace. This includes merging commits from the remote repository with the commits of the local repository. In effect, a **fetch** followed by a **merge** is the same as a **pull**, but it allows you more fine-grained control and is available through the Git GUI.
+ **clone** : Copy the content of a remote repository locally for the first time.
+ more advanced:
+ **branch** : Create a branch (a parallel version of the code in the repository)
+ **checkout**: load the status of a *branch* into your workspace
<center>
<img src="figs/git-flows.svg" alt="git flows" style="width: 600px"/>
</center>
## Setting up a Git project
Effective use of git includes two components: local software to manage the files on your computer (git client) and an online git hosting service to make them centrally accessible. While git is a single system, there is a variety of clients and a variety of hosts.
In this course, we will primarily use Git GUI as the client. It is a simple client that is included with Git itself, and is language-agnostic. There are more graphical clients as well, including one integrated into RStudio itself, but these clients are outside the scope of this course. Note that Git is language-agnostic, and we will be using it with both R and Python, so it's best to learn the language-neutral GUI, rather than an R-specific GUI.
```{block, type="alert alert-info"}
**Useful tip**: For those who are comfortable with working from the Linux terminal (which we recommend you all try!), the command line client is often the most efficient choice. Knowing how to use git from the command line is also useful when working on cloud virtual machines/servers for big data processing. So in pro-tip boxes like this you will find command line equivalents to the GUI actions we will perform. Choose whichever way you find the most convenient for yourself.
In the [next tutorial](https://geoscripting-wur.github.io/Intro2Linux/) we will introduce you to more command line commands.
In addition, while this tutorial describes how to use the built-in `git-gui`, you can also use the more modern `git-cola`, which may be more intuitive to you. The general steps are the same as in `git-gui`. You can find Git Cola in the applications menu (or else you may need to install it).
```
Throughout the Geoscripting course, for hosting our code, we will be using the university's very own instance of [GitLab](https://git.wur.nl), the most popular self-hosted Git hosting platform.
Let's jump right into it! We will start by making our very own GitLab scripting project from scratch, and also try forking someone else's project.
### Client setup
1. *Launch Git GUI*
Launch a program called *Git GUI* (e.g. from the *Applications* → *Programming* menu). Git GUI is a graphical interface to Git that comes with Git itself, and is thus cross-platform and always available. When launched, it looks something like this:
![Main screen of Git GUI](figs/git_gui_main.png)
```{block, type="alert alert-info"}
**Note**: External users should [download git](https://git-scm.com/download/) and install it to obtain Git GUI.
**Useful tip**: You can launch Git GUI from the terminal with `git gui`.
```
2. *Create an SSH key pair*
In order for GitLab (and other services) to identify that the machine connecting to it is indeed owned by you, there are two options: using a password, or using an SSH key. SSH keys are much more secure than passwords, and it doesn't require you to enter a password every time you try to communicate with the server. Therefore throughout the course we will use SSH keys.
You can generate a new SSH key pair in Git GUI by going to *Help* → *Show SSH Key* and pressing the *Generate Key* button. It will ask you for a **passphrase**. This is **not a password** and is completely **optional**: it is useful in the case your SSH key is stolen, for instance by a thief stealing your laptop or a virus; however, SSH keys are specific to each machine and are never sent over the network, so most of the time it is completely fine to **leave the passphrase empty**. If you keep it empty, you will **not need to enter it every time** you try to push your changes, yet the connection will be even more secure than when using a password.
Once done, you will see your new public key:
![SSH public key generated](figs/git_gui_sshkey.png)
```{block, type="alert alert-info"}
**Useful tip**: From the command line you generate a key pair by running `ssh-keygen -t rsa -b 4096`. In both cases, by default the public key is stored in the file `~/.ssh/id_rsa.pub` (where `~` indicates the user's home directory).
```
### Account setup
Next, we will link our client with a Git host so that we can download and upload repositories.
3. *Log into GitLab*
Go to [GitLab](https://git.wur.nl/) and log in (using your **email address** as the user name and your password). You also need to set up two-factor authentication by going to [your profile page](https://git.wur.nl/-/profile/account).
4. *Enrol the public key to your user account*
The SSH key pair is used to identify that you own the machine. <!--On WUR Windows PCs, the keys are stored on your *M:* drive, so they will follow you on any computer in the university.--> On WUR Linux virtual machines (VMs), the keys are stored on your personal VM, so you can access them from any computer that can access the VM.
Now you need to tell GitLab about your new key. To do that, copy the public key from the dialog, then in GitLab click on your avatar in the top right and go to *Edit profile* → *SSH keys*. Paste the public key in the box, give it a title describing your machine, and press *Add key*. You might need to confirm the key by email for added security.
This only has to be done once (per machine/OS you use GitLab on).
### Creating a new repository
5. *Create remote repository*
Now we are ready to start making new repositories! In GitLab, press the "+" button at the top right, select *New repository/project* (GitLab uses both terms somewhat interchangeably) and *Create blank project*. Give it a descriptive name and a short description, choose the visibility of the repository and check *Initialize repository with a README*.
![New repository creation on GitLab](figs/GitLab-new-project.png)
<!-- **Note:** During the course you are asked to submit answers to exercises as **separate projects**, using a specific scheme for the repository name: `Geoscripting-Exercise<id>-<team>` where `<id>` is the exercise number and `<team>` is your team name, for instance, `Geoscripting-Exercise3-TheGitters`. In addition, the visibility level **must** be set to *Private*. **Submissions that do not follow these rules may be rejected!** -->
<!--Next, GitHub asks what software license you'd like to apply to your code, as we have discussed in the previous chapter. See [Choose a License](http://choosealicense.com/) for a quick overview of what licenses are available. Make sure to choose a license, otherwise basic copyright applies to your code.
In addition, you can add a `.gitignore` file. This is useful to prevent Git from tracking files that you don't want to keep track of, like temporary files or your R command history. The options are sorted by language, so you can select your language of choice, e.g. `R`.-->
6. *Configure repository settings*
Explore your new blank repository a bit. In the middle, you have buttons to add new files. Choose to add a `LICENSE` file, as we have discussed in the previous chapter. See [Choose a License](http://choosealicense.com/) for a quick overview of what licenses are available. Make sure to choose a license, otherwise basic copyright applies to your code.
On the tabs to the left, you can find that the repository can have issues and merge requests assigned to them. **Issues** is what is used to give feedback on code, so try and make a few issues and close them. It is useful to know how to use these, as for personal projects it can be used as a to-do list, and for others' projects you can use it to report bugs or propose suggestions. You may be surprised how responsive developers can be to newly raised issues!
![Example issue on GitLab](figs/GitLab-issues.png)
Next, check out the repository settings. Under the *Members* subtab of the *Project information* tab, you can invite other people to collaborate on your repository. Go ahead and invite your team member to be a collaborator with a maintainer role.
<!-- **Note:** During the course you are also required to add a staff member to your project with *Master* privileges before the submission deadline of each exercise in order for your exercise answer to be graded. The username of the staff member you need to add can be found on Blackboard. Do not share the repository with a group. **Submissions that do not follow this rule will be rejected!** -->
7. *Get the URL of your new repository*
Now that you have a remote repository, it's time to create a local repository that links to it! Open the main page of your new repository, click the blue *Clone* button at the top right of the page, and copy the *Clone with **SSH** * address of your new repository.
<!-- ![Clone or Download → Clone with SSH](figs/github_screenCapture.png) -->
![Blank GitLab repository](figs/GitLab-empty-repository.png)
8. *Clone your repository*
Go back to Git GUI, and press *Clone Existing Repository*. Paste the URL you just copied to the *Source Location* field, and choose a directory you want to store your code in in the *Target Directory* field. **Note**: the *Target Directory* must **not** already exist! Git GUI will create it for you.
Once you click *Clone*, you will get a question about whether you trust the remote machine (if you ran `git gui` from a command line, it will appear stuck, but actually **the question will appear in the terminal** and you need to answer it there). You need to answer this with `yes` (the full word). This puts the GitLab server into a list of trusted servers, to guard against potential impostor servers.
You will end up in an empty Git GUI window:
![Git GUI in an empty directory](figs/git_gui_blank.png)
```{block, type="alert alert-danger"}
**Notice**: Sometimes Git GUI crashes or gets stuck at this stage. When you restart it, you may also find that the panes are collapsed (you need to drag them out from the borders of the window) and that the repository branch is set to `master` instead of `main`. To avoid this issue, and because it is much faster and more convenient in general, we recommend always cloning repositories from the terminal (see the next box), and then continuing to open the repository with Git GUI.
```
```{block, type="alert alert-info"}
**Useful tip**: From the terminal, `cd` into the directory you want to clone into, and run `git clone <url>`, e.g. `git clone [email protected]:you001/example.git`. The repository will be cloned into a subdirectory with a matching name. This is much faster than using any GUI!
```
9. *Tell Git who you are*
Before you start using Git, you should tell it what your name and email address is. You need to do that only once per Git installation. You should go to *Edit* → *Options...* and fill out the *Global (All Repositories)* options *User Name* and *Email Address*. These will be displayed on GitLab.
```{block, type="alert alert-info"}
**Useful tip**: To set your user name and email from the command line:
git config --global user.name "Your Name"
git config --global user.email [email protected]
```
### Working with Git GUI
10. *Make changes*
To see Git in action, you need to make some changes in your repository. Try it by creating a new R script file in the directory where you cloned your new project.
Once you are done, go back to Git GUI. If you closed the window, you can get back to your repository by launching Git GUI and clicking on its path in the *Open Recent Repository* list. If you did not close it, click the *Rescan* button. You will see some changes:
![Changes pending in Git GUI](figs/git_gui_changes.png)
```{block, type="alert alert-info"}
**Useful tip**: To see a list of files with pending changes from the command line, use `git status` while in a git repository. To see what exactly changed in each of these files, use `git diff`.
```
At the top left corner, the *Unstaged Changes* panel, you can see all the files that changed in your workspace. If you click on the **name** of the file, the main panel will show you what changed since the last commit. Unless it is a non-text (data) file, in which case it will just note that something has changed. **Note**: Git is very efficient with storing changes in text files: these *diff* files are all it stores internally, it does not copy the whole file on each commit. However, it does not deal efficiently with non-text files, and thus you should limit the amount and size of such files as much as possible.
If you click on the **icon** of the file in the *Unstaged Changes* panel, the file changes will be *staged* and appear at the *Staged Changes (Will Commit)* panel. These are the file changes you want to save and sent to GitLab. You don't have to stage all files for each commit, only those you actually want to be tracked by git. You can safely ignore some files such as manual backups, temporary files, and the like and they will remain untracked by git, as long as you never stage them. If you do want to stage everything, you can press the *Stage Changed* button. If you staged more than you wanted to, you can click on the file **icon** in the *Staged Changes* panel to unstage it.
Remember: clicking the **name** of the file shows the changes you made, clicking the **icon** of the file stages or unstages the change!
```{block, type="alert alert-info"}
**Useful tip**: To stage a change from the command line, use `git add path/to/file.ext` where `path/to/file.ext` is the file you want to stage. To unstage, use `git reset HEAD path/to/file.ext`.
If you have files that you don't want git to track, you can add them into the `.gitignore` file. It could be the name of a file, a directory, a wildcard (e.g. `*.pdf`), or any combination of these. To list several, put them on separate lines.
```
11. *Commit changes*
Once you staged the files that you want to commit, you need to fill out the *commit message*. This is a brief description of what changes you made between the last commit and the one you are about to create. The **first line** you enter is the **title** of the commit, keep that one short. Subsequent lines are the **description**. You may notice that the *Commit message* box does not have a horizontal scrollbar: that is intentional, because your commit message should fit within that box without the need for scrolling. Use new lines to break the text.
If it is the first time you use Git GUI to make a commit, and you haven't filled out your user name and email, it might complain about it not knowing who you are. In that case go back to step 9.
Next press the *Commit* button and your commit will be saved locally. A commit is like a saved state: you are always able to roll back the contents of your tracked files to the state they were in when you committed the changes.
```{block, type="alert alert-info"}
**Useful tip**: To commit a change from the command line, use `git commit`. If you want to stage all tracked and changed files and commit, use `git commit -a`. To include a message with your commit you can use the commit command with a `-m` flag, for example: `git commit -m "added plots to webapp" `.
```
In the case you made a mistake (a mistake in the message, forgot to stage something, etc.), you can press the *Amend Last Commit* button and get right back to where you were when you made the last commit; but use this functionality very sparingly, as it does not work with changes that have already been pushed to GitLab.
```{block, type="alert alert-info"}
**Useful tip**: To amend the last commit from the command line, use `git commit \-\-amend`.
```
10. *Push changes to the server*
Press the *Push* button, and confirm the push, to send all your changes to your GitLab repository. You can now refresh the GitLab page to see your changes. Well done!
![GitLab repository with content](figs/GitLab-repository.png)
```{block, type="alert alert-info"}
**Useful tip**: To push changes from the command line, use `git push`.
```
11. *Pull changes from the server*
One of the major uses of Git is collaboration and the ability to synchronise changes across different devices. Multiple users can do changes in the same Git repository (as long as you change the repository settings in GitLab to allow another user to do that), and you can work on the same code on different devices yourself. In both cases, it is important to keep all local repositories in sync with the remote repository. That is done via Git GUI by using *Fetch* and *Merge*. If you like, you can test it by cloning the same repository in another directory, making changes and pushing them to the server, then using fetch in the other copy.
If there are any changes on GitLab that are not on your local copy yet, in Git GUI go to *Remote* → *Fetch from* → *origin* to download all changes. This will not apply them yet, however.
To attempt to apply the changes, go to *Merge* → *Local Merge...*. If all goes well, the changes will be applied.
```{block, type="alert alert-info"}
**Useful tip**: To do a fetch and merge together the command line, use `git pull`.
```
There may be cases where files go out of sync in incompatible ways, however, like two people editing one file at the same time. In that case you may hit a *merge conflict*. It is best to try to avoid them. In case it happens, you need to go through the conflicting files in a text editor and edit them by hand, keeping the parts of the files you need. The conflicting parts will be in between lines of of `>>>>` and `<<<<` symbols. Once you remove the parts you don't need (including the separators), you can solve the conflict by committing the changes.
### Forks and merge requests
Now we know how to work with Git and GitLab for our personal work, and how to collaborate on a project with your team member. But what if you want to contribute code to someone else who has not given you access rights, or what if you want to review the code before it's accepted to your repository? That's where forking and merge requests come in handy (respectively)!
A fork is your own personal copy of someone else's repository. GitLab allows you to fork any public repository. You want to make forks whenever you want to edit code but do not have direct commit rights.
```{block, type="alert alert-info"}
**Useful tip**: In fact, if you click the edit button on a file on GitLab and do changes to a repository that you don't have the rights to write into, GitLab will helpfully make a fork for you, followed by a proposal to make a merge request for your changes.
```
12. *Fork a repository*
Go to your team member's repository that they created by following the steps above, and then click the *Fork* button at the top right. (If you can't find it, alternatively you can go to some other repository, and fork it.) You will find a new repository under your profile, by default with the same name as the original.
<!--![Fork button](figs/GitHub-fork.png)-->
13. *Make changes, commit and push*
After you have your own fork, it is the same as having your own personal repository with the code from the original (*upstream*) repository in it. Clone it locally, make some changes, commit them and push them back to GitLab, as per steps 7-10. You should see that your changes take effect in your own *downsteam* fork, but not in the upstream repository.
14. *Make a merge request*
If you are ready to ask the upstream developers to incorporate your code into their repository, go to the *Merge Requests* tab and press the blue *New merge request* button. Select your `main` branch of your fork as the source.
<!-- ![New pull request button](figs/GitHub-pr.png) -->
This will show you all the changes you have made, and if that is what you want to propose for the upstream developers to incorporate, give the name for your merge request (changeset) and a description as to why the upstream developers would want to incorporate your code. After you confirm clicking *Create merge request*, the merge request will be visible in the merge requests tab of the *upstream* repository:
![A submitted pull request](figs/GitLab-pr-submitted.png)
Then it's up to the upstream developers to perform a code review and either accept or reject the pull request in the end.
```{block, type="alert alert-info"}
**Useful tip**: For code review, GitLab also has special tools. If you look at the *Changes* tab of a merge request, you will see that you can press a bubble button next to any line of code and write a comment about it. Once finished, there is a "Submit review" button at the bottom to send all comments at once.
```
### Other Git GUI functionality
You might run into a situation when you have made changes in tracked files, but do not want to keep some of the changes. You can revert one file by selecting it in Git GUI, then clicking *Commit* → *Revert changes*.
```{block, type="alert alert-info"}
**Useful tip**: The command line equivalent is `git checkout \-\- path/to/file.ext`, or if you want to reset all changed files, `git reset \-\-hard`.
```
Git GUI not only provides a way to make, push and pull commits, but also to visualise the commit history of your repository in a tree graph. Go to *Repository* → *Visualise Master's History* to see it. For larger and more complex projects with lots of contributors and merges, it might look like some sort of a subway map:
![Git GUI history (gitk)](figs/git_gui_gitk.png)
```{block, type="alert alert-info"}
**Useful tip**: The command line equivalent is `git log`.
```
The history view also allows you to reset the state of the repository to any previous commit by using the context menu. Note, however, that you can only push if you are on the latest commit. So the easiest way to revert changes is to copy over the files to a temporary directory outside of git, reset back, and move the files back into your repository.
```{block, type="alert alert-info"}
**Useful tip**: A few more options are available from the command line. `git revert <commit>` will undo changes from a given commit, where `<commit>` is the commit ID (you can get commit IDs from `git log`, they look like a long string of letters and numbers). `git checkout <commit> \-\- path/to/file.ext` will reset a single file to the state it was at the given commit.
```
You can also browse the history of a repository from your Git hosting service, and GitHub/GitLab even allow editing files from a web interface.
```{block, type="alert alert-success"}
> **Question 3**: How do you find commit history and old versions of your files on GitHub/GitLab?
```
Below you can see a visual summary of what we have described above.
![Git workflow overview](figs/Git_overview.png)
That's it: now you know how to keep track of all your files, so you will never lose them again, and no longer have to worry about making backups or saving multiple versions. In addition, this is the way that free and open-source code development happens in actuality. Also, the exercises and assignments in the course will be delivered and submitted this way, so make sure you are familiar with the whole process!
# References
* Great 15 min interactive git commands tutorial: [try.github.io](https://try.github.io)
* Hadley Wickham on [Rstudio and git](http://r-pkgs.had.co.nz/git.html#undefined)
* R Studio documentation on version control: [Using Version Control with RStudio](http://www.rstudio.com/ide/docs/version_control/overview)
* Video tutorial to use revision control with R Studio and GitHub/BitBucket [Youtube link](http://www.youtube.com/watch?v=jGeCCxdZsDQ&noredirect=1)
* Advanced Git: [A successful git branching model](http://nvie.com/posts/a-successful-git-branching-model/)