-
Notifications
You must be signed in to change notification settings - Fork 1
/
intRo 3- Sharing.Rmd
117 lines (74 loc) · 7.1 KB
/
intRo 3- Sharing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: "intRo 3"
subtitle: "Sharing"
author: "Billy Fryer"
output:
html_document:
code_download: TRUE
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Introduction to Github
Sharing both code and results are super important when doing data analysis of any kind, but especially in sports analytics. In intRo 2, we went over how to make plots and that there are sports data packages on a website called GitHub. Now, it's our turn to share our work, specifically the code, with other sports analytics people. This can be to show off all of the hard work we have done or to collaborate with partners on a project by working together.
This tutorial will be pretty different than the rest of the intRo series because it is not necessarily with R, but instead just a more general coding thing that people should learn.
## What is Github?
Github is a site where users submit files to version control. This allows for coders to collaborate on code without fear of losing a previous version if they have a major mess up. This also allows coders to both be working on a piece of code at the same time.
## Why is Github important?
Having a Github account is important because it allows you a place to display all of your code publicly as a portfolio. It also allows other people to see your current projects and give feedback to improve your projects. Github fosters collaboration which is vital to the coding community.
## Words to Know
A *repository* is similar to a folder on your computer. All of your files for a project go inside the repository.
A *branch* is a version of a repository. Branches are like alternate realities of a repository. Instead of "breaking off" a branch, one says that they *fork* a branch in order to make their own changes. Forking branches are the reason why Github is one of the best platforms for collaboration: it allows multiple people to work on a project at the same time while not messing up a current correct version.
A *pull request* is a request to the owner of the repository to merge your edits to the *main branch*, the one that contains the current
The *README file* is the explanation of all of your work. In your README file, you should outline what every file in your repository does or contains. If you are creating a package in R, this should be an explanation of important functions inside your package.
More words will be introduced as appropriate. In the text document, these are bolded.
\newpage
## Creating a New Repository
After creating an account, click your **profile picture** in the top right corner. Then click the **Repositories** tab. Finally, click **New** to create a new repository.
This should open up a new page, where you need to enter a repository name and a general description of the repository. This should be around a paragraph. This should not be a full length explanation of the entire file, that's what the README file is for. From there, click the check-box next to **Add a README file** in order to add a README file. Finally, click **Create Repository** to create the your new repo.
## Adding Files to the Repository
Adding files to your repo is pretty simple. Drag the file from your computer to the main screen of your repository. Then, attach a name a description to your *commit* and click **commit changes** to submit to the main branch.
## Tracking Commits
The best way to look about how a repository has changed over time is looking at a timeline. To see this view, click the **Insights** tab and then click **Network**. This shows the main branch as well as all other branches that have were created or *merged* back to main branch.
## Forking
Forking is very useful when you do not want to mess up your current repository. Forking creates a branch which, to make a comparison, is like an alternate timeline in the movie Back to the Future. However, you can always delete the branch and go back to the original timeline. To create a new branch, go to the **Code** tab. Then click the button that says **main** and type in a name for a new branch. Finally, click **Create Branch** to create this branch.
From there, make the changes in this branch as you would the main branch. To commit the branch you created back to main branch, click on the **Pull requests** tab, then click **Compare & pull request**. Create a name and a quick description to describe what changes you made from the original branch. Finally, click **Create pull request** to submit the pull request. The owner of the repository will merge the pull request.
## Merging Pull Requests
If you are the owner of the repository, you are the one that gets the final say on what gets committed to the main branch of the repository. To see the changes, click on the **Pull requests** tab and click on the pull request that is available. You should look for changes that were made and see if you like them. If you do, click the **Merge pull request** button and then **Confirm merge** to merge with the main branch. After you merg the branch, you should delete it. To do this, click the **Delete branch** button.
\newpage
# Sharing Your Content
One of the most important things is sharing the work that you create. One of the most popular ways to do so and to get feedback in the sports analytics community is by posting on Twitter. Here are some tips about posting your work for others to see:
* Make sure and say where your data is from! It's always nice to give credit where it is due, so in the caption of any plot that you make, make sure to explicitly state what package and/or what website you got your data from. A except of writing this is "Data from ESPN via the wehoop package"
* On the flip side, make sure and put your social media handle or your name on graphics you make. Unfortunately, people may try to steal your work, but if you tag yourself, they can't claim it as their own. On the other hand, if someone shares your work, it's even easier for people to look you up if your tag is already part of the graphic. This can also be listed as part of the caption for your plot.
* Make sure to be colorblind friendly in your plots! This includes not mixing reds, greens, and browns that may be easily confused. To make sure your plot isn't colorblind unfriendly, look up colorblind themes on Google.
* Feel free to tag the Sports Analytics Club at NC State on Twitter! We'd love to post cool graphics created by members to share with people that may not be in your current network.
# Example
```{r Graphic}
# Packages
library(tidyverse)
library(Lahman)
# Data Cleaning
redsox <- Pitching %>%
filter(yearID == 2004) %>%
filter(teamID == "BOS")
# Plotting 1
boxplot <-
ggplot(redsox,
aes(x = teamID,
y = HR)) +
geom_boxplot()
print(boxplot)
# Plotting 1
violin <-
ggplot(redsox,
aes(x = teamID,
y = HR)) +
geom_violin()
print(violin)
# Adding Labels
fullplot <- violin +
labs(title = "Red Sox Pitchers Homeruns Allowed",
subtitle = "2004 World Series Season",
caption = "Data Viz by Billy Fryer (@_b4billy_) | Data from Lahman Data Sets")
print(fullplot)
```