Skip to content

Using GitHub as a Reproducible Research Platform

Tiffany J. Callahan edited this page Mar 27, 2019 · 15 revisions

This page documents our attempt to implement reproducible research in every aspect of our are projects using GitHub. Why GitHub? Honestly, because there are several built-in functions that make it well-suited for out-of-the-box use as a reproducible research platform. In this post we will introduce several native GitHub tools and provide examples of how we use them to make our projects reproducible. It is our hope that this post will be useful to collaborators, regardless of their experience with GitHub.

Finally, we have drafted some simple guidelines that we (and hopefully our collaborators 😄) fill follow in our quest for reproducibility.

Tour of GitHub


Tour of GitHub


Notifications

Before we get started, please update notifications for this repository!!
To ensure that you only get sent emails form GitHub when you are specifically @mentioned or assigned to an issue or when big changes to the code base are made, you need to update your notifications. This change will prevent you from getting annoyed with me by ensuring that you are only notified when something ACTUALLY needs your attention. To make this change see the two-step instructions and screenshot below.

Steps:

  • Go to the “UnWatch” drop-down in the Abra-Collaboratory repository
  • You will notice when you click on the arrow that “Watching” will be checked (by default), change this to “Releases only”

That’s it!

Homepage

When you search for and navigate to a repository on GitHub, you will be taken to a page that looks like the image below. This view is the homepage or command center -- all of the important functionality you need to interact with your project can be reached from here.





In the image above, you will notice I have highlighted several things in red and gold:

  • The gold box is drawn around the directories and files that are used in the Git repository. Most often, this is where code and other project files or data are stored.
  • The red circles are drawn around different tools you can use to interact with your project. These tools are the most important for enabling reproducible research. Let's take a closer look at each of these, starting with Issues.

Issues

"Use issues to track ideas, enhancements, tasks, or bugs for work on GitHub. You can collect user feedback, report software bugs, and organize tasks you'd like to accomplish with issues in a repository. Issues can act as more than just a place to report software bugs". GitHub - About Issues

If you were to click on the Issues tab from the current repository, you would see a page similar to what is shown below.





This page serves as a dashboard for the project, providing information like the number of closed (i.e. resolved or completed) and open (i.e. unresolved or uncompleted) issues as well as the issue type. In the image shown above, you can see that there three open issues that have labeled as "TODO: Manuscript Tasks". You can also see that there are 3 issues that I have closed.

A new issues is created by clicking on the green "New Issue" button, which will present you with several different types of issues you can submit (as shown in the figure below).





As you can see in this image, there are several kinds of issues that can be selected. In addition to built-in issue types that GitHub provides for every repository, I have built some specific issues templates for use in this project.

Whenever possible, the issues shown above should be used according to their description:

  • Bug report: A built-in issue template that should be used when you find an issue in the code base that needs to be fixed.
  • Coding Tasks: A custom template that should be used when you want to request a change be made to existing code or when you want to suggest new code that could be added to the code base.
  • Feature request: A built-in issue template that is used when you have a new idea or suggestion that you would like to share with the project developers.
  • Help: A custom template that should be used when you have a question on how to contribute to the repository. This can also be used a place for asking any question on how to contribute to this repository.
  • Manuscript Tasks: A custom template that should be used when you want to create a task that is related to a manuscript being written about/using this project.
  • Meetings: A custom template that should be used when you want follow-up on a task assigned during a meeting or when you want to suggest a new topic for discussion at an upcoming meeting.
  • Other: A custom template that should be used when you are unable to use any of the other issue templates (e.g. general questions about
  • Project Organization Tasks: A custom template that should be used when you want to add a task related to the organization of the project (e.g. adding collaborators or modifying project boards or milestones).
  • Wiki: A custom template that should be used when you want to suggest and edit to the project Wiki page.

Once you have selected the type of issue you want to submit, you will be presented with an empty template, specific to that issue, and asked to provide certain information. The image below provides and example of what a new "Manuscript Task" issue request looks like.





There are two components that must be filled out in order to submit a new issue (shown above):

  1. Issue Title: Provide a brief description or label of the task you are submitting
  2. Description: If a template has been provided, fill it out to the best of your ability. The template is by no means a requirement, but rather provides suggest information to help the project developers address your request.

Aside from these two components, there are other things that you should consider adding when making a new issue request (see green boxes in the image below).





The green boxes highlight additional information that a user can add to a new issue request:

  1. Assignees: Assign the issue to a specific collaborator on the project.
  2. Labels: Assign a label to the issue. For this project, we have created labels that match the issue template types in order to make issue search and categorization easier.
  3. Projects: Assign the issue to a Project Board, which helps to keep specific sets of issues grouped or organized.
  4. Milestone: Assign the issue to a Milestone, which are goals set up by the project developers.

Once the issue template is filled out, preview it and then submit it. Once submitted you, you should see a screen that looks similar to the image shown below.





In this image, you will see blue and green boxes:

  • Blue Box: In this box, I am pointing out how you can include collaborators in your issue, without assigning it to them (when you assign an issue to someone you are asking them to address it). when the @ symbol is used with a collaborator's username (e.g. @callahantiff), that person receives a notification that they have been mentioned or referenced in an issue. It's most commonly used when you want to ask someone a question or make them aware of an issue.
  • Green Box: In this box, you will see confirmation of the Assignees, Milestones, and the progress of the issue with respect to a Project Board. As a reminder, all of these items options that are selected when creating a new issue request.

Milestones

"You can use milestones to track progress on groups of issues or pull requests in a repository. When you create a milestone, you can associate it with issues and pull requests. To better manage your project, you can view details about your milestone". GitHub - About Milestones

As shown in the image below, milestones can be reached from the Issues tab.





As described above, milestones can be thought of as project goals comprised of specific issues. The progress towards a milestone is then a direct reflection of the how many of the assigned issues have been completed. In the image above, you will see that each milestone includes a brief description and due date. Additionally, you will notice a progress bar that shows what percentage of issues have been completed.


Project Boards

"Project boards on GitHub help you organize and prioritize your work. You can create project boards for specific feature work, comprehensive roadmaps, or even release checklists. With project boards, you have the flexibility to create customized workflows that suit your needs. Project boards are made up of issues, pull requests, and notes that are categorized as cards in columns of your choosing. You can drag and drop or use keyboard shortcuts to reorder cards within a column, move cards from column to column, and change the order of columns. Project board cards contain relevant metadata for issues and pull requests, like labels, assignees, the status, and who opened it. You can view and make lightweight edits to issues and pull requests within your project board by clicking on the issue or pull request's title." GitHub - About Project Boards

Shown in the image below, we have created three projects within the PheKnowVec repository.





You will notice that this page serves as a mini-dashboard and provides basic details like the name of the project, a brief description of its purpose, and progress shown in green (open issues) and purple (in-progress issues) bars.

If we click on the "Manuscript Tasks" project, you will see three boards (shown below).





The three boards that comprise this project are: New Tasks, In Progress, and Completed Tasks. Within each of these boards there are issues. It is important to note that projects can be automated, meaning if the issue is assigned to a project at the time of creation, it will automatically be added to that project's boards. This is a very useful feature and goes a long way towards helping create things organized.


Wiki

"You can host documentation for your repository in a wiki, so that others can use and contribute to your project. Every GitHub Enterprise repository comes equipped with a section for hosting documentation, called a wiki. You can use your repository's wiki to share long-form content about your project, such as how to use it, how you designed it, or its core principles." GitHub - About Wikis

Wiki's are one of the most powerful reproducibility tools that GitHub provides. As described in the documentation, Wiki's are meant to provide additional information on a project and in my case, will be used to help keep collaborators up to speed, provide detailed information regarding project decisions, and track accomplishments (i.e. publications and conference submissions).




The image above shows what you would see if you navigated to the PheKnowVec Wiki Home page. In the green box, I'm pointing out the other wiki pages that have been created for this project. Currently, these pages have been organized into three sections (i.e. "Project Information", "Discussions", and "Analyses"). Wiki's are very flexible allowing for additional pages to easily be added as the project progresses.



Other Ways to be Reproducible

It's worth noting that GitHub is not the only way to implement a reproducible research environment. I highly recommend ReproducibleReserach.net, which provides helpful blogs, links to resources, and also celebrates those in field who are currently conducting research in a reproducible way. Other platforms/tools/environments include:


Return to Menu

Clone this wiki locally