Skip to content

Handbook: A General Lab Workflow Using GitHub

Taylor Salo edited this page Mar 14, 2022 · 2 revisions

A lab workflow using GitHub

  1. Try to avoid graphical user interfaces (GUIs) when there is a code-based alternative available. Results from GUIs are difficult to reproduce.
    • If you must use a GUI, look at the tool's documentation to determine if there is an option to generate a macro, script, or config file based on the steps taken in the GUI.
    • At minimum, write out each step taken in detail.
      • Part of review for each project should include having another member of the lab follow your instructions to reproduce your GUI-based results.
  2. All code should be pushed to GitHub regularly. This includes the following:
    1. Preprocessing scripts
    2. Statistical analyses
    3. Figure-generating scripts
    4. Job scripts
    5. Anything used to explore the data
  3. Work on forks for development, and open pull requests when you are ready for your code to be reviewed.
    1. Do not commit directly to NBCLab repositories.
    2. Request pull request reviews from folks who are working on the project or who are solid coders.
    3. Add branch protection rules, if necessary, to ensure that this is enforced.
  4. Use GitHub issues to track code- and data-related tasks.
    1. We can also have repository- or organization-level project boards to track issues.
    2. @ people when you need their help.
    3. Assign issues to folks when they agree to take them on.
    4. Remember to post summaries of pertinent offline conversations in issues.
    5. Trello is used for tasks related to writing and IRBs.
  5. Use GitHub issues and project boards to track research assistant tasks.
    1. You can access each research assistant's work history with a search:
      1. Use the search pattern "is:issue user:NBCLab assignee:username"
      2. E.g., https://github.com/issues?q=is%3Aissue+user%3ANBCLab+assignee%3Atsalo
    2. Research assistant recommendation letters should use these work histories to identify accomplishments.
  6. Every repository should have a README, a license (generally Apache 2.0), and a gitignore.
  7. Repository names should follow lab convention:
    1. [project]-project: Base project repository, containing code for BIDSification and anonymization.
    2. [project]-[title-info]: Specific subproject under the banner of the overall project. Preferably linked to a paper, poster, or talk.
  8. After analyses have been run, consider using something like pip freeze to generate a list of dependencies (and their versions) to cite in the manuscript. Always cite your dependencies (when those dependencies provide citation instructions, at least).

Miscellaneous notes

  • Do not automatically watch new repositories in the organization.
    • This will lead to an overabundance of GitHub notifications. You will start to ignore your GitHub notifications, which means you will miss cases where people actually need your input/help.
  • Do not include any protected data or information in GitHub repositories, including in the code, issues, or comments.
  • Repositories can be kept private until the preprint is uploaded, after which they should be made public.
  • Do not comment on GitHub issues or pull requests via email. Your automated salutations stay in the comment and they look terrible.

Future developments

  • All data should also be version controlled.
    • Tabular and text files should be managed with git/GitHub.
      • This, of course, must be done in a HIPAA-compliant manner.
    • Large files should be managed with git-annex/datalad.
  • All manuscripts should be version controlled.