This is a dummy project to be used as a template to maintain a standardized project structure.
There are two alternatives to use this template as a blueprint for your own project:
- Simply fork the repo
- Clone the repo and change the upstream remote:
Double check with
git clone [email protected]:Rexthor/scientific-project-template.git foobar git remote set-url origin [email protected]:{namespace}/{project}.git
git remote -v
, thengit push
.
This is a generic template to serve as reference for simple scientific projects.
The general structure is as follows:
dat
: data setsdev
: development (scripts)doc
: documentation (e.g. reports, dissemination)gis
: GIS projectsorg
: organizational stuff (e.g. contract, accounting)plt
: plots / figures
Specifically, the data folder dat
is structured as follows:
dat
├── interim » Intermediate data that has been transformed.
├── processed » Canonical output data sets.
├── raw » The original, immutable data dump.
└── reporting » Final data sets for delivery/reporting.
In addition, generic .gitignore
and .Rproj
files are included.
Remarks:
- Please do not commit large data sets (small csv files are fine), reports, plots/images, spreadsheets etc. These should reside on a share.
- Use pre-commit hooks. A sample
.pre-commit-config.yaml
is provided. Runpre-commit install
to install/set up the hooks specified in the configuration file.
Further reading:
- Noble (2009): A Quick Guide to Organizing Computational Biology Projects.
- Wilson et al. (2017): Good enough practices in scientific computing.
For additional information on project structure see:
Filenames
- File names should be meaningful and concise. Avoid excessively long filenames.
- Avoid blanks. Use underscores (
_
) or dashes (-
). - Avoid special characters
- Start the filename with the current timestamp if feasible:
YYYY-MM-DD_filename.end
- If files need to be run in sequence, prefix them with (two) numbers:
01-download_data.py
Filetypes
- Please use geopackages instead of shapefiles
- If you use shapefiles, put them in dedicated folders / zip them
- Use Arrow when seeking to access files in both R and Python.
- If possible, use scripts as much as possible - for the sake of reproducibility and automation. This is especially true for GIS operations.
- If files need to be run in sequence, prefix them with (two) numbers.
- For automation use workflow tools like doit, snakemake or targets.
- Consider using tools like mermaid or drawio for illustrating your workflows. Check in the xml file for documentation purposes.
Python
R
- tidyverse style guide
- Use
tidyverse
- Use
sf
for vector data andstars
for raster data - Recommended editor:
- Use of colors: For color coding data visualizations it is crucial to choose a palette that appropriately captures the underlying information. Please refer to color palettes as provided in the HCL Wizard and use the respective
colorspace
packages for R and Python. - Color advice for maps is available at the Color Brewer.
- Check out Question-based visualizations for help on visualizing the underlying scientific questions of interest clearly and explicitly.
- Commit messages should be clear and unambiguous.
- They can contain more than one line to explain the change if needed, which will not be visible in commit overviews, but in the detailed views.
- Please use imperative present tense for commit messages and avoid dots at the end.
- It is quite common to use short abbreviations at the beginning of commit messages declaring what type of changes the commit contains (c.f. list from Numpy developement workflow).
- Please use the following prefixes for commit messages:
API:
an (incompatible) API changeBUG:
bug fixDEP:
deprecate something, or remove a deprecated objectDEV:
development (tool or utility)DOC:
documentationMNT:
maintenance (e.g. renaming files)REF:
code refactoringREV:
revert an earlier commitSTY:
style changes / formattingTST:
addition or modification of tests
- Certain references in GitLab can be added automatically (copied from help):
- @foo : for team members
- @all : for the whole team
- #123 : for issues
- !123 : for merge requests
- $123 : for snippets
- 1234567 : for commits
- [file](path/to/file) : for file references
git status
: list files you've changed and those you still need to add or commitgit add <file>
: add to staging (index)git commit -m "Commit message"
: commit changes to headgit fetch
: fetch changes from remotegit merge
: merge changes from remote to local (also: merge branches)git pull
:git fetch
&git merge
(not recommended)git push
: send changes made in local version to remotegit log
: show all commitsgit log -p <file>
: show changes over time for a specific filegit blame <file>
: who changed what and when ingit stash
: stash the changes in a dirty working directory awaygit stash pop
: apply stashed state on top of current working directory state