Skip to content

NBISweden/workshop-pgip-H23

Repository files navigation

Population Genomics in Practice • Course materials

PGIP • Population genomics in practice

Course materials for course Population Genomics in Practice. Please make sure to read the entire README before adding material.

Terms of use

You are free to use the course material for online learning, journal clubs, or whatever way you see fit. If you find errors or would like to suggest improvements, please consider filing a github issue.

Installation and setup, TL;DR

Note: this is only for developers! To view the website, navigate to https://nbisweden.github.io/workshop-pgip/.

Clone the repo and cd to directory

git clone [email protected]:NBISweden/workshop-pgip.git
cd workshop-pgip

Since pushing to the main branch is disallowed, make sure you create a development branch named dev-yourgithubusername:

git checkout -b dev-yourgithubusername

Make edits on and push this branch to the repo. For this to work you need to properly setup github authentication. Alternatively, you can fork the repository to your own github user account.

Whenever a pull request is accepted, you need to sync your development branch with the new main. If you have cloned from NBISweden do

git fetch -a
git checkout main
git merge origin/main
git checkout dev-yourgithubusername
git merge main

If you are working on a forked copy it is easiest to sync the main branch via the github interface, after which you can follow the previous commands.

Installation

You can change the environment name by modifying the PGIP environment variable:

export PGIP=pgip
make install-pgip
conda activate pgip
make install-R
make install-kernels
make install-pixy
make install-dev

If the above commands have worked without issues you are done and you can head over to the section on rendering documents. If not, start by reading the following sections that describe each installation step in more detail. There is also a section on known installation issues.

Installation and setup

Create pgip conda environment

Create a conda environment called pgip using the conda lock file

mamba env create -n pgip conda-linux-64.lock

and activate the environment

conda activate pgip

The environment can also be installed with the make command make install-pgip.

Install R packages

UPDATE: the tinytex distribution causes all kinds of trouble. See section Installation issues below.

A number of R packages need to be installed manually, notably dotenv, tinytex, devtools, and the local package pgip which resides in src/latex. The easiest way to do so is to issue

make install-R

in the root directory.

Install kernels and python package

There is a helper package pgip-tools with code to run simulations and more. To make use of it install with

python -m pip install git+https://github.com/NBISweden/pgip-tools

or

make install-kernels

The make command will also install a python kernel that can be used as the main engine to render documents. The kernel is named pgip and is detailed as a jupyter kernelspec in the project configuration file. See Using Python for more information.

Install quarto

Install Quarto version Quarto>=1.2.475.

bcftools manual install (OBSOLETE)

OBSOLETE: installing pgip using conda-linux-64.lock obviates the need to install bcftools manually.

Due to dependency issues, bcftools has to be manually installed:

make install-bcftools

or

./scripts/install-bcftools.sh

in the root directory.

Install pixy

There currently is no pixy conda package for Python>3.9 which requires manuall installation with pip:

make install-pixy

or

python -m pip install git+https://github.com/ksamuk/pixy.git

Development tools

There are a number of development tools in environments-dev.yaml that help maintain consistent coding styles and perform code quality checks prior to committing. They can be installed either with

mamba env update -n pgip --file environment-dev.yml

or

make install-dev

To activate pre-commit (recommended but not enforced), run

pre-commit install

From here, pre-commit will be run whenever you attempt to commit code.

Setup environment variables

Finally, you need to set some environment variables to enable correct rendering. For reproducibility, set these in a file .envrc in the pgip root directory and source it with source .envrc.

On a first run you will likely encounter an MissingEnvVarsError for the PARTICIPANT_DATA variable, which has to be set. The first run should trigger a setup script that generates the file docs/_environment.local where PARTICIPANT_DATA is initialized. Rerunning quarto preview in the docs folder should suffice. Should this for some reason fail, you can always set the variable manually:

export PARTICIPANT_DATA="foo.csv"

The repo contains custom LaTeX code and the path src/latex needs to be set via the TEXINPUTS variable:

export TEXINPUTS=${TEXINPUTS}:/path/to/pgip/src/latex

Installation issues

tinytex

The current TinyTeX installation setup causes errors from the TeXLive manager tlmgr that are difficult to fix. As a workaround, install texlive and texlive-latex-extra. This is the working solution in the github CI actions. See .github/workflow/build.yml for details.

Missing libz2 and lzma library headers

In case you get errors related to libz2 and lzma it is likely due to missing header files. On Ubuntu, run

sudo apt install libz2-dev
sudo apt install liblzma-dev

Local preview/render

For local preview (edits to files immediately triggers regeneration of output), cd to docs directory and run quarto preview. You can specify a port to consistently reenter the same local web page:

quarto preview --port 8888

There are also make rules to render single files or the entire project ('production') as they would appear online:

make docs/_site/slides/demo/index.html
make production

Adding/Modifying topics

Add subdirectories to docs/exercises and docs/slides that describe the topic in one or a few words. Shorter is better. Add an index.qmd file to each directory and edit away. Look at the demo files (docs/slides/demo/index.qmd, docs/exercises/demo/index.qmd and docs/exercises/demopy/index.qmd) for examples.

For some directories, figures and commands are run on a small data set that is installed in docs/data on the first rendering. The data is setup with the script scripts/setup-exercise-data.sh. To "register" a directory for data setup, edit the script environment variable OOA_OUTGROUPS, add custom setup as appropriate or create a custom pre-render script (remember to list it in _quarto.yml).

Some resources on developing content

Style guide

It is recommended you follow pre-defined style guides. To help enforce styles and catch formatting errors early, use the linters listed below. To this end, it is also recommended to setup pre-commit hooks. The linters are listed as dependencies in the development requirements file environment-dev.yml.

Bibliographic entries

Bibliographic entries are stored in BibTeX format in docs/assets/bibliography.bib. Add entries when necessary and cite using quarto's citation syntax (e.g., [@citation]).

Development

Pushing to the main branch is prohibited so any updates to the online material must be added via pull requests. Create a branch from main prefixed with dev- (e.g., dev-yourgithubusername) to use as your main development branch.

Common linting errors

markdownlint is run on quarto and regular markdown files. It is possible to disable a linting error by adding a comment. For instance,

<!-- markdownlint-disable MD013 -->

Some code here

<!-- markdownlint-enable MD013 -->

will disable error MD013/line-length between the two comment statements above. This is particularly useful for code blocks that are difficult to wrap. It can also be convenient to disable error MD041/first-line-heading/first-line-h1 when including external files before the first heading.

Test / development data

Test data is managed with the pgip-data repository. The data is setup on the first rendering with the script scripts/setup-data.sh, as defined in the project definition file _quarto.yml (section project:pre-render).

Conda lock environment file

The conda environment file environment.yml lists required binaries needed to generate the pages. Whenever a dependency is added, the script scripts/condalock.sh should be run to generate a new conda-linux-64.lock file that is used to install packages in the CI environment.