Developer and curator setup guide

Introduction

This is a general guide to bootstrapping and maintaining a complete development environment for working as a curator or developer on the NIF-Ontology, protc, sparc-curation, scibot, etc. For a general introduction to the SPARC curation process see ./background.org The environment bootstrapped by running this file was originally developed on Gentoo, and is portable to other distributions with a few tweaks.

Please report any bugs you find in this file or during the execution of any of the workflows described in this file to the sparc-curation GitHub issue tracker.

Setup

Setup takes about 3 hours. OS level setup takes about and hour, and user setup takes about two hours.

If you do not have root or sudo access or do not administer the computer you are following this guide on you should start at user setup.

If you do have admin access then do the OS level setup first and then come back to the user setup once you are done.

User

If you are already on a system that has the prerequisites installed start here. If you are not you will find out fairly quickly when the following commands fail.

Git name and email

These workflows make extensive use of git. Git needs to know who you are (and so do we) so that it can stash files that you change (for example this file, which logs to itself). Use the email that you will use for curation or development for this. You should not use your primary email account for this because it will get a whole bunch of development related emails.

Run the following in a terminal replacing the examples with the fields that apply to you.

git config --global user.name "FIRST_NAME LAST_NAME"
git config --global user.email "MY_NAME@example.com"

Bootstrapping this `setup.org` file

You can run all the code in this setup.org file automatically using emacs org-mode. The easiest way to accomplish this is to install scimax which is an emacs starterkit for scientists and engineers that has everything we will need. The following steps will do this automatically for you.

All the code blocks in this Bootstrapping section need to be pasted into a terminal (shell) where you are logged in as your user. Run every code block in the order that they appear on this page. Do not skip any blocks. Read all the text between blocks. It will tell you what to do next.

When pasting blocks into the terminal (middles mouse, or C-V control-shift-v in the ubuntu terminal) if you do not copy the last newline of the blocks then you will have to hit enter to run the last command.

mkdir -p ~/.local/bin
mkdir ~/bin
mkdir ~/opt
mkdir ~/git
mkdir ~/files
source .profile

Run the following block to clone this repository and the scimax repository.

pushd ~/git
git clone https://github.com/SciCrunch/sparc-curation.git
popd
pushd ~/opt
git clone https://github.com/jkitchin/scimax.git
popd

Run the following command to initialize texlive for your user. It is needed for scimax to install correctly.

tlmgr init-usertree

Run the following commands to create the scimax command ( ~/bin/scimax on linux and macos, ~/bin/scimax.ps1 on windows), and the config file user.el that is needed for the rest of the process.

echo '(defvar *path-to-setup.org* "~/git/sparc-curation/docs/setup.org")' > vars.el
emacs --batch --load vars.el --load org --load ob-shell --eval '(org-babel-tangle-file *path-to-setup.org*)' --load ~/opt/scimax/user/user.el --eval '(org-babel-tangle-file *path-to-setup.org*)'
rm vars.el

When running the next block scimax will launch emacs an install a number of packages (DON’T PANIC). It is normal to see errors during this step. When everything finishes installing you should find yourself staring at next section of this file Per user setup and can continue from there in scimax.

scimax --find-file ~/git/sparc-curation/docs/setup.org --eval "(add-hook 'window-setup-hook (lambda () (org-goto-section *section-per-user-setup*)))"

Per user setup

You should now have this file open in scimax and can run the code blocks directly by clicking on a block and typing C-c C-c (control c control c). In the default scimax setup code blocks will appear as yellow or green. Note that not all yellow blocks are source code, some may be examples, you can tell because examples won’t execute and the start with #+BEGIN_EXAMPLE instead of #+BEGIN_SRC.

All the following should be run as your user in scimax. If you run these blocks from the command line be sure to run nameref:remote-exports first.

When you run this block emacs will think for about 3 minutes as it retrieves everything. You can know that it is thinking because your mouse will be in thinking mode if you hover over emacs, and because in the minibuffer window at the bottom of the window there will be a message saying something to the effect of Wrote /tmp/babel-nonsense/ob-input-nonsense. If an error window appears when running this block just run it again.

You can also run this block to update an existing installation.

After running this block you can move on to the Configuration files section.

See Developer setup code in the appendix for the source for this block.

Configuration files

The config files for this section should have already been tangled to the correct locations when setup.org was tangled. If you want to see their source it is contained in the Config Templates appendix

If the basic configuration files have been tangled correctly you should be able to run this block with C-c C-c and get results.

scig t brain

At this point installation is complete. Congratulations!

You should log out and log back in to your window manager so that any new terminal you open will have access to all the programs you just installed. Logout on the default ubuntu window manager is located in the upper right.

When you log back in run the following command to start at the next step.

scimax --find-file ~/git/sparc-curation/docs/setup.org --eval "(add-hook 'window-setup-hook (lambda () (org-goto-section *section-accounts-and-api-access*)))"

When you exit emacs it may ask you if you want to save, say yes so that the logs of the install are saved.

The next section will walk you through the steps needed to get access to all the various systems holding different pieces of data that we need.

Accounts and API access

At this point you should open your secrets.yaml file so that you can edit it as you work through the next section where you will get the various API keys that you will need to replace the fake values (seen in the template below). Direct links per platform are listed below. Clicking on the link will open it in another buffer. While editing the file you can save using the file menu, C-x C-s (emacs keys), or :w (vim keys).

Linux	~/.config/orthauth/secrets.yaml
Macos	~/Library/Application Support/orthauth/secrets.yaml
Windows	~/AppData/Local/orthauth/secrets.yaml

*When you are done* there should be *NO* entries with =*replace-me-with:= in the file.

The notation (-> key1 key2 key3) indicates a path in the secrets.yaml file. In a yaml file this looks like the block below. Replace the fake-value with the real value you obtain in the following sections.

key1:
  key2:
    key3: fake-value

Pennsieve

Once you have a Pennsieve account on the sparc org go to your profile and create an API key. Put they key in (-> blackfynn sparc key) and the secret in (-> blackfynn sparc secret). While you are there you should also connect your ORCiD (button at the bottom of the page).

Google API

Enable the google sheets API from the google api dashboard. If you need other APIs you can enable them via the library page.

If you do not do this then at the end of the client flow you will receive a =invalid_clientUnauthorized= error.

The instructions below are probably incomplete/missing steps.

Useful docs for (-> google api creds-file)
https://developers.google.com/identity/protocols/OAuth2 \ https://developers.google.com/api-client-library/python/guide/aaa_oauth \

You will need to get API access for an OAuth client.

https://console.developers.google.com/apis/credentials
create credentials -> OAuth client ID
Fill in the consent screen, you only need the Application name field.
Download JSON
Add the name of the downloaded JSON file to (-> google api creds-file).
Run the following
googapis auth sheets and \ googapis auth sheets --readonly.

Those commands will run the auth workflow and create the file specified at (-> google api store-file) for you. During the process you will be taken to (or need to paste a link to) a google login page to confirm that you want to give the google API project you created access to your account.

Google sheets

Get the document ids for the following.

(-> google sheets sparc-master)
(-> google sheets sparc-consistency)
(-> google sheets sparc-affiliations)
(-> google sheets sparc-field-alignment)

Document id matches this pattern https://docs.google.com/spreadsheets/d/{document_id}/edit.

protocols.io

To get protocols.io API keys create an account, login, and go to your developer page.

You will need to set the redirect uri on that page to match the redirect uri in the json below.

Use the information from that page to fill in a json file with the structure below. Add the full path to that json file to (-> protocols-io api creds-file) in secrets.yaml like you did for the google json file.

{
    "installed": {
        "client_id": "pr_live_id_fake-client-id<<<",
        "client_secret": "pr_live_sc_fake-client-secret<<<",
        "auth_uri": "https://www.protocols.io/api/v3/oauth/authorize",
        "token_uri": "https://www.protocols.io/api/v3/oauth/token",
        "redirect_uris": [
            "https://sparc.olympiangods.org/curation/"
        ]
    }
}

You will be prompted for your protocols.io email and password the first time you run.

Refresh

If you hit a refresh error because something expired, move or delete the pickle file and then run

spc report protocols

and you should be prompted to renew the token you will have to install and manually fix robobrowser at the moment

Hypothes.is

As your user Install the hypothesis client in chrome.

google-chrome-stable https://chrome.google.com/webstore/detail/hypothesis-web-pdf-annota/bjfhmglciegochdpefhhlphglcehbmek

To get Hypothes.is API keys create an account, login, and go to your developer page.

Add your the API key to (-> hypothesis api user-default-hypothesis)

SciGraph

For some use cases you will need access to the SciCrunch production SciGraph endpoint. Register for an account and get an api key. Edit config.yaml and update the scigraph-api-key: path: entry to point to scicrunch api name-of-user-or-name-for-the-key. Edit secrets.yaml and add the api key to (-> scicrunch api name-of-user-or-name-for-the-key).

Developer extras

Python debugger settings

POSIX

If you can use python3.7 (>=ubuntu-19.04) you can set the embedded debugger as follows.

pip install --user pudb

Add the following to ~/.bashrc.

export PYTHONBREAKPOINT=pudb.set_trace

Windows

Sadly pudb doesn’t support windows so we have to use ipdb instead.

pip install --user ipdb

Add the following to your powershell $profile.

$Env:PYTHONBREAKPOINT = "ipdb.set_trace"

Prevent vim from removing xattrs

~/.vimrc settings to prevent klobbering of xattrs

augroup HasXattrs
 autocmd BufRead,BufNewFile * let x=system('getfattr ' . bufname('%')) | if len(x) | call HasXattrs() | endif
augroup END

function HasXattrs()
 " don't create new inodes
 setlocal backupcopy=yes
endfunction

One shot

These bits are os specific setup instructions that need to be run as root. They only need to be run once.

Gentoo

app-editors/emacs
app-editors/gvim
app-text/texlive
dev-vcs/git
dev-scheme/racket
dev-lisp/sbcl
www-client/google-chrome-stable

Ubuntu

18.10 cosmic cuttlefish (and presumably other debian derivatives)

The following need to be run in a shell where you have root (e.g. via sudo su -).

apt install openssh-server net-tools

Add your ssh public key to ~/.ssh/authorized_keys if you want to run this remotely.

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' \
>> /etc/apt/sources.list.d/google-chrome.list
add-apt-repository ppa:plt/racket
add-apt-repository ppa:kelleyk/emacs
add-apt-repository ppa:pypy/ppa
apt update
apt install build-essential lib64readline-dev rxvt-unicode htop attr tree sqlite curl git
apt install emacs26 vim-gtk3 texlive-full pandoc hunspell
apt install librdf0-dev python3-dev python3-pip pypy3 jupyter racket sbcl r-base r-base-dev maven
apt install inkscape gimp krita graphviz firefox google-chrome-stable xfce4
apt install nginx
update-alternatives --install /usr/bin/python python /usr/bin/python3 10
update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 10

Ubuntu struggles to set user specific PATHs correctly via ~/.profile This code works when the user logs in. It does not work correctly if you su to the user. Not entirely sure why. Doesn’t work on xfce either apparently. The absolute madness.

{ cat <<EOL
# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
    PATH="$HOME/bin:$PATH"
fi

# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/.local/bin" ] ; then
    PATH="$HOME/.local/bin:$PATH"
fi
EOL
} > /etc/profile.d/user-home-paths.sh

Other software that you will probably need at some point but that is not packaged on ubuntu.

Fiji/ImageJ

Windows

Config

Environment variables

In order for Emacs to work in a way that is similar to other operating systems the HOME environment variable needs to be set.

setx HOME %USERPROFILE%
setx PATH "C:\Program Files\Git\bin;%PATH%"

Group policy for file system issues

Long Paths

Datasets in SPARC often have long names. Windows 10 default settings break long paths in Python. To fix this enable win32 long paths.

Microsoft Documentation

You can use gpedit.msc to grant these permissions by adding the user by navigating the menu tree below. You can run gpedit.msc directly with Win-r or often Win gpedit enter.

Computer configuration
└── Administrative Templates
    Enable Win32 long paths

Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Policies  -Name LongPathsEnabled -Value 1

Double click to open the setting dialogue, select the enable radio button on the left and then click the ok button.

Any program that is run after this is enabled will work as expected if it supports long paths. No restart or logout is required.

Symlinks

augpathlib makes extensive use of symlinks to store metadata for remote files that have not been downloaded. By default normal users cannot create symlinks on windows. The best way to fix this is by granting the user that will run sparcur permission to create symlinks (NOT to run the process as Administrator).

Three relevant links: stackoverflow superuser powershell script source.

You will need to log out and log back in for the setting to take effect.

You can use gpedit.msc to grant these permissions by adding the user by navigating the menu tree below. You can run gpedit.msc directly with Win-r or often Win gpedit enter.

Computer configuration
└── Windows Settings
    └── Security Settings
        └── Local Policies
            └── User Rights Assignment
                Create symbolic links

Alternately you can define and run the function below as Administrator. Run it as addSymLinkPermissions("user-to-add").

function addSymLinkPermissions($accountToAdd){
    Write-Host "Checking SymLink permissions.."
    $sidstr = $null
    try {
        $ntprincipal = new-object System.Security.Principal.NTAccount "$accountToAdd"
        $sid = $ntprincipal.Translate([System.Security.Principal.SecurityIdentifier])
        $sidstr = $sid.Value.ToString()
    } catch {
        $sidstr = $null
    }
    Write-Host "Account: $($accountToAdd)" -ForegroundColor DarkCyan
    if( [string]::IsNullOrEmpty($sidstr) ) {
        Write-Host "Account not found!" -ForegroundColor Red
        exit -1
    }
    Write-Host "Account SID: $($sidstr)" -ForegroundColor DarkCyan
    $tmp = [System.IO.Path]::GetTempFileName()
    Write-Host "Export current Local Security Policy" -ForegroundColor DarkCyan
    secedit.exe /export /cfg "$($tmp)" 
    $c = Get-Content -Path $tmp 
    $currentSetting = ""
    foreach($s in $c) {
        if( $s -like "SECreateSymbolicLinkPrivilege*") {
            $x = $s.split("=",[System.StringSplitOptions]::RemoveEmptyEntries)
            $currentSetting = $x[1].Trim()
        }
    }
    if( $currentSetting -notlike "*$($sidstr)*" ) {
        Write-Host "Need to add permissions to SymLink" -ForegroundColor Yellow
        
        Write-Host "Modify Setting ""Create SymLink""" -ForegroundColor DarkCyan

        if( [string]::IsNullOrEmpty($currentSetting) ) {
            $currentSetting = "*$($sidstr)"
        } else {
            $currentSetting = "*$($sidstr),$($currentSetting)"
        }
        Write-Host "$currentSetting"
    $outfile = @"
[Unicode]
Unicode=yes
[Version]
signature="`$CHICAGO`$"
Revision=1
[Privilege Rights]
SECreateSymbolicLinkPrivilege = $($currentSetting)
"@
    $tmp2 = [System.IO.Path]::GetTempFileName()
        Write-Host "Import new settings to Local Security Policy" -ForegroundColor DarkCyan
        $outfile | Set-Content -Path $tmp2 -Encoding Unicode -Force
        Push-Location (Split-Path $tmp2)
        try {
            secedit.exe /configure /db "secedit.sdb" /cfg "$($tmp2)" /areas USER_RIGHTS 
        } finally { 
            Pop-Location
        }
    } else {
        Write-Host "NO ACTIONS REQUIRED! Account already in ""Create SymLink""" -ForegroundColor DarkCyan
        Write-Host "Account $accountToAdd already has permissions to SymLink" -ForegroundColor Green
        return $true;
    }
}

ssh

You can skip this if you will only be using the windows computer locally. In a local administrator powershell install OpenSSH. The rest can then be done remotely.

Get-WindowsCapability -Online | ? Name -like 'OpenSSH*'
Add-WindowsCapability -Online -Name OpenSSH.Client~~~~0.0.1.0
Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0
Set-Service sshd -StartupType Automatic
Start-Service sshd
# add your ssh key to %programdata%\ssh\administrators_authorized_keys
# disable password login in %programdata%\ssh\sshd_config
Restart-Service sshd

Set default login shell.

New-ItemProperty -Path "HKLM:\SOFTWARE\OpenSSH" -Name DefaultShell -Value "C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -PropertyType String -Force

Package manager

For managing a windows development/curation environment I highly recommend using the chocolatey package manager. Install chocolatey.

choco install `
autohotkey `
clisp `
emacs `
firefox `
GoogleChrome `
nodejs `
poshgit `
procexp `
python `
racket `
rsync `
vim

Update system Path to include packages that don’t add themselves. This needs to be run as administrator.

$path = [Environment]::GetEnvironmentVariable("Path", [EnvironmentVariableTarget]::Machine)
$prefix_path = "C:\Program Files\Racket;C:\Program Files\Git\cmd;C:\Program Files\Git\bin;"
[Environment]::SetEnvironmentVariable("Path",
                                      $prefix_path + $path,
                                      [EnvironmentVariableTarget]::Machine)

If you are logged in remotely restarting sshd is the easiest way to refresh the environment so commands are in PATH. This is because new shells inherit the environment of sshd at the time that it was started.

Restart-Service sshd

You will need to reconnect to a new ssh session in order to have access to git and other newly installed commands.

Manual install

texlive

https://www.tug.org/texlive/windows.html https://www.tug.org/texlive/acquire-netinstall.html http://mirror.ctan.org/systems/texlive/tlnet/install-tl-windows.exe This takes quite a while, about 50 mins on a good connection with a fast computer.

protege

https://github.com/protegeproject/protege-distribution/releases/latest

redland

rdf tools http://librdf.org/raptor/INSTALL.html https://github.com/dajobe/raptor Unfortunately to get the latest version of these it seems you have to build them yourself.

old

add to PATH so we can just link everything there %HOMEPATH%\bin %APPDATA%\Python\Python37\Scripts

TODO -l %HOMEPATH%/opt/scimax/init.el setup.org in the shortcut … also %HOMEPATH% for the start in …

OS X

ssh

You can skip this if you will only be using the osx computer locally.

sudo systemsetup -setremotelogin on
# scp your key over to ~/.ssh/authorized_keys
# set PasswordAuthentication no in /etc/ssh/sshd_config
# set ChallengeResponseAuthentication no in /etc/ssh/sshd_config
sudo launchctl unload  /System/Library/LaunchDaemons/ssh.plist
sudo launchctl load -w /System/Library/LaunchDaemons/ssh.plist

Package manager

Install homebrew.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/5ecca39372cffdc4c9fbacee6e22328a0dc61eac/install)"
brew install --cask \
docker \
emacs \
firefox \
gimp \
google-chrome \
inkscape \
krita \
mactex \
macvim \
protege \
racket

brew install \
coreutils \
curl \
git \
htop \
hunspell \
libmagic \
pandoc \
postgres \
pyenv \
python \
redland \
rxvt-unicode \
sbcl \
sqlite \
tree

Add the following to your ~/.bash_profile

# This file is sourced by bash for login shells.  The following line
# runs your .bashrc and is recommended by the bash info pages.
[[ -f ~/.bashrc ]] && . ~/.bashrc

Add the following to your ~/.bashrc

export PATH=${HOME}/bin:${HOME}/Library/Python/3.7/bin:${PATH}

Run the following to symlink python3 to python

mkdir ~/bin
ln -s /usr/local/bin/python3 ~/bin/python
ln -s /usr/local/bin/pip3 ~/bin/pip

Workflows

General

Updating an installation

pushd ~/git
for d in $(ls); do if [ -d $d/.git ]; then pushd $d; git pull || break; popd; fi; done
popd

function Git-Pull-All {
    if($pwd.Path -eq $HOME) {
        pushd ~/git }
    foreach($p in Get-ChildItem -directory) {
        if($p.GetDirectories(".git")) {
            pushd $p; git pull; popd } } }

SPARC

WARNINGS

DO NOT USE cp -a copy files with xattrs!
INSTEAD use rsync -X -u -v. \ cp does not remove absent fields from xattrs of the file previously occupying that name! OH NO (is this a cp bug!?)

Get data

If you have never retrieved the data before run.

pushd ~/files/blackfynn_local/
spc clone ${SPARC_ORG_ID} # initialize a new repo and pull existing structure
scp refresh -f
spc fetch  # actually download files
spc find -n '*.xlsx' -n '*.csv' -n '*.tsv' -n '*.msexcel'  # see what to fetch
spc find -n '*.xlsx' -n '*.csv' -n '*.tsv' -n '*.msexcel'-f  # fetch
spc find -n '*.xlsx' -n '*.csv' -n '*.tsv' -n '*.msexcel'-f -r 10  # slow down you are seeing errors!

ls -Q | xargs -P10 -r -n 1 sh -c 'spc refresh -r 4 "${1}"'

find -maxdepth 1 -type d -name '[C-Z]*' -exec spc refresh -r 8 {} \;

find \( -name '*.xlsx' -o -name '*.csv' -o -name '*.tsv' \) -exec ls -hlS {} \+

Open the dataset page for all empty directories in the browser.

find -maxdepth 1 -type d -empty -exec spc pull {} \+
find -maxdepth 1 -type d -empty -exec spc meta -u --browser {} \+

find -maxdepth 1 -type d -empty -exec rmdir {} \;

find -maxdepth 1 -type d -exec getfattr -n user.bf.id \;

Pull local copy of data to a new computer. Note the double escape needed for the space.

rsync -X -u -v -r -e ssh ${REMOTE_HOST}:/home/${DATA_USER}/files/blackfynn_local/SPARC\\\ Consortium ~/files/blackfynn_local/

-X copy extended attributes -u update files -v verbose -r recursive -e remote shell to use

Fetch missing files

fetching a whole dataset or a subset of a dataset spc ** -f

Export

pushd ${SPARCDATA}
spc export
popd

Setup as root

mkdir -p /var/www/sparc/sparc/archive/exports/
chown -R nginx:nginx /var/www/sparc

# export vs exports, no wonder this is so confusing >_<
function sparc-export-to-server () {
    : ${SPARCUR_EXPORTS:=/var/lib/sparc/.local/share/sparcur/export}
    EXPORT_BASE=${SPARCUR_EXPORTS}/N:organization:618e8dd9-f8d2-4dc4-9abb-c6aaab2e78a0/integrated/
    FOLDERNAME=$(readlink ${EXPORT_BASE}/LATEST)
    FULLPATH=${EXPORT_BASE}/${FOLDERNAME}
    pushd /var/www/sparc/sparc
    cp -a "${FULLPATH}" archive/exports/ && chown -R nginx:nginx archive && unlink exports ; ln -sT "archive/exports/${FOLDERNAME}" exports
    popd
    echo Export complete. Check results at:
    echo fill-in-the-url-here
}

Export and report

You can’t run this directly because the venvs create their own subshell.

# git repos are in ~/files/venvs/sparcur-dev/git
# use the development pull code
source ~/files/venvs/sparcur-dev/bin/activate
spc pull
# switch to the production export pipeline
source ~/files/venvs/sparcur-1/bin/activate
spc export

<<&sparc-export-to-server-function>>
sparc-export-to-server

function fetch-and-run-reports () {
    local FN="/tmp/curation-export-$(date -Is).json"
    curl https://cassava.ucsd.edu/sparc/preview/exports/curation-export.json -o "${FN}"
    spc sheets update Organs --export-file "${FN}"
    spc report all --sort-count-desc --to-sheets --export-file "${FN}"
}
fetch-and-run-reports

Export v3

function preview-sparc-export-to-server () {
    : ${SPARCUR_EXPORTS:=/var/lib/sparc/.local/share/sparcur/export}
    EXPORT_BASE=${SPARCUR_EXPORTS}/618e8dd9-f8d2-4dc4-9abb-c6aaab2e78a0/integrated/
    FOLDERNAME=$(readlink ${EXPORT_BASE}/LATEST)
    FULLPATH=${EXPORT_BASE}/${FOLDERNAME}
    pushd /var/www/sparc/sparc/preview
    cp -a "${FULLPATH}" archive/exports/ && chown -R nginx:nginx archive && unlink exports ; ln -sT "archive/exports/${FOLDERNAME}" exports
    popd
    echo Export complete. Check results at:
    echo https://cassava.ucsd.edu/sparc/preview/archive/exports/${FOLDERNAME}
}

The shared information on the file system is evil because there may be multiple processes. The way to mitigate the issue is to run everything locally with a read only local cache for certain files.

function preview-export-rest () {
    local DATE1=${1} # 2021-03-09T17\:26\:54\,980772-08\:00  # from spc export
    local DATE2=${2} # 2021-03-09T164046,487692-0800  # from the path created by sparc-get-all-remote-data
    cp -a /var/lib/sparc/.local/share/sparcur/export/protcur/LATEST/protcur.ttl /var/www/sparc/sparc/preview/archive/exports/${DATE1}/  # this may not update and should be versioned independently
    cp -a /var/lib/sparc/files/${DATE2}/exports/datasets /var/www/sparc/sparc/preview/archive/exports/${DATE1}/path-metadata  # NOTE these will not change unless the files or the code/format change
    chown -R nginx:nginx /var/www/sparc/sparc/preview/archive/exports/${DATE1}/
}

# git repos are in ~/files/venvs/sparcur-dev/git
# use the development pull code
source ~/files/venvs/sparcur-dev/bin/activate
source ~/files/venvs/sparcur-dev/git/sparc-cruation/bin/pipeline-functions.sh
export PYTHONBREAKPOINT=0  # ensure that breakpoints do not hang export
pushd ~/files/
PARENT_PATH=$(sparc-time-friendly)
sparc-get-all-remote-data \
    --symlink-objects-to ~/files/blackfynn_local/SPARC\ Consortium_20200601/.operations/objects/ \
    --parent-path "${PARENT_PATH}"
pushd "${PARENT_PATH}/SPARC Consortium"
spc export
find -maxdepth 1 -type d -not -path '.operations*' -not -path '.' -print0 | \
     xargs -0 -I{} -P8 -r -n 1 python -m sparcur.simple.path_metadata_validate --export-path ../exports/ {}
pushd ~/.local/share/sparcur/export/618*/integrated/LATEST/; python -m sparcur.export.published; popd
echo "${PARENT_PATH}"
unset PARENT_PATH

An example. Get DATE1 from spc export or from the output of preview-sparc-export-to-server. Get DATE2 from the file system path created by the initial call to sparc-get-all-remote-data. Export time is usually later than parent time.

preview-sparc-export-to-server
preview-export-rest ${EXPORT_PATH_TIME} ${PARENT_PATH_TIME}

Export published

Generate curation-export-published.ttl for existing exports.

pushd /var/www/sparc/sparc/preview/archive/exports
find -maxdepth 1 -type d -exec sudo chown $UID:$UID {} \;
find -name curation-export.ttl -execdir python -m sparcur.export.published \;
find -maxdepth 1 -type d -exec sudo chown -R nginx:nginx {} \;
popd

Reporting

turtle diff

spc report changes \
--ttl-file https://cassava.ucsd.edu/sparc/preview/archive/exports/2021-05-25T125039,817048-0700/curation-export.ttl \
--ttl-compare https://cassava.ucsd.edu/sparc/preview/archive/exports/2021-05-24T141309,920776-0700/curation-export.ttl

spc report completeness

spc server --latest --count

keywords = sorted(set([k for d in asdf['datasets'] if 'meta' in d and 'keywords' in d['meta']
                       for k in d['meta']['keywords']]))

Queries

Human datasets queries

import rdflib
from pyontutils.core import OntResIri
from pyontutils.namespaces import sparc, TEMP, dc, rdfs

ori = OntResIri('https://cassava.ucsd.edu/sparc/exports/curation-export.ttl')
g = ori.graph
gns = g.namespace_manager

def fmt(s, u):
    return f'[[{u}][{s.n3(gns)}]]'

species = set([fmt(do, urih) for s, p, o in g
              if isinstance(o, rdflib.Literal) and
              ('human' in o.lower() or 'homo' in o.lower()) and
              p == sparc.animalSubjectIsOfSpecies
              for do in g[s:TEMP.hasDerivedInformationAsParticipant]
              for urih in g[do:TEMP.hasUriHuman]])

hlabel = set([fmt(s, urih) for s, p, o in g
             if isinstance(o, rdflib.Literal) and
             ('human' in o.lower() or 'homo' in o.lower()) and
             p == rdfs.label
             for urih in g[s:TEMP.hasUriHuman]])

htitle = set([fmt(s, urih) for s, p, o in g
              if isinstance(o, rdflib.Literal) and
              ('human' in o.lower() or 'homo' in o.lower()) and
              p == dc.title
              for urih in g[s:TEMP.hasUriHuman]])

htd = set([fmt(s, urih) for s, p, o in g
           if isinstance(o, rdflib.Literal) and
           ('human' in o.lower() or 'homo' in o.lower()) and
           (p == dc.title or p == dc.description)
           for urih in g[s:TEMP.hasUriHuman]])

counts = dict(species=len(human),
              label=len(hlabel),
              title=len(htitle),
              title_and_desc=len(htd))

[print(_ + r' \\') for _ in ['species n= ' + str(counts['species'])] +
sorted(species) +
['label n= ' + str(counts['label'])] +
sorted(hlabel) +
['title n= ' + str(counts['title'])] +
sorted(htitle) +
['td n= ' + str(counts['title_and_desc'])] +
sorted(htd)]

Archiving files with xattrs

tar is the only one of the ‘usual’ suspects for file archiving that supports xattrs, zip cannot.

tar --force-local --xattrs -cvzf 2019-07-17T10\:44\:16\,457344.tar.gz '2019-07-17T10:44:16,457344/'

tar --force-local --xattrs -xvzf 2019-07-17T10\:44\:16\,457344.tar.gz

find 2019-07-17T10\:44\:16\,457344 -exec getfattr -d {} \;

Archiving releases

in place

Manually remove the echo after checking that you are removing what you expect.

pushd /var/www/sparc/sparc/
    pushd archive/exports
        find -maxdepth 1 -not -path '.' -type d -exec tar -cvJf '{}.tar.xz' '{}' \;
        chown nginx:nginx *.tar.xz
        # remove all but the one currently symlinked to exports
        find -maxdepth 1 -not -path '.' -not -path "*$(basename $(readlink ../../exports))*" -type d -exec echo rm -r '{}' \;
    popd

    pushd preview/archive/exports
        find -maxdepth 1 -not -path '.' -type d -newer $(ls -At *.tar.xz | head -n 1) -exec tar -cvJf '{}.tar.xz' '{}' \;
        chown nginx:nginx *.tar.xz
        # remove previous years
        find -maxdepth 1 -not -path '.' -not -path "*$(date +%Y)-*" -type d -exec echo rm -r '{}' \+
        # remove all the but most recent 8 folders
        find -maxdepth 1 -not -path '.' -type d | sort -u | head -n -8 | xargs echo rm -r
    popd

elsewhere

pushd /path/to/backup
rsync -z -v -r -e ssh cassava:/var/www/sparc sparc-$(date -I)

pushd /path/to/backup
pushd sparc-*/sparc/archive/exports
find -maxdepth 1 -not -path '.' -type d -exec tar -cvJf '{}.tar.xz' '{}' \;
find -maxdepth 1 -not -path '.' -type d -exec rm -r '{}' \;
popd
pushd sparc-*/sparc/preview/archive/exports
find -maxdepth 1 -not -path '.' -type d -exec tar -cvJf '{}.tar.xz' '{}' \;
find -maxdepth 1 -not -path '.' -type d -exec rm -r '{}' \;
popd

Other random commands

Duplicate top level and ./.operations/objects

function sparc-copy-pull () {
    : ${SPARC_PARENT:=${HOME}/files/blackfynn_local/}
    local TODAY=$(date +%Y%m%d)
    pushd ${SPARC_PARENT} &&
        mv SPARC\ Consortium "SPARC Consortium_${TODAY}" &&
        rsync -ptgo -A -X -d --no-recursive --exclude=* "SPARC Consortium_${TODAY}/"  SPARC\ Consortium &&
        mkdir SPARC\ Consortium/.operations &&
        mkdir SPARC\ Consortium/.operations/trash &&
        rsync -X -u -v -r "SPARC Consortium_${TODAY}/.operations/objects" SPARC\ Consortium/.operations/ &&
        pushd SPARC\ Consortium &&
        spc pull || echo "spc pull failed"
    popd
    popd
}

Simplified error report

jq -r '[ .datasets[] |
         {id: .id,
          name: .meta.folder_name,
          se: [ .status.submission_errors[].message ] | unique,
          ce: [ .status.curation_errors[].message   ] | unique } ]' curation-export.json

File extensions

List all file extensions

Get a list of all file extensions.

find -type l -o -type f | grep -o '\(\.[a-zA-Z0-9]\+\)\+$' | sort -u

Get ids with files matching a specific extension

Arbitrary information about a dataset with files matching a pattern. The example here gives ids for all datasets that contain xml files. Nesting find -exec does not work so the first pattern here uses shell globing to get the datasets.

function datasets-matching () {
    for d in */; do
        find "$d" \( -type l -o -type f \) -name "*.$1" \
        -exec getfattr -n user.bf.id --only-values "$d" \; -printf '\n' -quit ;
    done
}

Fetch files matching a specific pattern

Fetch files that have zero size (indication that fetch is broken).

find -type f -name '*.xml' -empty -exec spc fetch {} \+

Sort of manifest generation

This is slow, but prototypes functionality useful for the curators.

find -type d -not -name 'ephys' -name 'ses-*' -exec bash -c \
'pushd $1 1>/dev/null; pwd >> ~/manifest-stuff.txt; spc report size --tab-table ./* >> ~/manifest-stuff.txt; popd 1>/dev/null' _ {} \;

Path ids

This one is fairly slow, but is almost certainly i/o limited due to having to read the xattrs. Maintaining the backup database of the mappings would make this much faster.

# folders and files
find . -not -type l -not -path '*operations*' -exec getfattr -n user.bf.id --only-values {} \; -print
# broken symlink format, needs work, hard to parse
find . -type l -not -path '*operations*' -exec readlink -n {} \; -print

Path counts per dataset

for d in */; do printf "$(find "${d}" -print | wc -l) "; printf "$(getfattr --only-values -n user.bf.id "${d}") ${d}\n" ; done | sort -n

Debug units serialization

Until we fix compound units parsing for the round trip we might accidentally encounter and error along the lines of ValueError: Unit expression cannot have a scaling factor.

jq -C '.. | .units? // empty' /tmp/curation-export-*.json | sort -u

protocols cache

pushd ~/.cache/idlib
mv protocol_json protocol_json-old
# run export
find protocol_json -size -2 -exec cat {} \+
# check to make sure that there weren't any manually provided caches
find protocol_json -size -2 -execdir cat ../protocol_json-old/{} \;

SODA

Have to clone SODA and fetch the files for testing.

from pprint import pprint
import pysoda
from sparcur.paths import Path
p = Path(parent_folder, path).expanduser().resolve()
children = list(p.iterdir())
blob = pysoda.create_folder_level_manifest(
    {p.resolve().name: children},
    {k.name + '_description': ['some description'] * len(children)
     for k in [p] + list(p.iterdir())})
manifest_path = Path(blob[p.name][-1])
manifest_path.xopen()
pprint(manifest_path)

Developer

Releases

DatasetTemplate

Commit any changes and push to master.

make-template-zip () {
    local CLEANROOM=/tmp/cleanroom/
    mkdir ${CLEANROOM} || return 1
    pushd ${CLEANROOM}
    git clone https://github.com/SciCrunch/sparc-curation.git &&
    pushd ${CLEANROOM}/sparc-curation/resources
    zip -x '*.gitkeep' -r DatasetTemplate.zip DatasetTemplate
    mv DatasetTemplate.zip ${CLEANROOM}
    popd
    rm -rf ${CLEANROOM}/sparc-curation
    popd
}
make-template-zip

Once that is done open /tmp/cleanroom/DatasetTemplate.zip in file-roller or similar and make sure everything is as expected.

Create the GitHub release. The tag name should have the format dataset-template-1.1 where the version number should match the metadata version embedded in dataset_description.xlsx. Minor versions such as dataset-template-1.2.1 are allowed.

Attach ${CLEANROOM}/DatasetTemplate.zip as a release asset. Update https://github.com/Pennsieve/docs.sparc.science/blob/master/pages/data_submission/submit_data.md https://github.com/Pennsieve/docs.sparc.science/blob/master/pages/sparc_portal/sparc_data_format.md and with the new link. Link to the local copy. Link to the local copy.

Getting to know the codebase

Use inspect.getclasstree along with pyontutils.utils.subclasses to display hierarchies of classes.

from inspect import getclasstree
from pyontutils.utils import subclasses
from IPython.lib.pretty import pprint

# classes to inspect
import pathlib
from sparcur import paths

def class_tree(root):
    return getclasstree(list(subclasses(root)))

pprint(class_tree(pathlib.PurePosixPath))

Viewing logs

View the latest log file with colors using less.

less -R $(ls -d ~sparc/files/blackfynn_local/export/log/* | tail -n 1)

For a permanent fix for less add

alias less='less -R'

Debugging terminal pipeline errors

You have an error!

maybe_size = c.cache.meta.size  # << AttributeError here

Modify to wrap code

try:
    maybe_size = c.cache.meta.size
except AttributeError as e:
    breakpoint()  # << investigate error

Temporary squash by logging as an exception with optional explanation

try:
    maybe_size = c.cache.meta.size
except AttributeError as e:
    log.exception(e)
    log.error(f'explanation for error and local variables {c}')

Dataset removed

If a dataset is removed, just move it manually to trash IF it is clear that it was supposed to be removed, otherwise to consult the curation team. You can confirm that it was actually removed by checking Pennsieve directly using DATASETID from the error trace.

spc meta -u "$(spc goto ${DATASETID})"

Example trace.

Future exception was never retrieved
future: <Future finished exception=Exception("No dataset matching name or ID 'N:dataset:83e0ebd2-dae2-4ca0-ad6e-81eb39cfc053'.",)>
Traceback (most recent call last):
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/var/lib/sparc/git/pyontutils/pyontutils/utils.py", line 416, in <lambda>
    generator = (lambda:list(limited_gen(chunk, smooth_offset=(i % lc)/lc, time_est=time_est, debug=debug, thread=i))  # this was the slowdown culpret
  File "/var/lib/sparc/git/pyontutils/pyontutils/utils.py", line 455, in limited_gen
    yield element()
  File "/var/lib/sparc/git/pyontutils/pyontutils/utils.py", line 376, in inner
    return function(*args, **kwargs)
  File "/var/lib/sparc/git/sparc-curation/sparcur/paths.py", line 1156, in refresh
    size_limit_mb=size_limit_mb)
  File "/var/lib/sparc/git/sparc-curation/sparcur/backends.py", line 816, in refresh
    old_meta = self.meta
  File "/var/lib/sparc/git/sparc-curation/sparcur/backends.py", line 872, in meta
    return PathMeta(size=self.size,
  File "/var/lib/sparc/git/sparc-curation/sparcur/backends.py", line 603, in size
    if isinstance(self.bfobject, File):
  File "/var/lib/sparc/git/sparc-curation/sparcur/backends.py", line 401, in bfobject
    bfobject = self._api.get(self._seed)
  File "/var/lib/sparc/git/sparc-curation/sparcur/blackfynn_api.py", line 795, in get
    thing = self.bf.get_dataset(id)  # heterogenity is fun!
  File "/var/lib/sparc/.local/lib/python3.6/site-packages/blackfynn/client.py", line 231, in get_dataset
    raise Exception("No dataset matching name or ID '{}'.".format(name_or_id))
Exception: No dataset matching name or ID 'N:dataset:83e0ebd2-dae2-4ca0-ad6e-81eb39cfc053'.
sparc@cassava:~/files/blackfynn_local/SPARC Consortium$ spc goto 'N:dataset:83e0ebd2-dae2-4ca0-ad6e-81eb39cfc053'
Hackathon Team Materials
sparc@cassava:~/files/blackfynn_local/SPARC Consortium$ mv Hackathon\ Team\ Materials ../.trash/
sparc@cassava:~/files/blackfynn_local/SPARC Consortium$ spc pull

Variables

If you make any changes to this section be sure to run #+SRC and #+CALL: blocks below.

GitHub repositories

augpathlib idlib hyputils orthauth ontquery parsercomb pyontutils protc rrid-metadata rkdf orgstrap racket-breadcrumb racket-json-view

NIF-Ontology scibot sparc-curation

Ophirr33/pda zussitarze/qrcode

Repository local roots. The ordering of the entries matters.

augpathlib idlib pyontutils/htmlfn pyontutils/ttlser hyputils orthauth ontquery parsercomb pyontutils pyontutils/nifstd pyontutils/neurondm protc/protcur sparc-curation scibot

qrcode/ pda/ protc/protc-lib protc/protc-tools-lib protc/protc protc/protc-tools rkdf/rkdf-lib rkdf/rkdf rrid-metadata/rrid sparc-curation/sparcur_internal/sparcur NIF-Ontology/

Make repos

from itertools import chain
urs = chain((('tgbugs', r) for tr in trl for rs in tr for r in rs.split(' ')),
            (('SciCrunch', r) for sr in srl for rs in sr for r in rs.split(' ')),
            (ur.split('/') for o_r in orl for urs in o_r for ur in urs.split(' ')))
#print(trl, srl, orl)
#print(list(urs))  # will express the generator so there will be no result

out = []
for user, repo in urs:
    out.append(f'https://github.com/{user}/{repo}')
return [' '.join(out)]

Variables testing

for repo in ${REPOS}; do echo ${repo}; done
echo '-------------'
for repo in ${PYROOTS}; do echo ${repo}; done
echo '-------------'
for repo in ${RKTROOTS}; do echo ${repo}; done

Remote exports code

export REPOS='
https://github.com/tgbugs/augpathlib
https://github.com/tgbugs/idlib
https://github.com/tgbugs/hyputils
https://github.com/tgbugs/orthauth
https://github.com/tgbugs/ontquery
https://github.com/tgbugs/parsercomb
https://github.com/tgbugs/pyontutils
https://github.com/tgbugs/protc
https://github.com/tgbugs/rrid-metadata
https://github.com/tgbugs/rkdf
https://github.com/tgbugs/orgstrap
https://github.com/tgbugs/racket-breadcrumb
https://github.com/tgbugs/racket-json-view
https://github.com/SciCrunch/NIF-Ontology
https://github.com/SciCrunch/scibot
https://github.com/SciCrunch/sparc-curation
https://github.com/Ophirr33/pda
https://github.com/zussitarze/qrcode
'
export PYROOTS='
augpathlib
idlib
pyontutils/htmlfn
pyontutils/ttlser
hyputils
orthauth
ontquery
parsercomb
pyontutils
pyontutils/nifstd
pyontutils/neurondm
protc/protcur
sparc-curation
scibot
'
export RKTROOTS='
qrcode/
pda/
protc/protc-lib
protc/protc-tools-lib
protc/protc
protc/protc-tools
rkdf/rkdf-lib
rkdf/rkdf
rrid-metadata/rrid
NIF-Ontology/
'

Appendix

Code

Config Templates

To get up-to-date versions of these run

mkdir /tmp/fakehome
HOME=/tmp/fakehome python -m sparcur.cli
less /tmp/fakehome/.config/*/*.yaml

~/.config/idlib/config.yaml

auth-stores:
  secrets:
    path: '{:user-config-path}/orthauth/secrets.yaml'
auth-variables:
  cache-path:
  log-path:
  protocols-io-api-creds-file: protocols-io api creds-file
  protocols-io-api-store-file: protocols-io api store-file

~/.config/pyontutils/config.yaml

auth-stores:
  secrets:
    path: '{:user-config-path}/orthauth/secrets.yaml'
auth-variables:
  curies:
  git-local-base: ~/git
  git-remote-base:
  google-api-creds-file:
    path: google api creds-file
  google-api-store-file:
    path: google api store-file
  google-api-store-file-readonly:
    path: google api store-file-readonly
  nifstd-checkout-ok:
  ontology-local-repo:
  ontology-org:
  ontology-repo:
  patch-config:
  resources:
  scigraph-api: https://scigraph.olympiangods.org/scigraph
  scigraph-api-key:
  scigraph-graphload:
  scigraph-services:
  zip-location:

~/.config/sparcur/config.yaml

auth-stores:
  secrets:
    path: '{:user-config-path}/orthauth/secrets.yaml'
auth-variables:
  cache-path:
  datasets-no:
  datasets-noexport:
  datasets-sparse:
  datasets-test:
  export-path:
  google-api-service-account-file-readonly: google api saro
  google-api-service-account-file-rw:
  hypothesis-api-key: hypothesis api default-user
  hypothesis-group: hypothesis group sparc-curation
  hypothesis-user:
  log-path:
  never-update:
  preview:
  remote-backoff-factor:
  remote-cli-path:
  remote-organization: N:organization:618e8dd9-f8d2-4dc4-9abb-c6aaab2e78a0
  resources:
  sparse-limit:

Secrets template for full development setup. ~/.config/orthauth/secrets.yaml

pennsieve:
  N:organization:618e8dd9-f8d2-4dc4-9abb-c6aaab2e78a0:
    key: *replace-me-with:your-pennsieve-api-key*
    secret: *replace-me-with:your-pennsieve-api-secret*
google:
  api:
    creds-file: *replace-me-with:/path/to/creds-file.json*
    store-file: google-api-token-rw.pickle
    store-file-readonly: google-api-token.pickle
  sheets:
    sparc-consistency: *replace-me-with:document-hash-id*
    sparc-master: *replace-me-with:document-hash-id*
    sparc-affiliations: *replace-me-with:document-hash-id*
    sparc-field-alignment: *replace-me-with:document-hash-id*
    spc-reports: *replace-me-with:document-hash-id*
    spc-reports-preview: *replace-me-with:document-hash-id*
    anno-tags: *replace-me-with:document-hash-id*
hypothesis:
  api:
    user-default-hypothesis: *replace-me-with:your-hypothesis-api-key*
  group:
    sparc-curation: *replace-me-with:sparc-curation-group-id*
protocols-io:
  api:
    creds-file: *replace-me-with:/path/to/creds-file.json*
    store-file: protocols-io-api-token-rw.pickle

Secrets template for minimal viewer setup. ~/.config/orthauth/secrets.yaml

pennsieve:
  N:organization:618e8dd9-f8d2-4dc4-9abb-c6aaab2e78a0:
    key: *replace-me-with:your-pennsieve-api-key*
    secret: *replace-me-with:your-pennsieve-api-secret*
google:
  api:
    saro: *replace-me-with:path-to-service-account.json*
  sheets:
    sparc-field-alignment: *replace-me-with:document-hash-id*
    sparc-affiliations: *replace-me-with:document-hash-id*
    anno-tags: *replace-me-with:document-hash-id*
protocols-io:
  # FIXME robobrowser not a dependency in some setup.py
  # FIXME robobrowser has a broken werkzeug import that has to be fixed manually
  api:
    creds-file: client_secret_protocols.io.json
    store-file: protocols-io-api-token-rw.pickle

Secrets template for minimal SODA sparcur.simple.retrieve setup. ~/.config/orthauth/secrets.yaml

pennsieve:
  N:organization:618e8dd9-f8d2-4dc4-9abb-c6aaab2e78a0:
    key: *replace-me-with:your-pennsieve-api-key*
    secret: *replace-me-with:your-pennsieve-api-secret*

Bootstrap code

user.el

Tangle the following blocks with C-c C-v C-t in vanilla emacs or paste it into scimax’s

;; silence ob-ipython complaining about missing command
;; THIS CAN CAUSE RUNTIME ERRORS
(setq ob-ipython-html-to-image-program "/dev/null")

(defun config-paths (&optional os)
  (cl-case (or os system-type)
    ;; ucp udp uchp ulp
    (gnu/linux '("~/.config"
                 "~/.local/share"
                 "~/.cache"
                 "~/.cache/log"))
    (darwin '("~/Library/Application Support"
              "~/Library/Application Support"
              "~/Library/Caches"
              "~/Library/Logs"))
    (windows-nt (let ((ucp "~/AppData/Local"))
                  (list ucp ucp ucp (concat ucp "/Logs"))))
    (otherwise (error (format "Unknown OS %s" (or os system-type))))))

(eval-when-compile (defvar *config-paths* (config-paths)))

(defun fcp (position &optional suffix)
  (let ((base-path (funcall position *config-paths*)))
    (if suffix
        (format "%s/%s" base-path suffix)
      base-path)))

(defun user-config-path (&optional suffix) (fcp #'cl-first  suffix))
(defun user-data-path   (&optional suffix) (fcp #'cl-second suffix))
(defun user-cache-path  (&optional suffix) (fcp #'cl-third  suffix))
(defun user-log-path    (&optional suffix) (fcp #'cl-fourth suffix))

;; org goto heading
(defun org-goto-section (heading)
  "\`heading' should be a string matching the desired heading"
  (goto-char (org-find-exact-headline-in-buffer heading)))

;; workaround for powershell cmd windows braindead handling of strings
(defvar *section-per-user-setup* "Per user setup")
(defvar *section-accounts-and-api-access* "Accounts and API access")

;; recenter a line set using --eval to be at the top of the buffer
(add-hook 'emacs-startup-hook (lambda () (recenter-top-bottom 0)))

;; line numbers so it is harder to get lost in a big file
(when (>= emacs-major-version 26)
  (setq display-line-numbers-grow-only 1)
  (global-display-line-numbers-mode 1))

;; open setup.org symlink without prompt
(setq vc-follow-symlinks 1)

;; sane python indenting
(setq-default indent-tabs-mode nil)
(setq tab-width 4)
(setq org-src-preserve-indentation nil)
(setq org-src-tab-acts-natively nil)

;; don't hang on tlmgr since it is broken on ubuntu
(setq scimax-installed-latex-packages t)

;; save command history
(setq history-length t)
(savehist-mode 1)
(setq savehist-additional-variables '(kill-ring search-ring regexp-search-ring))

;; racket
(when (fboundp 'use-package)
  (use-package racket-mode
    :mode "\\.ptc\\'" "\\.rkt\\'" "\\.sxml\\'"
    :bind (:map racket-mode-map
                ("<f5>" . recompile-quietly))
    :init
    (defun my/buffer-local-tab-complete ()
      "Make \`tab-always-indent' a buffer-local variable and set it to 'complete."
      (make-local-variable 'tab-always-indent)
      (setq tab-always-indent 'complete))
    (defun rcc ()
      (set (make-local-variable 'compile-command)
           (format "raco make %s" (file-name-nondirectory buffer-file-name))))
    (add-hook 'racket-mode-hook 'rcc)
    (add-hook 'racket-mode-hook 'hs-minor-mode)
    (add-hook 'racket-mode-hook 'goto-address-mode)
    (add-hook 'racket-mode-hook 'my/buffer-local-tab-complete)
    (add-hook 'racket-repl-mode-hook 'my/buffer-local-tab-complete)))

;; config paths

<<user-config>>

;; vim bindings if you need them
;; if undo-tree fails to install for strange reasons M-x list-packages C-s undo-tree
;; to manually install, mega gnu elpa weirdness
(setq evil-want-keybinding nil)
(when (fboundp 'use-package)
  (require 'scimax-evil))

scimax launch scripts

emacs -q -l ~/opt/scimax/init.el $args

emacs -q -l ~/opt/scimax/init.el $@

Developer setup code

# implicit check for bash by being able to run this block at all

# git check on the off chance that we made it here without cloning this repo
git --version || { echo git is missing; exit 1; }

# python version check
python -c "print('python ok') if __import__('sys').version_info.major >= 3 else __import__('sys').exit(1)" || { echo bad python version; exit 2; }
pip --version || { echo pip is missing; exit 3; }

# git email check
[[ -n "$(git config --list | grep user.email)" ]] || { echo git user.email has not been configured; exit 4; }

pushd ~/git
for repo_url in ${REPOS}; do git clone ${repo_url}.git 2>&1; done
popd

[ -z $VIRTUAL_ENV ] || pip install --user wheel  # if in a venv wheel will be missing
pushd ~/git
for repo in ${PYROOTS}; do pushd ${repo}; pip install --user --editable .[dev,test] 2>&1 || break; popd; done
popd

ln -s ~/git/rkdf/bin/ttl-to-rkt ~/bin/ttl-to-rkt
ln -s ~/git/rkdf/bin/rkdf-convert-all ~/bin/rkdf-convert-all
pushd ~/git/NIF-Ontology
git checkout dev
rkdf-convert-all
git checkout master
popd

pushd ~/git
# XXX note the special cases
raco pkg install --name breadcrumb racket-breadcrumb/
raco pkg install --name json-view racket-json-view/
raco pkg install --skip-installed --auto --batch ${RKTROOTS} 2>&1
popd

Remote exports

Paste the results of this block into your shell if you are running the code from this file by pasting it into a terminal.

*NOTE: DO NOT EDIT THE CODE BELOW IT WILL BE OVERWRITTEN.*

Bootstrap

(defun orgstrap---advise-ob-scor-windows (command &rest args)
  (let ((exec-path (cons "C:/Program Files/Git/bin/" exec-path)))
    (when (string= (downcase (executable-find "bash"))
                   (downcase "C:/WINDOWS/system32/bash.exe"))
      ;; git bash (and I assume mingw bash) is ok so only fail on wsl bash
      ;; this check needs to happen inside the advice to prevent any bash block
      ;; from running with bad semantics since wsl bash could damage the system
      (error "WSL bash detected! Not running bash block since WSL bash is completely broken and will skip whole commands and/or drop their output."))
    (apply command args)))

<<user-config>>

(when (eq system-type 'windows-nt)
  (advice-add #'org-babel--shell-command-on-region :around #'orgstrap---advise-ob-scor-windows))

(org-babel-do-load-languages 'org-babel-load-languages
                             ;; TODO powershell
                             (append org-babel-load-languages
                                     '((shell . t)
                                       (python . t))))

Files

setup.org

Latest commit

History

setup.org

File metadata and controls

Developer and curator setup guide

Introduction

Setup

User

Git name and email

Bootstrapping this setup.org file

Per user setup

Configuration files

Accounts and API access

Pennsieve

Google API

Google sheets

protocols.io

Refresh

Hypothes.is

SciGraph

Developer extras

Python debugger settings

POSIX

Windows

Prevent vim from removing xattrs

One shot

Gentoo

Ubuntu

Windows

Config

Environment variables

Group policy for file system issues

Long Paths

Symlinks

ssh

Package manager

Manual install

texlive

protege

redland

old

OS X

ssh

Package manager

Workflows

General

Updating an installation

SPARC

WARNINGS

Get data

Fetch missing files

Export

Export and report

Export v3

Export published

Reporting

Queries

Human datasets queries

Archiving files with xattrs

Archiving releases

in place

elsewhere

Other random commands

Duplicate top level and ./.operations/objects

Simplified error report

File extensions

List all file extensions

Get ids with files matching a specific extension

Fetch files matching a specific pattern

Sort of manifest generation

Path ids

Path counts per dataset

Debug units serialization

protocols cache

SODA

Developer

Releases

DatasetTemplate

Bootstrapping this `setup.org` file