Skip to content

Commit

Permalink
Conda pre link (#8)
Browse files Browse the repository at this point in the history
* try pre-link again

* and make noarch

* fix reduce path maybe

* update tests

* allow build without git tag

* h5py patch

* h5py patch

* i am going to scream

* ok I don't think that one was my fault

* explicitly run pre-link script

* make it a post-link not a pre-link

* run pre-link with prefix + correct install location

* touch log file

* one more touch

* try new way to install DSSP

* Update test_clean_pdb.py

* Update run-tests.yml

* Update run-tests.yml

* update readme

* Update README.md
  • Loading branch information
william-galvin authored Jun 12, 2024
1 parent 4ba1875 commit a259e7d
Show file tree
Hide file tree
Showing 12 changed files with 150 additions and 117 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_and_upload_conda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
strategy:
matrix:
python-version: ["3.9"]
os: [ubuntu-latest, macos-13, macos-14]
os: [ubuntu-latest]
steps:
- uses: actions/checkout@v3
with:
Expand Down
36 changes: 0 additions & 36 deletions .github/workflows/python-package-conda.yml

This file was deleted.

53 changes: 53 additions & 0 deletions .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: Build from source and run tests

on: [push]

jobs:
build-linux:
runs-on: ubuntu-latest
strategy:
max-parallel: 5

steps:
- uses: actions/checkout@v3
- name: Set up Python 3.9
uses: actions/setup-python@v3
with:
python-version: '3.9'

- name: Add conda to system path
run: |
# $CONDA is an environment variable pointing to the root of the miniconda directory
echo $CONDA/bin >> $GITHUB_PATH
- name: Set up Homebrew for DSSP
id: set-up-homebrew
uses: Homebrew/actions/setup-homebrew@master
- name: Install DSSP
run: |
brew install brewsci/bio/dssp
- name: Install Zernikegrams
run: |
conda install -y python=3.9
conda update conda
conda install -y conda-build
PREFIX=$HOME/local bash devtools/conda-build/pre-link.sh
conda build devtools/conda-build --no-test --output-folder ./build -c conda-forge
conda install zernikegrams -c ./build -c conda-forge
- name: reinstall h5py
run: |
echo "uninstalling and reinstalling h5py to avoid GH actions issues"
pip uninstall h5py -y
pip install h5py
- name: Install modern gcc
run: |
echo "Installing new GNU runtime so that openmm can find GLIBCXX_3.4.30"
conda install libstdcxx-ng -c conda-forge
- name: Test with pytest
run: |
conda install pytest
pytest
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
*.hdf5
*.slurm

build/
reduce/

# Byte-compiled / optimized / DLL files
Expand Down
75 changes: 58 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,19 @@
# Zernikegrams

## Installation
This package currently depends on packages which can only be installed with `conda` or built from source, so a conda environment is recommended.
1. Clone this repo and `cd zernikegrams/`
2.
- To create a new conda environment with necessary dependencies and this package installed:
```
conda env create -f env.yml
conda activate zernikegrams
conda install zernikegrams -c statphysbio -c conda-forge
```
- To install into an existing conda environment:
```
conda env update --name <existing env name> -f env.yml
```
- To install without conda (coming soon: pip install support):
- see env.yml for a list of required packages
- From the root directory, run `pip install .`

### Requirements
Zernikegrams is distributed through the anaconda package manager, which provides most dependencies in most cases. Notable exceptions include:
- `foldcomp`, which is optional and only necessary if using `--foldcomp` with `structural-info`. If you are, you probably already have it installed, but you can install it with `pip install foldcomp` if not.
- `DSSP`, which is optional and only necessary if using `--DSSP` with `structural-info`. On Mac, this can be installed with `brew install brewsci/bio/dssp`; on Linux, use `conda install dssp -c salilab`.
- A modern GCC compiler. Most machines have one already, but if you see cryptic messages referencing "GLIBCXX_3.4.30 not found", try `conda install libstdcxx-ng -conda-forge` (if on Linux).
- `argparse`, which comes with almost all Python distributions, but (apparently) not all and is (apparently) not installable with conda. Try `pip install argparse`.

### Supported Platforms
This package should work on any permutation of {mac, linux} X {x86-64, arm64}. Though the package itself is OS-agnostic, its dependencies (namely OpenMM) are not.

## Usage
### CLI
Expand All @@ -41,15 +39,58 @@ from zernikegrams.holograms.get_holograms import get_single_zernikegram
```

## Development
### Building locally
1. Clone this repo
2. If necessary, install conda-build with `conda install conda-build`
3. Install the dependencies for this repo with
```
conda build devtools/conda-build --no-test --output-folder ./build -c conda-forge
conda install zernikegrams -c ./build -c conda-forge --only-deps
```
This installs the dependencies, but not zernikegrams itself. If the dependencies haven't changed since
the latest release, you can also use `conda install zernikegrams -c statphysbio -c conda-forge --only-deps`

4. Build Reduce, the program for adding hydrogens. The easiest way to do this is with
```
PREFIX=$HOME/local bash devtools/conda-build/pre-link.sh
```
Which installs the reduce executable in $HOME/local/bin, where structural_info_core expects it. $PREFIX can be something
other than $HOME/local, as long as it's not the current working directory. This will not change where reduce is installed, just where some
temporary files live.

5. Install zernikegrams
```
pip install -e . -vv
```
`-e` is for editable mode (changes take effect without reinstalling) and `-vv` is very verbose. Either can be changed.

6. Optionally, install DSSP. This is only necessary for calculating secondary structure, but is needed for some tests. Use `conda install dssp -c salilab` or `brew install brewsci/bio/dssp` depending on your OS.

7. Run `pytest` from the root directory--if everything passes, you're good to go!

### Testing
Tests should be run with `pytest` from the root directory.
Tests should be run with `pytest` from the root directory. It is polite to include new tests with new code (if it can be reasonably tested) and to ensure that new code doesn't break old tests. Bug fixes should include at least one test that fails without the fix and passes with it.

### Releases
If you would like your changes to be reflected in the latest version of the package that's pulled from conda, create a new release. In GitHub, find the "Releases" section (on the right) and follow the instructions. It's polite to version your release as [MAJOR.MINOR.PATCH](https://semver.org/) and provide a detailed description of updates. When you create a new version, a GitHub Actions script will run and (if successful) automatically update the Anaconda repository.

### GitHub Actions
On every push to every branch, `.github/workflows/run-tests.yml` builds the repo from scratch using conda, configures the environment, installs DSSP, and runs `pytest`. If it passes, a green check mark shows up. Currently, we only test using python 3.9 on a Linux x86-64 machine. In the future, we might want to test on more Python versions and macos-13 (x86) and macos-latest (arm)--probably only on pushes to main--using `matrix`.

On every release, `.github/workflows/build_and_upload_conda.yaml` runs, which updates the Anaconda repository. It was configured using [this tutorial](https://github.com/marketplace/actions/build-and-upload-conda-packages). In `devtools/` there is the `meta.yaml` file that conda uses to build the package.

### Building Dependencies from Source
Most Anaconda packages distribute compiled binaries, not source code. Due to limitations of GH Actions runners (e.g., no Linux arm support), it's problematic to rely on GH Actions to compile and distribute code that we need to build from source.

Currently, the only example of this is Reduce. Our approach is to put the build-from-source code in `devtools/conda-build/pre-link.sh`, which conda automatically runs on every local machine when zernikegrams is installed.

Another great candidate for build-from-source would be DSSP (currently, users are responsible for getting it themselves), probably from the [PDB-REDO implementation](https://github.com/PDB-REDO/dssp). Note: I (William) have never been able to build this package from source--you'll know you're successful when you have a DSSP executable and can run `<executable> path-to-pdb-file` and the output looks reasonable. If you can do that, then in `structural_info_core`, find the call to `dssp_dict_from_pdb_file` and pass `DSSP=<executable-path>` as a key-word argument.

In general: all of the binaries we compile should be installed in the same, **non-root**, location. Currently, `$HOME/local` seems reasonable. If the code we're compiling uses `cmake`, then `-DCMAKE_INSTALL_PREFIX=$HOME/local` should do that.

### Roadmap
- Support MMFT files as well as .pdb files. Currently there is support for `foldcomp`, but not MMFT.
- Support for other radial basis functions (e.g., bessel)
- Dataloaders
- From zernikegram .hdf5 files
- On the fly: pdb files --> zernikegrams
- Make pip installable
- Need to build conda-only dependencies from source
- Publish on pypi
3 changes: 2 additions & 1 deletion devtools/conda-build/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

package:
name: {{ name|lower }}
version: "{{ environ['GIT_DESCRIBE_TAG'] }}"
version: "{{ environ.get('GIT_DESCRIBE_TAG', 'versionNotFound') }}"

source:
path: ../../
Expand All @@ -13,6 +13,7 @@ build:
- neighborhoods = zernikegrams.neighborhoods.get_neighborhoods:main
- zernikegrams = zernikegrams.holograms.get_holograms:main
- noise-neighborhoods = zernikegrams.add_noise.get_noised_nh:main
noarch: python
script: {{ PYTHON }} -m pip install . -vv --no-deps
number: 0

Expand Down
23 changes: 23 additions & 0 deletions devtools/conda-build/pre-link.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash

set -e

mkdir -p $PREFIX
touch $PREFIX/.messages.txt

echo "PREFIX is $PREFIX" >> $PREFIX/.messages.txt

# install reduce -- only available from source
echo "installing Reduce" >> $PREFIX/.messages.txt
git clone https://github.com/rlabduke/reduce.git
mv reduce $PREFIX

mkdir -p $PREFIX/build/reduce
cd $PREFIX/build/reduce
cmake $PREFIX/reduce -DCMAKE_INSTALL_PREFIX=$HOME/local >> $PREFIX/.messages.txt
make >> $PREFIX/.messages.txt
make install >> $PREFIX/.messages.txt

rm -rf $PREFIX/reduce
rm -rf $PREFIX/build/reduce
rm $PREFIX/reduce_wwPDB_het_dict.txt
31 changes: 0 additions & 31 deletions env.yml

This file was deleted.

25 changes: 0 additions & 25 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,4 @@
import setuptools
from setuptools.command.install import install
import os
import subprocess


class CustomInstall(install):
def run(self):
install.run(self)

dir_path = os.path.dirname(os.path.realpath(__file__))
reduce_path = os.path.join(dir_path, "zernikegrams/structural_info/reduce")
os.makedirs(reduce_path, exist_ok=True)
os.chdir(reduce_path)
subprocess.run(["unzip", "reduce.zip"])
os.chdir(os.path.join(reduce_path, "reduce"))

subprocess.run(["make", "clean"])
subprocess.run(["make"])

# reduce's tests break our tests
with open(os.path.join(reduce_path, "reduce/test/test_reduce.py"), "w") as w:
w.write("\n")

os.chdir(dir_path)

setuptools.setup(
name="zernikegrams",
Expand All @@ -41,5 +17,4 @@ def run(self):
]
},
include_package_data=True,
cmdclass={"install": CustomInstall},
)
11 changes: 6 additions & 5 deletions tests/struct_info/test_clean_pdb.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from Bio.PDB import PDBParser
from zernikegrams.structural_info.RaSP import clean_pdb
from zernikegrams.structural_info.structural_info_core import REDUCER
import tempfile

def count_element_atoms(pdb_file, element="H"):
Expand All @@ -21,7 +22,7 @@ def test_no_hydrogens_respected():
clean_pdb(
"tests/data/pdbs/1MBO.pdb",
tmp.name,
"zernikegrams/structural_info/reduce/reduce/reduce_src/reduce",
REDUCER,
hydrogens=False,
extra_molecules=False
)
Expand All @@ -34,7 +35,7 @@ def test_yes_hydorgens_adds_H():
clean_pdb(
"tests/data/pdbs/1MBO.pdb",
tmp.name,
"zernikegrams/structural_info/reduce/reduce/reduce_src/reduce",
REDUCER,
hydrogens=True,
extra_molecules=False
)
Expand All @@ -47,7 +48,7 @@ def test_no_extra_molecules_removes_FE():
clean_pdb(
"tests/data/pdbs/1MBO.pdb",
tmp.name,
"zernikegrams/structural_info/reduce/reduce/reduce_src/reduce",
REDUCER,
hydrogens=False,
extra_molecules=False
)
Expand All @@ -60,10 +61,10 @@ def test_yes_extra_molecules_keeps_FE():
clean_pdb(
"tests/data/pdbs/1MBO.pdb",
tmp.name,
"zernikegrams/structural_info/reduce/reduce/reduce_src/reduce",
REDUCER,
hydrogens=False,
extra_molecules=True
)

assert count_element_atoms(tmp.name, "FE") == 1


Binary file removed zernikegrams/structural_info/reduce/reduce.zip
Binary file not shown.
7 changes: 6 additions & 1 deletion zernikegrams/structural_info/structural_info_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,12 @@

logger = logging.getLogger(__name__)

REDUCER = os.path.join(os.path.dirname(__file__), "reduce/reduce/reduce_src/reduce")
REDUCER = os.path.join(
os.path.expanduser("~"),
"local",
"bin",
"reduce"
)

##################### Copied from https://github.com/nekitmm/DLPacker/blob/main/utils.py
# read in the charges from special file
Expand Down

0 comments on commit a259e7d

Please sign in to comment.