Merge pull request #79 from zooniverse/offline-config-files

v 1.0.0
zooniverse · Aug 23, 2018 · c986722 · c986722
2 parents b5637be + b9b9740
commit c986722
Show file tree

Hide file tree

Showing 104 changed files with 1,470 additions and 17,304 deletions.
diff --git a/.gitignore b/.gitignore
@@ -63,6 +63,7 @@ instance/
 
 # Sphinx documentation
 docs/_build/
+docs/source/*.md
 
 # PyBuilder
 target/

diff --git a/.travis.yml b/.travis.yml
@@ -2,8 +2,8 @@ language: "python"
 python:
   - "3.6"
 install:
-  - cat requirements.txt | xargs -n 1 -L 1 pip install --no-cache-dir
-  - pip install -U .
+  - pip install cython
+  - pip install -U .[online,test,doc]
 script: nosetests
 notifications:
   email: false
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,46 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at [email protected]. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
+
+[homepage]: http://contributor-covenant.org
+[version]: http://contributor-covenant.org/version/1/4/
diff --git a/Contributing.md b/Contributing.md
@@ -0,0 +1,101 @@
+# Contributing
+
+## Code Style
+Use [PEP8](https://www.python.org/dev/peps/pep-0008/) syntax.
+
+---
+
+## Building Documentation
+Automatic documentation will be created using [sphinx](http://www.sphinx-doc.org/en/stable/) so add doc strings to any files created and functions written.  Documentation can be compiled with the `make_docs.sh` bash script.
+
+---
+
+## Writing Extractors
+Extractors are used to take classifications coming out of Panoptes and extract the relevant data needed to calculate a aggregated answer for one task on a subject.  Ideally this extraction should be as flat as possible (i.e. no deeply nested dictionaries), but sometimes this can not be avoided.
+
+### 1. Make a new function for the extractor
+
+1. Create a new file for the function in the `extractors` folder
+2. Define a new function `*_extractor` that takes in the raw classification json (as it appears in the classification dump `csv` from Panoptes) and returns a `dict`-like object of the extracted data.
+3. Use the `@extractor_wrapper` decorator on the function (can be imported with `from .extractor_wrapper import extractor_wrapper`).
+4. Use the `@subtask_wrapper` and `@tool_wrapper` decorators if the function is for a drawing tool (can be imported with `from .extractor_wrapper import subtask_extractor_wrapper`).
+5. Write tests for the extractor in the `tests/extractor_tests` folder.  The `ExtractorTest` class from the `tests/extractor_tests/base_test_class.py` file should be used to create the test function.  This class ensures that both the "offline" and "online" versions of the code are tested and produce the expected results.  See the other tests in that folder for examples of how to use the `ExtractorTest` class.
+
+#### The `@extractor_wrapper` decorator
+
+This decorator removes the boiler plate code that goes along with making a extractor function that works with both the classification dump `csv` files (offline) and API request from caesar (online).  If A `request` is passed into the function it will pull the data out as json and pass it into the extractor, if anything else is passed in the function will be called directly.  This decorator also does the following:
+ - filter the classifications using the `task` and `tools` keywords passed into the extractor
+ - add the aggregation version number to the final extract
+
+#### The `@subtask_extractor_wrapper` decorator
+This decorator removes the boiler plate code that goes along with extracting subtask data from drawing tasks.  This decorator looks for the `details` keyword passed into the extractor function and will apply the specified extractor the the proper subtask data and return the extracts as a list in the same order the subtask presented them.
+
+Note: It is assumed that the first level of the extracted dictionary refers to the subject's frame index (e.g. `frame0` or `frame1`) even when the subject only has one frame.
+
+#### The `@tool_wrapper` decorator
+This decorator removes the boiler plate code for filtering classifications based on the `tools` keyword.  This makes it so each tool for a drawing task can have extractors set up independently.
+
+### 2. Create the route to the extractor
+The routes are automatically constructed using the `extractors` dictionary in the `__init__.py` file:
+
+1. import the new extractor into the `__init__.py` file with the following format `from .*_extractor import *_extractor`
+2. Add the `*_extractor` function to the `extractors` dictionary with a sensible route name as the `key` (typically the `key` should be the same as the extractor name)
+
+### 3. Allow the offline version of the code automatically detect this extractor type from a workflow object
+
+1. Update the `workflow_config.py` function with the new task type.  The value used for the type should be the same `key` used in the `__init__.py` file
+2. Update the `tests/utility_tests/test_workflow_config.py` test with this new task type
+
+### 4. Add to documentation
+The code is auto-documented using [sphinx](http://www.sphinx-doc.org/en/stable/index.html).
+
+1. Add a doc string to every function written and a "heading" doc string at the top of any new files created (follow the [numpy doc string convention](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt))
+2. Add a reference to the new file to `docs/source/extractors.rst`
+3. Add to the extractor/reducer lookup table `docs/source/Task_lookup_table.rst`
+4. Build the docs with the `make_docs.sh` bash script
+
+### 5. Make sure everything still works
+1. run `nosetests` and ensure all tests still pass
+
+---
+
+## Writing Reducers
+Reducers are functions that take a list of extracts and combines them into aggregated values.  Ideally this reduction should be as flat as possible (i.e. no deeply nested dictionaries), but sometimes this can not be avoided.
+
+### 1. Make new functions for the reducer
+Typically two function need to be defined for a reducer.
+
+1. `process_data` is a helper function that takes a list of raw extracted data objects and pre-processes them into a form the main reducer function can use (e.g. arranging the data into arrays, creating `Counter` objects, etc...)
+2. The `*_reducer` function that takes in the output of the `process_data` function and returns the reduced data as a `dict`-like object.
+3. The `*_reducer` function should use the `@reducer_wrapper` decorator with the `process_data` function passed as the `process_data` keyword.
+4. If the reducer exposes keywords the user can specify a `DEFAULTS` dictionary must be specified of the form: `DEFAULTS = {'<keyword name>': {'default': <default value>, 'type': <data type>}}`
+5. If these keywords are passed into the `process_data` function they `DEFAULTS` dictionary should be passed into the `@reducer_wrapper` as the `defaults_process` keyword.  If these keywords are passed into the main `*_reducer` function the `DEFAULTS` dictionary should be passed into the `@reducer_wrapper` as the `defaults_data` keyword.  Note: any combination of these two can be used.
+6. Write tests for all the above functions and place them in the `test/reducer_test/` folder.  The decorator exposes the original function on the `._original` method of the decorated function, this allows for it to be tested directly.  The `ReducerTest` class from the `tests/reducer_tests/base_test_class.py` file should be used to create the test function.  This class ensures that both the "offline" and "online" versions of the code are tested and produce the expected results.  See the other tests in that folder for examples of how to use the `ReducerTest` class.
+
+#### The `@reducer_wrapper` decorator
+
+This decorator removes the boiler plate needed to set up a reducer function to work with extractions from either a `csv` file (offline) or an API request from caesar.  It will also run an optional `process_data` data function and pass the results into the wrapped function.  Various user defined keywords are also passed into either the `process_data` function or the wrapped function.  All keywords are parsed and type-checked before being used, that way no invalid keywords will be passed into either function.  This wrapper will also do the following:
+ - Remove the `aggregation_version` keyword from each extract so it is not passed into the reducer function
+ - Add the `aggregation_version` keyword to the final reduction dictionary
+
+#### The `@subtask_reducer_wrapper` decorator
+This decorator removes the boiler plate code that goes along with reducing subtask data from drawing tasks.  This decorator looks for the `details` keyword passed into the reducer function and will apply the specified reducer the the proper subtask data within each *cluster* found on the subject and returns the reductions as a list in the same order the subtask presented them.
+
+Note: It is assumed that the first level of the reduced dictionary refers to the subject's frame index (e.g. `frame0` or `frame1`) even when the subject only has one frame.
+
+### 2. Create the route to the reducer
+The routes are automatically constructed using the `reducers` dictionary in the `__init__.py` file:
+
+1. import the new reducer into the `__init__.py` file with the following format `from .*_reducer import *_reducer`
+2. Add the `*_reducer` function to the `reducer` dictionary with a sensible route name as the `key` (typically the `key` should be the same as the reducer name)
+
+### 3. Add to documentation
+The code is auto-documented using [sphinx](http://www.sphinx-doc.org/en/stable/index.html).
+
+1. Add a doc string to every function written and a "heading" doc string at the top of any new files created (follow the [numpy doc string convention](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt))
+2. Add a reference to the new file to `docs/source/reducers.rst`
+3. Add to the extractor/reducer lookup table `docs/source/Task_lookup_table.rst`
+4. Build the docs with the `make_docs.sh` bash script
+
+### 4. Make sure everything still works
+1. run `nosetests` and ensure all tests still pass
diff --git a/Dockerfile b/Dockerfile
@@ -4,12 +4,18 @@ ENV LANG=C.UTF-8
 
 WORKDIR /usr/src/aggregation
 
-# install requirements
-COPY requirements.txt ./
 RUN pip install --upgrade pip
-RUN cat requirements.txt | xargs -n 1 -L 1 pip install --no-cache-dir
 
-COPY . ./
+# this line is still needed until hdbscan pushes to pip next
+RUN pip install cython numpy
+
+# install dependencies
+COPY setup.py .
+RUN pip install .[online,test,doc]
+
+# install package
+COPY . .
+RUN pip install -U .[online,test,doc]
 
 # make documentation
 RUN /bin/bash -lc ./make_docs.sh

diff --git a/Dockerfile.bin_cmds b/Dockerfile.bin_cmds
@@ -4,15 +4,15 @@ ENV LANG=C.UTF-8
 
 WORKDIR /usr/src/aggregation
 
-# install requirements
-COPY requirements.txt ./
 RUN pip install --upgrade pip
-RUN cat requirements.txt | xargs -n 1 -L 1 pip install --no-cache-dir
 
-COPY . ./
-RUN pip install .
+# this line is still needed until hdbscan pushes to pip next
+RUN pip install cython numpy
 
-# make documentation
-RUN /bin/bash -lc ./make_docs.sh
+# install dependencies
+COPY setup.py .
+RUN pip install .[test]
 
-CMD python routes.py
+# install package
+COPY . .
+RUN pip install -U .[test]