Skip to content

Commit

Permalink
new: standardisation (PR #83)
Browse files Browse the repository at this point in the history
  • Loading branch information
alycejenni authored Nov 28, 2022
2 parents 1f92f68 + 7fe8ba1 commit 1bd86e2
Show file tree
Hide file tree
Showing 111 changed files with 3,417 additions and 2,774 deletions.
9 changes: 7 additions & 2 deletions .github/SUPPORT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@

## Documentation
- [Our API documentation](https://naturalhistorymuseum.github.io/dataportal-docs)
- [CKAN documentation](http://docs.ckan.org/en/latest)
- [Official CKAN documentation](http://docs.ckan.org/en/latest)

## Issues
- [The NHM on Github](https://github.com/NaturalHistoryMuseum)
- [General issue tracker](https://github.com/NaturalHistoryMuseum/data-portal-issues/issues)
- [Our CKAN extensions](https://github.com/search?q=topic%3Ackan+org%3ANaturalHistoryMuseum&type=repositories) (for more specific issues)

## Contact Us
- [Gitter](https://gitter.im/nhm-data-portal/lobby)
- [Email _[email protected]_](mailto:[email protected])
- [Twitter](https://twitter.com/nhm_data)
- [Twitter](https://twitter.com/nhm_data)
2 changes: 1 addition & 1 deletion .github/nhm-logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions .github/workflows/bump.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Bump version

on:
push:
branches:
- main

jobs:
bump-version:
if: "!startsWith(github.event.head_commit.message, 'bump:')"
runs-on: ubuntu-latest
name: "Bump version and create changelog"
steps:
- name: Check out
uses: actions/checkout@v3
with:
token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
fetch-depth: 0
- name: Create bump and changelog
uses: commitizen-tools/commitizen-action@master
with:
github_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
changelog_increment_filename: CURRENT.md
extra_requirements: "cz-nhm"
- name: Release
uses: softprops/action-gh-release@v1
with:
body_path: "CURRENT.md"
tag_name: v${{ env.REVISION }}
env:
GITHUB_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
11 changes: 4 additions & 7 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,16 @@ on:

jobs:
test:
runs-on: ubuntu-18.04
runs-on: ubuntu-latest

steps:
- name: Checkout source code
uses: actions/checkout@v2
uses: actions/checkout@v3

- name: Build images
run: docker-compose build

- name: Run tests
run: docker-compose run ckan

- name: Run coveralls
env:
COVERALLS_REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: docker-compose run -e COVERALLS_REPO_TOKEN ckan coveralls --service=github
COVERALLS_REPO_TOKEN: ${{ secrets.COVERALLS_REPO_TOKEN }}
run: docker-compose run -e COVERALLS_REPO_TOKEN ckan bash /opt/scripts/run-tests.sh -c ckanext.versioned_datastore
35 changes: 35 additions & 0 deletions .github/workflows/pypi-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Upload Python Package

on:
push:
tags:
- "*"

permissions:
contents: read

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Check out
uses: actions/checkout@v3
with:
token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build package
run: python -m build
- name: Publish package
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
skip_existing: true
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@
.pytest_cache
.coverage
build/

dist/
.idea


node_modules/
**/node_modules/
27 changes: 27 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-merge-conflict
- id: detect-private-key
- id: end-of-file-fixer
- id: name-tests-test
args: ["--pytest-test-first"]
exclude: ^tests/helpers/
- id: trailing-whitespace
- repo: https://github.com/commitizen-tools/commitizen
rev: v2.37.0
hooks:
- id: commitizen
additional_dependencies: ["cz-nhm"]
- repo: https://github.com/psf/black
rev: 22.10.0
hooks:
- id: black
- repo: https://github.com/PyCQA/docformatter
rev: v1.5.0
hooks:
- id: docformatter
# these can't be pulled directly from the config atm, not sure why
args: ["-i", "--wrap-summaries=88", "--wrap-descriptions=88",
"--pre-summary-newline", "--make-summary-multi-line"]
8 changes: 8 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version: 2

python:
install:
- requirements: docs/requirements.txt

mkdocs:
configuration: mkdocs.yml
10 changes: 0 additions & 10 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1 @@
# Changelog

(This file may not be historically complete, as it is a recent addition to the project).


## [1.0.0-alpha] - 2019-07-23

- Updated to work with CKAN 2.9.0a, e.g.:
- uses toolkit wherever possible
- references to Pylons removed
- Standardised README, CHANGELOG, setup.py and .github files to match other Museum extensions
14 changes: 7 additions & 7 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007

Copyright (C) 2016 Natural History Museum <http://nhm.ac.uk/>
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

Expand Down Expand Up @@ -631,8 +631,8 @@ to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

{one line to give the program's name and a brief idea of what it does.}
Copyright (C) {year} {name of author}
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
Expand All @@ -645,14 +645,14 @@ the "copyright" line and a pointer to where the full notice is found.
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
along with this program. If not, see <https://www.gnu.org/licenses/>.

Also add information on how to contact you by electronic and paper mail.

If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:

{project} Copyright (C) {year} {fullname}
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
Expand All @@ -664,11 +664,11 @@ might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<http://www.gnu.org/licenses/>.
<https://www.gnu.org/licenses/>.

The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<http://www.gnu.org/philosophy/why-not-lgpl.html>.
<https://www.gnu.org/licenses/why-not-lgpl.html>.
1 change: 0 additions & 1 deletion MANIFEST.in

This file was deleted.

55 changes: 35 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
<!--header-start-->
<img src=".github/nhm-logo.svg" align="left" width="150px" height="100px" hspace="40"/>

# ckanext-versioned-datastore

[![Tests](https://img.shields.io/github/workflow/status/NaturalHistoryMuseum/ckanext-versioned-datastore/Tests?style=flat-square)](https://github.com/NaturalHistoryMuseum/ckanext-versioned-datastore/actions/workflows/main.yml)
[![Coveralls](https://img.shields.io/coveralls/github/NaturalHistoryMuseum/ckanext-versioned-datastore/master.svg?style=flat-square)](https://coveralls.io/github/NaturalHistoryMuseum/ckanext-versioned-datastore)
[![Coveralls](https://img.shields.io/coveralls/github/NaturalHistoryMuseum/ckanext-versioned-datastore/main?style=flat-square)](https://coveralls.io/github/NaturalHistoryMuseum/ckanext-versioned-datastore)
[![CKAN](https://img.shields.io/badge/ckan-2.9.1-orange.svg?style=flat-square)](https://github.com/ckan/ckan)
[![Python](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue.svg?style=flat-square)](https://www.python.org/)
[![Docs](https://img.shields.io/readthedocs/ckanext-versioned-datastore?style=flat-square)](https://ckanext-versioned-datastore.readthedocs.io)

_A CKAN extension providing a versioned datastore using MongoDB and Elasticsearch._

<!--header-end-->

# Overview

<!--overview-start-->
This plugin provides a complete replacement for ckan's datastore plugin and therefore shouldn't be used in conjunction with it.
Rather than storing data in PostgreSQL, resource data is stored in MongoDB and then made available to frontend APIs using Elasticsearch.

Expand All @@ -22,11 +27,13 @@ This allows this plugin to:
- store large resources (millions of rows) and still provide high speed search responses
- store complex data as both MongoDB and Elasticsearch are JSON based, allowing object nesting and arrays

This plugin is built on [Eevee](https://github.com/NaturalHistoryMuseum/eevee).
This plugin is built on [Splitgill](https://github.com/NaturalHistoryMuseum/splitgill).

<!--overview-end-->

# Installation

<!--installation-start-->
Path variables used below:
- `$INSTALL_FOLDER` (i.e. where CKAN is installed), e.g. `/usr/lib/ckan/default`
- `$CONFIG_FILE`, e.g. `/etc/ckan/default/development.ini`
Expand Down Expand Up @@ -64,8 +71,23 @@ Path variables used below:
ckan.plugins = ... versioned_datastore
```

## Further Setup

At the version of Splitgill this plugin uses, you will also need to:

- install MongoDB 4.x
- install Elasticsearch 6.7.x (6.x is probably ok, but untested)

See the [Splitgill](https://github.com/NaturalHistoryMuseum/splitgill) repository for more details.

This plugin also requires CKAN's job queue, which is included in recent versions of CKAN or can be added to old versions using the ckanext-rq plugin.
<!--installation-end-->
# Configuration
<!--configuration-start-->
There are a number of options that can be specified in your .ini config file.
All configuration options are currently required.
Expand Down Expand Up @@ -94,21 +116,11 @@ Name|Description|Example
`ckanext.versioned_datastore.dwc_org_email`|The contact email to use in DwC-A metadata. Default: the value of `smtp.mail_from`|`[email protected]`
`ckanext.versioned_datastore.dwc_default_license`|The license to use in DwC-A metadata if the resources have differing licenses or no license is specified. Default: `null`|`http://creativecommons.org/publicdomain/zero/1.0/legalcode`

# Further Setup

At the version of Eevee this plugin uses, you will also need to:

- install MongoDB 4.x
- install Elasticsearch 6.7.x (6.x is probably ok, but untested)

See the [Eevee](https://github.com/NaturalHistoryMuseum/eevee) repository for more details.

This plugin also requires CKAN's job queue, which is included in recent versions of CKAN or can be added to old versions using the ckanext-rq plugin.
<!--configuration-end-->
# Usage
<!--usage-start-->
A brief tour!
The plugin automatically detects resources on upload that can be added to the datastore.
Expand All @@ -125,8 +137,8 @@ Note that only the first sheet in multisheet XLS and XLSX files will be processe
Adding data to the datastore is accomplished in two steps:
1. Ingesting the records into MongoDB. A document is used per unique record ID to store all versions and the documents for a specific resource are stored in a collection named after the resource's ID. For more information on the structure of these documents see the [Eevee](https://github.com/NaturalHistoryMuseum/eevee) repository for more details.
2. Indexing the documents from MongoDB into Elasticsearch. One indexed is used for all versions of the records and a document in Elasticsearch is created per version of each record. The index is named after the resource's ID with the configured prefix prepended. For more information on the structure of these indexed documents see the [Eevee](https://github.com/NaturalHistoryMuseum/eevee) repository for more details.
1. Ingesting the records into MongoDB. A document is used per unique record ID to store all versions and the documents for a specific resource are stored in a collection named after the resource's ID. For more information on the structure of these documents see the [Splitgill](https://github.com/NaturalHistoryMuseum/splitgill) repository for more details.
2. Indexing the documents from MongoDB into Elasticsearch. One indexed is used for all versions of the records and a document in Elasticsearch is created per version of each record. The index is named after the resource's ID with the configured prefix prepended. For more information on the structure of these indexed documents see the [Splitgill](https://github.com/NaturalHistoryMuseum/splitgill) repository for more details.
The ingesting and indexing is completed in the background using the CKAN's job queue.

Expand Down Expand Up @@ -204,11 +216,12 @@ Here is a brief overview of its functions:

See the interface definition in this plugin for more details about these functions.

<!--usage-end-->

# Testing
_Test coverage is currently extremely limited._

To run the tests in this extension, there is a Docker compose configuration available in this
repository to make it easy.
<!--testing-start-->
There is a Docker compose configuration available in this repository to make it easier to run tests.

To run the tests against ckan 2.9.x on Python3:

Expand All @@ -225,4 +238,6 @@ docker-compose build
docker-compose run ckan
```
The ckan image uses the Dockerfile in the `docker/` folder which is based on `openknowledge/ckan-dev:2.9`.
The ckan image uses the Dockerfile in the `docker/` folder.
<!--testing-end-->
Loading

0 comments on commit 1bd86e2

Please sign in to comment.