This guide covers how to set up your environment to work on VxIngest. It also covers code standards, linting, and testing.
VxIngest is containerized for deployment. For more on using the application, see the README.md in the repo root.
VxIngest is a Python application, and uses Poetry for dependency management. Ruff is used in the codebase for linting & formatting. The repo follows a "src
"-style layout.
VxIngest outputs Couchbase JSON documents to disk as part of the "ingest" process. Those are then imported separately by a bash script in scripts/VXingest_utilities/run-import.sh
.
You will first need to download and install Poetry. We will use Poetry to manage our Python venv's and dependencies.
Poetry can be used to run the application locally. Note you will need a config.yaml
or credentials
file with connection information for a Couchbase instance.
config.yaml
:
cb_host: "url.for.couchbase"
cb_user: "user"
cb_password: "password"
cb_bucket: "vxdata"
cb_scope: "_default"
cb_collection: "METAR"
To run locally:
mkdir tmp/output
poetry install
poetry run ingest \
-m tmp/output/metrics \
-o tmp/output/out \
-x tmp/output/xfer \
-l tmp/output/log \
-c config.yaml \ # this is the path to the file with your database credentials
-j JOB-TEST:V01:METAR:CTC:CEILING:MODEL:OPS
For debug output, you can set the DEBUG
env variable:
mkdir tmp/output
poetry install
env DEBUG=true poetry run ingest \
-m tmp/output/metrics \
-o tmp/output/out \
-x tmp/output/xfer \
-l tmp/output/log \
-c config.yaml \ # this is the path to the file with your database credentials
-j JOB-TEST:V01:METAR:CTC:CEILING:MODEL:OPS
Linting, formatting, type checking, and unit testing can be done through Poetry like so:
# Lint
poetry run ruff check .
# Format
poetry run ruff format .
# Type check
poetry run mypy src
# Unit test
CREDENTIALS=config.yaml poetry run pytest tests
# Coverage report
CREDENTIALS=config.yaml poetry run coverage run -m pytest tests && \
poetry run coverage report && \
poetry run coverage html
You will need some data files downloaded locally in order to use the test suite. For more details, see tests/vxingest/README.md.
If you are using VSCode, the test suite should be picked up automatically. However, to set the CREDENTIALS
env variable in VSCode, you will want to put the value in a .env
file in the root of the repo like so:
.env
:
CREDENTIALS=config.yaml
Be aware there are different Dockerfiles in this repo - one for each service. This lets us keep our container images small and targeted to just the application that needs to be run.
You can build the docker container with the following:
docker build \
--build-arg BUILDVER=dev \
--build-arg COMMITBRANCH=$(git branch --show-current) \
--build-arg COMMITSHA=$(git rev-parse HEAD) \
-f ./docker/ingest/Dockerfile \
-t vxingest/ingest:dev \
.
And run it via Docker Compose with the below. You'll need to update the compose.yaml
file in the repo with your image tag. Note the data
and public
env variables point to where the input data resides and where you'd like the container to write out to. These are currently (12/2023) mounted to /opt/data
inside the container.
data=/data-ingest/data \
public=/public \
docker compose run ingest
Otherwise, note there are a number of targets in the Dockerfile. You can use the --target=dev
flag to build a dev version. If you do so, and want to run tests, you'll need to update the src
mount path below to where your test data is. If you're using Rancher Desktop, the data will need to be somewhere in your home directory in order to mount it in the container. You can run the container directly like so.
docker build \
--target=dev
-f ./docker/ingest/Dockerfile \
-t vxingest/ingest:test \
.
docker run \
--rm \
--mount type=bind,src=$(pwd)/tmp/test-data/opt/data,dst=/opt/data \
-it \
vxingest/ingest:test \
bash
CREDENTIALS=config.yaml poetry run pytest tests
Or to build & run the prod version of the image, you can do the following to build:
docker build \
-f ./docker/ingest/Dockerfile \
-t vxingest/ingest:prod \
.
And the following to run. Note that we're mounting the two things into the container - the directory we want to write output to ($HOME/output
, mounted to /opt/data
) and the credentials file we want to use $(pwd)/config.yaml
. These both are relatively flexible. However, if you're running the NetCDF or the GRIB ingest you will also need to mount a directory containing those files on your local computer to the location specified in the import job doc in the database in the container. (Typically job docs specify /public
)
docker run --rm \
--env DEBUG=true \
--mount type=bind,src=$HOME/output,dst=/opt/data \
--mount type=bind,src=$(pwd)/config.yaml,dst=/app/config.yaml,readonly \
vxingest:prod \
-m /opt/data/metrics \
-o /opt/data/out \
-x /opt/data/xfer \
-l /opt/data/log \
-c /app/config.yaml \
-j JOB-TEST:V01:METAR:CTC:CEILING:MODEL:OPS
docker build \
-f docker/import/Dockerfile \
--build-arg BUILDVER=dev \
--build-arg COMMITBRANCH=$(git branch --show-current) \
--build-arg COMMITSHA=$(git rev-parse HEAD) \
-t vxingest/import:dev \
.
You can run the "import" via Docker Compose like this example. You will need to use the same value for data
as you used for the "ingest".
data=/data-ingest/data \
docker compose run import
There is currently a Docker Compose file with options to run unit tests and ingest from within the container. This may be a useful option for local development as well.
NOTE: if you're using Rancher Desktop, you won't be able to access /opt on your system as it's not mounted into the VM by default. You'll need to move your test files into your home directory.
shell
: expects /data and /public for mountingtest
: expects /opt/data for mountingingest
: expects /data and /public for mountingimport
: expects /data for mounting
And can be run like:
data=/home/path/to/a/copy/of/opt/data docker compose run test
See docs/general-notes.md
for a general overview of Architecture, the data model and other useful things.
See docs/couchbase.md
for more on couchbase.