Skip to content

Commit

Permalink
Merge pull request #74 from Agri-Hub/67-poc-harvester-cli-for-gateway
Browse files Browse the repository at this point in the history
67 poc harvester cli for gateway
  • Loading branch information
fbalaban authored Nov 8, 2024
2 parents de8e8b4 + 642c56a commit 092f951
Show file tree
Hide file tree
Showing 18 changed files with 419 additions and 58 deletions.
7 changes: 7 additions & 0 deletions noa-harvester/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.8.0] - 2024-11-08
### Added
- Support for uuid list download of CDSE. Gateway CLI support and postgres read/update (#67)
### Changed
- Docker compose now also has secrets for db connection
- Updated python version: 3.12.0

## [0.7.0] - 2024-10-25
### Changed
- Bump version of CDSEtool to include bug fix of not just appending .zip (https://github.com/CDSETool/CDSETool/issues/180)
Expand Down
2 changes: 1 addition & 1 deletion noa-harvester/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Use the official Python 3.10 image as the base image
FROM python:3.11.8-slim
FROM python:3.12.0-slim

RUN pip install --upgrade pip setuptools
RUN apt-get update
Expand Down
28 changes: 27 additions & 1 deletion noa-harvester/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,17 +200,33 @@ Cli can be executed with the following:

- Commands
* `download` - The main option. Downloads according with the config file parameters.
* `from-uuid-list` - Download from uuid (e.g. id in Sentinel 2 metadata products) list. Needs to be combined with -u option. Necessary a db connection (TODO: optional)
* `query` - Queries the collection(s) for products according to the parameters present in the config file.
* `describe` (Copernicus only) - Describes the available query parameters for the collections as defined in the config file.
- Options
* `--output_path` (download only) Custom download location.
* `--output_path` (download only) Custom download location. Default is `.data`
* `-u, --uuid` [**multiple**] (from-uuid-list only). Multiple option of uuids.
* `-bb, --bbox_only` Draw total bbox, not individual polygons in multipolygon shapefile.
* `-v`, `--verbose` Shows the progress indicator when downloading (Copernicus - only for download command)
* `--log LEVEL (INFO, DEBUG, WARNING, ERROR)` Shows the logs depending on the selected `LEVEL`
- Arguments
* `config_file` - Necessary argument for the commands, indicating which config file will be used.
* `shape_file` - Optional. Create the query/download bounding box from a shapefile instead of the config file. Please note that this argument receives the base name of `.shp` and `.prj` files (e.g. argument should be `Thessalia` for `Thessalia.shp` and `Thessalia.prj` files)

## DB Considerations for uuid list download

Please note that for uuid list download, for now, a postgres db is required.
You can provide credentials either by having set up env vars or by filling up the `database.ini` file under db folder.
The necessary env vars are:
`DB_USER`
`DB_PASSWORD`
`DB_HOST`
`DB_PORT`
`DB_NAME`

Moreover, Harvester will query the db to get the UUID (to query based on the input uuid) and Title of the product to be downloaded (it does not query CDSE for metadata - it only downloads).
So make sure that a postgres with a table named "Products", includes at least a `uuid` field and a `name` field.

## Examples

* Show available query parameters for Copernicus Collections as defined in the config file:
Expand All @@ -221,6 +237,16 @@ docker run -it \
noaharvester describe config/config.json
```

* Download (with download indicator) from Copernicus providing a uuid list and store in mnt point:

```
docker run -it \
-v ./config/config.json:/app/config/config.json \
-v /mnt/data:/app/data \
noaharvester from-uuid-list -v -u caf8620d-974d-5841-b315-7489ffdd853b config/config.json
```


* Download (with download indicator) from Copernicus and Earthdata as defined in the config file, for an area provided by the shapefile files (`area.shp` and `area.prj`) located in folder `/home/user/project/strange_area`:

```
Expand Down
12 changes: 12 additions & 0 deletions noa-harvester/config/config_from_uri.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[
{
"copernicus_login": "",
"copernicus_password": "",
"earthdata_login": "",
"earthdata_password": "",
"db_uri": "",
"db_name": "",
"db_username": "",
"db_password": ""
}
]
19 changes: 17 additions & 2 deletions noa-harvester/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: noaharvester
services:
app:
noaharvester:
image: noaharvester:latest
build:
context: .
Expand All @@ -9,6 +9,11 @@ services:
- COPERNICUS_PASSWORD
- EARTHDATA_LOGIN
- EARTHDATA_PASSWORD
- DB_USER
- DB_PASSWORD
- DB_HOST
- DB_PORT
- DB_NAME
working_dir: /app
volumes:
- ./config:/app/config
Expand All @@ -23,4 +28,14 @@ secrets:
EARTHDATA_LOGIN:
environment: EARTHDATA_LOGIN
EARTHDATA_PASSWORD:
environment: EARTHDATA_PASSWORD
environment: EARTHDATA_PASSWORD
DB_USER:
environment: DB_USER
DB_PASSWORD:
environment: DB_PASSWORD
DB_HOST:
environment: DB_HOST
DB_PORT:
environment: DB_PORT
DB_NAME:
environment: DB_NAME
2 changes: 1 addition & 1 deletion noa-harvester/noaharvester/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# pylint:disable=missing-module-docstring
__version__ = "0.6.0"
__version__ = "0.8.0"
123 changes: 88 additions & 35 deletions noa-harvester/noaharvester/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,46 +39,28 @@ def cli(log):
logging.basicConfig(level=numeric_level, format="%(asctime)s %(message)s")


@cli.command(
help=(
"Queries for available products according to the config file."
"You can also provide (optional) a [SHAPE_FILE] path in order to define "
"the bounding box there instead of the config file."
)
)
@cli.command(help="Describe collection query fields (Copernicus only)")
@click.argument("config_file", required=True)
@click.argument("shape_file", required=False)
@click.option(
"--bbox_only",
"-bb",
is_flag=True,
help="Only use multipolygon total bbox, not individual",
)
def query(
config_file: Argument | str, shape_file: Argument | str, bbox_only: Option | bool
) -> None:
def describe(config_file: Argument | str) -> None:
"""
Instantiate Harvester class and call query function in order to search for
available products for the selected collections.
Instantiate Harvester Class and call "describe" for available query terms
of the selected collections (only available for Copernicus)
Parameters:
config_file (click.Argument | str): config json file listing
providers, collections and search terms
"""
if config_file:
logger.debug("Cli query for config file: %s", config_file)
logger.debug("Cli describing for config file: %s", config_file)

click.echo("Querying providers for products:\n")
harvest = harvester.Harvester(
config_file, shape_file=shape_file, bbox_only=bbox_only
)
harvest.query_data()
harvest = harvester.Harvester(config_file=config_file)
click.echo("Available parameters for selected collections:\n")
harvest.describe()


# TODO download location as an optional argument for download
@cli.command(
help=(
"Downloads data from the selected providers and query terms"
"Downloads data from the selected providers and query terms. "
"You can also provide (optional) a [SHAPE_FILE] path in order to define "
"the bounding box there instead of the config file."
)
Expand Down Expand Up @@ -130,23 +112,94 @@ def download(
click.echo("Done.\n")


@cli.command(help="Describe collection query fields (Copernicus only)")
# TODO v2: integrate functionality in download command
@cli.command(
help=(
"Download data from the provided provider and URI list. "
"Command also expects the provider credentials but can also "
"get them from the optional config file."
)
)
@click.option(
"--verbose",
"-v",
is_flag=True,
help="Shows the progress indicator (for Copernicus only)",
)
@click.argument("config_file", required=True)
def describe(config_file: Argument | str) -> None:
@click.option("--output_path", default="./data", help="Output path")
@click.option("--uuid", "-u", multiple=True, help="Uuid. Can be set multiple times")
def from_uuid_list(
config_file: Argument | str,
output_path: Option | str,
uuid: Option | tuple[str],
verbose: Option | bool,
) -> None:
"""
Instantiate Harvester Class and call "describe" for available query terms
of the selected collections (only available for Copernicus)
Instantiate Harvester class and call download function.
Downloads all relevant data as defined in the config file.
Parameters:
config_file (click.Argument | str): config json file listing
providers, collections and search terms
output_path (click.Option | str): where to download to
uuid (click.Option | tuple[str]): A tuple of uuids to download
verbose (click.Option | bool): to show download progress indicator or not
"""
if config_file:
logger.debug("Cli describing for config file: %s", config_file)
logger.debug("Cli download for config file: %s", config_file)

harvest = harvester.Harvester(config_file=config_file)
click.echo("Available parameters for selected collections:\n")
harvest.describe()
click.echo(f"Downloading at: {output_path}\n")
harvest = harvester.Harvester(
config_file=config_file,
output_path=output_path,
verbose=verbose,
from_uri=True
)
downloaded_uuids, failed_uuids = harvest.download_from_uuid_list(uuid)
if failed_uuids:
logger.error("Failed uuids: %s", failed_uuids)
# TODO The following is a dev test: to be converted to unit tests
# harvest.test_db_connection()
print(downloaded_uuids)
click.echo("Done.\n")
return downloaded_uuids


@cli.command(
help=(
"Queries for available products according to the config file."
"You can also provide (optional) a [SHAPE_FILE] path in order to define "
"the bounding box there instead of the config file."
)
)
@click.argument("config_file", required=True)
@click.argument("shape_file", required=False)
@click.option(
"--bbox_only",
"-bb",
is_flag=True,
help="Only use multipolygon total bbox, not individual",
)
def query(
config_file: Argument | str, shape_file: Argument | str, bbox_only: Option | bool
) -> None:
"""
Instantiate Harvester class and call query function in order to search for
available products for the selected collections.
Parameters:
config_file (click.Argument | str): config json file listing
providers, collections and search terms
"""
if config_file:
logger.debug("Cli query for config file: %s", config_file)

click.echo("Querying providers for products:\n")
harvest = harvester.Harvester(
config_file, shape_file=shape_file, bbox_only=bbox_only
)
harvest.query_data()


if __name__ == "__main__": # pragma: no cover
Expand Down
Empty file.
6 changes: 6 additions & 0 deletions noa-harvester/noaharvester/db/database.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[sentinel_products]
host=localhost
port=port
dbname=suppliers
user=YourUsername
password=YourPassword
Loading

0 comments on commit 092f951

Please sign in to comment.