Data Pipelines

This package aims to provide a set of Python code for geospatial format conversion, as well as the ability to create your own STAC Catalog and interact with your data in an object store.

Prerequisites

You can run this script by setting the credentials for the BUCKET Object Store manually. Or you can set the following environment variables:

BUCKET_TOKEN
BUCKET_SECRET
BUCKET_API_URL

Create and set up a pyenv environment:

pyenv virtualenv data_pipelines
pyenv local data_pipelines

Install the package and requirements:

pip install -e .

Data Conversion

Currently, code for converting the following data formats has been implemented:

Input: TIF, NETCDF, GeoJSON
Output: COG

How to Use

Import the function and initialize the variables you will use:

from data_pipelines.data_pipelines import DataFormatConverter

input_file = '../../raw_data/shipborne_2018.tif'
input_format = 'tif'
output_format = 'cog'
output_filename = 'shipborne_2018.tif'
output_filepath = '../../raw_data/'
bucket_name = 'haig-fras'
bucket_folder = 'layers/bathymetry/shipborne_2018'

Create an instance of the class and perform the data conversion:

d = DataFormatConverter(input_file=input_file, input_format=input_format)

d.convert_format(output_filename=output_filename,
                 output_filepath=output_filepath,
                 output_format=output_format,
                 upload_file=True,
                 bucket_name=bucket_name,
                 bucket_folder=bucket_folder)

Work with Files in an Object Store

This package allows you to perform some simple actions in an object store, such as ls, cp, mv, rm, and upload.

How to Use

Import the function:

from access_bucket.access_bucket import AccessBucket

If you did not set the ENV variables, you need to pass these values to the class constructor:

a = AccessBucket(aws_access_key_id,
                 aws_secret_access_key,
                 endpoint_url,
                 bucket)

If you set the ENV variables, you just pass:

a = AccessBucket(bucket)

To perform operations in the bucket:

# List files
a.ls(bucket_folder='layers/seabed_images/hf2012/images', prefix="M58_10441297_1298774480")

# Remove files
a.rm(bucket_folder='frontend/assets', verbose=1)

# Create a directory
a.mkdir(bucket_folder='layers/sidescan/sidescan_2012')

# Move files
a.mv(
    bucket_folder='layers/bathymetry/',
    output_folder='layers/seabed_images/nbn',
    verbose=1,
)

# Rename files
a.mv(
    bucket_folder='layers/seabed_images/jncc/images',
    sufix='.JPG',
    replace=['.JPG', '.jpg'],
    verbose=1
)

# Copy files
a.mv(
    bucket_folder='mbtiles',
    output_folder='layers/seabed_habitats/habitat_map',
    input_filename='habitats_new-65536.mbtiles',
    output_filename='habitat_map.mbtiles'
)

# Upload files
a.upload(filenames, path, bucket_folder)

The notebook code_test.ipynb provides usage examples.

Create STAC Catalogs

To create STAC Catalogs, you must first create a set of configuration files called "metadata."

Metadata files consist of a primary JSON file representing the main catalog. Depending on the data group, if you want to add sub-catalogs to your catalog, you can create auxiliary JSON files.

The main catalog file should follow the format described in the example file. This file provides examples of adding different types of data to the STAC, such as COG files, GeoJSON, PNG files, WMS links, and links to Asset ION.

How to Use

To generate the STAC Catalog, follow these steps:

Create the metadata folder following the example file. In the same folder, there is also an example for adding a sub-catalog to your main catalog.
Import the function and create an instance of the class:

from create_stac.stac_gen import STACGen

# Create an instance of the class and provide the path to your metadata files (JSON files)
s = STACGen(metadata_path='../metadatas/')

Generate the STAC Catalog. If you did not set the ENV variables, you need to pass these values at this step:

s.stac_gen(upload_bucket=True,
           stac_path='stac',
           aws_access_key_id=aws_access_key_id,
           aws_secret_access_key=aws_secret_access_key,
           endpoint_url=endpoint_url
          )

If you set the ENV variables, you can just pass:

s.stac_gen(upload_bucket=True,
           stac_path='stac',
          )

In the end, a folder called "stac" will be created with your STAC Catalog.

Convert STAC Catalogs to JSON for use in a Web Application

To use STAC Catalogs in a frontend, importing all these files may take a long time. To speed up this process, a set of codes has been created to convert the STAC into a single, user-friendly JSON format accepted by the web application. Currently, we are converting the STAC to a JSON format accepted by the "Haig Fras Digital Twin" project.

How to Use

To generate the JSON file, follow these steps:

Import the function and create an instance of the class:

from create_stac.stac_convert import STACConvert

c = STACConvert(bucket_path=''
               stac_path='stac')

Generate the JSON file from the STAC Catalog:

c.convert()

Save the final JSON file and upload it to the object store. If you did not set the ENV variables, you need to pass these values at this step:

c.save_and_upload(filename='layers.json',
                  aws_access_key_id=aws_access_key_id,
                  aws_secret_access_key=aws_secret_access_key,
                  endpoint_url=endpoint_url)

If you set the ENV variables, you can just pass:

c.save_and_upload(filename='layers.json')

In the end, a file called "layers.json" will be created and ready to be used by the web application.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
access_bucket		access_bucket
create_stac		create_stac
data_pipelines		data_pipelines
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
add_license.sh		add_license.sh
dev_requirements.txt		dev_requirements.txt
environment.yml		environment.yml
metadata_example.json		metadata_example.json
post-activate.sh		post-activate.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Pipelines

Prerequisites

Data Conversion

How to Use

Work with Files in an Object Store

How to Use

Create STAC Catalogs

How to Use

Convert STAC Catalogs to JSON for use in a Web Application

How to Use

About

Releases

Packages

Languages

License

NOC-OI/data_pipelines

Folders and files

Latest commit

History

Repository files navigation

Data Pipelines

Prerequisites

Data Conversion

How to Use

Work with Files in an Object Store

How to Use

Create STAC Catalogs

How to Use

Convert STAC Catalogs to JSON for use in a Web Application

How to Use

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages