Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1pt] PR: Create inputs for selected HUC(s) #1178

Open
wants to merge 39 commits into
base: dev
Choose a base branch
from
Open

Conversation

mluck
Copy link
Contributor

@mluck mluck commented May 22, 2024

Summary

Contains files to generate data to run and evaluate FIM (fim_pipeline.sh and synthesize_test_cases.py) for specified HUC(s) as well as related patches to replace absolute file paths with relative file paths, a fix for evaluating when IFC data are not available, and update code to generate pre-clip data so that WBD for Alaska contains only one layer. NOTE: this PR requires wbd.gpkg to be created by the updated generate_pre_clip_fim_huc8.py to be copied to the pre-clip HUC folders to remove a warning in synthesize_test_case.py.

Resolves #1265 and resolves #1296.

Usage

python /foss_fim/data/sandbox/get_sample_data.py -u 03100204 -i /data -o /foss_fim/data/sample-data

Additions

  • data/
    • sandbox/ [archived]
      • Dockerfile: A copy of the root Dockerfile that also pulls code and data into the build image [archived]
      • fim-in-a-box.ipynb: Jupyter notebook to run and evaluate an example HUC [archived]
      • README.md [archived]
    • get_sample_data.py: Copies relevant data (e.g., inputs and test_cases) from the FIM data folder for specified HUC(s) and saves it to a separate location
    • wbd/generate_pre_clip_fim_huc8.py: Fix file paths and layers
  • src/bash_variables.sh: Add paths for landsea and nws_lid
  • tools/run_test_case.py: Skip evaluation for missing validation data and fix hardcoded paths for masks

Testing

fim_pipeline.sh and synthesize_test_cases.py ran on HUCs 12090301 and 19020302 without errors.

Deployment Plan (For developer use)

How does the changes affect the product?

  • Code only?
  • Require new or adjusted data inputs? Does it have start, end and duration code (in UTC)?
  • If new or updated data sets, has the FIM code been updated and tested with the new/adjusted data (subset is fine, but must be a subset of the new data)?
  • Require new pre-clip set?
  • Has new or updated python packages?
  • If applicable, has a deployment plan be created with the deployment person/team?

Issuer Checklist (For developer use)

You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.

  • Informative and human-readable title, using the format: [_pt] PR: <description>
  • Links are provided if this PR resolves an issue, or depends on another other PR
  • If submitting a PR to the dev branch (the default branch), you have a descriptive Feature Branch name using the format: dev-<description-of-change> (e.g. dev-revise-levee-masking)
  • Changes are limited to a single goal (no scope creep)
  • The feature branch you're submitting as a PR is up to date (merged) with the latest dev branch
  • pre-commit hooks were run locally
  • Any change in functionality is tested
  • Passes all unit tests locally (inside interactive Docker container, at /foss_fim/, run: pytest unit_tests/)
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • CHANGELOG updated with template version number, e.g. 4.x.x.x
  • Reviewers requested
  • Add yourself as an assignee in the PR as well as the FIM Technical Lead

Merge Checklist (For Technical Lead use only)

  • Update CHANGELOG with latest version number and merge date
  • Update the Citation.cff file to reflect the latest version number in the CHANGELOG
  • If applicable, update README with major alterations

@mluck mluck added enhancement New feature or request refactoring Refacting code to obtain the same result High Priority labels May 22, 2024
@mluck mluck requested a review from RobHanna-NOAA May 22, 2024 16:14
@mluck
Copy link
Contributor Author

mluck commented May 23, 2024

Here are some example steps to add data to a Docker image and save it as a file using the command line

Run Docker container

docker run --rm -it --name sandbox fim_4:4.5.2.1

Add data to a container

docker cp ~/sample-data sandbox:data

Save container to an image

docker commit sandbox fim_4:4.5.2.1-sandbox

Export image to an image file

docker save fim_4:4.5.2.1-sandbox | gzip > ~/Downloads/fim-in-a-box.tar.gz

Load image file to Docker image

docker load < ~/Downloads/fim-in-a-box.tar.gz

@RobHanna-NOAA
Copy link
Contributor

We are putting this on hold for a bit. As of one or two days ago the Whitebox installer is failing. They have been making updates, and even one hotfix last night, so they may still be actively fixing it. We will wait until next week and try again on our Docker builds tests. If worse comes to worse, we can see if we can extract Whitebox's old zip file from one of our current Docker images, put it in our Repo and use that to build new Docker image. We will see how this plays out over the next few weeks.

@mluck
Copy link
Contributor Author

mluck commented May 28, 2024

We are putting this on hold for a bit. As of one or two days ago the Whitebox installer is failing. They have been making updates, and even one hotfix last night, so they may still be actively fixing it. We will wait until next week and try again on our Docker builds tests. If worse comes to worse, we can see if we can extract Whitebox's old zip file from one of our current Docker images, put it in our Repo and use that to build new Docker image. We will see how this plays out over the next few weeks.

The Docker build error is now a separate issue #1181.

@RobHanna-NOAA
Copy link
Contributor

Test against ESIP failed for get_sample_data:

clear ; python /foss_fim/data/sandbox/get_sample_data.py -u 12040101 -i "s3://noaa-nws-owp-fim/hand_fim/" -o /foss_fim/data/sandbox/sample-data-12040101 -s3 -ak (hidden) -sk (hidden)

Traceback (most recent call last):
File "/foss_fim/data/sandbox/get_sample_data.py", line 330, in
get_sample_data(**vars(args))
File "/foss_fim/data/sandbox/get_sample_data.py", line 184, in get_sample_data
nws_validation_hucs = get_validation_hucs(root_dir, 'nws')
File "/foss_fim/data/sandbox/get_sample_data.py", line 57, in get_validation_hucs
[
File "/foss_fim/data/sandbox/get_sample_data.py", line 57, in
[
File "/usr/local/lib/python3.10/dist-packages/boto3/resources/collection.py", line 81, in iter
for page in self.pages():
File "/usr/local/lib/python3.10/dist-packages/boto3/resources/collection.py", line 171, in pages
for page in pages:
File "/usr/local/lib/python3.10/dist-packages/botocore/paginate.py", line 269, in iter
response = self._make_request(current_kwargs)
File "/usr/local/lib/python3.10/dist-packages/botocore/paginate.py", line 357, in _make_request
return self._method(**current_kwargs)
File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 915, in _make_api_call
api_params = self._emit_api_params(
File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 1027, in _emit_api_params
self.meta.events.emit(
File "/usr/local/lib/python3.10/dist-packages/botocore/hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/botocore/hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
File "/usr/local/lib/python3.10/dist-packages/botocore/hooks.py", line 239, in emit
response = handler(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/botocore/handlers.py", line 285, in validate_bucket_name
raise ParamValidationError(report=error_msg)
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid bucket name "s3:": Bucket name must match the regex "^[a-zA-Z0-9.-
]{1,255}$" or be an ARN matching the regex "^arn:(aws).:(s3|s3-object-lambda):[a-z-0-9]:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9-]{1,63}$"

--- I am not sure it is critical to keep this functionality. I see two options:

  1. Fix the bug
  2. Comment out that functionality and put in a ticket to fix it another time.

Also: The README.md should have the word "python" in front of the script calls. e.g. python /foss_fim/data/sandbox/get_sample_data.py. I fixed it in the PR notes already.

@mluck
Copy link
Contributor Author

mluck commented Jun 14, 2024

The last error occurred because s3:// was included in the -i argument. I've added code to strip s3:// if it's included so it will no longer throw this error, and updated README.md as well.

@RobHanna-NOAA
Copy link
Contributor

Update. We can no longer use explicit aws keys in our enviro's. Those arguments will need to be removed and we will need to use our SSO system. If this ever gets opened up externally, users will need their keys installed on their machines.

@RobHanna-NOAA
Copy link
Contributor

We will also need to review the inputs and paths. Like loading bridge data I think. We need to check DEM paths as well. Also, we can not load any validatation or test case data of any kinds for IFC as it is not considered to be public.

@RobHanna-NOAA
Copy link
Contributor

/data/sandbox/docker file needs updates too (sorry)

@mluck
Copy link
Contributor Author

mluck commented Nov 5, 2024

data/sandbox/get_sample_data.py doesn't work anymore due to changes in S3 access and changes in files and file locations. This script is currently being updated.

@mluck
Copy link
Contributor Author

mluck commented Nov 12, 2024

As of facca2e, the sandbox (FIM in a Box) code in this feature branch will be archived. Refer to commit 4e6ae2e for the latest version of the sandbox code.

This pull request will contain scripts to create a new inputs folder with data for specified HUC(s) as well as related patches to replace absolute file paths with relative file paths, a fix for evaluating when IFC data are not available, and update code to generate pre-clip data so that WBD for Alaska contains only one layer. NOTE: this PR requires wbd.gpkg to be created by the updated generate_pre_clip_fim_huc8.py to be copied to the pre-clip HUC folders to remove a warning in synthesize_test_case.py.

@mluck mluck changed the title [1pt] PR: Create a self-contained FIM sandbox demo image [1pt] PR: Create inputs for selected HUC(s) Nov 12, 2024
@RobHanna-NOAA
Copy link
Contributor

With opening the s3 input option, we will need to update the readme.md (and/or others) to tell them it exists and how to either use it or a pointer for instructions. I personally think it is better to add a some text inside the README.md on how to do it. Pointing to the PR seems loose. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request High Priority refactoring Refacting code to obtain the same result
Projects
None yet
3 participants